KNN / Naive Bayes

Pradeep Ankem
2 min readAug 1, 2019

--

A Centroid from KNN walks into a Bar, and asks..what is everyone having ?

KNN

Notes: Recap on Linear Regression/KNN — Eucledian distance/Practise example on Titanic/ Jack Vander Plas book (link)

Wolfram Demonstrations (link)

Netlog Demo(link) + distance (link)

Power Point Walk thru (link)

Sample example in AzureML Studio

Action Points: Agenda/ppt/.ipynb file to medium post/Pros and Cons/Network Intrusion Dataset

knn = KNeighborsClassifier(n_neighbors = 3)
knn.fit(X_train, Y_train)
Y_pred = knn.predict(X_test)
acc_knn = round(knn.score(X_train, Y_train) * 100, 2)
acc_knn

[Gap]

Import the libraries.

Get the data.

Add headers to the dataframe

Handle missing data

Perform Data preprocessing by duplicating copy of the original data frame.

Perform Hot encoding

Initialise the encoded categorical columns

Split the data into train and test

Implement Gaussian Naive Bayes

Calculate Accuracy

What is K really means?

The k-nearest neighbours algorithm uses a very simple approach to perform classification. When tested with a new example, it looks through the training data and finds the k training examples that are closest to the new example. It then assigns the most common class label (among those k training examples) to the test example.

k is therefore just the number of neighbors “voting” on the test example’s class.

If k=1, then test examples are given the same label as the closest example in the training set. If k=3, the labels of the three closest classes are checked and the most common (i.e., occuring at least twice) label is assigned, and so on for larger ks.

When you build a k-nearest neighbor classifier, you choose the value of k. You might have a specific value of k in mind, or you could divide up your data and use something like cross-validation to test several values of k in order to determine which works best for your data.

Network Intrusion Sample

Cancer Analysis

Naive Bayes

In Probability, we meet two types of People, one is Frequentist and the Other is a Bayesian.

Let’s take a small example of rolling two dice for 1000 times (independent events).

Example on Monty Hall Problem

Stat Trek (Calculation) + Sample Example (link)

Strong Example (link)

Dataset(link) + Excel working (link)

Titanic Kaggle link

Spam or Ham

--

--

Pradeep Ankem
Pradeep Ankem

Written by Pradeep Ankem

In Parallel Universe, I would have been a Zen Monk.

No responses yet