KNN / Naive Bayes
A Centroid from KNN walks into a Bar, and asks..what is everyone having ?
KNN
Notes: Recap on Linear Regression/KNN — Eucledian distance/Practise example on Titanic/ Jack Vander Plas book (link)
Wolfram Demonstrations (link)
Netlog Demo(link) + distance (link)
Power Point Walk thru (link)
Sample example in AzureML Studio
Action Points: Agenda/ppt/.ipynb file to medium post/Pros and Cons/Network Intrusion Dataset
knn = KNeighborsClassifier(n_neighbors = 3)
knn.fit(X_train, Y_train)
Y_pred = knn.predict(X_test)
acc_knn = round(knn.score(X_train, Y_train) * 100, 2)
acc_knn
[Gap]
Import the libraries.
Get the data.
Add headers to the dataframe
Handle missing data
Perform Data preprocessing by duplicating copy of the original data frame.
Perform Hot encoding
Initialise the encoded categorical columns
Split the data into train and test
Implement Gaussian Naive Bayes
Calculate Accuracy
What is K really means?
The k-nearest neighbours algorithm uses a very simple approach to perform classification. When tested with a new example, it looks through the training data and finds the k training examples that are closest to the new example. It then assigns the most common class label (among those k training examples) to the test example.
k is therefore just the number of neighbors “voting” on the test example’s class.
If k=1, then test examples are given the same label as the closest example in the training set. If k=3, the labels of the three closest classes are checked and the most common (i.e., occuring at least twice) label is assigned, and so on for larger ks.
When you build a k-nearest neighbor classifier, you choose the value of k. You might have a specific value of k in mind, or you could divide up your data and use something like cross-validation to test several values of k in order to determine which works best for your data.
Network Intrusion Sample
Cancer Analysis
Naive Bayes
In Probability, we meet two types of People, one is Frequentist and the Other is a Bayesian.
Let’s take a small example of rolling two dice for 1000 times (independent events).
Example on Monty Hall Problem
Stat Trek (Calculation) + Sample Example (link)
Strong Example (link)
Dataset(link) + Excel working (link)
Titanic Kaggle link
Spam or Ham