K-Nearest Neighbor

Characteristics

Lazy learning (the test point is not given)
Non-parametric algorithm (no assumption)
Heavy computation, not real-time
A Low K-value is sensitive to outliers and a higher K-value is more resilient to outliers as it considers more voters to decide prediction
Higher K-value has higher computational time

Pros

Easy to understand
No assumptions about data
Classification and regression (average of the output value for the top K nearest neighbors)
Works easily on multi-class problems

Cons

Memory/Computationally expensive
Sensitive to scale of data
Struggle when high number of attributes (do standardization before knn)
Does not work well with categorical features since it is difficult to find the distance between dimensions with categorical features (use one-hot encoding for hamming distances)

Improve

Principle-Component Analysis (PCA) to reduce dimension
KD-tree
Parallelization for distance computations

Applications

Hand-written character recognition
Fast content-based image retrieval
Intrusion detection
Fault detection for semiconductor manufacturing processes

Steps

Prepare training and test dataset
Select K
Determine distance function
Compute distances to n training samples
Sort the distances and find K-nearest data samples
Assign to class based on majority vote

Standardization

X_{n e w} = \frac{X - μ}{σ}

Distance Measures

Euclidean distance

Manhattan distance

Reduce computation time
Might not get good result

Cosine distance

Smaller the distance, the more similar

cos θ Distance = 1 - cos θ = \frac{a \cdot b}{∣ a ∣ ∣ b ∣} = \frac{\sum _{i = 1}^{n} ( x _{i}^{Train} \times x _{i}^{Test} )}{\sum _{i = 1}^{n} ( x _{i}^{Train} ) ^{2} \sum _{i = 1}^{n} ( x _{i}^{Test} ) ^{2}}

Cosine distance vs cosine similarity

Hamming Distance

The number of bit positions in which the bits are different
For binary data string
Categorical data

K Value

Suppose we have even number of classes and if we choose K to be an odd number, we can prevent the tie condition

D-fold Cross-Validation

Split into d groups
Select 1 group for testing, remaining for testing
For each value of K, classify the data and record the error
Repeat 1-3 for different value of K
For each value of K, find the average error across validation sets, choose K with the lowest error

Error Measurement for Classification Problems

Precision

Precision = \frac{TP}{TP + FP}

Recall

Recall = \frac{TP}{TP + FN}

F1-Score

F1-Score = \frac{2 \times Precision \times Recall}{Percision + Recall}

Error = 1 - F1-Score

Error Measurement for Regression Problems

Mean Absolute Error (MAE)

MAE = \frac{\sum _{i = 1}^{m} ∣ a _{i} - p _{i} ∣}{m}

Mean Square Error (MSE)

MSE = \frac{\sum _{i = 1}^{m} ( a _{i} - p _{i} ) ^{2}}{m}

Mean Absolute Percentage Error (MAPE)

MAPE = \frac{\sum _{i = 1}^{m} \frac{a _{i} - p _{i}}{a _{i}}}{m}

🏡

Explorer

K-Nearest Neighbor

Characteristics

Pros

Cons

Improve

Applications

Steps

Standardization

Distance Measures

Euclidean distance

Manhattan distance

Cosine distance

Hamming Distance

K Value

D-fold Cross-Validation

Error Measurement for Classification Problems

Precision

Recall

F1-Score

Error Measurement for Regression Problems

Mean Absolute Error (MAE)

Mean Square Error (MSE)

Mean Absolute Percentage Error (MAPE)

Explorer

Table of Contents

Backlinks