Member-only story
Python Outlier Detection Algorithm — KNN
K-nearest neighbor (KNN) is one of the most popular algorithms in Machine Learning, widely used in supervised and unsupervised learning. In supervised learning, KNN is used to calculate the distance to k neighbors and can define outliers. In unsupervised learning, KNN can also be used to calculate the distance to neighbors and then define outliers. In PyOD, the KNN algorithm is mainly used for unsupervised learning. This article will discuss the application of KNN in supervised and unsupervised learning and how to define outlier scores. For more anomaly detection technologies, please refer to the recommended article at the end of the article.
KNN as unsupervised learning
The unsupervised KNN method uses Euclidean distance to calculate the distance between observation values and other observation values, without adjusting parameters to improve performance. Its steps include calculating the distance between each Data Point and other Data Points, sorting the Data Points from small to large based on distance, and then selecting the top K entries. One of the commonly used methods for distance calculation is Euclidean distance.
- Step 1: Calculate the distance between each Data Point and other Data Points.
- Sort Data Points from smallest to largest based on distance.
- Step 3: Select the first K items.
There are multiple options for calculating the distance between two Data Points. The most commonly used…