Linear Regression vs K-Nearest Neighbors
Linear Regression vs K-Nearest Neighbors
Simplest and best known non-parametric method: K-nearest neighbors regression (KNN regression).
Given a value for
In general, the optimal value for
The parametric approach will outperform the non-parametric approach if the parametric form that has been selected is close to the true form of
Note that as the extent of non-linearity increases, there is little change in the test set MSE for the non-parametric KNN method, but there is a large increase in the test set MSE of linear regression.
Even when the true relationship is highly non-linear, KNN may still provide inferior results to linear regression. In higher dimensions, KNN often performs worse than linear regression. This decrease in performance as the dimension increases is a common problem for KNN, and results from the fact that in higher dimensions there is effectively a reduction in sample size.
As a general rule, parametric methods will tend to outperform non-parametric approaches when there is a small number of observations per predictor.
Even when the dimension is small, we might prefer linear regression to KNN from an interpretability standpoint. If the test MSE of KNN is only slightly lower than that of linear regression, we might be willing to forego a little bit of prediction accuracy for the sake of a simple model that can be described in terms of just a few coefficients, and for which
Sources: 1