Naive Bayes

Naive Bayes estimates $f_{1} (x), \dots, f_{K} (x)$ with the assumption that within the $k^{t h}$ class, the $p$ predictors are independent. Mathematically, for $k = 1, \dots, K$ :

f_{k} (x) = f_{k 1} (x_{1}) * f_{k 2} (x_{2}) * \dots * f_{k p} (x_{p})

where $f_{k j}$ is the density function of the $j^{t h}$ predictor among observations in the $k^{t h}$ class.

Naive Bayes is a good estimator where $n$ is not large enough relative to $p$ for us to effectively estimate the joint distribution of the predictors within each class.

The posterior probability of naive Bayes for $k = 1, \dots, K$ is:

P r (Y = k | X = x) = \frac{π_{k} * f_{k 1} (x_{1}) * f_{k 2} (x_{2}) * \dots * f_{k p} (x_{p})}{\sum_{l = 1}^{K} π_{l} * f_{l 1} (x_{1}) * f_{l 2} (x_{2}) * \dots * f_{l p} (x_{p})}

Estimate $f_{k j}$ using training data $x_{1 j}, \dots, x_{n j}$ using one of the following methods:

If $X_{j}$ is quantitative, then we can assume that $X_{j} | Y = k \sim N (μ_{j k}, σ_{j k}^{2})$
If $X_{j}$ is quantitative, use a non-parametric estimate for $f_{k j}$ . Think of a histogram or a kernel density estimator (a smoothed version of a histogram)
If $X_{j}$ is qualitative, count the proportion of training observations for the $j^{t h}$ predictor corresponding to each class.

Sources: 1

Naive Bayes

Connect With Me!