sklearn
import numpy as np
import pandas as pd
from matplotlib.pyplot import subplots
import statsmodels.api as sm
from ISLP import load_data
from ISLP.models import (ModelSpec as MS,
summarize)
from ISLP import confusion_table
from ISLP.models import contrast
from sklearn.discriminant_analysis import \
(LinearDiscriminantAnalysis as LDA,
QuadraticDiscriminantAnalysis as QDA)
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
Logistic Regression
sm.GLM()
fits generalized linear models- pass
family=sm.families.Binomial()
to run logistic instead of some other GLM
- pass
sm.Logit()
fits a logistic modelresults.pvalues
returns parameter p valuesresults.predict()
can predict given predictor values, and if no parameters are given it gives the probabilities of the training data
Convert predictions to class labels:
labels = np.array(['baseline']*nrows)
probs = results.predict()
labels[probs>0.5] = "non-baseline"
Use labels in confusion matrix:
confusion_table(labels, df.y)
Calculate correct predictions by hand or with:
np.mean(labels == df.y)
split train and test data using a boolean array:
train = (Smarket.Year < 2005)
X_train, X_test = X.loc[train], X.loc[~train]
y_train, y_test = y.loc[train], y.loc[~train]
Linear Discriminant Analysis
LDA()
from preamblelda = LDA(store_covariance=True)
means_
attribute extracts average of each predictor within each class- estimates of
- estimates of
prior_
attribute shows prior probabilitiesclasses_
attribute shows which entry corresponds to which labelscalings_
attribute shows linear discriminant vectors
Use a posterior probability threshold (shows number of predictions with a posterior probability of at least 90%):
lda_prob = lda.predict_proba(X_test)
np.sum(lda_prob[:,0] > 0.9)
Quadratic Discriminant Analysis
QDA()
from preambleqda = QDA(store_covariance=True)
qda.covariance_[0]
estimates covariance for first class (change index for other classes)
Naive Bayes
NB = GaussianNB()
class_prior_
attribute shows the class prior probabilities- feature parameters are in
theta_
andvar_
attributes - find attribute names with
?NB
orNB?
K-Nearest Neighbors
KNeighborsClassifier()
knn1 = KNeighborsClassifier(n_neighbors=1)
- need to scale the date before using KNN
StandardScaler()
train_test_split()
splits data into training and testing
Tuning Parameters
for K in range(1,6):
knn = KNeighborsClassifier(n_neighbors=K)
knn_pred = knn.fit(X_train, y_train).predict(X_test)
C = confusion_table(knn_pred, y_test)
templ = ('K={0:d}: # predicted to rent: {1:>2},' +
' # who did rent {2:d}, accuracy {3:.1%}')
pred = C.loc['Yes'].sum()
did_rent = C.loc['Yes','Yes']
print(templ.format(
K,
pred,
did_rent,
did_rent / pred))
- linear predictors are stored as the attributeĀ
lin_pred
- fitted values are stored in theĀ
fittedvalues
Ā attribute returned by theĀfit()
Ā method
Poisson
sm.GLM()
family=sm.families.Poisson()
Gamma
sm.GLM()
family=sm.families.Gamma()