Você está na página 1de 10

observations is or isn’t a Virginica.

We can see that as our petal width


feature increases the probability of being a Virginica increases.

log_reg = LogisticRegression(penalty="l2")
log_reg.fit(X,y)
X_new = np.linspace(0,3,1000).reshape(-1,1)
y_proba = log_reg.predict_proba(X_new)

Virginica Probability

We can extend Logistic Regression to multiple classes and it turns out to


be very powerful. In this example we can see a very low classification
error amongst the three classes.

softmax_reg = LogisticRegression(multi_class="multinomial",
solver="lbfgs", C=5)
softmax_reg.fit(X,y)
pred = softmax_reg.predict(X_test)
Classifying with Logistic Regression

2. Support Vector Machines


Support Vector Machines work by attempting to pass a hyperplane
through the dataset, capable of classifying the data. This can be done
on various dimensions. Check out this article if you’re interested in
diving deep into the various details.

clf = svm.SVC(gamma='scale', decision_function_shape='ovo')


clf.fit(X,y)
X_new = np.linspace(0,3,1000).reshape(-1,1)
y_proba = clf.predict_proba(X_new)
SVM prediction probability of Virginica using Petal Width

clf = svm.SVC(gamma='scale', decision_function_shape='ovo')


clf.fit(X,y)
pred = clf.predict(X_test)

Classifying with our SVC

3. Naive Bayes
Perhaps the simplest of all the models discussed in this article, we make
it now to Naive Bayes. Naive Bayes is great for the small amount of data
necessary to estimate parameters. Naive Bayes applies Bayes’ theorem
and is called naive because of the assumption of conditional
independence between each feature. In this example I apply Gaussian
Naive Bayes:

Gaussian Naive Bayes

clf = GaussianNB()
clf.fit(X,y)
X_new = np.linspace(0,3,1000).reshape(-1,1)
y_proba = clf.predict_proba(X_new)

Naive Bayes prediction probability of Virginica using Petal Width

clf = GaussianNB()
clf.fit(X,y)
pred = clf.predict(X)
Classifying with Naive Bayes

4. Random Forest
Random Forest is a popular ensemble model used quite frequently. You
can see ensemble models popping up all over the place, especially in
Kaggle competitions. Random forest works by fitting decision tree
classifiers on subsamples of the dataset. It then averages classification
performance to garner superior accuracy whilst avoiding overfitting.
We set n_estimators to 100 which sets the number of trees in the forest
to 100. Max depth sets the maximum depth of the tree.

clf = RandomForestClassifier(n_estimators=100, max_depth=2,


random_state=0)
clf.fit(X,y)
X_new = np.linspace(0,3,1000).reshape(-1,1)
y_proba = clf.predict_proba(X_new)
Random Forest prediction probability of Virginica using Petal Width

clf = RandomForestClassifier(n_estimators=100, max_depth=2,


random_state=0)
clf.fit(X,y)
pred = clf.predict(X)

Classifying with Random Forest

5. AdaBoost
Another popular ensemble model…AdaBoost works to fit many
classifiers on the dataset with different weights for incorrectly classified
instances. AdaBoost training selects the features known to increase the
classification power of the model. This of course acts as dimension
reduction, which is a plus as long as classification capabilities are
preserved.

clf = AdaBoostClassifier(n_estimators=100)
clf.fit(X,y)
X_new = np.linspace(0,3,1000).reshape(-1,1)
y_proba = clf.predict_proba(X_new)

AdaBoost prediction probability of Virginica using Petal Width

clf = AdaBoostClassifier(n_estimators=100)
clf.fit(X,y)
pred = clf.predict(X)
Classifying with AdaBoost

Congrats
Hooray! You made it to the end. Now it’s your job to ask questions and
try to understand these models on a deeper level. In the next article I
will dive into the pros and cons of each model. Until next time…

Some more Scikit-Learn examples: https://scikit-


learn.org/stable/auto_examples/classification/plot_classifier_compari
son.html

All Sorts of Di erent Methods…Source

As a reminder, all of the models are available on Github if you want to


learn more:
https://github.com/Poseyy/Articles/tree/master/5SkLearnModels

Você também pode gostar