Skip to content

Machine learning

scikit-learn is one of many excellent machine learning libraries for Python.

Use a virtual environment

To have full control over installing the packages described within this section, consider using virtual environment demonstrated here.

Installation

Use pip to install scikit-learn

pip install scikit-learn

The name of the module is sklearn

import sklearn

Iris data set

One of the canonical examples within the machine learning community is classifying Iris plants from various petal and sepal features. This is such a common example in fact, the Iris data set is included within sklearn. All you have to do is import the datasets module and load it

from sklearn import datasets

iris = datasets.load_iris()

The Iris data set contains 150 examples of Iris plant. There are 50 examples of 3 classes, or targets — setosa, versicolour, and virgincia. Each example contains 4 different measurements, or features — sepal length, sepal width, petal length, and petal width.

The sklearn data set contains the features in iris.data and the corresponding classes are found within iris.target.

Building a classifier

In this example, we'll build a Decision Tree classifier to predict Iris plants from petal and sepal features. First, we'll load the tree module

from sklearn import tree

Next, we'll create a DecisionTreeClassifer with a maximum depth of 2

classifier = tree.DecisionTreeClassifier(max_depth=2)

Fitting data (training)

We can train our classifier on the data set using the fit method

classifier = classifier.fit(iris.data, iris.target)

Visualizing the classifer

Before moving on to prediction, let's visualize our classifier using the export_text function from sklearn.tree

from sklearn.tree import export_text

text = export_text(classifier, feature_names=iris['feature_names'])

print(text)

You should see the following output

|--- petal length (cm) <= 2.45
|   |--- class: 0
|--- petal length (cm) >  2.45
|   |--- petal width (cm) <= 1.75
|   |   |--- class: 1
|   |--- petal width (cm) >  1.75
|   |   |--- class: 2

Class names for this example can be found in iris['target_names'].

Prediction

Given an unseen input feature vector [ 1.0, 1.0, 2.5, 1.75 ] you can predict its class using the .predict() method on our new model

classifier.predict([[ 1.0, 1.0, 2.5, 1.75 ]])

This should return class 1. If you manually run this input vector through the Decision Tree rules yourself, you can verify that this is correct.