Decision Trees

Building a single decision tree can be done through sklearn, as seen below:

from sklearn import tree
import matplotlib.pyplot as plt

# Create and train the model
classifier = tree.DecisionTreeClassifier()
classifier = classifier.fit(x_train, y_train)
# Get information on your model
plt.figure(figsize(15,15))
tree.plot_tree(classifier)
plt.show()
predictions = classifier.predict(x_test)
accuracy    = [label == predictions[i] for i, label in enumerate(y_test)]

Random Forest

We can also build random forests, as below:

from sklearn.ensemble import RandomForestClassifier
from math import sqrt

# Available parameters
n_estimators = 100     # number of trees
n_jobs       = -1      # number of cores to use. -1 is all
max_samples  = 0.1     # percentage of samples to use per tree
max_features = 'sqrt'  # Default is sqrt of features of data. Also allows ints
max_depth    = 5       # If not set, trees may grow to any size
# Set the values for your model
tree = RandomForestClassifier(n_estimators=n_estimators, n_jobs=-1, max_samples=max_samples)
# Train your model
tree.fit(x_train, y_train)
# Get information on your model
predictions = classifier.predict(x_test)
accuracy    = [label == predictions[i] for i, label in enumerate(y_test)]

Boosted Trees

Sometimes we can choose to use the GradientBoostingClassifier, for models that generate additional trees whenever there is loss occurring, to deal with this residual loss. These are called "boosted trees"

from sklearn.ensemble import GradientBoostingClassifier

# Set the values for your model
# Note: subsample is the percentage of training data that should be sampled for each tree
gbc = GradientBoostingClassifier(n_estimators=200, subsample=0.5, max_features='sqrt', verbose=1, learning_rate=0.003)
# Train your model
gbc.fit(x_train, y_train)
# Get information on your model
predictions = classifier.predict(x_test)
accuracy    = [label == predictions[i] for i, label in enumerate(y_test)]