sklearn.tree.DecisionTreeRegressor

class sklearn.tree.DecisionTreeRegressor(criterion='mse', splitter='best', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=None, random_state=None, max_leaf_nodes=None, presort=False)[source]

A decision tree regressor.

Read more in the User Guide.

criterion : string, optional (default=”mse”)
The function to measure the quality of a split. The only supported criterion is “mse” for the mean squared error, which is equal to variance reduction as feature selection criterion.
splitter : string, optional (default=”best”)
The strategy used to choose the split at each node. Supported strategies are “best” to choose the best split and “random” to choose the best random split.
max_features : int, float, string or None, optional (default=None)
The number of features to consider when looking for the best split:
  • If int, then consider max_features features at each split.
  • If float, then max_features is a percentage and int(max_features * n_features) features are considered at each split.
  • If “auto”, then max_features=n_features.
  • If “sqrt”, then max_features=sqrt(n_features).
  • If “log2”, then max_features=log2(n_features).
  • If None, then max_features=n_features.

Note: the search for a split does not stop until at least one valid partition of the node samples is found, even if it requires to effectively inspect more than max_features features.

max_depth : int or None, optional (default=None)
The maximum depth of the tree. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples. Ignored if max_leaf_nodes is not None.
min_samples_split : int, optional (default=2)
The minimum number of samples required to split an internal node.
min_samples_leaf : int, optional (default=1)
The minimum number of samples required to be at a leaf node.
min_weight_fraction_leaf : float, optional (default=0.)
The minimum weighted fraction of the input samples required to be at a leaf node.
max_leaf_nodes : int or None, optional (default=None)
Grow a tree with max_leaf_nodes in best-first fashion. Best nodes are defined as relative reduction in impurity. If None then unlimited number of leaf nodes. If not None then max_depth will be ignored.
random_state : int, RandomState instance or None, optional (default=None)
If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.
presort : bool, optional (default=False)
Whether to presort the data to speed up the finding of best splits in fitting. For the default settings of a decision tree on large datasets, setting this to true may slow down the training process. When using either a smaller dataset or a restricted depth, this may speed up the training.
feature_importances_ : array of shape = [n_features]
The feature importances. The higher, the more important the feature. The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance [4].
max_features_ : int,
The inferred value of max_features.
n_features_ : int
The number of features when fit is performed.
n_outputs_ : int
The number of outputs when fit is performed.
tree_ : Tree object
The underlying Tree object.

DecisionTreeClassifier

[1]http://en.wikipedia.org/wiki/Decision_tree_learning
[2]L. Breiman, J. Friedman, R. Olshen, and C. Stone, “Classification and Regression Trees”, Wadsworth, Belmont, CA, 1984.
[3]T. Hastie, R. Tibshirani and J. Friedman. “Elements of Statistical Learning”, Springer, 2009.
[4]L. Breiman, and A. Cutler, “Random Forests”, http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm
>>> from sklearn.datasets import load_boston
>>> from sklearn.cross_validation import cross_val_score
>>> from sklearn.tree import DecisionTreeRegressor
>>> boston = load_boston()
>>> regressor = DecisionTreeRegressor(random_state=0)
>>> cross_val_score(regressor, boston.data, boston.target, cv=10)
...                    
...
array([ 0.61..., 0.57..., -0.34..., 0.41..., 0.75...,
        0.07..., 0.29..., 0.33..., -1.42..., -1.77...])
__init__(criterion='mse', splitter='best', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=None, random_state=None, max_leaf_nodes=None, presort=False)[source]

Methods

__init__([criterion, splitter, max_depth, ...])
apply(X[, check_input]) Returns the index of the leaf that each sample is predicted as.
fit(X, y[, sample_weight, check_input, ...]) Build a decision tree from the training set (X, y).
fit_transform(X[, y]) Fit to data, then transform it.
get_params([deep]) Get parameters for this estimator.
predict(X[, check_input]) Predict class or regression value for X.
score(X, y[, sample_weight]) Returns the coefficient of determination R^2 of the prediction.
set_params(**params) Set the parameters of this estimator.
transform(*args, **kwargs) DEPRECATED: Support to use estimators as feature selectors will be removed in version 0.19.

Attributes

feature_importances_ Return the feature importances.