class sklearn.tree.DecisionTreeClassifier(criterion=’gini’, splitter=’best’, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=None, random_state=None, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, class_weight=None, presort=False)
[source]
A decision tree classifier.
Read more in the User Guide.
Parameters: |
|
---|---|
Attributes: |
|
See also
The default values for the parameters controlling the size of the trees (e.g. max_depth
, min_samples_leaf
, etc.) lead to fully grown and unpruned trees which can potentially be very large on some data sets. To reduce memory consumption, the complexity and size of the trees should be controlled by setting those parameter values.
The features are always randomly permuted at each split. Therefore, the best found split may vary, even with the same training data and max_features=n_features
, if the improvement of the criterion is identical for several splits enumerated during the search of the best split. To obtain a deterministic behaviour during fitting, random_state
has to be fixed.
[1] | https://en.wikipedia.org/wiki/Decision_tree_learning |
[2] | L. Breiman, J. Friedman, R. Olshen, and C. Stone, “Classification and Regression Trees”, Wadsworth, Belmont, CA, 1984. |
[3] | T. Hastie, R. Tibshirani and J. Friedman. “Elements of Statistical Learning”, Springer, 2009. |
[4] | L. Breiman, and A. Cutler, “Random Forests”, http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm |
>>> from sklearn.datasets import load_iris >>> from sklearn.model_selection import cross_val_score >>> from sklearn.tree import DecisionTreeClassifier >>> clf = DecisionTreeClassifier(random_state=0) >>> iris = load_iris() >>> cross_val_score(clf, iris.data, iris.target, cv=10) ... ... array([ 1. , 0.93..., 0.86..., 0.93..., 0.93..., 0.93..., 0.93..., 1. , 0.93..., 1. ])
apply (X[, check_input]) | Returns the index of the leaf that each sample is predicted as. |
decision_path (X[, check_input]) | Return the decision path in the tree |
fit (X, y[, sample_weight, check_input, …]) | Build a decision tree classifier from the training set (X, y). |
get_params ([deep]) | Get parameters for this estimator. |
predict (X[, check_input]) | Predict class or regression value for X. |
predict_log_proba (X) | Predict class log-probabilities of the input samples X. |
predict_proba (X[, check_input]) | Predict class probabilities of the input samples X. |
score (X, y[, sample_weight]) | Returns the mean accuracy on the given test data and labels. |
set_params (**params) | Set the parameters of this estimator. |
__init__(criterion=’gini’, splitter=’best’, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=None, random_state=None, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, class_weight=None, presort=False)
[source]
apply(X, check_input=True)
[source]
Returns the index of the leaf that each sample is predicted as.
New in version 0.17.
Parameters: |
|
---|---|
Returns: |
|
decision_path(X, check_input=True)
[source]
Return the decision path in the tree
New in version 0.18.
Parameters: |
|
---|---|
Returns: |
|
feature_importances_
Return the feature importances.
The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance.
Returns: |
|
---|
fit(X, y, sample_weight=None, check_input=True, X_idx_sorted=None)
[source]
Build a decision tree classifier from the training set (X, y).
Parameters: |
|
---|---|
Returns: |
|
get_params(deep=True)
[source]
Get parameters for this estimator.
Parameters: |
|
---|---|
Returns: |
|
predict(X, check_input=True)
[source]
Predict class or regression value for X.
For a classification model, the predicted class for each sample in X is returned. For a regression model, the predicted value based on X is returned.
Parameters: |
|
---|---|
Returns: |
|
predict_log_proba(X)
[source]
Predict class log-probabilities of the input samples X.
Parameters: |
|
---|---|
Returns: |
|
predict_proba(X, check_input=True)
[source]
Predict class probabilities of the input samples X.
The predicted class probability is the fraction of samples of the same class in a leaf.
check_input : boolean, (default=True)
Parameters: |
|
---|---|
Returns: |
|
score(X, y, sample_weight=None)
[source]
Returns the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
Parameters: |
|
---|---|
Returns: |
|
set_params(**params)
[source]
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter>
so that it’s possible to update each component of a nested object.
Returns: |
|
---|
sklearn.tree.DecisionTreeClassifier
© 2007–2018 The scikit-learn developers
Licensed under the 3-clause BSD License.
http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html