class sklearn.ensemble.RandomForestClassifier(n_estimators=’warn’, criterion=’gini’, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=’auto’, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, bootstrap=True, oob_score=False, n_jobs=None, random_state=None, verbose=0, warm_start=False, class_weight=None)
[source]
A random forest classifier.
A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting. The sub-sample size is always the same as the original input sample size but the samples are drawn with replacement if bootstrap=True
(default).
Read more in the User Guide.
Parameters: |
|
---|---|
Attributes: |
|
See also
DecisionTreeClassifier
, ExtraTreesClassifier
The default values for the parameters controlling the size of the trees (e.g. max_depth
, min_samples_leaf
, etc.) lead to fully grown and unpruned trees which can potentially be very large on some data sets. To reduce memory consumption, the complexity and size of the trees should be controlled by setting those parameter values.
The features are always randomly permuted at each split. Therefore, the best found split may vary, even with the same training data, max_features=n_features
and bootstrap=False
, if the improvement of the criterion is identical for several splits enumerated during the search of the best split. To obtain a deterministic behaviour during fitting, random_state
has to be fixed.
[1] |
|
>>> from sklearn.ensemble import RandomForestClassifier >>> from sklearn.datasets import make_classification >>> >>> X, y = make_classification(n_samples=1000, n_features=4, ... n_informative=2, n_redundant=0, ... random_state=0, shuffle=False) >>> clf = RandomForestClassifier(n_estimators=100, max_depth=2, ... random_state=0) >>> clf.fit(X, y) RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini', max_depth=2, max_features='auto', max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=None, oob_score=False, random_state=0, verbose=0, warm_start=False) >>> print(clf.feature_importances_) [0.14205973 0.76664038 0.0282433 0.06305659] >>> print(clf.predict([[0, 0, 0, 0]])) [1]
apply (X) | Apply trees in the forest to X, return leaf indices. |
decision_path (X) | Return the decision path in the forest |
fit (X, y[, sample_weight]) | Build a forest of trees from the training set (X, y). |
get_params ([deep]) | Get parameters for this estimator. |
predict (X) | Predict class for X. |
predict_log_proba (X) | Predict class log-probabilities for X. |
predict_proba (X) | Predict class probabilities for X. |
score (X, y[, sample_weight]) | Returns the mean accuracy on the given test data and labels. |
set_params (**params) | Set the parameters of this estimator. |
__init__(n_estimators=’warn’, criterion=’gini’, max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=’auto’, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, bootstrap=True, oob_score=False, n_jobs=None, random_state=None, verbose=0, warm_start=False, class_weight=None)
[source]
apply(X)
[source]
Apply trees in the forest to X, return leaf indices.
Parameters: |
|
---|---|
Returns: |
|
decision_path(X)
[source]
Return the decision path in the forest
New in version 0.18.
Parameters: |
|
---|---|
Returns: |
|
feature_importances_
Returns: |
|
---|
fit(X, y, sample_weight=None)
[source]
Build a forest of trees from the training set (X, y).
Parameters: |
|
---|---|
Returns: |
|
get_params(deep=True)
[source]
Get parameters for this estimator.
Parameters: |
|
---|---|
Returns: |
|
predict(X)
[source]
Predict class for X.
The predicted class of an input sample is a vote by the trees in the forest, weighted by their probability estimates. That is, the predicted class is the one with highest mean probability estimate across the trees.
Parameters: |
|
---|---|
Returns: |
|
predict_log_proba(X)
[source]
Predict class log-probabilities for X.
The predicted class log-probabilities of an input sample is computed as the log of the mean predicted class probabilities of the trees in the forest.
Parameters: |
|
---|---|
Returns: |
|
predict_proba(X)
[source]
Predict class probabilities for X.
The predicted class probabilities of an input sample are computed as the mean predicted class probabilities of the trees in the forest. The class probability of a single tree is the fraction of samples of the same class in a leaf.
Parameters: |
|
---|---|
Returns: |
|
score(X, y, sample_weight=None)
[source]
Returns the mean accuracy on the given test data and labels.
In multi-label classification, this is the subset accuracy which is a harsh metric since you require for each sample that each label set be correctly predicted.
Parameters: |
|
---|---|
Returns: |
|
set_params(**params)
[source]
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter>
so that it’s possible to update each component of a nested object.
Returns: |
|
---|
sklearn.ensemble.RandomForestClassifier
© 2007–2018 The scikit-learn developers
Licensed under the 3-clause BSD License.
http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html