W3cubDocs

sklearn.model_selection.train_test_split

sklearn.model_selection.train_test_split(*arrays, **options) [source]

Split arrays or matrices into random train and test subsets

Quick utility that wraps input validation and next(ShuffleSplit().split(X, y)) and application to input data into a single call for splitting (and optionally subsampling) data in a oneliner.

Parameters:	`*arrays : sequence of indexables with same length / shape[0]` Allowed inputs are lists, numpy arrays, scipy-sparse matrices or pandas dataframes. `test_size : float, int or None, optional (default=0.25)` If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. If None, the value is set to the complement of the train size. By default, the value is set to 0.25. The default will change in version 0.21. It will remain 0.25 only if `train_size` is unspecified, otherwise it will complement the specified `train_size`. `train_size : float, int, or None, (default=None)` If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. If int, represents the absolute number of train samples. If None, the value is automatically set to the complement of the test size. `random_state : int, RandomState instance or None, optional (default=None)` If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by `np.random`. `shuffle : boolean, optional (default=True)` Whether or not to shuffle the data before splitting. If shuffle=False then stratify must be None. `stratify : array-like or None (default=None)` If not None, data is split in a stratified fashion, using this as the class labels.
Returns:	`splitting : list, length=2 * len(arrays)` List containing train-test split of inputs. New in version 0.16: If the input is sparse, the output will be a `scipy.sparse.csr_matrix`. Else, output type is the same as the input type.

Examples

>>> import numpy as np
>>> from sklearn.model_selection import train_test_split
>>> X, y = np.arange(10).reshape((5, 2)), range(5)
>>> X
array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])
>>> list(y)
[0, 1, 2, 3, 4]

>>> X_train, X_test, y_train, y_test = train_test_split(
...     X, y, test_size=0.33, random_state=42)
...
>>> X_train
array([[4, 5],
       [0, 1],
       [6, 7]])
>>> y_train
[2, 0, 3]
>>> X_test
array([[2, 3],
       [8, 9]])
>>> y_test
[1, 4]

>>> train_test_split(y, shuffle=False)
[[0, 1, 2], [3, 4]]

Examples using `sklearn.model_selection.train_test_split`

../../_images/sphx_glr_plot_face_recognition_thumb.png

Faces recognition example using eigenfaces and SVMs

Prediction Latency

../../_images/sphx_glr_plot_calibration_curve_thumb.png

Probability Calibration curves

../../_images/sphx_glr_plot_calibration_thumb.png

Probability calibration of classifiers

Classifier comparison

../../_images/sphx_glr_plot_column_transformer_mixed_types_thumb.png

Column Transformer with Mixed Types

../../_images/sphx_glr_plot_transformed_target_thumb.png

Effect of transforming the targets in regression model

../../_images/sphx_glr_plot_random_forest_regression_multioutput_thumb.png

Comparing random forests and the multi-output meta estimator

../../_images/sphx_glr_plot_partial_dependence_thumb.png

Partial Dependence Plots

../../_images/sphx_glr_plot_gradient_boosting_early_stopping_thumb.png

Early stopping of Gradient Boosting

../../_images/sphx_glr_plot_feature_transformation_thumb.png

Feature transformations with ensembles of trees

../../_images/sphx_glr_plot_gradient_boosting_oob_thumb.png

Gradient Boosting Out-of-Bag estimates

../../_images/sphx_glr_plot_feature_selection_pipeline_thumb.png

Pipeline Anova SVM

../../_images/sphx_glr_plot_sgd_comparison_thumb.png

Comparing various online solvers

../../_images/sphx_glr_plot_sparse_logistic_regression_mnist_thumb.png

MNIST classfification using multinomial logistic + L1

../../_images/sphx_glr_plot_sparse_logistic_regression_20newsgroups_thumb.png

Multiclass sparse logisitic regression on newgroups20

../../_images/sphx_glr_plot_sgd_early_stopping_thumb.png

Early stopping of Stochastic Gradient Descent

../../_images/sphx_glr_plot_grid_search_digits_thumb.png

Parameter estimation using grid search with cross-validation

Confusion matrix

../../_images/sphx_glr_plot_roc_thumb.png

Receiver Operating Characteristic (ROC)

Precision-Recall

Classifier Chain

../../_images/sphx_glr_plot_mlp_alpha_thumb.png

Varying regularization in Multi-layer Perceptron

../../_images/sphx_glr_plot_rbm_logistic_classification_thumb.png

Restricted Boltzmann Machine features for digit classification

../../_images/sphx_glr_plot_function_transformer_thumb.png

Using FunctionTransformer to select columns

../../_images/sphx_glr_plot_scaling_importance_thumb.png

Importance of Feature Scaling

../../_images/sphx_glr_plot_map_data_to_normal_thumb.png

Map data to a normal distribution

../../_images/sphx_glr_plot_discretization_classification_thumb.png

Feature discretization

../../_images/sphx_glr_plot_unveil_tree_structure_thumb.png

Understanding the decision tree structure

© 2007–2018 The scikit-learn developers
Licensed under the 3-clause BSD License.
http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

sklearn.model_selection.train_test_split

Examples

Examples using sklearn.model_selection.train_test_split

Examples using `sklearn.model_selection.train_test_split`