Time Series cross-validator.
Provides train/test indices to split time series data samples that are observed at fixed time intervals, in train/test sets. In each split, test indices must be higher than before, and thus shuffling in cross validator is inappropriate.
This cross-validation object is a variation of KFold. In the kth split, it returns first k folds as train set and the (k+1)th fold as test set.
Note that unlike standard cross-validation methods, successive training sets are supersets of those that come before them.
Read more in the User Guide.
For visualisation of cross-validation behaviour and comparison between common scikit-learn split methods refer to Visualizing cross-validation behavior in scikit-learn
Added in version 0.18.
Number of splits. Must be at least 2.
Changed in version 0.22: n_splits default value changed from 3 to 5.
Maximum size for a single training set.
Used to limit the size of the test set. Defaults to n_samples // (n_splits + 1), which is the maximum allowed value with gap=0.
Added in version 0.24.
Number of samples to exclude from the end of each train set before the test set.
Added in version 0.24.
The training set has size i * n_samples // (n_splits + 1)
+ n_samples % (n_splits + 1) in the i th split, with a test set of size n_samples//(n_splits + 1) by default, where n_samples is the number of samples. Note that this formula is only valid when test_size and max_train_size are left to their default values.
>>> import numpy as np
>>> from sklearn.model_selection import TimeSeriesSplit
>>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4], [1, 2], [3, 4]])
>>> y = np.array([1, 2, 3, 4, 5, 6])
>>> tscv = TimeSeriesSplit()
>>> print(tscv)
TimeSeriesSplit(gap=0, max_train_size=None, n_splits=5, test_size=None)
>>> for i, (train_index, test_index) in enumerate(tscv.split(X)):
... print(f"Fold {i}:")
... print(f" Train: index={train_index}")
... print(f" Test: index={test_index}")
Fold 0:
Train: index=[0]
Test: index=[1]
Fold 1:
Train: index=[0 1]
Test: index=[2]
Fold 2:
Train: index=[0 1 2]
Test: index=[3]
Fold 3:
Train: index=[0 1 2 3]
Test: index=[4]
Fold 4:
Train: index=[0 1 2 3 4]
Test: index=[5]
>>> # Fix test_size to 2 with 12 samples
>>> X = np.random.randn(12, 2)
>>> y = np.random.randint(0, 2, 12)
>>> tscv = TimeSeriesSplit(n_splits=3, test_size=2)
>>> for i, (train_index, test_index) in enumerate(tscv.split(X)):
... print(f"Fold {i}:")
... print(f" Train: index={train_index}")
... print(f" Test: index={test_index}")
Fold 0:
Train: index=[0 1 2 3 4 5]
Test: index=[6 7]
Fold 1:
Train: index=[0 1 2 3 4 5 6 7]
Test: index=[8 9]
Fold 2:
Train: index=[0 1 2 3 4 5 6 7 8 9]
Test: index=[10 11]
>>> # Add in a 2 period gap
>>> tscv = TimeSeriesSplit(n_splits=3, test_size=2, gap=2)
>>> for i, (train_index, test_index) in enumerate(tscv.split(X)):
... print(f"Fold {i}:")
... print(f" Train: index={train_index}")
... print(f" Test: index={test_index}")
Fold 0:
Train: index=[0 1 2 3]
Test: index=[6 7]
Fold 1:
Train: index=[0 1 2 3 4 5]
Test: index=[8 9]
Fold 2:
Train: index=[0 1 2 3 4 5 6 7]
Test: index=[10 11]
For a more extended example see Time-related feature engineering.
Get metadata routing of this object.
Please check User Guide on how the routing mechanism works.
A MetadataRequest encapsulating routing information.
Returns the number of splitting iterations in the cross-validator.
Always ignored, exists for compatibility.
Always ignored, exists for compatibility.
Always ignored, exists for compatibility.
Returns the number of splitting iterations in the cross-validator.
Generate indices to split data into training and test set.
Training data, where n_samples is the number of samples and n_features is the number of features.
Always ignored, exists for compatibility.
Always ignored, exists for compatibility.
The training set indices for that split.
The testing set indices for that split.
© 2007–2025 The scikit-learn developers
Licensed under the 3-clause BSD License.
https://scikit-learn.org/1.6/modules/generated/sklearn.model_selection.TimeSeriesSplit.html