class sklearn.feature_extraction.FeatureHasher(n_features=1048576, input_type=’dict’, dtype=<class ‘numpy.float64’>, alternate_sign=True, non_negative=False)
[source]
Implements feature hashing, aka the hashing trick.
This class turns sequences of symbolic feature names (strings) into scipy.sparse matrices, using a hash function to compute the matrix column corresponding to a name. The hash function employed is the signed 32bit version of Murmurhash3.
Feature names of type byte string are used asis. Unicode strings are converted to UTF8 first, but no Unicode normalization is done. Feature values must be (finite) numbers.
This class is a lowmemory alternative to DictVectorizer and CountVectorizer, intended for largescale (online) learning and situations where memory is tight, e.g. when running prediction code on embedded devices.
Read more in the User Guide.
Parameters: 


See also
DictVectorizer
sklearn.preprocessing.OneHotEncoder
>>> from sklearn.feature_extraction import FeatureHasher >>> h = FeatureHasher(n_features=10) >>> D = [{'dog': 1, 'cat':2, 'elephant':4},{'dog': 2, 'run': 5}] >>> f = h.transform(D) >>> f.toarray() array([[ 0., 0., 4., 1., 0., 0., 0., 0., 0., 2.], [ 0., 0., 0., 2., 5., 0., 0., 0., 0., 0.]])
fit ([X, y])  Noop. 
fit_transform (X[, y])  Fit to data, then transform it. 
get_params ([deep])  Get parameters for this estimator. 
set_params (**params)  Set the parameters of this estimator. 
transform (raw_X)  Transform a sequence of instances to a scipy.sparse matrix. 
__init__(n_features=1048576, input_type=’dict’, dtype=<class ‘numpy.float64’>, alternate_sign=True, non_negative=False)
[source]
fit(X=None, y=None)
[source]
Noop.
This method doesn’t do anything. It exists purely for compatibility with the scikitlearn transformer API.
Parameters: 


Returns: 

fit_transform(X, y=None, **fit_params)
[source]
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters: 


Returns: 

get_params(deep=True)
[source]
Get parameters for this estimator.
Parameters: 


Returns: 

set_params(**params)
[source]
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter>
so that it’s possible to update each component of a nested object.
Returns: 


transform(raw_X)
[source]
Transform a sequence of instances to a scipy.sparse matrix.
Parameters: 


Returns: 

sklearn.feature_extraction.FeatureHasher
© 2007–2018 The scikitlearn developers
Licensed under the 3clause BSD License.
http://scikitlearn.org/stable/modules/generated/sklearn.feature_extraction.FeatureHasher.html