sklearn.metrics.pairwise_distances_chunked(X, Y=None, reduce_func=None, metric=’euclidean’, n_jobs=None, working_memory=None, **kwds)
[source]
Generate a distance matrix chunk by chunk with optional reduction
In cases where not all of a pairwise distance matrix needs to be stored at once, this is used to calculate pairwise distances in working_memory
-sized chunks. If reduce_func
is given, it is run on each chunk and its return values are concatenated into lists, arrays or sparse matrices.
Parameters: |
|
---|---|
Yields: |
|
Without reduce_func:
>>> X = np.random.RandomState(0).rand(5, 3) >>> D_chunk = next(pairwise_distances_chunked(X)) >>> D_chunk array([[0. ..., 0.29..., 0.41..., 0.19..., 0.57...], [0.29..., 0. ..., 0.57..., 0.41..., 0.76...], [0.41..., 0.57..., 0. ..., 0.44..., 0.90...], [0.19..., 0.41..., 0.44..., 0. ..., 0.51...], [0.57..., 0.76..., 0.90..., 0.51..., 0. ...]])
Retrieve all neighbors and average distance within radius r:
>>> r = .2 >>> def reduce_func(D_chunk, start): ... neigh = [np.flatnonzero(d < r) for d in D_chunk] ... avg_dist = (D_chunk * (D_chunk < r)).mean(axis=1) ... return neigh, avg_dist >>> gen = pairwise_distances_chunked(X, reduce_func=reduce_func) >>> neigh, avg_dist = next(gen) >>> neigh [array([0, 3]), array([1]), array([2]), array([0, 3]), array([4])] >>> avg_dist array([0.039..., 0. , 0. , 0.039..., 0. ])
Where r is defined per sample, we need to make use of start
:
>>> r = [.2, .4, .4, .3, .1] >>> def reduce_func(D_chunk, start): ... neigh = [np.flatnonzero(d < r[i]) ... for i, d in enumerate(D_chunk, start)] ... return neigh >>> neigh = next(pairwise_distances_chunked(X, reduce_func=reduce_func)) >>> neigh [array([0, 3]), array([0, 1]), array([2]), array([0, 3]), array([4])]
Force row-by-row generation by reducing working_memory
:
>>> gen = pairwise_distances_chunked(X, reduce_func=reduce_func, ... working_memory=0) >>> next(gen) [array([0, 3])] >>> next(gen) [array([0, 1])]
© 2007–2018 The scikit-learn developers
Licensed under the 3-clause BSD License.
http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise_distances_chunked.html