class sklearn.utils.Parallel(n_jobs=None, backend=None, verbose=0, timeout=None, pre_dispatch=‘2 * n_jobs’, batch_size=’auto’, temp_folder=None, max_nbytes=‘1M’, mmap_mode=’r’, prefer=None, require=None)
[source]
Helper class for readable parallel mapping.
Read more in the User Guide.
Parameters: |
|
---|
This object uses workers to compute in parallel the application of a function to many different arguments. The main functionality it brings in addition to using the raw multiprocessing or concurrent.futures API are (see examples for details):
A simple example:
>>> from math import sqrt >>> from sklearn.externals.joblib import Parallel, delayed >>> Parallel(n_jobs=1)(delayed(sqrt)(i**2) for i in range(10)) [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
Reshaping the output when the function has several return values:
>>> from math import modf >>> from sklearn.externals.joblib import Parallel, delayed >>> r = Parallel(n_jobs=1)(delayed(modf)(i/2.) for i in range(10)) >>> res, i = zip(*r) >>> res (0.0, 0.5, 0.0, 0.5, 0.0, 0.5, 0.0, 0.5, 0.0, 0.5) >>> i (0.0, 0.0, 1.0, 1.0, 2.0, 2.0, 3.0, 3.0, 4.0, 4.0)
The progress meter: the higher the value of verbose
, the more messages:
>>> from time import sleep >>> from sklearn.externals.joblib import Parallel, delayed >>> r = Parallel(n_jobs=2, verbose=10)(delayed(sleep)(.2) for _ in range(10)) [Parallel(n_jobs=2)]: Done 1 tasks | elapsed: 0.6s [Parallel(n_jobs=2)]: Done 4 tasks | elapsed: 0.8s [Parallel(n_jobs=2)]: Done 10 out of 10 | elapsed: 1.4s finished
Traceback example, note how the line of the error is indicated as well as the values of the parameter passed to the function that triggered the exception, even though the traceback happens in the child process:
>>> from heapq import nlargest >>> from sklearn.externals.joblib import Parallel, delayed >>> Parallel(n_jobs=2)(delayed(nlargest)(2, n) for n in (range(4), 'abcde', 3)) #... --------------------------------------------------------------------------- Sub-process traceback: --------------------------------------------------------------------------- TypeError Mon Nov 12 11:37:46 2012 PID: 12934 Python 2.7.3: /usr/bin/python ........................................................................... /usr/lib/python2.7/heapq.pyc in nlargest(n=2, iterable=3, key=None) 419 if n >= size: 420 return sorted(iterable, key=key, reverse=True)[:n] 421 422 # When key is none, use simpler decoration 423 if key is None: --> 424 it = izip(iterable, count(0,-1)) # decorate 425 result = _nlargest(n, it) 426 return map(itemgetter(0), result) # undecorate 427 428 # General case, slowest method TypeError: izip argument #1 must support iteration ___________________________________________________________________________
Using pre_dispatch in a producer/consumer situation, where the data is generated on the fly. Note how the producer is first called 3 times before the parallel loop is initiated, and then called to generate new data on the fly:
>>> from math import sqrt >>> from sklearn.externals.joblib import Parallel, delayed >>> def producer(): ... for i in range(6): ... print('Produced %s' % i) ... yield i >>> out = Parallel(n_jobs=2, verbose=100, pre_dispatch='1.5*n_jobs')( ... delayed(sqrt)(i) for i in producer()) Produced 0 Produced 1 Produced 2 [Parallel(n_jobs=2)]: Done 1 jobs | elapsed: 0.0s Produced 3 [Parallel(n_jobs=2)]: Done 2 jobs | elapsed: 0.0s Produced 4 [Parallel(n_jobs=2)]: Done 3 jobs | elapsed: 0.0s Produced 5 [Parallel(n_jobs=2)]: Done 4 jobs | elapsed: 0.0s [Parallel(n_jobs=2)]: Done 6 out of 6 | elapsed: 0.0s remaining: 0.0s [Parallel(n_jobs=2)]: Done 6 out of 6 | elapsed: 0.0s finished
__call__ (iterable) | |
dispatch_next () | Dispatch more data for parallel processing |
dispatch_one_batch (iterator) | Prefetch the tasks for the next batch and dispatch them. |
format (obj[, indent]) | Return the formatted representation of the object. |
print_progress () | Display the process of the parallel execution only a fraction of time, controlled by self.verbose. |
debug | |
retrieve | |
warn |
__init__(n_jobs=None, backend=None, verbose=0, timeout=None, pre_dispatch=‘2 * n_jobs’, batch_size=’auto’, temp_folder=None, max_nbytes=‘1M’, mmap_mode=’r’, prefer=None, require=None)
[source]
dispatch_next()
[source]
Dispatch more data for parallel processing
This method is meant to be called concurrently by the multiprocessing callback. We rely on the thread-safety of dispatch_one_batch to protect against concurrent consumption of the unprotected iterator.
dispatch_one_batch(iterator)
[source]
Prefetch the tasks for the next batch and dispatch them.
The effective size of the batch is computed here. If there are no more jobs to dispatch, return False, else return True.
The iterator consumption and dispatching is protected by the same lock so calling this function should be thread safe.
format(obj, indent=0)
[source]
Return the formatted representation of the object.
print_progress()
[source]
Display the process of the parallel execution only a fraction of time, controlled by self.verbose.
© 2007–2018 The scikit-learn developers
Licensed under the 3-clause BSD License.
http://scikit-learn.org/stable/modules/generated/sklearn.utils.Parallel.html