class sklearn.cluster.DBSCAN(eps=0.5, min_samples=5, metric=’euclidean’, metric_params=None, algorithm=’auto’, leaf_size=30, p=None, n_jobs=None)
[source]
Perform DBSCAN clustering from vector array or distance matrix.
DBSCAN - Density-Based Spatial Clustering of Applications with Noise. Finds core samples of high density and expands clusters from them. Good for data which contains clusters of similar density.
Read more in the User Guide.
Parameters: |
|
---|---|
Attributes: |
|
For an example, see examples/cluster/plot_dbscan.py.
This implementation bulk-computes all neighborhood queries, which increases the memory complexity to O(n.d) where d is the average number of neighbors, while original DBSCAN had memory complexity O(n). It may attract a higher memory complexity when querying these nearest neighborhoods, depending on the algorithm
.
One way to avoid the query complexity is to pre-compute sparse neighborhoods in chunks using NearestNeighbors.radius_neighbors_graph
with mode='distance'
, then using metric='precomputed'
here.
Another way to reduce memory and computation time is to remove (near-)duplicate points and use sample_weight
instead.
Ester, M., H. P. Kriegel, J. Sander, and X. Xu, “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise”. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, Portland, OR, AAAI Press, pp. 226-231. 1996
>>> from sklearn.cluster import DBSCAN >>> import numpy as np >>> X = np.array([[1, 2], [2, 2], [2, 3], ... [8, 7], [8, 8], [25, 80]]) >>> clustering = DBSCAN(eps=3, min_samples=2).fit(X) >>> clustering.labels_ array([ 0, 0, 0, 1, 1, -1]) >>> clustering DBSCAN(algorithm='auto', eps=3, leaf_size=30, metric='euclidean', metric_params=None, min_samples=2, n_jobs=None, p=None)
fit (X[, y, sample_weight]) | Perform DBSCAN clustering from features or distance matrix. |
fit_predict (X[, y, sample_weight]) | Performs clustering on X and returns cluster labels. |
get_params ([deep]) | Get parameters for this estimator. |
set_params (**params) | Set the parameters of this estimator. |
__init__(eps=0.5, min_samples=5, metric=’euclidean’, metric_params=None, algorithm=’auto’, leaf_size=30, p=None, n_jobs=None)
[source]
fit(X, y=None, sample_weight=None)
[source]
Perform DBSCAN clustering from features or distance matrix.
Parameters: |
|
---|
fit_predict(X, y=None, sample_weight=None)
[source]
Performs clustering on X and returns cluster labels.
Parameters: |
|
---|---|
Returns: |
|
get_params(deep=True)
[source]
Get parameters for this estimator.
Parameters: |
|
---|---|
Returns: |
|
set_params(**params)
[source]
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter>
so that it’s possible to update each component of a nested object.
Returns: |
|
---|
sklearn.cluster.DBSCAN
© 2007–2018 The scikit-learn developers
Licensed under the 3-clause BSD License.
http://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html