W3cubDocs

DistanceMetric

classsklearn.metrics.DistanceMetric

Uniform interface for fast distance metric functions.

The DistanceMetric class provides a convenient way to compute pairwise distances between samples. It supports various distance metrics, such as Euclidean distance, Manhattan distance, and more.

The pairwise method can be used to compute pairwise distances between samples in the input arrays. It returns a distance matrix representing the distances between all pairs of samples.

The get_metric method allows you to retrieve a specific metric using its string identifier.

Examples

>>> from sklearn.metrics import DistanceMetric
>>> dist = DistanceMetric.get_metric('euclidean')
>>> X = [[1, 2], [3, 4], [5, 6]]
>>> Y = [[7, 8], [9, 10]]
>>> dist.pairwise(X,Y)
array([[7.81..., 10.63...]
       [5.65...,  8.48...]
       [1.41...,  4.24...]])

Available Metrics

The following lists the string metric identifiers and the associated distance metric classes:

Metrics intended for real-valued vector spaces:

identifier	class name	args	distance function
“euclidean”	EuclideanDistance		`sqrt(sum((x - y)^2))`
“manhattan”	ManhattanDistance		`sum(\|x - y\|)`
“chebyshev”	ChebyshevDistance		`max(\|x - y\|)`
“minkowski”	MinkowskiDistance	p, w	`sum(w * \|x - y\|^p)^(1/p)`
“seuclidean”	SEuclideanDistance	V	`sqrt(sum((x - y)^2 / V))`
“mahalanobis”	MahalanobisDistance	V or VI	`sqrt((x - y)' V^-1 (x - y))`

Metrics intended for two-dimensional vector spaces: Note that the haversine distance metric requires data in the form of [latitude, longitude] and both inputs and outputs are in units of radians.

identifier	class name	distance function
“haversine”	HaversineDistance	`2 arcsin(sqrt(sin^2(0.5dx) + cos(x1)cos(x2)sin^2(0.5dy)))`

Metrics intended for integer-valued vector spaces: Though intended for integer-valued vectors, these are also valid metrics in the case of real-valued vectors.

identifier	class name	distance function
“hamming”	HammingDistance	`N_unequal(x, y) / N_tot`
“canberra”	CanberraDistance	`sum(\|x - y\| / (\|x\| + \|y\|))`
“braycurtis”	BrayCurtisDistance	`sum(\|x - y\|) / (sum(\|x\|) + sum(\|y\|))`

Metrics intended for boolean-valued vector spaces: Any nonzero entry is evaluated to “True”. In the listings below, the following abbreviations are used:

N: number of dimensions
NTT: number of dims in which both values are True
NTF: number of dims in which the first value is True, second is False
NFT: number of dims in which the first value is False, second is True
NFF: number of dims in which both values are False
NNEQ: number of non-equal dimensions, NNEQ = NTF + NFT
NNZ: number of nonzero dimensions, NNZ = NTF + NFT + NTT

identifier	class name	distance function
“jaccard”	JaccardDistance	NNEQ / NNZ
“matching”	MatchingDistance	NNEQ / N
“dice”	DiceDistance	NNEQ / (NTT + NNZ)
“kulsinski”	KulsinskiDistance	(NNEQ + N - NTT) / (NNEQ + N)
“rogerstanimoto”	RogersTanimotoDistance	2 * NNEQ / (N + NNEQ)
“russellrao”	RussellRaoDistance	(N - NTT) / N
“sokalmichener”	SokalMichenerDistance	2 * NNEQ / (N + NNEQ)
“sokalsneath”	SokalSneathDistance	NNEQ / (NNEQ + 0.5 * NTT)

User-defined distance:

identifier	class name	args
“pyfunc”	PyFuncDistance	func

Here func is a function which takes two one-dimensional numpy arrays, and returns a distance. Note that in order to be used within the BallTree, the distance must be a true metric: i.e. it must satisfy the following properties

Non-negativity: d(x, y) >= 0
Identity: d(x, y) = 0 if and only if x == y
Symmetry: d(x, y) = d(y, x)
Triangle Inequality: d(x, y) + d(y, z) >= d(x, z)

Because of the Python object overhead involved in calling the python function, this will be fairly slow, but it will have the same scaling as other distances.

classmethodget_metric(metric, dtype=<class 'numpy.float64'>, **kwargs)

Get the given distance metric from the string identifier.

See the docstring of DistanceMetric for a list of available metrics.

Parameters:

metricstr or class name: The string identifier or class name of the desired distance metric. See the documentation of the DistanceMetric class for a list of available metrics.
dtype{np.float32, np.float64}, default=np.float64: The data type of the input on which the metric will be applied. This affects the precision of the computed distances. By default, it is set to np.float64.
**kwargs: Additional keyword arguments that will be passed to the requested metric. These arguments can be used to customize the behavior of the specific metric.

Returns:

metric_objinstance of the requested metric: An instance of the requested distance metric class.

© 2007–2025 The scikit-learn developers
Licensed under the 3-clause BSD License.
https://scikit-learn.org/1.6/modules/generated/sklearn.metrics.DistanceMetric.html