seqsim package

Submodules

seqsim.common module

Module for defining common functions and variables used in different circumstances.

This module works as a big repository of all the functions and variables that are used by different methods (such as for the computation of an edit distance using the Wagner-Fischer algorithm), including more low-level and book-keeping functions such as for interfacing with the system.

seqsim.common.collect_subseqs(sequence: Sequence, sort: bool = True)List[Sequence]

Collects all possible sub-sequences in a given sequence.

When sorting is requested, sub-sequences will first be sorted by their length and, later, by comparing one with the other. Mixing types, like strings and integers, can lead to unexpected results and is not suggested if the type cannot be guaranteed.

Note that this function performs simple comprehensions, neither using padding symbols nor the more complex methods n-gram collection methods ultimately based on ngram_iter().

Example

>>> seqsim.common.collect_subseqs('abcde')
['a', 'b', 'c', 'd', 'e', 'ab', 'bc', 'cd', 'de', 'abc', 'bcd', 'cde', 'abcd', 'bcde', 'abcde']
Parameters
  • sequence – The sequence that shall be converted into it’s ngram-representation.

  • sort – Whether to sort the list of ngrams by length and by identity (default: True).

Returns

A list of all ngrams of the input sequence.

seqsim.common.equivalent_string(seq_x: Sequence[collections.abc.Hashable], seq_y: Sequence[collections.abc.Hashable])Tuple[str, str]

Returns a string equivalent to a sequence, for comparison.

As some methods offered by third-party libraries only operate on strings, while seqsim is designed to offer all methods of comparison for generic sequences of hashable elements, in some cases it is necessary to convert a sequence to an equivalent string. Using a normal str conversion is not possible or satisfactory for a number of reasons, including elements not having a string representation, and individual string representations of different lengths and potentially overlapping (consider cases like [1, 12, 123, 23]).

This function accepts a pair of sequences and returns an equivalent textual representation, that is, a pair of strings where the order is preserved and each token is mapped to a single, unique character. While the information in the strings is meaningless, they are built to facilitate inspection and debugging as much as possible, trying to use only ASCII printable characters or Unicode characters that are expected to be supported for visualization in the majority of systems.

If two strings are passed, the same strings will be returned. Note that in case of mixed types (such as a string and a list of strings), strings will be considered sequences of characters and will be modified upon return, as the tokens of the second sequence could be of length over one character (e.g., “abc” and [“a”, “bc”]).

Example

>>> seqsim.common.equivalent_string([1, 2, 3], [1, 2, 4, 5])
('012', '0134')
Parameters
  • seq_x – The first sequence to be mapped to an equivalent string.

  • seq_y – The second sequence to be mapped to an equivalent string.

Returns

A tuple of two strings equivalent, for matters of comparison and distance computation, to the provided sequences.

seqsim.common.sequence_find(hay: Sequence, needle: Sequence)Optional[int]

Return the index for starting index of a sub-sequence within a sequence.

The function is intended to work similarly to the built-in .find() method for Python strings, but accepting all types of sequences (including different types for hay and needle).

Example

>>> seqsim.common.sequence_find([1, 2, 3, 4, 5], [2, 3])
1
Parameters
  • hay – The sequence to be searched within.

  • needle – The sub-sequence to be located in the sequence.

Returns

The starting index of the sub-sequence in the sequence, or None if not found.

seqsim.compression module

Module implementing various methods for similarity and distance from compression methods.

Most of the methods are commonly used in string comparison from normalized compression distance, such as from Arithmetic Coding, but in this module we need to make sure we can operate on arbitrary iterable data structures.

seqsim.compression.arith_ncd(seq_x: Sequence[collections.abc.Hashable], seq_y: Sequence[collections.abc.Hashable], normal: bool = False)float

Computes a distance between two sequences based on Arithmetic Coding.

Example

>>> seqsim.compression.arith_ncd("abc", "bcde")
1.2222222222222223

References

MacKay, D.J.C. (2003), Information Theory, Inference and Learning Algorithms, Cambridge University Press, ISBN 978-0-521-64298-9

Parameters
  • seq_x – The first sequence to be compared.

  • seq_y – The second sequence to be compared.

  • normal – Dummy parameter, see comment above.

Returns

The Arithmetic Coding NCD between the two sequences.

seqsim.compression.entropy_ncd(seq_x: Sequence[collections.abc.Hashable], seq_y: Sequence[collections.abc.Hashable], normal: bool = False)float

Computes a distance between two sequences based on entropy.

Example

>>> seqsim.compression.entropy_ncd("abc", "bcde")
0.21698794996929216

References

MacKay, D.J.C. (2003), Information Theory, Inference and Learning Algorithms, Cambridge University Press, ISBN 978-0-521-64298-9

Shannon, C.E., Weaver, W. (1949) The Mathematical Theory of Communication, Univ of Illinois Press. ISBN 0-252-72548-4

Parameters
  • seq_x – The first sequence to be compared.

  • seq_y – The second sequence to be compared.

  • normal – Dummy parameter, see comment above.

Returns

The Entropy NCD between the two sequences.

seqsim.edit module

Module implementing various methods for similarity and distance from edit methods.

Most of the methods are commonly used in string comparison, such as Levenshtein distance, but in this module we need to make sure we can operate on arbitrary iterable data structures.

seqsim.edit.birnbaum_dist(seq_x: Sequence[collections.abc.Hashable], seq_y: Sequence[collections.abc.Hashable], normal: bool = False)float

Compute the Birnbaum similarity distance with the original method.

This implementation uses the original method we developed following the description in Birnbaum (2003). See comments for birnbaum_simil().

The function accepts the normal parameter to have calls equivalent to those of other methods, but it is redundant as the Jaccard distance is already in range [0..1].

Example

>>> seqsim.edit.birnbaum_dist("abc", "bcde")
0.5

References

Birnbaum, David J. (2003). “Computer-Assisted Analysis and Study of the Structure of Mixed-Content Miscellanies”. Scripta & Scripta 1:15-64.

Parameters
  • seq_x – The first sequence to be compared.

  • seq_y – The second sequence to be compared.

  • normal – Dummy parameter, see comment above.

Returns

The distance between the two sequences. A distance of 0.0 indicates identical sequences, and a distance of 1.0 indicates the maximum theoretical distance between two sequences.

seqsim.edit.birnbaum_simil(seq_x: Sequence[collections.abc.Hashable], seq_y: Sequence[collections.abc.Hashable], normal: bool = False)float

Compute the Birnbaum similarity score with the fast method.

This implementation uses the experimental method we developed following the description in Birnbaum (2003), which is much faster and less memory-intensive than the one implemented in the birnbaum_simil() function. While in most cases the results are comparable, particularly after scaling/normalization, and while the ones provided by this method might be considered more adequate due to their handling of duplicate information, the values are not identical.

Example

>>> seqsim.edit.birnbaum_simil("abc", "bcde")
3.0

References

Birnbaum, David J. (2003). “Computer-Assisted Analysis and Study of the Structure of Mixed-Content Miscellanies”. Scripta & Scripta 1:15-64.

Parameters
  • seq_x – The first sequence to be compared.

  • seq_y – The second sequence to be compared.

  • normal – Whether to normalize the similarity score in range [0..1] using sequence lengths.

Returns

The similarity score between the two sequences. The higher the similarity score, the more similar the two sequences are; a similarity score of zero is the theoretical maximum difference between two sequences.

seqsim.edit.bulk_delete_dist(seq_x: Sequence[collections.abc.Hashable], seq_y: Sequence[collections.abc.Hashable], max_del_len: int = 5, normal: bool = False)float

Compute the “bulk delete” distance between two sequences.

This function will use the standard Wagner-Fischer algorithm with the default costs provided by the internal _levdamerau_costs() function. This distance measure is not used directly in the paper and was a proof-of-concept while working toward the “Stemmatological distance”.

Example

>>> seqsim.edit.bulk_delete_dist("abc", "bcde")
3.0

References

Göransson, Elisabet; Maurits, Luke; Dahlman, Britt; Sarkisian, Karine Å.; Rubenson, Samuel; Dunn, Michael. “Improved distance measures for ‘mixed-content miscellania’ (in prep.).

Parameters
  • seq_x – The first sequence to be compared.

  • seq_y – The second sequence to be compared.

  • max_del_len – The maximum length of deletion block.

  • normal – Whether to normalize the similarity score in range [0..1] using sequence lengths.

Returns

The computed “bulk delete” distance.

seqsim.edit.fast_birnbaum_dist(seq_x: Sequence[collections.abc.Hashable], seq_y: Sequence[collections.abc.Hashable], normal: bool = False)float

Compute the Birnbaum similarity distance with the fast method.

This implementation uses the experimental method we developed following the description in Birnbaum (2003), which is much faster and less memory-intensive than the one implemented in the birnbaum() method. While in most cases the results are comparable, and the ones provided by this method might be considered more adequate due to their handling of duplicate information, the values are not identical.

The function accepts the normal parameter to have calls equivalent to those of other methods, but it is redundant as the Jaccard distance is already in range [0..1].

Example

>>> seqsim.edit.fast_birnbaum_dist("abc", "bcde")
0.5

References

Birnbaum, David J. (2003). “Computer-Assisted Analysis and Study of the Structure of Mixed-Content Miscellanies”. Scripta & Scripta 1:15-64.

Parameters
  • seq_x – The first sequence to be compared.

  • seq_y – The second sequence to be compared.

  • normal – Dummy parameter, see comment above.

Returns

The distance between the two sequences. A distance of 0.0 indicates identical sequences, and a distance of 1.0 indicates the maximum theoretical distance between two sequences.

seqsim.edit.fragile_ends_simil(seq_x: Sequence[collections.abc.Hashable], seq_y: Sequence[collections.abc.Hashable], normal: bool = False)float

Compute the “fragile ends” similarity between two sequences.

The “fragile ends” similarity is defined as equal to the Levenshtein one, but with deletions in the initial or final 10% of the positions being discounted.

This function will use the standard Wagner-Fischer algorithm with the default costs provided by the internal _levdamerau_costs() function. This similarity measure is not used directly in the paper and was a proof-of-concept while working toward the “Stemmatological distance”.

Example

>>> seqsim.edit.fragile_ends_simil("abc", "bcde")
3.0

References

Göransson, Elisabet; Maurits, Luke; Dahlman, Britt; Sarkisian, Karine Å.; Rubenson, Samuel; Dunn, Michael. “Improved distance measures for ‘mixed-content miscellania’ (in prep.).

Parameters
  • seq_x – The first sequence to be compared.

  • seq_y – The second sequence to be compared.

  • normal – Whether to normalize the similarity score in range [0..1] using sequence lengths.

Returns

The computed “fragile ends” similarity.

seqsim.edit.jaro_dist(seq_x: Sequence[collections.abc.Hashable], seq_y: Sequence[collections.abc.Hashable], normal: bool = False)float

Computes the Jaro distance between two sequences.

This function returns the value from the implementation provided by the textdistance library.

The function accepts the normal parameter to have calls equivalent to those of other methods, but it is redundant as the Jaccard distance is already in range [0..1].

Example

>>> seqsim.edit.jaro_dist("abc", "bcde")
0.2777777777777778

References

Jaro, M. A. (1989). “Advances in record linkage methodology as applied to the 1985 census of Tampa Florida”. Journal of the American Statistical Association. 84 (406): 414–20. doi:10.1080/01621459.1989.10478785.

Jaro, M. A. (1995). “Probabilistic linkage of large public health data file”. Statistics in Medicine. 14 (5–7): 491–8. doi:10.1002/sim.4780140510. PMID 7792443.

Winkler, W. E. (1990). “String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage”. Proceedings of the Section on Survey Research Methods. American Statistical Association: 354–359.

Winkler, W. E. (2006). “Overview of Record Linkage and Current Research Directions” (PDF). Research Report Series, RRS.

Parameters
  • seq_x – The first sequence of elements to be compared.

  • seq_y – The second sequence of elements to be compared.

  • normal – Dummy parameter, see comment above.

Returns

The Jaro distance between the two sequences.

seqsim.edit.jaro_winkler_dist(seq_x: Sequence[collections.abc.Hashable], seq_y: Sequence[collections.abc.Hashable], normal: bool = False)float

Computes the Jaro-Winkler distance between two sequences.

This function returns the value from the implementation provided by the textdistance library.

The function accepts the normal parameter to have calls equivalent to those of other methods, but it is redundant as the Jaccard distance is already in range [0..1].

Example

>>> seqsim.edit.jaro_winkler_dist("abc", "bcde")
0.2777777777777778

References

Jaro, M. A. (1989). “Advances in record linkage methodology as applied to the 1985 census of Tampa Florida”. Journal of the American Statistical Association. 84 (406): 414–20. doi:10.1080/01621459.1989.10478785.

Jaro, M. A. (1995). “Probabilistic linkage of large public health data file”. Statistics in Medicine. 14 (5–7): 491–8. doi:10.1002/sim.4780140510. PMID 7792443.

Winkler, W. E. (1990). “String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage”. Proceedings of the Section on Survey Research Methods. American Statistical Association: 354–359.

Winkler, W. E. (2006). “Overview of Record Linkage and Current Research Directions” (PDF). Research Report Series, RRS.

Parameters
  • seq_x – The first sequence of elements to be compared.

  • seq_y – The second sequence of elements to be compared.

  • normal – Dummy parameter, see comment above.

Returns

The Jaro-Winkler distance between the two sequences.

seqsim.edit.levdamerau_dist(seq_x: Sequence[collections.abc.Hashable], seq_y: Sequence[collections.abc.Hashable], normal: bool = False)float

Compute the Damerau-Levenshtein distance between two sequences.

This function will use the standard Wagner-Fischer algorithm with the default costs provided by the internal _levdamerau_costs() function.

>>> seqsim.edit.levdamerau_dist("abc", "bcde")
3.0

References

Damerau, Fred J. (March 1964), “A technique for computer detection and correction of spelling errors”, Communications of the ACM, 7 (3): 171–176, doi:10.1145/363958.363994,

Levenshtein, Vladimir I. (February 1966). “Binary codes capable of correcting deletions, insertions, and reversals”. Soviet Physics Doklady. 10 (8): 707–710

Wagner, Robert A., and Michael J. Fischer. “The string-to-string correction problem.” Journal of the ACM (JACM) 21.1 (1974): 168-173.

Parameters
  • seq_x – The first sequence to be compared.

  • seq_y – The second sequence to be compared.

  • normal – Whether to normalize the similarity score in range [0..1] using sequence lengths.

Returns

The computed Levenshtein distance.

seqsim.edit.levenshtein_dist(seq_x: Sequence[collections.abc.Hashable], seq_y: Sequence[collections.abc.Hashable], normal: bool = False)float

Compute the Levenshtein distance between two sequences.

This function will use the standard Wagner-Fischer algorithm with the default costs provided by the internal _levenshtein_costs() function.

>>> seqsim.edit.levenshtein_dist("abc", "bcde")
3.0

References

Levenshtein, Vladimir I. (February 1966). “Binary codes capable of correcting deletions, insertions, and reversals”. Soviet Physics Doklady. 10 (8): 707–710

Wagner, Robert A., and Michael J. Fischer. “The string-to-string correction problem.” Journal of the ACM (JACM) 21.1 (1974): 168-173.

Parameters
  • seq_x – The first sequence to be compared.

  • seq_y – The second sequence to be compared.

  • normal – Whether to normalize the similarity score in range [0..1] using sequence lengths.

Returns

The computed Levenshtein distance.

seqsim.edit.mmcwpa_dist(seq_x: Sequence[collections.abc.Hashable], seq_y: Sequence[collections.abc.Hashable], normal: bool = False)float

Computes an MMCWPA distance between two sequences.

MMCWPA is the Modifier Moving Contracting Window Pattern Algorithm, modified by Tiago Tresoldi from a method published by Yang et al. (2001). In order to simplify the logic, the function uses the auxiliary internal function _mmcwpa().

The function accepts the normal parameter to have calls equivalent to those of other methods, but it is redundant as the Jaccard distance is already in range [0..1].

Example

>>> seqsim.edit.mmcwpa_dist("abc", "bcde")
0.4285714285714286

References

Tresoldi, Tiago. “Newer method of string comparison: the Modified Moving Contracting Window Pattern Algorithm.” arXiv preprint arXiv:1605.01079 (2016).

Yang, Q. X.; Yuan, Sung S.; Chun, Lu; Zhao, Li; Peng Sun. “Faster Algorithm of String Comparison”, eprint arXiv:cs/0112022, December 2001.

Parameters
  • seq_x – The first sequence of elements to be compared.

  • seq_y – The second sequence of elements to be compared.

  • normal – Dummy parameter, see comment above.

Returns

The MMCWPA distance between the two sequences.

seqsim.edit.stemmatological_simil(seq_x: Sequence[collections.abc.Hashable], seq_y: Sequence[collections.abc.Hashable], frag_start: float = 10.0, frag_end: float = 10.0, max_del_len: int = 5, normal: bool = False)float

Compute the “stemmatological” similarity between two sequences.

This function will use the standard Wagner-Fischer algorithm with the default costs provided by the internal _levdamerau_costs() function. This similarity measure is essentially a combination of the “fragile ends” and “bulk delete” methods, with the first one generalised a little to allow specifying the size of both fragile regions.

Example

>>> seqsim.edit.stemmatological_simil("abc", "bcde")
3.0

References

Göransson, Elisabet; Maurits, Luke; Dahlman, Britt; Sarkisian, Karine Å.; Rubenson, Samuel; Dunn, Michael. “Improved distance measures for ‘mixed-content miscellania’ (in prep.).

Parameters
  • seq_x – The first sequence to be compared.

  • seq_y – The second sequence to be compared.

  • max_del_len – The maximum length of deletion block.

  • frag_start

  • frag_end

.param normal: Whether to normalize the similarity score in range

[0..1] using sequence lengths.

Returns

The computed “stemmatological” similarity.

seqsim.ngrams module

Module for collecting ngrams on sequences of arbitrary elements.

Most of the code in this module follows the original implementation by Tiago Tresoldi for the lingpy library, later moved into the independent lpngram package.

seqsim.ngrams.get_all_ngrams_by_order(sequence, orders=None, pad_symbol='$$$')

Build an iterator for collecting all ngrams of a given set of orders.

If no set of orders (i.e., “lengths”) is provided, this will collect all possible ngrams in the sequence.

Parameters
  • sequence – The sequence from which the ngrams will be collected.

  • orders – An optional list of the orders of the ngrams to be collected. Can be larger than the length of the sequence, in which case the latter will be padded accordingly if requested. Defaults to the collection of all possible ngrams in the sequence with the minimum padding.

  • pad_symbol – An optional symbol to be used as start-of- and end-of-sequence boundaries. The same symbol is used for both boundaries. Must be a value different from None, defaults to “$$$”.

Returns

An iterable over the ngrams of the sequence, returned as tuples.

seqsim.ngrams.ngrams_iter(sequence: Sequence, order: int, pad_symbol: collections.abc.Hashable = '$$$')

Build an iterator for collecting all ngrams of a given order.

The sequence can optionally be padded with boundary symbols which are equal for before and and after sequence boundaries.

Parameters
  • sequence – The sequence from which the ngrams will be collected.

  • order – The order of the ngrams to be collected.

  • pad_symbol – An optional symbol to be used as start-of- and end-of-sequence boundaries. The same symbol is used for both boundaries. Must be a value different from None, defaults to “$$$”.

seqsim.sequence module

Module implementing various methods for similarity and distance from sequence methods.

Most of the methods are commonly used in string comparison, such as RatcliffObershelp, but in this module we need to make sure we can operate on arbitrary iterable data structures.

seqsim.sequence.ratcliff_obershelp(seq_x: Sequence[collections.abc.Hashable], seq_y: Sequence[collections.abc.Hashable], normal: bool = False)float

Computes a distance between two sequences based on the Ratcliff-Obershelp similarity.

The function accepts the normal parameter to have calls equivalent to those of other methods, but it is redundant as the Ratcliff-Obershelp distance is already in range [0..1].

Example

>>> seqsim.sequence.ratcliff_obershelp("abc", "bcde")
0.4285714285714286

References

John W. Ratcliff and David Metzener: Pattern Matching: The Gestalt Approach, Dr. Dobb’s Journal, Issue 46, July 1988

Parameters
  • seq_x – The first sequence to be compared.

  • seq_y – The second sequence to be compared.

  • normal – Dummy parameter, see comment above.

Returns

The Ratcliff-Obershelp distance between the two sequences.

seqsim.token module

Module implementing various methods for similarity and distance from token methods.

Most of the methods are commonly used in string comparison, such as Jaccard index, but in this module we need to make sure we can operate on arbitrary iterable data structures.

seqsim.token.jaccard_dist(seq_x: Sequence[collections.abc.Hashable], seq_y: Sequence[collections.abc.Hashable], normal: bool = False)float

Computes a Jaccard distance between two sequences.

The function accepts the normal parameter to have calls equivalent to those of other methods, but it is redundant as the Jaccard distance is already in range [0..1].

Example

>>> seqsim.token.jaccard_dist("abc", "bcde")
0.6

References

Tan PN, Steinbach M, Kumar V (2005). Introduction to Data Mining. ISBN 0-321-32136-7.

Parameters
  • seq_x – The first sequence to be compared.

  • seq_y – The second sequence to be compared.

  • normal – Dummy parameter, see comment above.

Returns

The Jaccard distance between the two sequences.

seqsim.token.sorensen_dist(seq_x: Sequence[collections.abc.Hashable], seq_y: Sequence[collections.abc.Hashable], normal: bool = False)float

Computes a distance between two sequences based on the Sørensen–Dice coefficient.

The function accepts the normal parameter to have calls equivalent to those of other methods, but it is redundant as the Sørensen–Dice distance is already in range [0..1].

Example

>>> seqsim.sequence.sorensen_dist("abc", "bcde")
0.4285714285714286

References

Kondrak, Grzegorz; Marcu, Daniel; Knight, Kevin (2003). “Cognates Can Improve Statistical ranslation Models” (PDF). Proceedings of HLT-NAACL 2003: Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics. pp. 46–48.

Sørensen, T. (1948). “A method of establishing groups of equal amplitude in plant sociology based on similarity of species and its application to analyses of the vegetation on Danish commons”. Kongelige Danske Videnskabernes Selskab. 5 (4): 1–34.

Parameters
  • seq_x – The first sequence to be compared.

  • seq_y – The second sequence to be compared.

  • normal – Dummy parameter, see comment above.

Returns

The Sørensen–Dice distance between the two sequences.

seqsim.token.subseq_jaccard_dist(seq_x: Sequence[collections.abc.Hashable], seq_y: Sequence[collections.abc.Hashable], normal: bool = False)float

Computes a Jaccard distance between two sequences using sub-sequence occurrence.

The function accepts the normal parameter to have calls equivalent to those of other methods, but it is redundant as the Jaccard distance is already in range [0..1].

Example

>>> seqsim.sequence.subseq_jaccard_dist("abc", "bcde")
0.6857496100000001

References

Tan PN, Steinbach M, Kumar V (2005). Introduction to Data Mining. ISBN 0-321-32136-7.

Parameters
  • seq_x – The first sequence to be compared.

  • seq_y – The second sequence to be compared.

  • normal – Dummy parameter, see comment above.

Returns

The Subseq-Jaccard distance between the two sequences.

Module contents

Main module of the seqsim package.

We follow the mathematical definitions for distinguishing between “similarity” and “distance”, as the latter must have the following properties:

  • positivity: d(x,y) >= 0

  • symmetry: d(x,y) = d(y,x)

  • identity-discerning: d(x,y) = 0 => x = y

  • triangle inequality: d(x,z) <= d(x,y) + d(y,z)

seqsim.distance(seqs: Sequence[Sequence[collections.abc.Hashable]], method: str = 'levenshtein', normal: bool = False)float

Computes the distance between sequences according to a specified method.

This function acts as a wrapper to all the methods offered by the package, including those that are not properly “distances” but measures of “similarity” (that is, those that do not offer all the distance properties). It is intended as a single point of call for all the methods that are offered.

Contrary to the individual methods that accept two sequence as arguments, this wrapper accepts a sequence of sequence, allowing to compute multiple distances.

Examples

>>> seqsim.distance(["abc", "bcde"])
3.0
>>> seqsim.distance(["abc", "bcde", "fgh"])
3.3333333333333335
Parameters
  • seqs – A group of group of hashable elements to be compared. Currently, if more than two sequences are passed, it just returns the mean value of all pairwise comparisons, but this operation might change in the future at least for some methods.

  • method – The method for comparison to be used. The list of methods, and the function they call, can be obtained from the keys of the METHODS dictionary exported by this module. Defaults to “levenshtein”.

  • normal – Whether to return a normalized score for the comparison in range [0..1]. Note that the function will accept a True value for all methods, but not all methods offer normalization and some method always return normalized values. In those cases, the standard value will be returned a warning message will be sent to the standard logger (which can be silenced as usual with the Python logging standard library. Defaults to False.

Returns

The distance score.