seqsim package¶
Submodules¶
seqsim.common module¶
Module for defining common functions and variables used in different circumstances.
This module works as a big repository of all the functions and variables that are used by different methods (such as for the computation of an edit distance using the Wagner-Fischer algorithm), including more low-level and book-keeping functions such as for interfacing with the system.
-
seqsim.common.
collect_subseqs
(sequence: Sequence, sort: bool = True) → List[Sequence]¶ Collects all possible sub-sequences in a given sequence.
When sorting is requested, sub-sequences will first be sorted by their length and, later, by comparing one with the other. Mixing types, like strings and integers, can lead to unexpected results and is not suggested if the type cannot be guaranteed.
Note that this function performs simple comprehensions, neither using padding symbols nor the more complex methods n-gram collection methods ultimately based on ngram_iter().
Example
>>> seqsim.common.collect_subseqs('abcde') ['a', 'b', 'c', 'd', 'e', 'ab', 'bc', 'cd', 'de', 'abc', 'bcd', 'cde', 'abcd', 'bcde', 'abcde']
- Parameters
sequence – The sequence that shall be converted into it’s ngram-representation.
sort – Whether to sort the list of ngrams by length and by identity (default: True).
- Returns
A list of all ngrams of the input sequence.
-
seqsim.common.
equivalent_string
(seq_x: Sequence[collections.abc.Hashable], seq_y: Sequence[collections.abc.Hashable]) → Tuple[str, str]¶ Returns a string equivalent to a sequence, for comparison.
As some methods offered by third-party libraries only operate on strings, while seqsim is designed to offer all methods of comparison for generic sequences of hashable elements, in some cases it is necessary to convert a sequence to an equivalent string. Using a normal str conversion is not possible or satisfactory for a number of reasons, including elements not having a string representation, and individual string representations of different lengths and potentially overlapping (consider cases like [1, 12, 123, 23]).
This function accepts a pair of sequences and returns an equivalent textual representation, that is, a pair of strings where the order is preserved and each token is mapped to a single, unique character. While the information in the strings is meaningless, they are built to facilitate inspection and debugging as much as possible, trying to use only ASCII printable characters or Unicode characters that are expected to be supported for visualization in the majority of systems.
If two strings are passed, the same strings will be returned. Note that in case of mixed types (such as a string and a list of strings), strings will be considered sequences of characters and will be modified upon return, as the tokens of the second sequence could be of length over one character (e.g., “abc” and [“a”, “bc”]).
Example
>>> seqsim.common.equivalent_string([1, 2, 3], [1, 2, 4, 5]) ('012', '0134')
- Parameters
seq_x – The first sequence to be mapped to an equivalent string.
seq_y – The second sequence to be mapped to an equivalent string.
- Returns
A tuple of two strings equivalent, for matters of comparison and distance computation, to the provided sequences.
-
seqsim.common.
sequence_find
(hay: Sequence, needle: Sequence) → Optional[int]¶ Return the index for starting index of a sub-sequence within a sequence.
The function is intended to work similarly to the built-in .find() method for Python strings, but accepting all types of sequences (including different types for hay and needle).
Example
>>> seqsim.common.sequence_find([1, 2, 3, 4, 5], [2, 3]) 1
- Parameters
hay – The sequence to be searched within.
needle – The sub-sequence to be located in the sequence.
- Returns
The starting index of the sub-sequence in the sequence, or None if not found.
seqsim.compression module¶
Module implementing various methods for similarity and distance from compression methods.
Most of the methods are commonly used in string comparison from normalized compression distance, such as from Arithmetic Coding, but in this module we need to make sure we can operate on arbitrary iterable data structures.
-
seqsim.compression.
arith_ncd
(seq_x: Sequence[collections.abc.Hashable], seq_y: Sequence[collections.abc.Hashable], normal: bool = False) → float¶ Computes a distance between two sequences based on Arithmetic Coding.
Example
>>> seqsim.compression.arith_ncd("abc", "bcde") 1.2222222222222223
References
MacKay, D.J.C. (2003), Information Theory, Inference and Learning Algorithms, Cambridge University Press, ISBN 978-0-521-64298-9
- Parameters
seq_x – The first sequence to be compared.
seq_y – The second sequence to be compared.
normal – Dummy parameter, see comment above.
- Returns
The Arithmetic Coding NCD between the two sequences.
-
seqsim.compression.
entropy_ncd
(seq_x: Sequence[collections.abc.Hashable], seq_y: Sequence[collections.abc.Hashable], normal: bool = False) → float¶ Computes a distance between two sequences based on entropy.
Example
>>> seqsim.compression.entropy_ncd("abc", "bcde") 0.21698794996929216
References
MacKay, D.J.C. (2003), Information Theory, Inference and Learning Algorithms, Cambridge University Press, ISBN 978-0-521-64298-9
Shannon, C.E., Weaver, W. (1949) The Mathematical Theory of Communication, Univ of Illinois Press. ISBN 0-252-72548-4
- Parameters
seq_x – The first sequence to be compared.
seq_y – The second sequence to be compared.
normal – Dummy parameter, see comment above.
- Returns
The Entropy NCD between the two sequences.
seqsim.edit module¶
Module implementing various methods for similarity and distance from edit methods.
Most of the methods are commonly used in string comparison, such as Levenshtein distance, but in this module we need to make sure we can operate on arbitrary iterable data structures.
-
seqsim.edit.
birnbaum_dist
(seq_x: Sequence[collections.abc.Hashable], seq_y: Sequence[collections.abc.Hashable], normal: bool = False) → float¶ Compute the Birnbaum similarity distance with the original method.
This implementation uses the original method we developed following the description in Birnbaum (2003). See comments for birnbaum_simil().
The function accepts the normal parameter to have calls equivalent to those of other methods, but it is redundant as the Jaccard distance is already in range [0..1].
Example
>>> seqsim.edit.birnbaum_dist("abc", "bcde") 0.5
References
Birnbaum, David J. (2003). “Computer-Assisted Analysis and Study of the Structure of Mixed-Content Miscellanies”. Scripta & Scripta 1:15-64.
- Parameters
seq_x – The first sequence to be compared.
seq_y – The second sequence to be compared.
normal – Dummy parameter, see comment above.
- Returns
The distance between the two sequences. A distance of 0.0 indicates identical sequences, and a distance of 1.0 indicates the maximum theoretical distance between two sequences.
-
seqsim.edit.
birnbaum_simil
(seq_x: Sequence[collections.abc.Hashable], seq_y: Sequence[collections.abc.Hashable], normal: bool = False) → float¶ Compute the Birnbaum similarity score with the fast method.
This implementation uses the experimental method we developed following the description in Birnbaum (2003), which is much faster and less memory-intensive than the one implemented in the birnbaum_simil() function. While in most cases the results are comparable, particularly after scaling/normalization, and while the ones provided by this method might be considered more adequate due to their handling of duplicate information, the values are not identical.
Example
>>> seqsim.edit.birnbaum_simil("abc", "bcde") 3.0
References
Birnbaum, David J. (2003). “Computer-Assisted Analysis and Study of the Structure of Mixed-Content Miscellanies”. Scripta & Scripta 1:15-64.
- Parameters
seq_x – The first sequence to be compared.
seq_y – The second sequence to be compared.
normal – Whether to normalize the similarity score in range [0..1] using sequence lengths.
- Returns
The similarity score between the two sequences. The higher the similarity score, the more similar the two sequences are; a similarity score of zero is the theoretical maximum difference between two sequences.
-
seqsim.edit.
bulk_delete_dist
(seq_x: Sequence[collections.abc.Hashable], seq_y: Sequence[collections.abc.Hashable], max_del_len: int = 5, normal: bool = False) → float¶ Compute the “bulk delete” distance between two sequences.
This function will use the standard Wagner-Fischer algorithm with the default costs provided by the internal _levdamerau_costs() function. This distance measure is not used directly in the paper and was a proof-of-concept while working toward the “Stemmatological distance”.
Example
>>> seqsim.edit.bulk_delete_dist("abc", "bcde") 3.0
References
Göransson, Elisabet; Maurits, Luke; Dahlman, Britt; Sarkisian, Karine Å.; Rubenson, Samuel; Dunn, Michael. “Improved distance measures for ‘mixed-content miscellania’ (in prep.).
- Parameters
seq_x – The first sequence to be compared.
seq_y – The second sequence to be compared.
max_del_len – The maximum length of deletion block.
normal – Whether to normalize the similarity score in range [0..1] using sequence lengths.
- Returns
The computed “bulk delete” distance.
-
seqsim.edit.
fast_birnbaum_dist
(seq_x: Sequence[collections.abc.Hashable], seq_y: Sequence[collections.abc.Hashable], normal: bool = False) → float¶ Compute the Birnbaum similarity distance with the fast method.
This implementation uses the experimental method we developed following the description in Birnbaum (2003), which is much faster and less memory-intensive than the one implemented in the birnbaum() method. While in most cases the results are comparable, and the ones provided by this method might be considered more adequate due to their handling of duplicate information, the values are not identical.
The function accepts the normal parameter to have calls equivalent to those of other methods, but it is redundant as the Jaccard distance is already in range [0..1].
Example
>>> seqsim.edit.fast_birnbaum_dist("abc", "bcde") 0.5
References
Birnbaum, David J. (2003). “Computer-Assisted Analysis and Study of the Structure of Mixed-Content Miscellanies”. Scripta & Scripta 1:15-64.
- Parameters
seq_x – The first sequence to be compared.
seq_y – The second sequence to be compared.
normal – Dummy parameter, see comment above.
- Returns
The distance between the two sequences. A distance of 0.0 indicates identical sequences, and a distance of 1.0 indicates the maximum theoretical distance between two sequences.
-
seqsim.edit.
fragile_ends_simil
(seq_x: Sequence[collections.abc.Hashable], seq_y: Sequence[collections.abc.Hashable], normal: bool = False) → float¶ Compute the “fragile ends” similarity between two sequences.
The “fragile ends” similarity is defined as equal to the Levenshtein one, but with deletions in the initial or final 10% of the positions being discounted.
This function will use the standard Wagner-Fischer algorithm with the default costs provided by the internal _levdamerau_costs() function. This similarity measure is not used directly in the paper and was a proof-of-concept while working toward the “Stemmatological distance”.
Example
>>> seqsim.edit.fragile_ends_simil("abc", "bcde") 3.0
References
Göransson, Elisabet; Maurits, Luke; Dahlman, Britt; Sarkisian, Karine Å.; Rubenson, Samuel; Dunn, Michael. “Improved distance measures for ‘mixed-content miscellania’ (in prep.).
- Parameters
seq_x – The first sequence to be compared.
seq_y – The second sequence to be compared.
normal – Whether to normalize the similarity score in range [0..1] using sequence lengths.
- Returns
The computed “fragile ends” similarity.
-
seqsim.edit.
jaro_dist
(seq_x: Sequence[collections.abc.Hashable], seq_y: Sequence[collections.abc.Hashable], normal: bool = False) → float¶ Computes the Jaro distance between two sequences.
This function returns the value from the implementation provided by the textdistance library.
The function accepts the normal parameter to have calls equivalent to those of other methods, but it is redundant as the Jaccard distance is already in range [0..1].
Example
>>> seqsim.edit.jaro_dist("abc", "bcde") 0.2777777777777778
References
Jaro, M. A. (1989). “Advances in record linkage methodology as applied to the 1985 census of Tampa Florida”. Journal of the American Statistical Association. 84 (406): 414–20. doi:10.1080/01621459.1989.10478785.
Jaro, M. A. (1995). “Probabilistic linkage of large public health data file”. Statistics in Medicine. 14 (5–7): 491–8. doi:10.1002/sim.4780140510. PMID 7792443.
Winkler, W. E. (1990). “String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage”. Proceedings of the Section on Survey Research Methods. American Statistical Association: 354–359.
Winkler, W. E. (2006). “Overview of Record Linkage and Current Research Directions” (PDF). Research Report Series, RRS.
- Parameters
seq_x – The first sequence of elements to be compared.
seq_y – The second sequence of elements to be compared.
normal – Dummy parameter, see comment above.
- Returns
The Jaro distance between the two sequences.
-
seqsim.edit.
jaro_winkler_dist
(seq_x: Sequence[collections.abc.Hashable], seq_y: Sequence[collections.abc.Hashable], normal: bool = False) → float¶ Computes the Jaro-Winkler distance between two sequences.
This function returns the value from the implementation provided by the textdistance library.
The function accepts the normal parameter to have calls equivalent to those of other methods, but it is redundant as the Jaccard distance is already in range [0..1].
Example
>>> seqsim.edit.jaro_winkler_dist("abc", "bcde") 0.2777777777777778
References
Jaro, M. A. (1989). “Advances in record linkage methodology as applied to the 1985 census of Tampa Florida”. Journal of the American Statistical Association. 84 (406): 414–20. doi:10.1080/01621459.1989.10478785.
Jaro, M. A. (1995). “Probabilistic linkage of large public health data file”. Statistics in Medicine. 14 (5–7): 491–8. doi:10.1002/sim.4780140510. PMID 7792443.
Winkler, W. E. (1990). “String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage”. Proceedings of the Section on Survey Research Methods. American Statistical Association: 354–359.
Winkler, W. E. (2006). “Overview of Record Linkage and Current Research Directions” (PDF). Research Report Series, RRS.
- Parameters
seq_x – The first sequence of elements to be compared.
seq_y – The second sequence of elements to be compared.
normal – Dummy parameter, see comment above.
- Returns
The Jaro-Winkler distance between the two sequences.
-
seqsim.edit.
levdamerau_dist
(seq_x: Sequence[collections.abc.Hashable], seq_y: Sequence[collections.abc.Hashable], normal: bool = False) → float¶ Compute the Damerau-Levenshtein distance between two sequences.
This function will use the standard Wagner-Fischer algorithm with the default costs provided by the internal _levdamerau_costs() function.
>>> seqsim.edit.levdamerau_dist("abc", "bcde") 3.0
References
Damerau, Fred J. (March 1964), “A technique for computer detection and correction of spelling errors”, Communications of the ACM, 7 (3): 171–176, doi:10.1145/363958.363994,
Levenshtein, Vladimir I. (February 1966). “Binary codes capable of correcting deletions, insertions, and reversals”. Soviet Physics Doklady. 10 (8): 707–710
Wagner, Robert A., and Michael J. Fischer. “The string-to-string correction problem.” Journal of the ACM (JACM) 21.1 (1974): 168-173.
- Parameters
seq_x – The first sequence to be compared.
seq_y – The second sequence to be compared.
normal – Whether to normalize the similarity score in range [0..1] using sequence lengths.
- Returns
The computed Levenshtein distance.
-
seqsim.edit.
levenshtein_dist
(seq_x: Sequence[collections.abc.Hashable], seq_y: Sequence[collections.abc.Hashable], normal: bool = False) → float¶ Compute the Levenshtein distance between two sequences.
This function will use the standard Wagner-Fischer algorithm with the default costs provided by the internal _levenshtein_costs() function.
>>> seqsim.edit.levenshtein_dist("abc", "bcde") 3.0
References
Levenshtein, Vladimir I. (February 1966). “Binary codes capable of correcting deletions, insertions, and reversals”. Soviet Physics Doklady. 10 (8): 707–710
Wagner, Robert A., and Michael J. Fischer. “The string-to-string correction problem.” Journal of the ACM (JACM) 21.1 (1974): 168-173.
- Parameters
seq_x – The first sequence to be compared.
seq_y – The second sequence to be compared.
normal – Whether to normalize the similarity score in range [0..1] using sequence lengths.
- Returns
The computed Levenshtein distance.
-
seqsim.edit.
mmcwpa_dist
(seq_x: Sequence[collections.abc.Hashable], seq_y: Sequence[collections.abc.Hashable], normal: bool = False) → float¶ Computes an MMCWPA distance between two sequences.
MMCWPA is the Modifier Moving Contracting Window Pattern Algorithm, modified by Tiago Tresoldi from a method published by Yang et al. (2001). In order to simplify the logic, the function uses the auxiliary internal function _mmcwpa().
The function accepts the normal parameter to have calls equivalent to those of other methods, but it is redundant as the Jaccard distance is already in range [0..1].
Example
>>> seqsim.edit.mmcwpa_dist("abc", "bcde") 0.4285714285714286
References
Tresoldi, Tiago. “Newer method of string comparison: the Modified Moving Contracting Window Pattern Algorithm.” arXiv preprint arXiv:1605.01079 (2016).
Yang, Q. X.; Yuan, Sung S.; Chun, Lu; Zhao, Li; Peng Sun. “Faster Algorithm of String Comparison”, eprint arXiv:cs/0112022, December 2001.
- Parameters
seq_x – The first sequence of elements to be compared.
seq_y – The second sequence of elements to be compared.
normal – Dummy parameter, see comment above.
- Returns
The MMCWPA distance between the two sequences.
-
seqsim.edit.
stemmatological_simil
(seq_x: Sequence[collections.abc.Hashable], seq_y: Sequence[collections.abc.Hashable], frag_start: float = 10.0, frag_end: float = 10.0, max_del_len: int = 5, normal: bool = False) → float¶ Compute the “stemmatological” similarity between two sequences.
This function will use the standard Wagner-Fischer algorithm with the default costs provided by the internal _levdamerau_costs() function. This similarity measure is essentially a combination of the “fragile ends” and “bulk delete” methods, with the first one generalised a little to allow specifying the size of both fragile regions.
Example
>>> seqsim.edit.stemmatological_simil("abc", "bcde") 3.0
References
Göransson, Elisabet; Maurits, Luke; Dahlman, Britt; Sarkisian, Karine Å.; Rubenson, Samuel; Dunn, Michael. “Improved distance measures for ‘mixed-content miscellania’ (in prep.).
- Parameters
seq_x – The first sequence to be compared.
seq_y – The second sequence to be compared.
max_del_len – The maximum length of deletion block.
frag_start –
frag_end –
- .param normal: Whether to normalize the similarity score in range
[0..1] using sequence lengths.
- Returns
The computed “stemmatological” similarity.
seqsim.ngrams module¶
Module for collecting ngrams on sequences of arbitrary elements.
Most of the code in this module follows the original implementation by Tiago Tresoldi for the lingpy library, later moved into the independent lpngram package.
-
seqsim.ngrams.
get_all_ngrams_by_order
(sequence, orders=None, pad_symbol='$$$')¶ Build an iterator for collecting all ngrams of a given set of orders.
If no set of orders (i.e., “lengths”) is provided, this will collect all possible ngrams in the sequence.
- Parameters
sequence – The sequence from which the ngrams will be collected.
orders – An optional list of the orders of the ngrams to be collected. Can be larger than the length of the sequence, in which case the latter will be padded accordingly if requested. Defaults to the collection of all possible ngrams in the sequence with the minimum padding.
pad_symbol – An optional symbol to be used as start-of- and end-of-sequence boundaries. The same symbol is used for both boundaries. Must be a value different from None, defaults to “$$$”.
- Returns
An iterable over the ngrams of the sequence, returned as tuples.
-
seqsim.ngrams.
ngrams_iter
(sequence: Sequence, order: int, pad_symbol: collections.abc.Hashable = '$$$')¶ Build an iterator for collecting all ngrams of a given order.
The sequence can optionally be padded with boundary symbols which are equal for before and and after sequence boundaries.
- Parameters
sequence – The sequence from which the ngrams will be collected.
order – The order of the ngrams to be collected.
pad_symbol – An optional symbol to be used as start-of- and end-of-sequence boundaries. The same symbol is used for both boundaries. Must be a value different from None, defaults to “$$$”.
seqsim.sequence module¶
Module implementing various methods for similarity and distance from sequence methods.
Most of the methods are commonly used in string comparison, such as RatcliffObershelp, but in this module we need to make sure we can operate on arbitrary iterable data structures.
-
seqsim.sequence.
ratcliff_obershelp
(seq_x: Sequence[collections.abc.Hashable], seq_y: Sequence[collections.abc.Hashable], normal: bool = False) → float¶ Computes a distance between two sequences based on the Ratcliff-Obershelp similarity.
The function accepts the normal parameter to have calls equivalent to those of other methods, but it is redundant as the Ratcliff-Obershelp distance is already in range [0..1].
Example
>>> seqsim.sequence.ratcliff_obershelp("abc", "bcde") 0.4285714285714286
References
John W. Ratcliff and David Metzener: Pattern Matching: The Gestalt Approach, Dr. Dobb’s Journal, Issue 46, July 1988
- Parameters
seq_x – The first sequence to be compared.
seq_y – The second sequence to be compared.
normal – Dummy parameter, see comment above.
- Returns
The Ratcliff-Obershelp distance between the two sequences.
seqsim.token module¶
Module implementing various methods for similarity and distance from token methods.
Most of the methods are commonly used in string comparison, such as Jaccard index, but in this module we need to make sure we can operate on arbitrary iterable data structures.
-
seqsim.token.
jaccard_dist
(seq_x: Sequence[collections.abc.Hashable], seq_y: Sequence[collections.abc.Hashable], normal: bool = False) → float¶ Computes a Jaccard distance between two sequences.
The function accepts the normal parameter to have calls equivalent to those of other methods, but it is redundant as the Jaccard distance is already in range [0..1].
Example
>>> seqsim.token.jaccard_dist("abc", "bcde") 0.6
References
Tan PN, Steinbach M, Kumar V (2005). Introduction to Data Mining. ISBN 0-321-32136-7.
- Parameters
seq_x – The first sequence to be compared.
seq_y – The second sequence to be compared.
normal – Dummy parameter, see comment above.
- Returns
The Jaccard distance between the two sequences.
-
seqsim.token.
sorensen_dist
(seq_x: Sequence[collections.abc.Hashable], seq_y: Sequence[collections.abc.Hashable], normal: bool = False) → float¶ Computes a distance between two sequences based on the Sørensen–Dice coefficient.
The function accepts the normal parameter to have calls equivalent to those of other methods, but it is redundant as the Sørensen–Dice distance is already in range [0..1].
Example
>>> seqsim.sequence.sorensen_dist("abc", "bcde") 0.4285714285714286
References
Kondrak, Grzegorz; Marcu, Daniel; Knight, Kevin (2003). “Cognates Can Improve Statistical ranslation Models” (PDF). Proceedings of HLT-NAACL 2003: Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics. pp. 46–48.
Sørensen, T. (1948). “A method of establishing groups of equal amplitude in plant sociology based on similarity of species and its application to analyses of the vegetation on Danish commons”. Kongelige Danske Videnskabernes Selskab. 5 (4): 1–34.
- Parameters
seq_x – The first sequence to be compared.
seq_y – The second sequence to be compared.
normal – Dummy parameter, see comment above.
- Returns
The Sørensen–Dice distance between the two sequences.
-
seqsim.token.
subseq_jaccard_dist
(seq_x: Sequence[collections.abc.Hashable], seq_y: Sequence[collections.abc.Hashable], normal: bool = False) → float¶ Computes a Jaccard distance between two sequences using sub-sequence occurrence.
The function accepts the normal parameter to have calls equivalent to those of other methods, but it is redundant as the Jaccard distance is already in range [0..1].
Example
>>> seqsim.sequence.subseq_jaccard_dist("abc", "bcde") 0.6857496100000001
References
Tan PN, Steinbach M, Kumar V (2005). Introduction to Data Mining. ISBN 0-321-32136-7.
- Parameters
seq_x – The first sequence to be compared.
seq_y – The second sequence to be compared.
normal – Dummy parameter, see comment above.
- Returns
The Subseq-Jaccard distance between the two sequences.
Module contents¶
Main module of the seqsim package.
We follow the mathematical definitions for distinguishing between “similarity” and “distance”, as the latter must have the following properties:
positivity: d(x,y) >= 0
symmetry: d(x,y) = d(y,x)
identity-discerning: d(x,y) = 0 => x = y
triangle inequality: d(x,z) <= d(x,y) + d(y,z)
-
seqsim.
distance
(seqs: Sequence[Sequence[collections.abc.Hashable]], method: str = 'levenshtein', normal: bool = False) → float¶ Computes the distance between sequences according to a specified method.
This function acts as a wrapper to all the methods offered by the package, including those that are not properly “distances” but measures of “similarity” (that is, those that do not offer all the distance properties). It is intended as a single point of call for all the methods that are offered.
Contrary to the individual methods that accept two sequence as arguments, this wrapper accepts a sequence of sequence, allowing to compute multiple distances.
Examples
>>> seqsim.distance(["abc", "bcde"]) 3.0 >>> seqsim.distance(["abc", "bcde", "fgh"]) 3.3333333333333335
- Parameters
seqs – A group of group of hashable elements to be compared. Currently, if more than two sequences are passed, it just returns the mean value of all pairwise comparisons, but this operation might change in the future at least for some methods.
method – The method for comparison to be used. The list of methods, and the function they call, can be obtained from the keys of the METHODS dictionary exported by this module. Defaults to “levenshtein”.
normal – Whether to return a normalized score for the comparison in range [0..1]. Note that the function will accept a True value for all methods, but not all methods offer normalization and some method always return normalized values. In those cases, the standard value will be returned a warning message will be sent to the standard logger (which can be silenced as usual with the Python logging standard library. Defaults to False.
- Returns
The distance score.