### Install RapidFuzz from Git Source: https://rapidfuzz.github.io/RapidFuzz/Installation.html Clone the repository and install RapidFuzz directly from GitHub. This method requires a C++17 capable compiler. ```bash git clone --recursive https://github.com/rapidfuzz/rapidfuzz.git cd rapidfuzz pip install . ``` -------------------------------- ### Install RapidFuzz using pip Source: https://rapidfuzz.github.io/RapidFuzz/Installation.html Use this command to install the latest version of RapidFuzz via pip. Pre-built binaries are available for common operating systems. ```bash pip install rapidfuzz ``` -------------------------------- ### Install RapidFuzz using conda Source: https://rapidfuzz.github.io/RapidFuzz/Installation.html Install RapidFuzz from the conda-forge channel using this command. This is an alternative to pip installation. ```bash conda install -c conda-forge rapidfuzz ``` -------------------------------- ### Get Opcodes for String Transformation Source: https://rapidfuzz.github.io/RapidFuzz/Usage/distance/index.html Use this to get a list of opcodes that describe how to transform the first string into the second. This is useful for understanding the differences between strings. ```python >>> from rapidfuzz.distance import Levenshtein >>> Levenshtein.opcodes('spam', 'park') [Opcode(tag=delete, src_start=0, src_end=1, dest_start=0, dest_end=0), Opcode(tag=equal, src_start=1, src_end=3, dest_start=0, dest_end=2), Opcode(tag=replace, src_start=3, src_end=4, dest_start=2, dest_end=3), Opcode(tag=insert, src_start=4, src_end=4, dest_start=3, dest_end=4)] ``` -------------------------------- ### Get Opcodes for LCSseq Source: https://rapidfuzz.github.io/RapidFuzz/Usage/distance/LCSseq.html Use `opcodes` to get a detailed list of operations (equal, delete, insert, replace) that transform one string into another, including the start and end indices for slices in both strings. This provides a more granular view of the sequence alignment. ```python >>> from rapidfuzz.distance import LCSseq ``` ```python >>> a = "qabxcd" >>> b = "abycdf" >>> for tag, i1, i2, j1, j2 in LCSseq.opcodes(a, b): ... print(('%7s a[%d:%d] (%s) b[%d:%d] (%s)' % (tag, i1, i2, a[i1:i2], j1, j2, b[j1:j2]))) delete a[0:1] (q) b[0:0] () equal a[1:3] (ab) b[0:2] (ab) delete a[3:4] (x) b[2:2] () insert a[4:4] () b[2:3] (y) equal a[4:6] (cd) b[3:5] (cd) insert a[6:6] () b[5:6] (f) ``` -------------------------------- ### Levenshtein Distance with Custom Weights Source: https://rapidfuzz.github.io/RapidFuzz/Usage/distance/Levenshtein.html Allows for custom costs for insertion, deletion, and substitution operations by providing a tuple of weights. This example uses a higher cost for substitutions. ```python >>> Levenshtein.distance("lewenstein", "levenshtein", weights=(1,1,2)) 3 ``` -------------------------------- ### Get Edit Operations for LCSseq Source: https://rapidfuzz.github.io/RapidFuzz/Usage/distance/LCSseq.html Use `editops` to retrieve a list of edit operations (delete, insert, replace) required to transform the first string into the second, based on the LCS alignment. This is useful for understanding the minimal changes needed. ```python >>> from rapidfuzz.distance import LCSseq >>> for tag, src_pos, dest_pos in LCSseq.editops("qabxcd", "abycdf"): ... print(('%7s s1[%d] s2[%d]' % (tag, src_pos, dest_pos))) delete s1[0] s2[0] delete s1[3] s2[2] insert s1[4] s2[2] insert s1[6] s2[5] ``` -------------------------------- ### Calculate Quick Ratio using QRatio Source: https://rapidfuzz.github.io/RapidFuzz/Usage/fuzz.html Use QRatio for a fast string similarity calculation. It's similar to fuzz.ratio but returns 0 for two empty strings. Pre-processing and score cutoff are optional. ```python >>> fuzz.QRatio("this is a test", "this is a test!") 96.55171966552734 ``` -------------------------------- ### Get Edit Operations for Levenshtein Distance Source: https://rapidfuzz.github.io/RapidFuzz/Usage/distance/Levenshtein.html Use editops to get a list of edit operations required to transform one string into another. This is useful for understanding the minimal changes needed. ```python >>> from rapidfuzz.distance import Levenshtein >>> for tag, src_pos, dest_pos in Levenshtein.editops("qabxcd", "abycdf"): ... print(('%7s s1[%d] s2[%d]' % (tag, src_pos, dest_pos))) delete s1[1] s2[0] replace s1[3] s2[2] insert s1[6] s2[5] ``` -------------------------------- ### Basic Usage of extractOne with ratio scorer Source: https://rapidfuzz.github.io/RapidFuzz/Usage/process.html Demonstrates finding the best match using the default ratio scorer. Results are returned as a tuple of (match, score, index). ```python >>> extractOne("abcd", ["abce"], scorer=ratio) ("abce", 75.0, 0) ``` -------------------------------- ### Get Opcodes for Levenshtein Distance Source: https://rapidfuzz.github.io/RapidFuzz/Usage/distance/Levenshtein.html Use opcodes to get a sequence of operations (equal, replace, delete, insert) that transform one string into another. This provides a detailed breakdown of string differences. ```python >>> from rapidfuzz.distance import Levenshtein ``` ```python >>> a = "qabxcd" >>> b = "abycdf" >>> for tag, i1, i2, j1, j2 in Levenshtein.opcodes("qabxcd", "abycdf"): ... print(('%7s a[%d:%d] (%s) b[%d:%d] (%s)' % ... (tag, i1, i2, a[i1:i2], j1, j2, b[j1:j2]))) delete a[0:1] (q) b[0:0] () equal a[1:3] (ab) b[0:2] (ab) replace a[3:4] (x) b[2:3] (y) equal a[4:6] (cd) b[3:5] (cd) insert a[6:6] () b[5:6] (f) ``` -------------------------------- ### opcodes Source: https://rapidfuzz.github.io/RapidFuzz/Usage/distance/Indel.html Returns Opcodes describing how to turn one string into another. ```APIDOC ## opcodes ### Description Return Opcodes describing how to turn s1 into s2. ### Parameters #### Path Parameters - **s1** (Sequence[Hashable]) - Required - First string to compare. - **s2** (Sequence[Hashable]) - Required - Second string to compare. - **processor** (callable, optional) - Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour. ### Returns **opcodes** (Opcodes) - edit operations required to turn s1 into s2 ### Example ```python >>> from rapidfuzz.distance import Indel >>> a = "qabxcd" >>> b = "abycdf" >>> for tag, i1, i2, j1, j2 in Indel.opcodes(a, b): ... print(('%7s a[%d:%d] (%s) b[%d:%d] (%s)' % ... (tag, i1, i2, a[i1:i2], j1, j2, b[j1:j2]))) delete a[0:1] (q) b[0:0] () equal a[1:3] (ab) b[0:2] (ab) delete a[3:4] (x) b[2:2] () insert a[4:4] () b[2:3] (y) equal a[4:6] (cd) b[3:5] (cd) insert a[6:6] () b[5:6] (f) ``` ``` -------------------------------- ### Get Opcodes for Indel Distance Source: https://rapidfuzz.github.io/RapidFuzz/Usage/distance/Indel.html Use `opcodes` to get a list of tuples describing the edit script to convert one string to another. Each tuple contains the operation type and the slices of the source and destination strings involved. This provides a more detailed view of the alignment compared to `editops`. ```python from rapidfuzz.distance import Indel a = "qabxcd" b = "abycdf" for tag, i1, i2, j1, j2 in Indel.opcodes(a, b): print(("%7s a[%d:%d] (%s) b[%d:%d] (%s)" % (tag, i1, i2, a[i1:i2], j1, j2, b[j1:j2]))) ``` -------------------------------- ### QRatio Source: https://rapidfuzz.github.io/RapidFuzz/Usage/fuzz.html Calculates a quick ratio between two strings using fuzz.ratio. Since v3.0, it behaves similarly to fuzz.ratio but returns 0 for empty strings. ```APIDOC ## QRatio `rapidfuzz.fuzz.QRatio(_s1_ , _s2_ , _*_ , _processor =None_, _score_cutoff =None_)` Calculates a quick ratio between two strings using fuzz.ratio. Since v3.0 this behaves similar to fuzz.ratio with the exception that this returns 0 when comparing two empty strings. ### Parameters * **s1** (_Sequence_ _[__Hashable_ _]_) – First string to compare. * **s2** (_Sequence_ _[__Hashable_ _]_) – Second string to compare. * **processor** (_callable_ _,__optional_) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour. * **score_cutoff** (_float_ _,__optional_) – Optional argument for a score threshold as a float between 0 and 100. For ratio < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour. ### Returns * **similarity** – similarity between s1 and s2 as a float between 0 and 100 ### Return type float ### Examples ``` >>> fuzz.QRatio("this is a test", "this is a test!") 96.55171966552734 ``` ``` -------------------------------- ### Opcodes.as_matching_blocks() Source: https://rapidfuzz.github.io/RapidFuzz/Usage/distance/index.html Converts the opcodes into matching blocks. ```APIDOC ## Opcodes.as_matching_blocks() ### Description Converts Opcodes to matching blocks. ### Returns - **matching_blocks** (list[MatchingBlock]) - Opcodes converted to matching blocks. ``` -------------------------------- ### WRatio Source: https://rapidfuzz.github.io/RapidFuzz/Usage/fuzz.html Calculates a weighted ratio based on other ratio algorithms. It preprocesses strings and can apply a score cutoff. ```APIDOC ## WRatio `rapidfuzz.fuzz.WRatio(_s1_ , _s2_ , _*_ , _processor =None_, _score_cutoff =None_)` Calculates a weighted ratio based on the other ratio algorithms. ### Parameters * **s1** (_str_) – First string to compare. * **s2** (_str_) – Second string to compare. * **processor** (_callable_ _,__optional_) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour. * **score_cutoff** (_float_ _,__optional_) – Optional argument for a score threshold as a float between 0 and 100. For ratio < score_cutoff 0 is returned instead. Default is 0, which deactivates this behaviour. ### Returns * **similarity** – similarity between s1 and s2 as a float between 0 and 100 ### Return type float ``` -------------------------------- ### rapidfuzz.utils Source: https://rapidfuzz.github.io/RapidFuzz/Usage/index.html Utility functions for RapidFuzz. ```APIDOC ## `default_process(query, choices, scorer=, **kwargs)` ### Description Applies a default processing pipeline to a query and choices before matching. ### Method `default_process(query, choices, scorer=, **kwargs)` ### Parameters * **query** (str) - The string to process. * **choices** (list) - A list of strings to process. * **scorer** (function) - The scoring function to use after processing. * **kwargs** - Additional keyword arguments to pass to the scorer. ### Response Returns the processed query and choices, and the score from the scorer. ``` -------------------------------- ### Get Edit Operations for Indel Distance Source: https://rapidfuzz.github.io/RapidFuzz/Usage/distance/Indel.html Use `editops` to retrieve a list of edit operations (tag, source position, destination position) needed to transform one string into another using the Indel distance algorithm. This is useful for understanding the specific changes required for alignment. ```python from rapidfuzz.distance import Indel for tag, src_pos, dest_pos in Indel.editops("qabxcd", "abycdf"): print(("%7s s1[%d] s2[%d]" % (tag, src_pos, dest_pos))) ``` -------------------------------- ### WRatio Source: https://rapidfuzz.github.io/RapidFuzz/_sources/Usage/fuzz.rst.txt Calculates a weighted ratio based on multiple fuzzy matching strategies. ```APIDOC ## WRatio ### Description Calculates a weighted ratio by combining the results of several fuzzy matching strategies. This provides a more robust similarity score. ### Method `rapidfuzz.fuzz.WRatio(s1: str, s2: str, weights: tuple[float, float, float, float, float] | None = None, processor: callable | None = None, scorer: callable | None = None) -> float` ### Parameters * **s1** (str) - The first string to compare. * **s2** (str) - The second string to compare. * **weights** (tuple[float, float, float, float, float] | None, optional) - Weights for different matching strategies. * **processor** (callable | None, optional) - A function to preprocess strings before comparison. * **scorer** (callable | None, optional) - A scorer function to use for comparison. ### Returns * float - The weighted similarity ratio. ``` -------------------------------- ### Opcodes.apply() Source: https://rapidfuzz.github.io/RapidFuzz/Usage/distance/index.html Applies the opcodes to a source string to transform it into a destination string. ```APIDOC ## Opcodes.apply() ### Description Applies opcodes to source_string to transform it into the destination_string. ### Parameters #### Path Parameters - **source_string** (str | bytes) - The string to apply opcodes to. - **destination_string** (str | bytes) - The string to use for replacements or insertions into source_string. ### Returns - **mod_string** (str) - The modified source_string. ``` -------------------------------- ### Import necessary functions Source: https://rapidfuzz.github.io/RapidFuzz/Usage/process.html Import the required functions for using extractOne and different scoring methods. ```python from rapidfuzz.process import extractOne from rapidfuzz.distance import Levenshtein from rapidfuzz.fuzz import ratio ``` -------------------------------- ### Invert Editops with Levenshtein Source: https://rapidfuzz.github.io/RapidFuzz/Usage/distance/index.html Shows how to invert a sequence of edit operations to describe the transformation from the destination string back to the source string. This is useful for understanding the reverse edit path. ```python >>> Levenshtein.editops('spam', 'park').inverse() [Editop(tag=insert, src_pos=0, dest_pos=0), Editop(tag=replace, src_pos=2, dest_pos=3), Editop(tag=delete, src_pos=3, dest_pos=4)] ``` -------------------------------- ### Find Optimal Substring Alignment with fuzz.partial_ratio Source: https://rapidfuzz.github.io/RapidFuzz/Usage/fuzz.html Searches for the optimal alignment of the shorter string within the longer string and returns the fuzz.ratio for that alignment. This function is optimized for performance based on the length of the shorter string. ```python >>> fuzz.partial_ratio("this is a test", "this is a test!") 100.0 ``` -------------------------------- ### Calculate Editops with Levenshtein Source: https://rapidfuzz.github.io/RapidFuzz/Usage/distance/index.html Demonstrates how to calculate the edit operations required to transform one string into another using the Levenshtein distance. This is useful for understanding the differences between two strings. ```python >>> from rapidfuzz.distance import Levenshtein >>> Levenshtein.editops('spam', 'park') [Editop(tag=delete, src_pos=0, dest_pos=0), Editop(tag=replace, src_pos=3, dest_pos=2), Editop(tag=insert, src_pos=4, dest_pos=3)] ``` -------------------------------- ### Compare Token Sets with fuzz.token_set_ratio Source: https://rapidfuzz.github.io/RapidFuzz/Usage/fuzz.html Compares strings based on unique and common words using fuzz.ratio. It returns 100.0 if one string is a subset of the other, regardless of extra content. The score is reduced only when there is explicit disagreement. ```python >>> fuzz.token_sort_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear") 83.8709716796875 ``` ```python >>> fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear") 100.0 ``` ```python # Returns 100.0 if one string is a subset of the other, regardless of extra content in the longer string >>> fuzz.token_set_ratio("fuzzy was a bear but not a dog", "fuzzy was a bear") 100.0 ``` ```python # Score is reduced only when there is explicit disagreement in the two strings >>> fuzz.token_set_ratio("fuzzy was a bear but not a dog", "fuzzy was a bear but not a cat") 92.3076923076923 ``` -------------------------------- ### default_process Source: https://rapidfuzz.github.io/RapidFuzz/Usage/utils.html Preprocesses a string by removing non-alphanumeric characters, trimming whitespace, and converting to lowercase. ```APIDOC ## default_process ### Description This function preprocesses a string by: * removing all non alphanumeric characters * trimming whitespaces * converting all characters to lower case ### Parameters #### Path Parameters - **sentence** (str) - Required - String to preprocess ### Returns **processed_string** (str) - processed string ``` -------------------------------- ### default_process Source: https://rapidfuzz.github.io/RapidFuzz/_sources/Usage/utils.rst.txt The default_process function is used to preprocess strings before comparison. It converts strings to lowercase and removes whitespace. ```APIDOC ## default_process ### Description This function preprocesses strings by converting them to lowercase and removing leading/trailing whitespace. ### Signature `rapidfuzz.utils.default_process(value: str) -> str` ### Parameters #### Arguments - **value** (str) - The string to process. ### Returns - str - The processed string. ``` -------------------------------- ### QRatio Source: https://rapidfuzz.github.io/RapidFuzz/_sources/Usage/fuzz.rst.txt Calculates a ratio based on the quality of the match, often used for shorter strings. ```APIDOC ## QRatio ### Description Calculates a ratio based on the quality of the match. This is often used for comparing shorter strings or when a different scoring metric is desired. ### Method `rapidfuzz.fuzz.QRatio(s1: str, s2: str, weights: tuple[float, float, float] | None = None, processor: callable | None = None, scorer: callable | None = None) -> float` ### Parameters * **s1** (str) - The first string to compare. * **s2** (str) - The second string to compare. * **weights** (tuple[float, float, float] | None, optional) - Weights for insertion, deletion, and substitution. * **processor** (callable | None, optional) - A function to preprocess strings before comparison. * **scorer** (callable | None, optional) - A scorer function to use for comparison. ### Returns * float - The quality-based similarity ratio. ``` -------------------------------- ### opcodes Source: https://rapidfuzz.github.io/RapidFuzz/Usage/distance/LCSseq.html Returns Opcodes describing the sequence of operations required to transform one string into another. ```APIDOC ## opcodes ### Description Returns Opcodes describing how to turn s1 into s2. ### Parameters #### Path Parameters - **s1** (Sequence[Hashable]) - Required - First string to compare. - **s2** (Sequence[Hashable]) - Required - Second string to compare. - **processor** (callable, optional) - Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour. ### Returns **opcodes** (Opcodes) - edit operations required to turn s1 into s2 ### Example ```python >>> from rapidfuzz.distance import LCSseq >>> a = "qabxcd" >>> b = "abycdf" >>> for tag, i1, i2, j1, j2 in LCSseq.opcodes(a, b): ... print(('%7s a[%d:%d] (%s) b[%d:%d] (%s)' % ... (tag, i1, i2, a[i1:i2], j1, j2, b[j1:j2]))) delete a[0:1] (q) b[0:0] () equal a[1:3] (ab) b[0:2] (ab) delete a[3:4] (x) b[2:2] () insert a[4:4] () b[2:3] (y) equal a[4:6] (cd) b[3:5] (cd) insert a[6:6] () b[5:6] (f) ``` ``` -------------------------------- ### Editop Class Source: https://rapidfuzz.github.io/RapidFuzz/_sources/Usage/distance/index.rst.txt Documentation for the Editop class, which represents edit operations. ```APIDOC ## rapidfuzz.distance.Editop ### Description Provides details about the Editop class. ### Members (Members are documented in the source) ``` -------------------------------- ### Opcodes.from_editops() Source: https://rapidfuzz.github.io/RapidFuzz/Usage/distance/index.html Creates Opcodes from a given Editops object. ```APIDOC ## Opcodes.from_editops(_editops_) ### Description Creates Opcodes from Editops. ### Parameters #### Path Parameters - **editops** (Editops) - The Editops object to convert to opcodes. ### Returns - **opcodes** (Opcodes) - Editops converted to Opcodes. ``` -------------------------------- ### Token Sort Ratio Comparison Source: https://rapidfuzz.github.io/RapidFuzz/Usage/fuzz.html Calculates the similarity between two strings by sorting their words alphabetically and then applying the standard ratio. Useful when word order is not important. ```python >>> fuzz.token_sort_ratio("fuzzy wuzzy was a bear", "wuzzy fuzzy was a bear") 100.0 ``` -------------------------------- ### rapidfuzz.process Source: https://rapidfuzz.github.io/RapidFuzz/Usage/index.html Functions for processing lists of strings to find the best matches. ```APIDOC ## `cdist()` ### Description Calculates the pairwise distance between all strings in two lists. ### Method `cdist(list1, list2, scorer=, **kwargs)` ### Parameters * **list1** (list) - The first list of strings. * **list2** (list) - The second list of strings. * **scorer** (function) - The scoring function to use (default: `rapidfuzz.fuzz.ratio`). * **kwargs** - Additional keyword arguments to pass to the scorer. ### Response Returns a list of lists containing the pairwise distances. ``` ```APIDOC ## `cpdist(string, choices, scorer=, **kwargs)` ### Description Calculates the pairwise distance between a single string and a list of choices. ### Method `cpdist(string, choices, scorer=, **kwargs)` ### Parameters * **string** (str) - The string to compare against. * **choices** (list) - A list of strings to compare with. * **scorer** (function) - The scoring function to use (default: `rapidfuzz.fuzz.ratio`). * **kwargs** - Additional keyword arguments to pass to the scorer. ### Response Returns a list of tuples, where each tuple contains the score and the corresponding choice. ``` ```APIDOC ## `extract(query, choices, scorer=, limit=5, **kwargs)` ### Description Extracts the best matching strings from a list of choices for a given query. ### Method `extract(query, choices, scorer=, limit=5, **kwargs)` ### Parameters * **query** (str) - The string to search for. * **choices** (list) - A list of strings to search within. * **scorer** (function) - The scoring function to use (default: `rapidfuzz.fuzz.ratio`). * **limit** (int) - The maximum number of matches to return (default: 5). * **kwargs** - Additional keyword arguments to pass to the scorer. ### Response Returns a list of tuples, where each tuple contains the best matching string, its score, and its index in the original choices list. ``` ```APIDOC ## `extract_iter(query, choices, scorer=, limit=5, **kwargs)` ### Description An iterator that yields the best matching strings from a list of choices for a given query. ### Method `extract_iter(query, choices, scorer=, limit=5, **kwargs)` ### Parameters * **query** (str) - The string to search for. * **choices** (list) - A list of strings to search within. * **scorer** (function) - The scoring function to use (default: `rapidfuzz.fuzz.ratio`). * **limit** (int) - The maximum number of matches to return (default: 5). * **kwargs** - Additional keyword arguments to pass to the scorer. ### Response Yields tuples, where each tuple contains the best matching string, its score, and its index in the original choices list. ``` ```APIDOC ## `extractOne(query, choices, scorer=, **kwargs)` ### Description Extracts the single best matching string from a list of choices for a given query. ### Method `extractOne(query, choices, scorer=, **kwargs)` ### Parameters * **query** (str) - The string to search for. * **choices** (list) - A list of strings to search within. * **scorer** (function) - The scoring function to use (default: `rapidfuzz.fuzz.ratio`). * **kwargs** - Additional keyword arguments to pass to the scorer. ### Response Returns a tuple containing the best matching string, its score, and its index in the original choices list, or None if choices is empty. ``` -------------------------------- ### Opcodes.as_list() Source: https://rapidfuzz.github.io/RapidFuzz/Usage/distance/index.html Converts the opcodes into a list of tuples, compatible with difflib's SequenceMatcher. ```APIDOC ## Opcodes.as_list() ### Description Converts Opcodes to a list of tuples, compatible with the opcodes of difflibs SequenceMatcher. ### Returns - A list of opcode tuples. ``` -------------------------------- ### Opcodes.copy() Source: https://rapidfuzz.github.io/RapidFuzz/Usage/distance/index.html Creates a copy of the Opcodes object. ```APIDOC ## Opcodes.copy() ### Description Performs a copy of the Opcodes object. ### Returns - A new Opcodes object that is a copy of the original. ``` -------------------------------- ### Opcodes.as_editops() Source: https://rapidfuzz.github.io/RapidFuzz/Usage/distance/index.html Converts the opcodes into a format compatible with Editops. ```APIDOC ## Opcodes.as_editops() ### Description Converts Opcodes to Editops. ### Returns - **editops** (Editops) - Opcodes converted to Editops. ``` -------------------------------- ### Normalized Levenshtein Similarity with Custom Weights Source: https://rapidfuzz.github.io/RapidFuzz/Usage/distance/Levenshtein.html It is possible to select different weights for insertion, deletion, and substitution by passing a weight tuple. This allows for custom cost calculations in the similarity metric. ```python Levenshtein.normalized_similarity("lewenstein", "levenshtein", weights=(1,1,2)) ``` -------------------------------- ### Postfix Distance Source: https://rapidfuzz.github.io/RapidFuzz/_sources/Usage/distance/index.rst.txt Documentation for the Postfix distance calculation. ```APIDOC ## rapidfuzz.distance.Postfix ### Description Calculates the Postfix distance between two strings. (Further details and methods would be listed here if available in the source) ``` -------------------------------- ### similarity Source: https://rapidfuzz.github.io/RapidFuzz/_sources/Usage/distance/Prefix.rst.txt Calculates the prefix similarity between two strings. ```APIDOC ## similarity ### Description Calculates the prefix similarity between two strings. ### Signature `similarity(s1: str, s2: str) -> int` ``` -------------------------------- ### partial_token_set_ratio Source: https://rapidfuzz.github.io/RapidFuzz/Usage/fuzz.html Compares strings based on unique and common words using fuzz.partial_ratio. It preprocesses strings and can apply a score cutoff. ```APIDOC ## partial_token_set_ratio Compares the words in the strings based on unique and common words between them using fuzz.partial_ratio. ### Parameters * **s1** (str) – First string to compare. * **s2** (str) – Second string to compare. * **processor** (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None. * **score_cutoff** (float, optional) – Optional argument for a score threshold between 0 and 100. For ratio < score_cutoff, 0 is returned. Default is 0. ``` -------------------------------- ### editops Source: https://rapidfuzz.github.io/RapidFuzz/Usage/distance/Indel.html Returns Editops describing how to turn one string into another. ```APIDOC ## editops ### Description Return Editops describing how to turn s1 into s2. ### Parameters #### Path Parameters - **s1** (Sequence[Hashable]) - Required - First string to compare. - **s2** (Sequence[Hashable]) - Required - Second string to compare. - **processor** (callable, optional) - Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour. ### Returns **editops** (Editops) - edit operations required to turn s1 into s2 ### Example ```python >>> from rapidfuzz.distance import Indel >>> for tag, src_pos, dest_pos in Indel.editops("qabxcd", "abycdf"): ... print(('%7s s1[%d] s2[%d]' % (tag, src_pos, dest_pos))) delete s1[0] s2[0] delete s1[3] s2[2] insert s1[4] s2[2] insert s1[6] s2[5] ``` ``` -------------------------------- ### Inverse Opcodes for String Transformation Source: https://rapidfuzz.github.io/RapidFuzz/Usage/distance/index.html Invert the opcodes to describe how to transform the destination string back into the source string. This is useful for understanding the reverse transformation. ```python >>> Levenshtein.opcodes('spam', 'park').inverse() [Opcode(tag=insert, src_start=0, src_end=0, dest_start=0, dest_end=1), Opcode(tag=equal, src_start=0, src_end=2, dest_start=1, dest_end=3), Opcode(tag=replace, src_start=2, src_end=3, dest_start=3, dest_end=4), Opcode(tag=delete, src_start=3, src_end=4, dest_start=4, dest_end=4)] ``` -------------------------------- ### rapidfuzz.fuzz.partial_ratio Source: https://rapidfuzz.github.io/RapidFuzz/Usage/fuzz.html Finds the optimal alignment of the shorter string within the longer string and returns the fuzz.ratio for that alignment. This is useful for finding substrings. ```APIDOC ## rapidfuzz.fuzz.partial_ratio ### Description Searches for the optimal alignment of the shorter string in the longer string and returns the fuzz.ratio for this alignment. ### Parameters * **s1** (Sequence[Hashable]) - The first string to compare. * **s2** (Sequence[Hashable]) - The second string to compare. * **processor** (callable, optional) - A callable to preprocess strings before comparison. Defaults to None. * **score_cutoff** (float, optional) - A score threshold between 0 and 100. If the ratio is less than this value, 0 is returned. Defaults to 0. ### Returns * **similarity** (float) - The similarity score between s1 and s2, ranging from 0 to 100. ### Notes Different implementations are used for short and long needles to optimize performance. ### Example ```python >>> fuzz.partial_ratio("this is a test", "this is a test!") 100.0 ``` ``` -------------------------------- ### partial_token_ratio Source: https://rapidfuzz.github.io/RapidFuzz/Usage/fuzz.html A helper method that returns the maximum of fuzz.partial_token_set_ratio and fuzz.partial_token_sort_ratio, providing a faster comparison. ```APIDOC ## partial_token_ratio Helper method that returns the maximum of fuzz.partial_token_set_ratio and fuzz.partial_token_sort_ratio (faster than manually executing the two functions). ### Parameters * **s1** (str) – First string to compare. * **s2** (str) – Second string to compare. * **processor** (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None. * **score_cutoff** (float, optional) – Optional argument for a score threshold between 0 and 100. For ratio < score_cutoff, 0 is returned. Default is 0. ``` -------------------------------- ### extractOne with score_cutoff Source: https://rapidfuzz.github.io/RapidFuzz/Usage/process.html Shows how to use score_cutoff to filter out matches below a certain threshold. Returns None if no match meets the cutoff. ```python >>> extractOne("abcd", ["abce"], scorer=ratio, score_cutoff=80) None ``` -------------------------------- ### Jaro Distance Source: https://rapidfuzz.github.io/RapidFuzz/_sources/Usage/distance/index.rst.txt Documentation for the Jaro distance calculation. ```APIDOC ## rapidfuzz.distance.Jaro ### Description Calculates the Jaro distance between two strings. (Further details and methods would be listed here if available in the source) ``` -------------------------------- ### token_ratio Source: https://rapidfuzz.github.io/RapidFuzz/_sources/Usage/fuzz.rst.txt Calculates the similarity ratio based on token matching, considering order and duplicates. ```APIDOC ## token_ratio ### Description Calculates the similarity ratio based on token matching, considering the order and presence of duplicate tokens. This is a more strict token-based comparison. ### Method `rapidfuzz.fuzz.token_ratio(s1: str, s2: str, weights: tuple[float, float, float] | None = None, processor: callable | None = None, scorer: callable | None = None) -> float` ### Parameters * **s1** (str) - The first string to compare. * **s2** (str) - The second string to compare. * **weights** (tuple[float, float, float] | None, optional) - Weights for insertion, deletion, and substitution. * **processor** (callable | None, optional) - A function to preprocess strings before comparison. * **scorer** (callable | None, optional) - A scorer function to use for comparison. ### Returns * float - The similarity ratio based on token matching. ``` -------------------------------- ### Preprocess strings before comparison Source: https://rapidfuzz.github.io/RapidFuzz/Usage/process.html Utilize the processor argument to apply a preprocessing function to both the query and choices before comparison, ensuring case-insensitivity or other transformations. ```python extractOne("abcd", ["abcD"], scorer=ratio) extractOne("abcd", ["abcD"], scorer=ratio, processor=utils.default_process) extractOne("abcd", ["abcD"], scorer=ratio, processor=lambda s: s.upper()) ``` -------------------------------- ### token_ratio Source: https://rapidfuzz.github.io/RapidFuzz/Usage/fuzz.html A helper method that returns the maximum of fuzz.token_set_ratio and fuzz.token_sort_ratio, offering a faster alternative to manual execution. ```APIDOC ## token_ratio Helper method that returns the maximum of fuzz.token_set_ratio and fuzz.token_sort_ratio (faster than manually executing the two functions). ### Parameters * **s1** (str) – First string to compare. * **s2** (str) – Second string to compare. * **processor** (callable, optional) – Optional callable that is used to preprocess the strings before comparing them. Default is None. * **score_cutoff** (float, optional) – Optional argument for a score threshold between 0 and 100. For ratio < score_cutoff, 0 is returned. Default is 0. ``` -------------------------------- ### rapidfuzz.fuzz.token_set_ratio Source: https://rapidfuzz.github.io/RapidFuzz/Usage/fuzz.html Compares strings based on the set of words they contain, using fuzz.ratio on common and unique words. It's robust to word order and duplicates. ```APIDOC ## rapidfuzz.fuzz.token_set_ratio ### Description Compares the words in the strings based on unique and common words between them using fuzz.ratio. ### Parameters * **s1** (str) - The first string to compare. * **s2** (str) - The second string to compare. * **processor** (callable, optional) - A callable to preprocess strings before comparison. Defaults to None. * **score_cutoff** (float, optional) - A score threshold between 0 and 100. If the ratio is less than this value, 0 is returned. Defaults to 0. ### Returns * **similarity** (float) - The similarity score between s1 and s2, ranging from 0 to 100. ### Notes Returns 100.0 if one string is a subset of the other, regardless of extra content in the longer string. Score is reduced only when there is explicit disagreement. ### Examples ```python >>> fuzz.token_sort_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear") 83.8709716796875 >>> fuzz.token_set_ratio("fuzzy was a bear", "fuzzy fuzzy was a bear") 100.0 >>> fuzz.token_set_ratio("fuzzy was a bear but not a dog", "fuzzy was a bear") 100.0 >>> fuzz.token_set_ratio("fuzzy was a bear but not a dog", "fuzzy was a bear but not a cat") 92.3076923076923 ``` ``` -------------------------------- ### similarity Source: https://rapidfuzz.github.io/RapidFuzz/_sources/Usage/distance/Jaro.rst.txt Calculates the Jaro similarity between two strings. ```APIDOC ## similarity ### Description Calculates the Jaro similarity between two strings. ### Signature `rapidfuzz.distance.Jaro.similarity(s1: str, s2: str, weights: tuple[float, float] = (0.1, 0.1)) -> float` ``` -------------------------------- ### rapidfuzz.process.extract Source: https://rapidfuzz.github.io/RapidFuzz/Usage/process.html Finds the best matches in a list of choices and sorts them by similarity. It supports custom scorers, processors, limits, and score cutoffs. ```APIDOC ## extract ### Description Finds the best matches in a list of choices. The list is sorted by the similarity. When multiple choices have the same similarity, they are sorted by their index. ### Parameters * **query** (_Sequence_[_Hashable_]) – string we want to find * **choices** (_Collection_[_Sequence_[_Hashable_]] | _Mapping_[_Sequence_[_Hashable_]]) – list of all strings the query should be compared with or dict with a mapping {: } * **scorer** (_Callable_ _,__optional_) – Optional callable that is used to calculate the matching score between the query and each choice. This can be any of the scorers included in RapidFuzz (both scorers that calculate the edit distance or the normalized edit distance), or a custom function, which returns a normalized edit distance. fuzz.WRatio is used by default. * **processor** (_Callable_ _,__optional_) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour. * **limit** (_int_ _,__optional_) – maximum amount of results to return. None can be passed to disable this behavior. Default is 5. * **score_cutoff** (_Any_ _,__optional_) – Optional argument for a score threshold. When an edit distance is used this represents the maximum edit distance and matches with a distance > score_cutoff are ignored. When a normalized edit distance is used this represents the minimal similarity and matches with a similarity < score_cutoff are ignored. Default is None, which deactivates this behaviour. * **score_hint** (_Any_ _,__optional_) – Optional argument for an expected score to be passed to the scorer. This is used to select a faster implementation. Default is None, which deactivates this behaviour. * **scorer_kwargs** (_dict_[_str_,_Any_]__,__optional_) – any other named parameters are passed to the scorer. This can be used to pass e.g. weights to Levenshtein.distance. ### Returns The return type is always a List of Tuples with 3 elements. However the values stored in the tuple depend on the types of the input arguments. * The first element is always the choice, which is the value that’s compared to the query. * The second value represents the similarity calculated by the scorer. This can be: * An edit distance (distance is 0 for a perfect match and > 0 for non perfect matches). In this case only choices which have a distance <= score_cutoff are returned. An example of a scorer with this behavior is Levenshtein.distance. * A normalized edit distance (similarity is a score between 0 and 100, with 100 being a perfect match). In this case only choices which have a similarity >= score_cutoff are returned. An example of a scorer with this behavior is Levenshtein.normalized_similarity. Note, that for all scorers, which are not provided by RapidFuzz, only normalized edit distances are supported. * The third parameter depends on the type of the choices argument it is: * The index of choice when choices is a simple iterable like a list * The key of choice when choices is a mapping like a dict, or a pandas Series The list is sorted by similarity or distance depending on the scorer used. The first element in the list has the highest similarity/smallest distance. ### Return type list[tuple[Sequence[Hashable], Any, Any]] ``` -------------------------------- ### Editops.copy Source: https://rapidfuzz.github.io/RapidFuzz/Usage/distance/index.html Creates a copy of the Editops object. ```APIDOC ## copy ### Description Performs copy of Editops. ### Returns * **a copy of Editops** ### Return type Editops ``` -------------------------------- ### MIT License Text Source: https://rapidfuzz.github.io/RapidFuzz/License.html This is the full text of the MIT license agreement. It grants broad permissions for use, modification, and distribution of the software, with the condition that the copyright notice and permission notice are included in all copies or substantial portions of the Software. ```plaintext Copyright © 2020-present Max Bachmann Copyright © 2011 Adam Cohen Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ``` -------------------------------- ### rapidfuzz.process.cpdist Source: https://rapidfuzz.github.io/RapidFuzz/Usage/process.html Computes the pairwise distance/similarity between corresponding elements of the queries and choices collections. ```APIDOC ## cpdist rapidfuzz.process.cpdist(_queries_, _choices_, _*_ , _scorer= _, _processor=None_, _score_cutoff=None_, _score_hint=None_, _score_multiplier=1_, _dtype=None_, _workers=1_ , _**kwargs_) ### Description Compute the pairwise distance/similarity between corresponding elements of the queries & choices. ### Parameters * **queries** (_Collection_[_Sequence_[_Hashable_]]) – list of strings used to compute the distance/similarity. * **choices** (_Collection_[_Sequence_[_Hashable_]]) – list of strings the queries should be compared with. Must be the same length as the queries. * **scorer** (_Callable_,_optional_) – Optional callable that is used to calculate the matching score between the query and each choice. This can be any of the scorers included in RapidFuzz (both scorers that calculate the edit distance or the normalized edit distance), or a custom function, which returns a normalized edit distance. fuzz.ratio is used by default. * **processor** (_Callable_,_optional_) – Optional callable that is used to preprocess the strings before comparing them. Default is None, which deactivates this behaviour. * **score_cutoff** (_Any_,_optional_) – Optional argument for a score threshold to be passed to the scorer. Default is None, which deactivates this behaviour. * **score_hint** (_Any_,_optional_) – Optional argument for an expected score to be passed to the scorer. This is used to select a faster implementation. Default is None, which deactivates this behaviour. * **score_multiplier** (_Any_,_optional_) – Optional argument to multiply the calculated score with. This is applied as the final step, so e.g. score_cutoff is applied on the unmodified score. This is mostly useful to map from a floating point range to an integer to reduce the memory usage. Default is 1, which deactivates this behaviour. * **dtype** (_data-type_,_optional_) – The desired data-type for the result array. Depending on the scorer type the following dtypes are supported: * similarity: - np.float32, np.float64 - np.uint8 -> stores fixed point representation of the result scaled to a range 0-100 * distance: - np.int8, np.int16, np.int32, np.int64 If not given, then the type will be np.float32 for similarities and np.int32 for distances. * **workers** (_int_,_optional_) – The calculation is subdivided into workers sections and evaluated in parallel. Supply -1 to use all available CPU cores. This argument is only available for scorers using the RapidFuzz C-API so far, since it releases the Python GIL. * **scorer_kwargs** (_dict_[_str_,_Any_],_optional_) – any other named parameters are passed to the scorer. This can be used to pass e.g. weights to Levenshtein.distance ### Returns Returns a matrix of size (n x 1) of dtype with the distance/similarity between each pair of the two collections of inputs. ### Return type ndarray ``` -------------------------------- ### Calculate Normalized Indel Similarity using fuzz.ratio Source: https://rapidfuzz.github.io/RapidFuzz/Usage/fuzz.html Calculates the normalized Indel similarity between two strings. An optional processor can be used for preprocessing, and a score_cutoff can be set to return 0 if the similarity is below the threshold. ```python >>> fuzz.ratio("this is a test", "this is a test!") 96.55171966552734 ``` -------------------------------- ### Editops.apply Source: https://rapidfuzz.github.io/RapidFuzz/Usage/distance/index.html Applies a sequence of edit operations to a source string to transform it into a destination string. ```APIDOC ## apply ### Description Applies editops to source_string. ### Parameters * **source_string** (_str_ | _bytes_) – string to apply editops to * **destination_string** (_str_ | _bytes_) – string to use for replacements / insertions into source_string ### Returns * **mod_string** – modified source_string ### Return type str ```