FuzzyWuzzy Python Library
Introduction
Python's FuzzyWuzzy library is used for string matching. Finding strings that match a specified pattern is known as fuzzy string matching. In essence, Levenshtein Distance is used to determine the differences between sequences.
By using SeatGeek's service to find sports and concert tickets, FuzzyWuzzy has been created and made available to the public. Their blog post's discussion of their initial use case.
FuzzyWuzzy is one of the best library used for string matching, where we can have a score out of 100 indicating the similarity index.
Installation Syntax
FuzzyWuzzy library can be installed using the following syntax:
pip install fuzzywuzzy |
Imports of FuzzyWuzzy library can be done using the following syntax:
from fuzzywuzzy import fuzz |
Parameters
Parameters used by Python filter() function is enlisted below:
- function: it is the function that tests if each element of a sequence true or not.
- sequence: it is the sequence which needs to be filtered, it can be sets, lists, tuples, or containers of any iterators.
Return Value
This function returns an filtered iterator that can be iterated upon.
Simple Ratio Module
Ratio module can be used for comparing the String values and getting a score of how the strings are similar.
fuzz.ratio('abcdefghijklm', 'abcdeijklm') 87 # Exact match fuzz.ratio('boardinfinity', 'boardinfinity') 100 fuzz.ratio('abcde fgh ijklm', 'Abcde Fgh Ijklm ') 80 |
Partial Ratio Module
Partial Ratio can be used for comparing the String values partially and getting a score of how the strings are similar.
fuzz.partial_ratio("abcdefghijklm", "abcdefghijklm#!") 100 fuzz.partial_ratio("abcde fgh ijklm", "abcde ijklm") 64 |
Token Sort Ratio
Token Sort ratio sorts and compare the string values
fuzz.token_sort_ratio("abcdefghijklm", "abcdefghijklm") 100 fuzz.token_sort_ratio("abcde fgh ijklm", "abcde fgh fgh ijklm") 88 |
WRatio
WRatio is used to compare string values by handling uppercase and lowercase letters.
fuzz.WRatio("abcdefghijklm", "ABCDEFGHIJKLM") 100 fuzz.WRatio("abcde fgh ijklm", "abcde fgh ijklm&&") 100 |
Example: Lets take an full code example where we are comparing strings using this library.
# Python code showing all the ratios together, # make sure you have installed fuzzywuzzy module from fuzzywuzzy import fuzz from fuzzywuzzy import process s1 = "abcdefghijklm" s2 = "abcdefghijklm" print "FuzzyWuzzy Ratio: ", fuzz.ratio(s1, s2) print "FuzzyWuzzy PartialRatio: ", fuzz.partial_ratio(s1, s2) print "FuzzyWuzzy TokenSortRatio: ", fuzz.token_sort_ratio(s1, s2) print "FuzzyWuzzy TokenSetRatio: ", fuzz.token_set_ratio(s1, s2) print "FuzzyWuzzy WRatio: ", fuzz.WRatio(s1, s2) |
Output:
FuzzyWuzzy Ratio: 100 FuzzyWuzzy PartialRatio: 100 FuzzyWuzzy TokenSortRatio: 100 FuzzyWuzzy TokenSetRatio: 100 FuzzyWuzzy WRatio: 100 |