FuzzyWuzzy Python Library

FuzzyWuzzy Python Library

Introduction

Python's FuzzyWuzzy library is used for string matching. Finding strings that match a specified pattern is known as fuzzy string matching. In essence, Levenshtein Distance is used to determine the differences between sequences.

By using SeatGeek's service to find sports and concert tickets, FuzzyWuzzy has been created and made available to the public. Their blog post's discussion of their initial use case.

FuzzyWuzzy is one of the best library used for string matching, where we can have a score out of 100 indicating the similarity index.

Installation Syntax

FuzzyWuzzy library can be installed using the following syntax:

pip install fuzzywuzzy
pip install python-Levenshtein

Imports of FuzzyWuzzy library can be done using the following syntax:

from fuzzywuzzy import fuzz
from fuzzywuzzy import process

write your code here: Coding Playground

Parameters

Parameters used by Python filter() function is enlisted below:

  • function: it is the function that tests if each element of a sequence true or not.
  • sequence: it is the sequence which needs to be filtered, it can be sets, lists, tuples, or containers of any iterators.

Return Value

This function returns an filtered iterator that can be iterated upon.

Simple Ratio Module

Ratio module can be used for comparing the String values and getting a score of how the strings are similar.

fuzz.ratio('abcdefghijklm', 'abcdeijklm')

87


# Exact match

fuzz.ratio('boardinfinity', 'boardinfinity')

100


fuzz.ratio('abcde fgh ijklm', 'Abcde Fgh Ijklm ')

80

Partial Ratio Module

Partial Ratio can be used for comparing the String values partially and getting a score of how the strings are similar.

fuzz.partial_ratio("abcdefghijklm", "abcdefghijklm#!")

100


fuzz.partial_ratio("abcde fgh ijklm", "abcde ijklm")

64

Token Sort Ratio

Token Sort ratio sorts and compare the string values

fuzz.token_sort_ratio("abcdefghijklm", "abcdefghijklm")

100


fuzz.token_sort_ratio("abcde fgh ijklm", "abcde fgh fgh ijklm")

88

WRatio

WRatio is used to compare string values by handling uppercase and lowercase letters.

fuzz.WRatio("abcdefghijklm", "ABCDEFGHIJKLM")

100


fuzz.WRatio("abcde fgh ijklm", "abcde fgh ijklm&&")

100

Example: Lets take an full code example where we are comparing strings using this library.

# Python code showing all the ratios together,

# make sure you have installed fuzzywuzzy module


from fuzzywuzzy import fuzz

from fuzzywuzzy import process


s1 = "abcdefghijklm"

s2 = "abcdefghijklm"

print "FuzzyWuzzy Ratio: ", fuzz.ratio(s1, s2)

print "FuzzyWuzzy PartialRatio: ", fuzz.partial_ratio(s1, s2)

print "FuzzyWuzzy TokenSortRatio: ", fuzz.token_sort_ratio(s1, s2)

print "FuzzyWuzzy TokenSetRatio: ", fuzz.token_set_ratio(s1, s2)

print "FuzzyWuzzy WRatio: ", fuzz.WRatio(s1, s2)

Output:

FuzzyWuzzy Ratio:  100

FuzzyWuzzy PartialRatio:  100

FuzzyWuzzy TokenSortRatio:  100

FuzzyWuzzy TokenSetRatio:  100

FuzzyWuzzy WRatio:  100

write your code here: Coding Playground