Fuzzy string matching

A few months ago, I wrote about a tool I made called the Fictograph, which graphs the awesomeness of authors’ works over time. It leans heavily on data from the Goodreads API. I expected the Goodreads API to be reliable, but it turns out it has some design problems. For example, if you query an author name with a minor spelling mistake, you sometimes get back data on a random author who is totally unrelated to your search.

This behavior is irritating to users, who get results for a different author than they intended. Plus, Fictograph users aren’t going to have much sympathy for my whining and blaming the underlying API for the problem. So I needed to find a programmatic way to compensate for this unhelpful API behavior.

I was stuck on this problem until I saw a presentation at PyGotham that touched on fuzzy string matching. This was a plausible solution, as fuzzy string matching can evaluate whether the name entered by the user is more or less the same as the name returned from the API. If they’re pretty much the same, great! If they’re not, it means the API is probably returning an unexpected result, so the Fictograph should probably return an “author not found” error.

The best part is that I didn’t have to write any string matching code myself; Python has libraries like fuzzywuzzy that will take care of fuzzy string matching for you.

This entry was posted in api, goodreads. Bookmark the permalink. Both comments and trackbacks are currently closed.
  • Subscribe to this blog

Skip to toolbar