Skip to main content

Fuzzy join and sounds like

Hi,
I have two questions
1. Is is possible to do a join between databases with fuzzy matching. In other words while doing a join, i would like to do based on fuzzy matching and not direct matching. I dont think there is a direct way to do this. Can you let me know how to go about this?
2. Phonetic matching using sounds like- I would like to identify duplicates using sounds like within the same cloumn. For example identifying duplicate vendors by performing sounds like on the vendor name. Currently i think sounds like can be done between two columsn only. If i use soundex for this there are too many false positives. Is there a better way of doing this?
 
Regards
Padma 

Brian Element Sat, 06/24/2017 - 06:13

Hi Padma,

Sorry for taking so long to get back to you.

For the fuzzy matching, right now the join only looks for exact matches, I think they might be thinking of adding this feature in the future but don't quote me on this.  The fuzzy matching only works for looking for duplicates or when comparing fields within a row for right now.

For the Phonetic matching, I also think you are out of luck unless you want to create a script.  You can find the algorithms on the internet for the different types of phonetic matching and there are actually quite a few out there that does will replace the text with a phonetic representation that you can then match (or join on).  I had done some work on this several years ago but I don't think I ever finished that project, maybe I should.

Brian

padmathiagarajan Mon, 06/26/2017 - 01:38

Thanks Brian. 
will you be able to share some links for the types of phonetic matching.  If you happen to finish your project please do share your scripts. 
 
Regards
Padma

Brian Element Mon, 06/26/2017 - 06:19

The ones I found when I was doing my initial research are the:

New York State Identification and Intelligence System - https://en.wikipedia.org/wiki/New_York_State_Identification_and_Intelli…

Soundex - https://en.wikipedia.org/wiki/Soundex

Here is a site with many different types:  http://ntz-develop.blogspot.ca/2011/03/phonetic-algorithms.html

I might relook at this with the next version.