Skip to main content

Find similarity between two words in ideascript

Hi!
I have a huge database (1 million records) en which I would like to find similarities within one field. The problem is that Fuzzy function is too slow. I have tryied to do it comparing one woth the rest by using ideacript, but I can't find the function which gives me the similarity between two names.
SimilarPhrase and SimilarWord aren't working in ideascript. Does anyone know how to find if two names are similar, from ideascript?
Thannk you!

klmi Fri, 06/19/2020 - 04:01

Hi Clara,
there exist a lot of algorithms to check the similarity between two strings (f.e. Levenshtein distance, cologne phonetics, fuzzy, Russell-Soundex, similarity score ...). I found different code examples (also in VBA) in the internet and have experimented a lot with all of these implementations. From my point of view sometimes one is better than the other and sometimes there is the reverse situation which depends much on the strings you want to compare. Another topic you mentioned are performance issues if you want to compare each single entry with all the others in the database - especially if you use slow IdeaScript/VBA functions.
Did I understand you right that you want to compare only names or do you have longer strings? Without knowing your data I would suggest to try IDEAs @Soundex function in a new field. The advantage is that the result can be calculated from a single textfield which is fast. After that operation you can sort that column and see the result. For longer strings you can have a look on Pythons ML algorithms with scikit-learn and feature extraction by bag of words method. But that doesn't work with similar but identical words without further steps.