New to Idea - Normalize Vendor Names
Forums
HELP! Is there a way to normalize vendor names in Idea? For example, AMAZON MKTPLACE PMTS to AMAZON, WAL-MART002 to WAL-MART, UPS 091238547 to UPS...etc?
The database I’m working with has at least 500,000 records.
Thanks in advance
Hi Beth,
Hi Beth,
One suggestion is you first look for duplicates, such as vendors that have the same phone number, address, email, etc. Those would usually be exact duplicates. You remove those from your population and then run the fuzzy duplicates on the rest. This will cut down on the number of transactions you will need to look at.
The fuzzy duplicates alows you to play with a tolerance, I think the default is .7, using a higher one and you will get less duplicates having a lower one you will get more. This is probalby the best place to start but you will still have to review as two vendors can have similar names but different companies. Depending on the other information in the file you might also look for duplicates that have the same address, same phone numbers, same email addresses. Usually performing something like this is doing several tests looking for duplicates and then putting them together.
Hopefully this will get you started.
Brian
Hi Beth,
Hi Beth,
You could clean up the file by creating one field with nested functions or a series of fields using each function separately until you got the desired end result. I will use your example to create a one equation field.
First right click in the vendor name field and choose append field from the pick list. Give the field a name like shrtvendname make it virtual character and use a length of 50 to make sure you don't tuncate a large name.
For your example the equation @left(@justletters(vendorname),@isini(" ", vendorname)-1) will return: AMAZON, WALMART, UPS. To explain the funciton, you are going inside vendorname and having IDEA look at just the letters then telling it to start at the left of the field and pickup letters up to a space, then back up one character to remove the trailing space.
This may not work for all your vendors becuse you could have vendors like FOREVER21, which is the name of the venodor and losing the 21 will be consfusing if you have another Vendor named FOREVER YOURS FASHIONS.
Cleaning up a field like you describe is not an easy task but through inventive use of functions and posible multiple fields it can be done.
Good luck,
Steve
Hi Beth,
Hi Beth,
Which version of IDEA are you using? V10 has options to look for fuzzy duplicates that could help you out. Once I know the version I will be in a better position to help you out but sometimes it is tough to normalize something like this.
Brian