Possible Workaround for Fuzzy Join Across Two Files

1 post / 0 new
Brian Element's picture
Brian Element
Offline
Joined: 07/11/2012 - 19:57
Possible Workaround for Fuzzy Join Across Two Files

Hello Group Members,

 
We are all aware of the Fuzzy Duplicate feature in IDEA which allows us to identify similar (near) character string (field) pattern matches. The Fuzzy Duplicate necessitates the character data is in a single column like Address, Item Description, Beneficiary Name within a single IDEA database.
 
Now if you have two files ~ File A ~ contains Beneficiary Name with sample names like Wilford Brandt, Kevin Dune, Serene Joy and Gabe Rhames
 
File B ~ contains Prohibited Party Names with sample names like Wilfforrd Brand, Sarah Pack, Amy Dell and Bay Buch
 
If you look at the same names Wilford Brandt from File A is a near pattern match with Wilfforrd Brand from File B.
 
However if you try a Join or a Visual Connect between File A and File B on Beneficiary Name and Prohibited Party Name you will get nil cases as none of the Names are an exact match (precondition for the Join and Visual Connect).
 
So as a workaround you can do the following ~
 
(A) Change the Field Name in both files from Beneficiary Name to Name and Prohibited Party Name to Name using Modify Fields
 
(B) Append a virtual character field with length say 25 in File A with criteria in the Equation Editor being "File A - Beneficiary Payments" and in File B with criteria in the Equation Editor being "File B - Prohibited Party Listing"
 
(C) Now Append Databases. Keep File A open as the Primary file and append File B. The append will go through successfully as both files have the same Field viz Name
 
(D) Now with the both the data sets captured in a single file and with the Field 'Name' in a single column - Fuzzy Duplicate can be applied on the field 'Name' with say Similarity Degree being 85%
 
(E) The result will throw up Wilford Brandt from File A as a near pattern match with Wilfforrd Brand from File B but one below the other.
 
(F) The final step will be to apply a Duplicate Key Exclusion on the Fuzzy child file with Fields to Match being 'Group Name' and Fields that must be Different being File Name
 
This will give you similar names across the two files. A convenient and feasible workaround to a Fuzzy Join across two files.
 
Best Regards
 
Group Admin Team