Skip to main content

Replace characters in CSV

I have a csv file that is badly created and doesn't use "" to encapsulate a text field (which I have no control over). Consequently, when there is a comma in one of the fields it creates an issue for that record.
I think I can rely on a standard format for the error as it is a comma followed by a space followed by a text character, e.g. "Exec Director, Programs" (obviously without the quotes). Is there a way to look for that particular sequence of characters and remove the comma, other than manual review in a text editor? It might be achievable with Python script but I don't have Python. 
Any suggestions for an automated solution would be gratefully received!

klmi Wed, 06/24/2020 - 03:49

In reply to by laura.garner

Python was installed with IDEA. It comes with an older version 3.5.3 (I don't know whether IDEA 11 comes with a newer version because in our country IDEA V10.4 is the latest version) and supports important modules like numpy, pandas, malplotlib and scikit-learn.
For more information please have a look on the following resources:
https://www.youtube.com/watch?v=wNXWIX6LBq8
https://www.youtube.com/watch?v=oGntZi94Wck
http://ideascripting.com/forum/pythonwin-ide-and-gui-framework

klmi Mon, 06/22/2020 - 02:58

Another workaround I'm trying to use in such cases: If it's possible to select the separator for your export use rare chars like # or | as separator.

laura.garner Tue, 06/23/2020 - 18:54

In reply to by klmi

Thanks klmi. Unfortunately I have no control whatsoever over the format of the report - it was the first thing I checked! It's a report that is provided by a system, and it is pretty much a 'take it or leave it' situation.

scotchy33 Mon, 06/22/2020 - 12:06

Hi Laura,
You could use a text editor that supports regular expression (RegExp).  RegExp can be used to look for a particular sequence of characters and remove the comma.
I believe Notepad++ is free and supports RegExp. 
Also, depending on the line length of your file, if your max line length in the file is less than 1000 characters you could import file into IDEA and use IDEAs @RegExp function to clean the extra commas.  Basically you would import complete line as one field into IDEA and use @RegExp.
If you want, you can upload a sanitized version of your file and someone could show you how to clean it.
Scott

laura.garner Tue, 06/23/2020 - 18:58

In reply to by scotchy33

Thanks Scott! I was having a moment and couldn't remember the 'regular expression' term - this is exactly what I was looking for. I'll play around with it and come back if I get stuck.
Laura