Replace characters in CSV

8 posts / 0 new
Last post
laura.garner
Offline
Joined: 09/04/2014 - 19:12
Replace characters in CSV

I have a csv file that is badly created and doesn't use "" to encapsulate a text field (which I have no control over). Consequently, when there is a comma in one of the fields it creates an issue for that record.
I think I can rely on a standard format for the error as it is a comma followed by a space followed by a text character, e.g. "Exec Director, Programs" (obviously without the quotes). Is there a way to look for that particular sequence of characters and remove the comma, other than manual review in a text editor? It might be achievable with Python script but I don't have Python. 
Any suggestions for an automated solution would be gratefully received!

klmi
Offline
Joined: 02/13/2019 - 08:41

What's your version of IDEA? V10.3 and higher has Python support inside.

laura.garner
Offline
Joined: 09/04/2014 - 19:12

I'm on v11 which has Python support. But don't I have to have Python installed on my computer to either create a Python script or run the script through IDEA?

klmi
Offline
Joined: 02/13/2019 - 08:41

Python was installed with IDEA. It comes with an older version 3.5.3 (I don't know whether IDEA 11 comes with a newer version because in our country IDEA V10.4 is the latest version) and supports important modules like numpy, pandas, malplotlib and scikit-learn.
For more information please have a look on the following resources:
https://www.youtube.com/watch?v=wNXWIX6LBq8
https://www.youtube.com/watch?v=oGntZi94Wck
http://ideascripting.com/forum/pythonwin-ide-and-gui-framework

klmi
Offline
Joined: 02/13/2019 - 08:41

Another workaround I'm trying to use in such cases: If it's possible to select the separator for your export use rare chars like # or | as separator.

laura.garner
Offline
Joined: 09/04/2014 - 19:12

Thanks klmi. Unfortunately I have no control whatsoever over the format of the report - it was the first thing I checked! It's a report that is provided by a system, and it is pretty much a 'take it or leave it' situation.

scotchy33
Offline
Joined: 09/05/2012 - 15:51

Hi Laura,
You could use a text editor that supports regular expression (RegExp).  RegExp can be used to look for a particular sequence of characters and remove the comma.
I believe Notepad++ is free and supports RegExp. 
Also, depending on the line length of your file, if your max line length in the file is less than 1000 characters you could import file into IDEA and use IDEAs @RegExp function to clean the extra commas.  Basically you would import complete line as one field into IDEA and use @RegExp.
If you want, you can upload a sanitized version of your file and someone could show you how to clean it.
Scott

laura.garner
Offline
Joined: 09/04/2014 - 19:12

Thanks Scott! I was having a moment and couldn't remember the 'regular expression' term - this is exactly what I was looking for. I'll play around with it and come back if I get stuck.
Laura