Skip to main content

csvDefinition: CsvFileEncoding=2 does not recognize ä,ö,ü

Hi,
I am using the normal csvDefinition to import a csv into IDEA. The code is mentioned also here:
http://www.ideascripting.com/comment/3213#comment-3213
Sub Main

csvPath = "Source Files.ILB\Delimited.csv"
rdfPath = "Import Definitions.ILB\Delimited.RDF"
firstRowAsFieldNames = TRUE
CsvEncodingUTF8 = 2

' Create, configure, and save the definition file.
Set csvDefinition = Client.NewCsvDefinition
csvDefinition.DefinitionFilePath = rdfPath
csvDefinition.CsvFilePath = csvPath
csvDefinition.FieldDelimiter = ","
csvDefinition.TextEncapsulator = """"
csvDefinition.FirstRowIsFieldNames = firstRowAsFieldNames
csvDefinition.CsvFileEncoding = CsvEncodingUTF8
Client.SaveCSVDefinitionFile csvDefinition

' Define the output name and perform the import task.
dbName = "Delimited.IMD"
Client.ImportUTF8DelimFile csvPath, dbName, FALSE, "", rdfPath, firstRowAsFieldNames

' Open the result.
Client.OpenDatabase (dbName)

End Sub
 
My problem is that my original csv does contain letters like ä,ö and ü. These are shown with an "?" in my final IDEA file. So the import does work, no error, but these ö,ä,ü are not properly recognized.
 
I tried to set
CsvEncodingUTF8 = 1
But it does not work. I get an error that my file description file does not suit to my API.
 
What am I doing wrong? Is there a way to get this working?
 
Thanks for any help!

klmi Tue, 03/30/2021 - 03:45

Hi Bert,
I created a little csv-File with UTF8 encoding and had no such issues (with CsvEncodingUTF8 = 2). However an error was thrown when I tried to import that file with CsvEncodingUTF8 = 1. IDEA's language browser says that CsvEncodingUTF8 = 1 should be used with ASCII encoding (7 bits) and CsvEncodingUTF8 = 2 with UTF8 encoding.
The good news is that I can reproduce your issue when I create the csv-file with ANSI encoding (8 bits).
So do you actually know the encoding of the csv-file?
I recommend to try Notepad++ which also allows to convert encodings.

Brian Element Tue, 03/30/2021 - 07:04

Hi Bert,

I think your problem is that you are using the ASCII version IDEA and that might be why the characters are not coming in properly.  IDEA cannot alwasy display unicode characters properly and that is why they replace it with the ? in the text.  You can try opening the file with notepad and then resaving the file as an ANSI file, sometimes that will work in the characters have ANSI equivalents which I think these might or you think about switching your IDEA version to the Unicode version.

I would also recommend talking to your distributer about this problem.

Good luck

Brian

Bert_B Wed, 04/07/2021 - 03:35

First of all a big thanks for these good 2 answers!
I checked it with a simple test file. I created a csv which contains only two rows, one header and one with some text containing special characters like "ö, ü". When I import it with the IDEA wizard manually, it works. So the final imd file displays it correctly. However, when I do it with my csvDefinition makro (CsvEncodingUTF8 = 2) it does not work. The final imd file does not display ö, ü, but "?" instead.
 
I do not have notepad available unfortunately. However, I can use the regular editor to save a file with ANSI or UTF-8 encoding. And indeed! Turns out it works when I save the file with UTF-8 encoding. So then using my makro with csvDefinition works and the final imd file displays it correctly. So that hint was helpful, thank you very much!
 
The only thing I do not understand is why the manual import using the IDEA wizard works with an ANSI file. As far as I can see the answer is as follows: Using this manual import I do have an option of setting "UTF-8 source file". The system somehow automatically detects the source encoding correctly. When I try it with an ANSI file this option does not show a check-mark and it works (when I place a check mark into it it does not work. This is logical, as my source is indeed ANSI and not UTF-8). When I try it with the UTF-8 saved version this option automatically is filled with a check mark. So it detects that the source encoding is UTF-8 and it works. When I remove the check mark it does not work.
 
So I only ran into this problem as I tried to automate the import using csvDefinition and then this feature is missing. I think I cannot fix this in the IDEA code, as there is no option for this? I now fixed it with editing my Python code before so that I export it as an UTF-8 encoded file and not ANSI.
 
Problem solved thanks for your help!

The website encountered an unexpected error. Try again later.