Import file and append to database in one go
Forums
Hi all,
I need to import a very large number of files and append them into one master database.
I have written a simple script to import the files, and another simple scripte to append them into one database. So, I can just run the two scripts one after the other and be done.
However, due to the very large number of files (>500k), I am a bit worried about performance: IDEA needs to 'open-read-write-close' the files twice: first for the import-routine and then for the append-routine.
Is there a possibility in IDEA to import a file and write all the records directly to the master database, before going to the next file? (I'm looking for something similar to ACL's "EXTRACT RECORD TO filename APPEND"-command).
Thanks for any suggestions,
Marc
Hi Brian,
Hi Brian,
Sorry for the late reply, I have been 'offline' for a while.
Yes, I did mean 500k files. They are very short and small electonic messages that contain individual order lines that need to be put in a database before they can be reconciled to some financial statements. I realize importing 500k is 'ambitious' to say the least, so I'll probably try a short period first, just to see the method works.
Anyway, thanks for your comment, much appreciated. I was not aware of the append limit of 32,768 files. That's good to know, as I'll need to use IDEA: I unfortunately don't know how to write scripts in Python...
Marc
Hi Ravi,
Hi Ravi,
The limitations of IDEA mentioned by Brian, made me realize I was perhaps a bit over ambitious. I have reconsidered the approach and decided to use a sample with a high reliability. This results in a number of records that is manageable with the simple scripts I created for importing and appending.
But thanks for offering your help.
Marc
Excellent. When problems are
Excellent. When problems are solved by self, it is often the most satisfying.
I was planning to use the following flow in a loop to address your requirement.
Read transaction file -> parse to get data for required fields -> append to a database.
Here, the restrictions of the APPEND command will not apply. Also, IDEA creates a new database when appended to rather than adding to an existing database like ACL does. Regarding speed, please refer to the new scripts I published to this website which use recordset based loops. I would still be interested to take a look at your data out of curosity, if you could to send some sample. However, please dont bother if you are not inclined.
Cheers,Ravi
Hi Ravi,
Hi Ravi,
I appreciate your help and in return was willing to send you some sample data 'to play with'. However, the files are XML-files from a client (a trucking company), containing both client and (client's) customer information. I started to anonimize the data, until I discovered that a typical file contains some three thousand lines I need to check and anonimize by hand (the data is litterally scattered over the file in various places; for my purpose I only import part of the file, typically resulting in a database with appr. 10 records with 15 fields each...) I hope you understand I didn't go through with it...
I will hava a look at your newly published scripts. I'm sure I'll learn from them.
Thanks again, Marc
HI Marc,
HI Marc,
When you say >500k do you really mean greater than 500,000 files? That is huge, how come so many files? I am not sure if IDEA can actually handle that many files. I know the append is limited to 32,768 files so you would have to do multiple appends and then append the appends. Unfortunately there is no way to do it in one shot, especially with that number of files. You could also probably create a script to create the database directly, because of the number of files it might have to be done using Python. You might want to run this question by IDEA support as I have never dealt with that many files before so I can't really answer your question except you will have to first do the import and then the append unless you create a script that reads the files and adds the records to the IDEA database directly.
Brian