Automated Data Import to IDEA from Python (Pandas DF)

6 posts / 0 new
Last post
dlkrue
Offline
Joined: 01/12/2022 - 05:26
Automated Data Import to IDEA from Python (Pandas DF)

Dear all,
I want to import several tables into IDEA which I basically currently have in memory as (Python) pandas dataframes with intact tables schema (thus, correct dtypes, there are strings, integers, dates, floats...). I would like to find an easy way to automatically import many such tables to IDEA and preserve schema information. To do so, I had a look at the IDEA Import Dialog and I considered several options but did not find a nice solution yet.
I considered the following options:
1) CSV would work, but schema gets lost (or has to be specified manually in IDEA)
2) SAP AIS could be generated on the Python side, but I do not see any option for importing many such files automatically into IDEA (I tried to generate a macro for import but it needs the rdf-file generated on import and I do not have it nor do I see an option to generate it from Python)
3) MS Access would be writable from Python side preserving table schema and can be imported in Idea. This seems to be a good solution. However, it does not work for me because I have a 32bit Office version with corresponding 32bit ODBC driver installed + 64bit Python /IDEA (and I cannot easily change that). 
4) Apart from IDEA we are also using SmartExporter (provided by AUDICON) and it implements are so called manufacturer specific interface ('Herstellerspezifische Schnittstelle' is the german term). It allows mass import of csv-files with the schema provided in an index.xml-File  + .info-xml file, both of which contain all table and schema information. This would be extendible to import more tables. The xml-files could be read from Python, extended and rewritten. Data itself could be exported as csv. Then the import works (via Herstellerspezifische Schnittstellen in SmartAnalyzer Ribbon of the Idea GUI, ignoring the import wizzard).
However, this seems to be a rather complicated solution. Therefore, my question: is there an easier way to export many tables from python and automatically import these tables again in IDEA automatically?
Thanks a lot in advance.
Daniel
 
 

klmi
Offline
Joined: 02/13/2019 - 08:41

The answer likely depends on the IDEA version which is used by you.
Unfortunately IDEA 10.4.1 has no high level interface to transfer data between IDEA and Python. The prefered solution is to import/export the data via csv/text-files as shown in CaseWare's DateUnification example. You will find that example in IDEA Passport.
I've read that IDEA 11 would almost have a higher level interface, however I'm still working on 10.4.1 I can't test that at the moment. Besides Audicon staff told me that the interface in the background also relies on text-files to transfer the data.

dlkrue
Offline
Joined: 01/12/2022 - 05:26

Thanks a lot for your reply. I'm using IDEA 11 (Version 11.2.2) but I did not discover any such interface for automated import of several tables (and preserving table schemata) or alternatively single-file import  automated by an IDEA Macro (which iterates over the tables). If there is such a way, it would be great and help me a lot.
 

Brian Element's picture
Brian Element
Offline
Joined: 07/11/2012 - 19:57

Hi dlkrue,

What you need is to access the IDEALib.py module.  In this module their is funcationality to take a Panda dataframe and import it into IDEA.  The module be found in:

C:\Program Files\CaseWare IDEA\IDEA\Lib\site-packages

If you are running your python directly from IDEA then you already have access to it.  If you are running it outside of IDEA you will have to make a copy and then put it in a post where your python script can access it.

The import is:

import IDEALib as ideaLib
 
The line to import your dataframe into IDEA is:
 
ideaLib.py2idea(dataframedatabaseNameclient=NonecreateUniqueFile = False)
 
Which calls this
 
'''
Takes in a dataframe,database name and idea client
Creates an IDEA database with the same name and data
Returns the IDEA database object if successful
'''
def py2idea(dataframe, databaseName, client=None, createUniqueFile = False):
 
Hopefully this will get your info into IDEA.
dlkrue
Offline
Joined: 01/12/2022 - 05:26

Thanks, that worked for me with one minor adjustment (line 383):
df.to_csv(open(exportPath, 'w', newline=""), sep=DELIMITER, index=False, header=True, encoding='utf-8', decimal=DECIMAL_SEPARATOR, quoting=csv.QUOTE_NONE)
Otherwise, additional newlines were included in the intermediately generated csv-files which yielded an empty DB in IDEA. 
 
This is my sample code in case someone wants to reproduce it (C:\Temp contains the adjusted IDEAlib.py).
 
import sys
import logging
#sys.path.append(r"C:\Program Files\CaseWare IDEA\IDEA\Lib\site-packages") # original file
sys.path.append(r"C:\Temp") # modified IDEALib.py with adjustment in line 383
 
import IDEALib as ideaLib
import pandas as pd
 
# Test-data
s1 = pd.Series(["a", "b", "c"], dtype=str)
s2 = pd.Series([1, 2, 3], dtype=int)
s3 = pd.Series([3.3, 2.2, 1.1], dtype=float)
s4 = pd.to_datetime(pd.Series(["2020-01-01", "2021-01-02", "2022-04-08"], dtype=str))
 
df = pd.concat([s1, s2, s3, s4], axis=1)
df.columns = ["string", "int", "float64", "date"]
df.info()
 
logging.basicConfig(filename="log.log", format='%(asctime)s %(levelname)s %(message)s', level=logging.INFO, filemode='w', force=True)
logging.getLogger().addHandler(logging.StreamHandler())
ideaLib.py2idea(df, databaseName="pandasDf")
 
 
 

Brian Element's picture
Brian Element
Offline
Joined: 07/11/2012 - 19:57

Thanks for sharing your code.