Skip to main content

Experiences with Python

Since installing IDEA 10.3 I tried how to use the included Python modules. Finally my experience is that numpy, matplotlib, scikit and pandas work very well. However to transfer data from IDEA to Python/pandas our support recommends to export all data from IDEA to a Excel or Access file because the performance with RecordSet isn't optimal. From my point of view the performance with a dataset of <20k elements was ok. But if you wouldn't make use of RecordSet export to a CSV file seems the better solution because pandas.read_excel won't work due to the missing module xlrd. Certainly openpyxl is included but it is not as easy to use as pandas.read_excel. In that case pandas.read_csv can be a good alternative.

Brian Element Tue, 02/19/2019 - 11:08

Hello klmi, when I did the word find script I exported to a CSV file which I then loaded into Python.  I think it is the way to go as reading the RecordSet I believe would take alot longer but I haven't actually done any speed tests.  If you have any examples of what you are doing or found I hope you will share.

Thanks

Brian

klmi Thu, 04/04/2019 - 10:10

Following I want to share the RecordSet variant:
 
import win32com.client as win32ComClient
import pandas as pd
 
# create empty pandas DataFrame
idea_df = pd.DataFrame()
 
if __name__ == "__main__":
       try:
             idea = win32ComClient.Dispatch(dispatch="Idea.IdeaClient")
             db = idea.CurrentDatabase()
            
             # Object to read table definitions
             tdef = db.TableDef()
            
             # Objects to read data
             rs = db.RecordSet()
             rec = rs.ActiveRecord()
            
             for col in range(1, rec.NumberOfFields + 1):
                    data_list = []
                    for row in range(1, rs.Count + 1):
                           rs.GetAt(row)
                           data_list.append(rec.ValueAt(col))
 
                    # append each column after reading to DataFrame
                    idea_df[tdef.GetFieldAt(col).Name] = data_list
            
       finally:
             db = None
             idea = None
             tdef = None
             rs = None
             rec = None

Brian Element Fri, 04/05/2019 - 07:10

In reply to by klmi

Thank-you for sharing the code.

One thing I would like to try out when I have time is a speed to test to see if it is faster to read directly from the IDEA file, like you did above, or to save the file as a delimited file and then read it in directly with Python.  Maybe a weekend project for myself.

Thanks

Brian

klmi Wed, 09/30/2020 - 04:57

In reply to by Brian Element

Hi Brian,
for bigger databases it's definately faster to export to a delimited file and to import that file into a pandas dataframe. Therefore I wrote a special IDEA connector class which is based on the example "DateUnification.py" from Passport. It manages the following main jobs: reading fieldnames and their types from IDEA database, export IDEA database to csv (temporary path), import csv to pandas dataframe, export dataframe to csv (temporary path), import csv to IDEA, cleaning temporary files.
Finally it's a lot of code and a elaborate technique. I'm wondering why IDEA doesn't support a user friendly high level interface by nature. Furthermore there are some problems especially regarding formatting issues with csv files.