Accuracy of field statistics in IDEA?
Forums
Dear Brian,
my question is about the specific accuracy of the field statistics (sum, numeric field) being calculated and shown in IDEA.
Suppose I have a large sample dataset with two columns, one name and the other some value. I create this sample dataset using Python:
import csv
import pandas as pd
counter=range(0, 5000000)
with open(r'C:\myfile\Test.csv', "w", newline='', encoding="utf-8") as fo:
cwriter = csv.writer(fo, delimiter=";", quoting=csv.QUOTE_MINIMAL)
cwriter.writerow(['ColumnName','Value'])
for i in counter:
cwriter.writerow(['A',500.01])
cwriter.writerow(['B',500.49])
cwriter.writerow(['C',1000.05])
df=pd.read_csv(r'C:\myfile\Test.csv', sep=";", encoding="utf-8")
print(df['Value'].sum())
This generates a csv with 5 million records with each having a value of 500.01. Then one line is additionally added with the value of 500.49 and one with 1000.05. So in total this would be 5000000*500,01+500,49+500,05 which gives 2500051000.54. And indeed the output of print(df['Value'].sum()) in Python gives me 2500051500.54. And if I set the options of decimals to maximum I can see 2500051500.5400214. So there is some calculation inaccuracy.
When I now import this csv into IDEA, specifying in the import wizard that "." is the decimal it works and I can see the dataset. Then I click on field statistics and the field statistics are being calculated. They show the number 2.500.051.500,66. So there is some deviation. I was a bit surprised when I saw this deviation. The inaccuracy in the last two digits (the ,66) is not correct. ,54 is the true value. So what is the issue here? Did I do something wrong or is this just the "typical" compuation inaccuracy due to the large volume of data?
I am using IDEA Client Version: 11.2.2 on Windows 10 Enterprise LTSC x64.
This is a "typical" because
This is a "typical" because of the computation inaccuracy. In IDEA I created a running total. You can see it starts to diverge at record 6593.
By the end this divergence gives you the value that you found.