Duplication on list
Forums
Hello,
I am Alice, I'm new in this forum and on IDEA.
I'm looking for some script/function whitch could : analyse a list with different parameters, and would be able to detect any duplication (perfect duplication on each paramater OR duplication for few parameters) . The result would be : the list with 1 item.
EXAMPLE:
my list :
NAME / COUNTRY / AGE
1 - Alice - France - 30
2 - Paul - Canada - 30
3 - Alice - France - 30
4 - Alice - Spain - 20
RESULT 1 : detect perfect duplication =>
1 - Alice - France - 30
2 - Paul - Canada - 30
4 - Alice - Spain - 20
RESULT 2: detect duplication on "NAME"=>
1 - Alice - France - 30
2 - Paul - Canada - 30
Could you help me please :) ?
Thank U so much :))
Hi Alice,
Hi Alice,
From your results it looks like you want to perform a summary as it seems you want to keep items even if they only appear once in your file.
For result one select summary and summarize on Name, Country and Age. IDEA allows you to summarize on up to 8 different fields. This will give you the result you are looking for along with another field that will indicate the number of records.
For your second result just perform another summary by Name. In your result even though you only say you want the Name field you have all the fields. So in the Alice Name you have her listed as France but you also have an Alice from Spain which would be different, so is your result really only for the Name and you shouldn't be including the other rows or are you actually looking for something different in your result?
The solution klmi is suggesting can only be run in IDEA 10.3 or later so you do not have this option with IDEA 9.2.
Here is a video on performing a summary: https://www.youtube.com/watch?v=0KTUZU8toBM&list=PLEE1l8LoXUCLS2GYi5QsvNuPuRoez3L2v&index=10&t=0s
Here is one on performing duplicates: https://www.youtube.com/watch?v=7bMj0VMHYE4&list=PLEE1l8LoXUCLS2GYi5QsvNuPuRoez3L2v&index=14
However IDEA can search for
However IDEA can search for duplicates it's a lot of work to provide your suggested output with IDEA/IDEAScript. I would prefer to use IDEA's python functionality (IDEA 10.3 is necessary) because it's much easier:
1) pass all your data from IDEA to a pandas dataframe by
a) export data from IDEA to csv file and read csv with pandas.read_csv
OR
b) use RecordSet method (possibly not recommended for bigger databases)
2) find duplicates with DataFrame.duplicates() ==> here you will find exactly your problem: https://thispointer.com/pandas-find-duplicate-rows-in-a-dataframe-based…
3) export dataframe to csv with DataFrame.to_csv
4) Import csv to IDEA