Question: Extracting

7 posts / 0 new
Last post
lelder
Offline
Joined: 05/08/2018 - 18:35
Question: Extracting

I am trying to extract records that have a gap in a time series. I have twelve months worth of data that includes a unique identifier and a month field. The typical unique identifier will have 12 records--one for each of the twelve months. Some unique identifiers have fewer than 12 records because there is not a record for one or more of the months. Also, a few of the records have duplicates for one or more of the months. What I want to do is extract all records for unique identifiers that do not have a record for all 12 months. Does anyone have a suggestion on how to do this?
lelder

Brian Element's picture
Brian Element
Offline
Joined: 07/11/2012 - 19:57

Hi lelder and welcome to the site.

Any chance you can post some example data, I am a visual person and seeing it will let me understand it better.

Thanks

Brian

lelder
Offline
Joined: 05/08/2018 - 18:35

My data looks like this:
ID                       Month    Payment     Group
9654324934  1/1/2018  10.14             Red
9654324934  2/1/2018  17.85             Red
9654324934  3/1/2018  49.50             Red
9654324934  4/1/2018  95.98           Red
9654324934   5/1/2018  42.78        Red
9654324934   6/1/2018  60.88          Red
9654324934   7/1/2018  15.58            Red
9654324934   8/1/2018  71.82            Red
9654324934   9/1/2018   3.62            Red
9654324934   10/1/2018  37.11           Red
9654324934    11/1/2018   78.78          Red
9654324934   12/1/2018   4.71          Red
6922129544    1/1/2018    40.34          Green
6922129544   2/1/2018    25.51            Green
6922129544    3/1/2018  82.02          Green
6922129544    4/1/2018   71.10           Green
6922129544    6/1/2018    64.16          Green
 

Steven Luciani
Offline
Joined: 07/31/2012 - 13:20

Hi lelder,

You can summarize the data Analysis Tab then summarization under the categorize group.

You stated you have a month field but your example shows a date field. You will have to create a month field by appending a numeric or virtual numeric field (Data tab, fields group) prior to summarizing your data.

@month(month) - field name in your example

Now summarize your data by the ID field and total on the month field.

A perfect ID set will show a number of records (field created by the summarization feature) of 12 and a month total of 78.

You would then extract all ID's where the number of records is less than 12 or the number of records is 12 and the month field total is not equal to 78.

No_Of_Recs <12 .or. ( No_of_Recs = 12 .and. month_sum <>78)

This will give you a listing of ID's that are not exactly 12 different months of data.

You now complete a mathces only join of this list back to your original data set.

Primary File: Original Data

Secondary file: Extracted list from the summarization

Match: ID field in both files

Join type: matches only.

You will end up with a new file with records for ID's that either do not contain 12 months of data or have 12 or more records per ID because of duplicate records.

Cheers,

Steve

lelder
Offline
Joined: 05/08/2018 - 18:35

Steve,
This is fantastic! Thanks for your time. Your instructions were great and I successfully produced a file. I wonder if there isn't still an issue with the duplicates. There are IDs that have greater than 12 records and that have a month_sum not equal 78 that could be missing a month. I don't think this scenario is too prevalent in my data though. Thanks again for your help.
lelder

Steven Luciani
Offline
Joined: 07/31/2012 - 13:20

Glad to help.

One way to deal with the duplicates prior to executing my previous instructions is to summarize the original file first by ID and the month field you created. This will leave you with a file where you will only have one record per id and month, therefore you will never have and ID with greater than 12 records or an ID with 12 records that doesn't contain all 12 months. you can then summarize by just ID and extract No_of_Rec less than 12 then do your matches only join back to the original file.

lelder
Offline
Joined: 05/08/2018 - 18:35

Thanks again Steve! That seemed to do the trick. That summary option is pretty powerful.
I started using IDEA two weeks ago so I can use all the help I can get. I hope after awhile I can help others as effectively as you.
Sincerely,
lelder