Random Sampling from multiple databases
Forums
Hi Brian,
I am new to scripting but hoping you can help assist. I have 100+ databases each named "Fund XXX". Each fund contains a column heading "trans" signifying a specific transaction code. I need to select 4 random samples with the "trans" code "SELL" from each of the funds. I then need to export these 4 selections to a database saved as "Fund XXX Sell Random Sample". Nothing needs to be in excel, everything can remain in IDEA.
I have watched your 10 part beginner's video and looked all over this site for a possible solution, but all seem to have minor hiccups. I cannot seem to figure out how to open or save multiple databases without going through each script and updating. I am very new to IDEA so any help is greatly appreciated.
Thanks,
Jon
Hi Jon,
Hi Jon,
Here is a demo for you to check out. It uses loops to read the files in a folder, the folder has to be under the project folder. It stores the file names in the sFiles() array. It then performs an extract on each file extracting all the Sell trans codes and stores the new filenames in the sExtractFileNames() array. It finally performs a random sample on each of the extracts and stores the file names in the sRandomFileNames() array. The final step is to take each of the random samples and combine them into one file.
Let me know if you have any questions. I have also attached the file for you.
Brian
'********************************************************************************************************************************************
'* Script: Multi Random Sample.iss
'* Author: Brian Element - brian.element@ideascripting.com
'* Date: Sept 24, 2016
'* Purpose: A demo script that shows how to read files and perform an extract and random sample and then combine them
'* This script is presented without any warranty or guarantee. Users are encouraged to validate the effectiveness and reliability on their own.
'********************************************************************************************************************************************
Option Explicit
Dim sFolder As String
Dim sFiles() As String
Dim sExtractFileNames() As String
Dim sRandomFileNames() As String
Sub Main
Dim i As Integer
sFolder = getFolder()
Call getFiles(sFolder)
Call performExtraction()
Call performRandomSample()
call appendRandomSamples
client.RefreshFileExplorer
MsgBox "Script Complete"
End Sub
'append the random files together to create one large file
Function appendRandomSamples()
Dim db As database
Dim task As task
Dim i As Integer
Dim dbName As String
Set db = Client.OpenDatabase(sRandomFileNames(0))
Set task = db.AppendDatabase
For i = 1 To UBound(sRandomFileNames)
task.AddDatabase sRandomFileNames(i)
Next i
dbName = client.uniqueFilename("Combined Random Sample")
task.PerformTask dbName, ""
Set task = Nothing
Set db = Nothing
Client.OpenDatabase (dbName)
End Function
'perform a random sample on each file and select 4 items
Function performRandomSample()
Dim db As database
Dim task As task
Dim i As Integer
Dim iSeed As Integer
Dim dbName As String
ReDim sRandomFileNames(ubound(sFiles))
For i = 0 To UBound(sFiles)
Set db = Client.OpenDatabase(sExtractFileNames(i))
Set task = db.RandomSample
task.IncludeAllFields
dbName = client.UniqueFileName(Left(sFiles(i), Len(sFiles(i)) - 4) & "-Random")
iSeed = Int((30000 - 10000 + 1) * Rnd + 10000) 'generate a new random seed for each file
task.PerformTask dbName, "", 4, 1, db.Count, iSeed, False
Set task = Nothing
db.close
Set db = Nothing
sRandomFileNames(i) = dbName
Next i
End Function
'Extract the Sell transactions for each of the files.
Function performExtraction()
Dim db As database
Dim task As task
Dim i As Integer
Dim dbName As String
ReDim sExtractFileNames(UBound(sFiles))
For i = 0 To UBound(sFiles)
Set db = Client.OpenDatabase(sFolder & sFiles(i))
Set task = db.Extraction
task.IncludeAllFields
dbName = client.UniqueFileName(Left(sFiles(i), Len(sFiles(i)) - 4) & "-Extract")
task.AddExtraction dbName, "", "TRAN_CODE == ""Sell"""
task.PerformTask 1, db.Count
Set task = Nothing
db.close
Set db = Nothing
sExtractFileNames(i) = dbName
Next i
End Function
'obtain the imd files from the selected directory
Function getFiles(sFolder As String)
Dim objFSO As Object
Dim objFolder As Object
Dim objFile As Object
Dim colFiles As Object
Dim bFirstTime As Boolean
bFirstTime = True
ReDim sFiles(0)
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFolder = objFSO.GetFolder(sFolder)
Set colFiles = objFolder.Files
For Each objFile In colFiles
If Right(UCase(objFile.Name), 3) = "IMD" Then 'only select IDEA files
If bFirstTime Then
bFirstTime = false
sFiles(0) = objFile.Name
Else
ReDim preserve sFiles(UBound(sFiles) + 1)
sFiles(UBound(sFiles)) = objFile.Name
End If
End If
Next
Set objFile = Nothing
Set colFiles = Nothing
Set objFolder = Nothing
Set objFso = Nothing
End Function
Function getFolder() 'this one uses the working directory as the highest level directory
Dim BrowseFolder As String
Dim oFolder, oFolderItem
Dim oPath, oShell, strPath
Dim Current As Variant 'per Windows documentation this is defined as a variant and not a string
Set oShell = CreateObject( "Shell.Application" )
Set oFolder = oShell.Namespace(17) 'the 17 indicates that we are looking at the virtual folder that contains everything on the local computer
Current = Client.WorkingDirectory()
Set oFolder = oShell.BrowseForFolder(0, "Please select the folder where the files are located:", 1, Current)
If (Not oFolder is Nothing) Then
Set oFolderItem = oFolder.Self
oPath = oFolderItem.Path
If Right(oPath, 1) <> "\" Then
oPath = oPath & "\"
End If
End If
getFolder = oPath
End Function
Hi Brian,
Hi Brian,
Thank you so much for all the help thus far. Really appreciate it.
I have imported all my databases into a single folder as instructed above. When I run the script, I select the folder with the databases and it gives me an "Error on line 79 - Bad Equation provided" error. Do you have any tips on how to address such an error?
Thanks,
Jon
Hi Jon,
Hi Jon,
The script was just an example as I didn't have access to your database. The problem would be in this line where you are extracting the Sell items, most likely I don't have the correct field name. So modify this line and it should work.
"TRAN_CODE == ""Sell"""
If you are not sure of the equation perform the extraction on one of the files for all the Sell items and then go look in the history to see the proper syntax for the extraction.
Hope this helps.
Brian
Hi Jon and welcome to the
Hi Jon and welcome to the site.
Looks like you have an interesting problem but quite doable. I will see if I can put something together for you to check out and start off with.
What the script will have to do is loop through each file, then extract all the transactions with a trans code of SELL, then select 4 items and finally append them all together.
Probably the easiest way to do this is to assume that all the files are ones you want to perform the sampling on, so you would move all the files into a separate directory and then select it from the script.
I will see what I can come up with for you.
Brian