7 Batch processing with PLMScoRe - Steph-Fulda/PLMScoRe GitHub Wiki

Batch processing refers to the automatic and sequential processing of more than one data set. For example, to run the PLMScoRe program on all records collected for the same study with the same specifications, ideally one after the other without the need without further need for input. Batch processing is very efficient and quick in R and given that the processed data tables are small (compared to other data sets), there are no anticipated problems in processing 100s or 1000s of records in one go.

Setting up a batch processing routine, the task are:

identify the REMLogic txt files to be processed
load an adequate RLs object
set up the batch processing routine in R
eventually decide on specific output you want to save in separate tables

There are also some final tips and final remarks.

For now, let's assume you want to process all files from the same study, all with identical specifications. In particular, the number of columns in the data matrix and the order of these columns must be the same. It also means that the names of the leg channels are exactly the same, as are the names of the sleep/wake stages. In addition, all possible arousal names and respiratory event names that have been used in the study are part of the respective sub-lists of the RLs object. The best way to assure that this is the case, is to manually and interactively edit the RLs object by starting the 'startPLMScoRe' routine with the first txt file of the study. If you then choose the option "all annotations" (and not "scored annotation") you make sure that you select all possible annotations in the respective categories. At the end of the interactive phase, opt to save the RLs object (e.g. as "Study_X_RLs.RData"). See what could possibly go wrong (WCPGW).
For the reminder, unless otherwise specified, I will assume:

that all your REMLogic text files are in a common folder on your hard drive (here: D:/Study_X/)
in this particular example, there are three txt files labeled Study_X_S01.txt, Study_X_S02.txt,Study_X_S03.txt. As you will see below, there is no specific naming convention necessary, it is only important that they are txt files and there should be no txt in the folder that is not a REMLogic file.

##Locating the REMLogic txt files
The first step is to specify where the REMLogic txt files are located, so that the PLMScoRe program can cycle through them. This is achieved with the following line:

> study_x_fn<-list.files("D:/Study_X/", pattern=".txt", full.names=TRUE)

In the above command, "D:/Study_X/" is the path to your folder, pattern=".txt" ensures that only text files are selected, and full.names=TRUE ensures that the complete path is returned. Let's inspect the resulting vector:

> study_x_fn
[1] "D:/Study_X/Study_X_S01.txt" "D:/Study_X/Study_X_S02.txt" "D:/Study_X/Study_X_S03.txt"  

> length(study_x_fn)
[1] 3

Now study_x_fn contains the location of all three files that are in you study folder.

##Load the RLs object
The next step is to load the adequate RLs object, the REMLogic specifications. You either create it first or if you have already created and saved one than you just load it.

Case 1: You interactively create the RLs object by starting the PLMScoRe routine, as usual (e.g. plmresult<-startPLMScoRe). When inputting the annotations make sure you choose all from the "all annotation" list. When asked by the program, save the RLs object, for example as Study_X_RLs.Rdata in the D:/Study_X_/ folder. At the moment your RLs object is available in R as plmresult$RLs. It is a good idea, to create a single object containing only the RLs object, otherwise there is the danger of overwriting the object, if you recycle the plmresult. To do this type Study_X_RLs<-plmresult$RLs and from then on the RLs object will be available in R (in the current session) as Study_X_RLs.
Case 2: You already created an adequate RLs object and now only have to load it. Let's say you already saved it as Study_X_RLs.Rdata in the D:/Study_X_/ folder. All you have to do is type load("D:/Study_X/Study_X_RLs.Rdata"); there is now the loaded object in your R session. As before, it is best to assign it right away to a uniquely names object Study_X_RLs<-RLs.

##Setting up the batch processing routine in R
You have created a vector with the exact file locations (study_x_fn) and have loaded the correct RLs object (Study_X_RLs); you are now ready to set up the proper batch processing routine in R. Basically, you will want to loop over each file, process it, and save the results. Here is how to do that:

#Set up the loop
> for (i in 1:length(study_x_fn)){
          study_x_result<-StartPLMScoRe(
                                       RLs=Study_X_RLs,     #specifies which RLs object to use
                                       fn=study_x_fn[i],    #cycles through all your files 
                                       silent=1)            #ensures that you want be asked for anything
      }

#Of course you can also write this in fewer lines
> for (i in 1:length(study_x_fn)){
          study_x_result<-StartPLMScoRe(RLs=Study_X_RLs, fn=study_x_fn[i],silent=1)          
      }
#Or even
> for (i in 1:length(study_x_fn)){study_x_result<-StartPLMScoRe(RLs=Study_X_RLs, study_x_fn[i],silent=1)}

If everything went ok, you should have seen the output tables on your R screen and in your Study_X folder are several new files (only if you have also chosen screen, csv, and pdf as output).

#These are now all files in your Study_X folder
> list.files("D:/Study_X/")
 [1] "Study_X_RLs.RData"           "Study_X_S01.txt"             "Study_X_S01_plm_results.csv"
 [4] "Study_X_S01_summary.pdf"     "Study_X_S02.txt"             "Study_X_S02_plm_results.csv"
 [7] "Study_X_S02_summary.pdf"     "Study_X_S03.txt"             "Study_X_S03_plm_results.csv"
 [10] "Study_X_S03_summary.pdf"

In theory, it was useless to assign the PLMScoRe results to study_x_result within the loop, you could simply have written StartPLMScoRe(RLs=Study_X_RLs, study_x_fn[i],silent=1) in the loop and the result would have been the same. I left it there because it helps with debugging in case of errors, if the routine throws an error the first thing I do is just type 'i' and 'study_x_fn[i]' to know which files had problems, and next I inspect the last PLMScoRe object, which is then conveniently available in study_x_results.

##Separately saving selected output in a single table The batch processing routine above will generate the selected output, which will be saved in separate files. Often it may be more efficient - if you are interested in only a few parameters - to save them directly in a single table. To do that we will make use of the pprint() method. Here is how you can do that:

#In this example, you want to save the PLMS and the PLMSnr index of all processed files
#First, set up an empty table
> study_x_table<-as.data.frame(                      #make the table a data frame
                          matrix(                    #create the table
                                 NA,                 #fill the table with NA
                                 length(study_x_fn), #Number of rows
                                  2))                #Number of columns
#Or in one line
> study_x_table<-as.data.frame(matrix(NA, length(study_x_fn), 2))

#Name the two variables
> names(study_x_table)<-c("PLMS", "PLMSnr")

#And this is what you created
> study_x_table
   PLMS PLMSnr
 1   NA     NA
 2   NA     NA
 3   NA     NA

#Integrate it into the batch processing

for (i in 1:length(study_x_fn)){
  study_x_result<-StartPLMScoRe(RLs=Study_X_RLs, fn=study_x_fn[i], silent=1)
  study_x_table[i,1]<-pprint(study_x_result$Stats, sel=c("PLM", "no./hour", "TST"), table=0, pretty=0)
  study_x_table[i,2]<-pprint(study_x_result$Stats, sel=c("PLMnr", "no./hour", "TST"), table=0, pretty=0)
  }

#And this is what you got:

> study_x_table
      PLMS    PLMSnr
1 17.81726  4.720812
2 23.59281 11.137725
3 42.15422 31.138311

Please, check out the WCPGW.

##Some final tips Just to round everything off, here are two final tips that you might find helpful. First, to avoid running the routine and then loading each separate output file for further post-processing, you can also directly save all or parts of the output in a single object.

#We are going to save all statistics tables in a common list
#First create the list where you put the results
> study_x_all_stats<-vector("list", length(study_x_fn))

#Next, loop over the file
>for (i in 1:length(study_x_fn)){
  study_x_result<-StartPLMScoRe(RLs=Study_X_RLs, fn=study_x_fn[i], silent=1)
  study_x_all_stats[i](/Steph-Fulda/PLMScoRe/wiki/i)<-study_x_result$Stats
}

#The result is a list with 3 elements
> length(study_x_all_stats)
[1] 3

#Each element is the statistic table and can be best looked at with View
> View(study_x_all_stats[1](/Steph-Fulda/PLMScoRe/wiki/1))

You can do that for the statistic table, for the data table (study_x_result$Data), for the RLs object (study_x_result$RLs), or the total object. If you have not already done it, check out how to navigate lists and files.

The final tip is a bit more advanced and deals with the situation where the REMLogic specifications are not the same for all files. Let's take the case where in some of the recordings the left leg channel has the name "EMG.Tibialis-Leg.Left" and in some other - for reasons that will forever remain obscured - it is labeled "LLeg1-LLeg2". The following examples deal with two cases, in the first you know which files have which label, in the second you know only that some have the other label but don't know which.
For simplicity, let's say you have 10 files, 4 of which have the other leg label.

#Case 1: You know that files no. 3,4,7,10 have the label LLeg1-LLeg2
#Your Study_X_RLs object contains the label "EMG.Tibialis-Leg.Left"

#identify the different files
> other_leg_name<-c(3,4,7,10)

#integrate it into the batch processing routine

    >for (i in 1:length(study_x_fn)){
        if(is.element(i, other_leg_name)){           #for those files with the other leg name
              Study_X_RLs[1](/Steph-Fulda/PLMScoRe/wiki/1)[1](/Steph-Fulda/PLMScoRe/wiki/1)[1](/Steph-Fulda/PLMScoRe/wiki/1)<-c("LLeg1-LLeg2")     #change the leg label
              study_x_result<-StartPLMScoRe(RLs=Study_X_RLs, fn=study_x_fn[i], silent=1)   #run PLMScoRe
           }   
        if(!is.element(i, other_leg_name)){           #for those files NOT with the other leg name
              Study_X_RLs[1](/Steph-Fulda/PLMScoRe/wiki/1)[1](/Steph-Fulda/PLMScoRe/wiki/1)[1](/Steph-Fulda/PLMScoRe/wiki/1)<-c("EMG.Tibialis-Leg.Left")     #change the leg label to the original
              study_x_result<-StartPLMScoRe(RLs=Study_X_RLs, fn=study_x_fn[i], silent=1)   #run PLMScoRe
           }  
     }
#At the end make sure the RLs file has the original label
> Study_X_RLs[1](/Steph-Fulda/PLMScoRe/wiki/1)[1](/Steph-Fulda/PLMScoRe/wiki/1)[1](/Steph-Fulda/PLMScoRe/wiki/1)<-c("EMG.Tibialis-Leg.Left")

In the second case, you do not know which files have which label. In that case the way to go is to first run the PLMScoRe routine, check the resulting data table to see if the respective label was present and if not, rerun the routine with the other label. Remember, that in the RLs object it is recorded which column contains the channel information (in Study_X_RLs[1](/Steph-Fulda/PLMScoRe/wiki/1)[6](/Steph-Fulda/PLMScoRe/wiki/6)[6](/Steph-Fulda/PLMScoRe/wiki/6)).

#Case 2: You do not know which files have which leg label
#Only that it either "EMG.Tibialis-Leg.Left" or "LLeg1-LLeg2"

#make the labels available
> label_1<-c("EMG.Tibialis-Leg.Left")
> label_2<-c("LLeg1-LLeg2")

#and integrate it into the batch processing
    >for (i in 1:length(study_x_fn)){
         study_x_result<-StartPLMScoRe(RLs=Study_X_RLs, fn=study_x_fn[i], silent=1)   #run PLMScoRe
                         #Now check if label_1 is absent in the data table
                if(!is.element(label_1, study_x_result$Data[,Study_X_RLs[1](/Steph-Fulda/PLMScoRe/wiki/1)[6](/Steph-Fulda/PLMScoRe/wiki/6)[6](/Steph-Fulda/PLMScoRe/wiki/6))){
                         #in that case change the label 
                      Study_X_RLs[1](/Steph-Fulda/PLMScoRe/wiki/1)[1](/Steph-Fulda/PLMScoRe/wiki/1)[1](/Steph-Fulda/PLMScoRe/wiki/1)<-c("LLeg1-LLeg2") 
                         #and run the routine again, overwriting the first files
                      study_x_result<-StartPLMScoRe(RLs=Study_X_RLs, fn=study_x_fn[i], silent=1)   
                         #and change the label back again 
                      Study_X_RLs[1](/Steph-Fulda/PLMScoRe/wiki/1)[1](/Steph-Fulda/PLMScoRe/wiki/1)[1](/Steph-Fulda/PLMScoRe/wiki/1)<-c("EMG.Tibialis-Leg.Left") 
                 }
          }

I hope you can envision how you can use this basic routine to accommodate other inconsistencies!

###Final remarks
If you are an R novice and you managed to follow all examples so far: Compliments! You are already deep in R territory! I sincerely hope that on the way you have found out that R is quite useful. Personally, I'd say quite wonderful! So why not venture a little bit further? I would like to encourage you to think about further steps you could implement to process and check your data. The last point - checking you data - is extremely important. Real life data sets are messy! That is a fact of life! And in real life data analysis, a major part of the task is to try and find the hidden errors and inconsistencies in the data to make sure that your findings are as accurate as possible and ultimately may be independently replicated, an underestimated and rarely achieved distinction in science. Happy Hunting!