Overview

The function of this R script is to clean and score information from the CFM2-4 and the Bierbeck Epilepsy Questionnaire. The R script is broken down into chunks so it is easier to read and document. There are 9 parts to this script:

Part 1: General data cleaning
Part 2: Creating Screener ID's (for HOH and children)
Part 3: Transforming numeric values into strings
Part 4: Scoring DD from the CFM
Part 5: Creating one variable for all CFM difficulties
Part 6: Scoring Epilepsy
Part 7: Creating a KBB DD status (from CFM + Bierbeck)
Part 8: Creating an Exclusion status
Part 9: Saving the data

There is also a Quality Control section where visual data inspection confirms the code from these chunks are functioning as intended.

Part 1: General data cleaning

The script below will be:

loading in the data
Renaming variables and keeping variables that are of interest
Cleaning string variables (date related)
Creating the variable Child_age from DOE since only DOB is collected

# Load in packages
library(tidyverse)
library(readxl)
library(lubridate)


# Set working directory to import data
setwd("~/KBB_new_2/1_screener/raw_data")

# Load in CFM2_4 files
CFM2_4.uncleaned <- read_excel(list.files(pattern = "3_4"))

# Remove any redundant rows
CFM2_4.uncleaned <- unique(CFM2_4.uncleaned)

# Selecting and renaming variables to keep
CFM2_4.removed.variables <- CFM2_4.uncleaned %>%
  select(GPS.lat = `_GPS_latitude`,
         GPS.long = `_GPS_longitude`,
         Date_of_Evaluation = start,
         Evaluator_ID,
         Name_of_the_Village,
         Location_Type,
         HOH_First_Name,
         HOH_Last_Name,
         HOH_Date_of_Birth,
         Respondant_First_Name,
         Respondant_Last_Name,
         Child_First_Name,
         Child_Last_Name,
         Child_Gender,
         Child_Date_of_Birth,
         BF = Child_s_Biological_Father,
         BM = Child_s_Biological_Mother,
         Respondant_relationship = `_08_Respondant_s_Relationship_`,
         CF1,
         CF2,
         CF3,
         CF4 = `_04_SENA_ZYINA_ULABESYA_KANC`,
         CF5,
         CF6,
         CF7,
         CF8,
         CF9,
         CF10,
         CF11,
         CF12,
         CF13,
         CF14,
         CF15,
         CF16,
         E1 = `_17_Mumwaka_wamana_akakozyeka_kulesegwa`,
         E2 = `_18_Mwaka_wamana_mw_uwa_akucinca_cikanda`,
         E3 = `_19_Mumwaka_wamana_m_oyu_kuli_nakayongene`,
         E4 = `_20_Mumwaka_wamana_kazilukide_kumizeezo`,
         E5 = `_21_Mumwaka_wamana_akuwa_akuluma_mulaka`,
         E6 = `_22_Mumwaka_wamana_ilwa_kulijata_kusuba`,
         E7 = `_23_Mumwaka_wamana_umaulu_nokuba_kumeso`,
         E8 = `_24_Mumwaka_wamana_limwi_akuvwa_kununka`,
         E9 = `_25_Mumwaka_wamana_idwe_bulwazi_bwakuwa`,
         EN10 = `_26_Eeli_penzi_lyaka_na_buyo_ciindi_comwe`,
         EN11 =`_27_Ipenzi_eeli_lyak_o_wakalivwide_ampeyo`,
         EN12 = `_28_Ipenzi_eeli_lyak_e_mumalo_kucibbadela`)

# Cleaning variables with date data 
CFM2_4.cleaned.date <- CFM2_4.removed.variables %>%
  mutate(Date_of_Evaluation = substr(Date_of_Evaluation, start = 1, stop = 10),
         HOH_Date_of_Birth = substr(HOH_Date_of_Birth, start = 1, stop = 10),
         Child_Date_of_Birth = substr(Child_Date_of_Birth, start = 1, stop = 10))

# Creating the variable age
dob <- ymd(CFM2_4.cleaned.date$Child_Date_of_Birth)
doe <- ymd(CFM2_4.cleaned.date$Date_of_Evaluation)

age_weeks <- as.numeric(difftime(doe, dob, units = "weeks"))
age <- age_weeks/52
CFM2_4.cleaned.date$Child_age <- round(age,1)

Part 2: Creating Screener IDs

In further down processing scripts, we will need to keep track of which child is which. This is very important for the matching stage. Additionally, the children will need to be matched with other children that are within the same household. Thus, we will need to create a variable that function as this. We will create screener IDs for the households and for each child.

CFM2_4.HOH.ID <- CFM2_4.cleaned.date %>%
  mutate(HOH_ID = paste(HOH_First_Name, 
                        HOH_Last_Name, 
                        HOH_Date_of_Birth),
         Child_ID = paste(Child_First_Name,
                          Child_Last_Name,
                          Child_Date_of_Birth,
                          Child_Gender))

Part 3: Transforming numeric value responses back to characters

There are up to 16 questions that could be asked from the CFM2-4 (this means not everyone gets asked the same number of questions). Some of these questions have binary responses (0 = No; 1 = Yes), but the majority have four possible responses (1 = No Difficulty; 2 = Some Difficulty; 3 = A lot of Difficulty; 4 = Cannot at All). The questions not asked universally are the ones related to physical related difficulties while using some type of assistive equipment.

We will be changing these numeric values to these strings mentioned. Below are the questions asked and what their possible responses are (2 for binary responses; 4 for four responses):

CF1: wearing glasses (2)
CF2: difficulty seeing while wearing glasses (4; not asked universally)
CF3: difficulty seeing (4)
CF4: using a hearing aid (2)
CF5: difficulty hearing while using a hearing aid (4; not asked universally)
CF6: difficulty hearing (4)
CF7: uses assistive equipment for walking (2)
CF8: difficulty walking without using assistive equipment (4; not asked universally)
CF9: difficulty walking while using assistive equipment (4; not asked universally)
CF10: difficulty walking (4)
CF11: difficulty picking objects (4)
CF12: difficulty understanding (4)
CF13: difficulty being understood (4)
CF14: difficulty learning (4)
CF15: difficulty playing (4)
CF16: difficulty with aggressive behaviors (4)

# Changing binary responses to strings 
CFM2_4.HOH.ID <- CFM2_4.HOH.ID %>%
  mutate(glasses = ifelse(CF1 == 1, "Yes", "No"),
         hearing.aid = ifelse(CF4 == 1, "Yes", "No"),
         walking.equipment = ifelse(CF7 == 1, "Yes", "No"))

# Create a function to change numeric values to the associate difficulty severity
difficulty_type_fun <- function(value) {
  case_when(
    value == 1 ~ "No difficulty",
    value == 2 ~ "Some Difficulty",
    value == 3 ~ "A lot of Difficulty",
    value == 4 ~ "Cannot at all",
    TRUE ~ as.character(value)
  )
}

# Create new variables with these labels
CFM2_4.CFM.labeled <- CFM2_4.HOH.ID %>%
  mutate(CF3_Seeing = difficulty_type_fun(CF3),
         CF6_Hearing = difficulty_type_fun(CF6),
         CF10_Walking = difficulty_type_fun(CF10),
         CF11_Fine_Motor = difficulty_type_fun(CF11),
         CF12_Understanding = difficulty_type_fun(CF12),
         CF13_Communicating = difficulty_type_fun(CF13),
         CF14_Learning = difficulty_type_fun(CF14),
         CF15_Playing = difficulty_type_fun(CF15),
         CF16_Controlling_Behavior = difficulty_type_fun(CF16))

Part 4: Scoring DD from the CFM responses

The section of the code labels a child by what their DD status according to the CFM. It could be as follows:

No difficulty (No DD)
Some Difficulty (DD)
A lot of Difficulty (DD)
Cannot at all (DD)

The type of DD label that is give is based on the severity, with the most severe reporting determining the status type. Thus, if a child has many reported 'Some Difficulty' but one reported 'Cannot at all', then they will be labeled as 'Cannot at all'. A child needs some type of difficulty in at least one domain for them to be classified as having DD according to the CFM.

Disclaimer: For our KBB study, we are interested in difficulties that are cognitive or behavioral- thus children with physical difficulties only will not be counted as DD (variable = KBB_CFM_DD). However, they will be counted as DD for another variable that will also be introduced into the dataset (variable = CFM_DD).

Lastly, to make a clear distinction between DD and non-DD in the children, we will introduced an at least 'Some Difficulty' variable into the dataset.

# Function to score difficulty type or status
CFM_opr_fun <- function(...) {
  
  pmap_chr(list(...), function(...) {
    if ("Cannot at all" %in% c(...)) {
      return("Cannot at all")
      
    } else if ("A lot of Difficulty" %in% c(...)) {
      return("A lot of Difficulty")
      
    } else if ("Some Difficulty" %in% c(...)) {
      return("Some Difficulty")
      
    } else {
      return("No difficulty")
    }
  })
}

# Add this variable using the criteria for the CFM definition of DD
CFM2_4 <- CFM2_4.CFM.labeled %>%
  mutate(CFM_DD = CFM_opr_fun(CF3_Seeing, 
                              CF6_Hearing, 
                              CF10_Walking, 
                              CF11_Fine_Motor,
                              CF12_Understanding,
                              CF13_Communicating,
                              CF14_Learning,
                              CF15_Playing,
                              CF16_Controlling_Behavior))


# Add this variable using the criteria for the KBB definition of DD
CFM2_4 <- CFM2_4 %>%
  mutate(KBB_CFM_DD = CFM_opr_fun(CF12_Understanding,
                              CF13_Communicating,
                              CF14_Learning,
                              CF15_Playing,
                              CF16_Controlling_Behavior))

# At least some difficulty
CFM2_4 <- CFM2_4 %>%
  mutate(CFM_DD_at_some = ifelse(CFM_DD == "No difficulty",CFM_DD,"Some Difficulty"))

CFM2_4 <- CFM2_4 %>%
  mutate(KBB_CFM_DD_at_some = ifelse(KBB_CFM_DD == "No difficulty",KBB_CFM_DD,"Some Difficulty"))

Part 5: Listing all difficulties into one variable

It might be difficulty to scan each column for each row to identify which difficulty a child has. An easier way is to have one column where each question that at least some difficulty was reported is present. The code below, while not pretty nor easy to interpret, does the following using for loops. It does this for both the CFM definition of DD and the KBB CFM definition of DD.

# First for CFM questions 
CFM_data <- CFM2_4 %>% select(CF3_Seeing,
                              CF6_Hearing,
                              CF10_Walking,
                              CF11_Fine_Motor,
                              CF12_Understanding,
                              CF13_Communicating,
                              CF14_Learning,
                              CF15_Playing,
                              CF16_Controlling_Behavior)


CFM_data[is.na(CFM_data)] <- "No difficulty"

CFM_DD_type_list <- list()
for(ii in 1:nrow(CFM_data)) {
  current_row <- CFM_data[ii,]
  current_row_name <- names(current_row)
  current_CFM_DD_type_list <- list()
  
  for(iii in 1:length(current_row)) {
    
    if(current_row[[iii]] != "No difficulty") {
      
      current_CFM_DD_type_list[[iii]] <- current_row_name[iii]
      
    } 
    
    
  }
  CFM_DD_type_list[[ii]] <- paste(current_CFM_DD_type_list %>% unlist(), collapse="; ")
}

CFM2_4$CFM_DD_type <- CFM_DD_type_list %>% unlist()

# Next for how KBB is operationalizing it 
KBB_CFM_data <- CFM2_4 %>% select(CF12_Understanding,
                              CF13_Communicating,
                              CF14_Learning,
                              CF15_Playing,
                              CF16_Controlling_Behavior)


KBB_CFM_data[is.na(KBB_CFM_data)] <- "No difficulty"

KBB_CFM_DD_type_list <- list()
for(ii in 1:nrow(KBB_CFM_data)) {
  current_row <- KBB_CFM_data[ii,]
  current_row_name <- names(current_row)
  current_KBB_CFM_DD_type_list <- list()
  
  for(iii in 1:length(current_row)) {
    
    if(current_row[[iii]] != "No difficulty") {
      
      current_KBB_CFM_DD_type_list[[iii]] <- current_row_name[iii]
      
    } 
    
  }
  KBB_CFM_DD_type_list[[ii]] <- paste(current_KBB_CFM_DD_type_list %>% unlist(), collapse="; ")
}

CFM2_4$KBB_CFM_DD_type <- KBB_CFM_DD_type_list %>% unlist()

Part 6: Scoring Epilepsy

The Birbeck Epilepsy Questionnaire is comprised of 12 questions, each has a binary response (0 = negative; 1 = positive). The first 9 questions ask about an experience that could be likely to be explained by epilepsy. Thus, if at least one of the first 9 questions is answered with positive then the child is likely to have epilepsy.

However, there are an additional 3 questions asked related to the frequency and whether the above occurred during an illness. If any of these three questions are responses with positive then the child will not be given the label of epilepsy.

Therefore, for a child to have epilepsy according to this screener, the need to have at least one positive for the first 9 questions and only negative for the last 3 questions.

epilepsy_positive_questions <- CFM2_4 %>%
  select(E1,
         E2,
         E3,
         E4,
         E5,
         E6,
         E7,
         E8,
         E9)

epilepsy_negative_questions <- CFM2_4 %>%
  select(EN10,
         EN11,
         EN12)

# Any NA's present convert them into 0's
epilepsy_positive_questions[is.na(epilepsy_positive_questions)] <- 0
epilepsy_negative_questions[is.na(epilepsy_negative_questions)] <- 0

# Score if epilepsy is present or not

epilepsy <- list()

for (ii in 1:nrow(epilepsy_positive_questions)) {
  if(any(epilepsy_positive_questions[ii,] > 0)) {
    if (any(epilepsy_negative_questions[ii,] == 1)) {
      epilepsy[[ii]] <- "No"
    } else {
      epilepsy[[ii]] <- "Yes"
    }
  } else {
    epilepsy[[ii]] <- "No"
  }
}

# Add this back to the screener
CFM2_4$Epilepsy <- unlist(epilepsy)

Part 7: Creating our final KBB DD status

To be assigned a DD status in accordance with the grant, a child must have at least some difficulty in one of the CFM domains (not including physical) and/or positive for epilepsy. This variable will label children as 'Yes' for having DD if they meet the criteria or 'No' if they do not.

CFM2_4 <- CFM2_4 %>%
  mutate(KBB_DD_status = case_when(
    Epilepsy == "Yes" ~ "Yes",
    KBB_CFM_DD_at_some != "No difficulty" ~ "Yes",
    TRUE ~ "No"
  ))

Part 8: Exclusion status (KBB_CFM)

There are two different exclusion criteria, each corresponding to whether a child has DD or not.

DD Exclusion: Have at least one 'Cannot at all' for a physical related question

Non-DD Exclusion: Have at least one 'Some difficulty' or more for a physical related question

Ideally we are not interested in children that have physical disabilities, however, children with cognitive/behavioral difficulties will tend to have physical difficulties. Therefore, as long as their physical disability does not make testing too difficult (e.g. not being blind, deaf, or immobile) then they can be recruited as DD. For our non-DD children, we want them to have 'No difficulty' for every question asked.

The first chunk of the code will create a variable that identifies the severity of difficulty for the physical questions- specifically we are interested in those that are 'Cannot at all'. The next part of the code will take into account the child's DD/non-DD status and then label them as excluded or not ('Yes' or 'No') based on the severity of their physical disability if any.

# Extract only the sensory questions
CFM2_4_physical_questions <- CFM2_4 %>%
  select(CF3_Seeing,
         CF6_Hearing,
         CF10_Walking,
         CF11_Fine_Motor)

# Some data cleaning
CFM2_4_physical_questions[is.na(CFM2_4_physical_questions)] <- "No difficulty"

# Create a for loop to obtain the rows that have "Cannot at all" for sensory or motor difficulties
physical_difficulty_type <- list()

for(ii in 1:nrow(CFM2_4_physical_questions)) {
  current_row <- CFM2_4_physical_questions[ii,]
  
  if(any(current_row == "Cannot at all")) {
    physical_difficulty_type[[ii]] <- "Cannot at all"
    
  } else if (any(current_row == "A lot of Difficulty")){
    physical_difficulty_type[[ii]] <- "A lot of Difficulty"
    
  } else if (any(current_row == "Some Difficulty")) {
    physical_difficulty_type[[ii]] <- "Some Difficulty"
    
  } else {
    physical_difficulty_type[[ii]] <- "No difficulty"
    
  }
  
}


CFM2_4$Physical_difficulty_type <- unlist(physical_difficulty_type)

# Create an exclusion variable
CFM2_4 <- CFM2_4 %>%
  mutate(Excluded = case_when(
    KBB_DD_status == "Yes" & Physical_difficulty_type == "Cannot at all" ~ "Yes",
    KBB_DD_status == "No" & Physical_difficulty_type != "No difficulty" ~ "Yes",
    TRUE ~ "No"
  ))

Part 9: Saving the data

# Set working directory to save the data
setwd("~/KBB_new_2/1_screener/processed_data")

# Save the data
write_csv(CFM2_4, file = "CFM2_4_clean.csv")

Quality Control

These codes below can be added to the bottom of the main script but should be removed after testing.

DD Status

We need to verify that the code above was able to assign a DD status correctly. To check for this, we can create a smaller dataset of the one above that includes all questions from the CFM2_4, the Bierbeck Epilepsy Questionnaire questions, and the KBB_DD_status. Additionally, we can incorporate KBB_CFM_DD_type, which stores all reported difficulties into one variable.

After visual inspection, the DD status children majorly had at least one report difficulty for a CFM question, a couple had epilepsy only, and a couple had both epilepsy and KBB CFM DD difficulties.

CFM2_4 %>%
  select(CF3_Seeing:CF16_Controlling_Behavior, Epilepsy, KBB_CFM_DD_type, KBB_DD_status) %>%
  filter(KBB_DD_status == "Yes") %>%
  view()

CFM2_4 %>%
  select(CF3_Seeing:CF16_Controlling_Behavior, Epilepsy, KBB_DD_status) %>%
  filter(KBB_DD_status == "No") %>%
  view()

Exclusion Status

Now that we are certain the DD status was given correctly, we can check if the exclusion status was also assigned correctly. We can split the data into DD status and see if each were excluded or not correctly.

After visual inspection, it seems to be working as inspected. It is for excluded non-DD since there were cases of non-DD having some difficulty for physical domains. For DD, no one was excluded (as of this time running the code). This is because no one had 'Cannot at all'

CFM2_4 %>%
  select(CF3_Seeing:CF11_Fine_Motor,  KBB_DD_status, Excluded) %>%
  filter(KBB_DD_status == "Yes") %>%
  view()

CFM2_4  %>%
  select(CF3_Seeing:CF11_Fine_Motor,  KBB_DD_status, Excluded) %>%
  filter(KBB_DD_status == "No") %>%
  view()

KBB CFM2‐4 and Birbeck Epilepsy Scoring - LeoLedesma237/LeoWebsite GitHub Wiki

Overview

Part 1: General data cleaning

Part 2: Creating Screener IDs

Part 3: Transforming numeric value responses back to characters

Part 4: Scoring DD from the CFM responses

Part 5: Listing all difficulties into one variable

Part 6: Scoring Epilepsy

Part 7: Creating our final KBB DD status

Part 8: Exclusion status (KBB_CFM)

Part 9: Saving the data

Quality Control

DD Status

Exclusion Status

⚠️ GitHub.com Fallback ⚠️

KBB CFM2‐4 and Birbeck Epilepsy Scoring - LeoLedesma237/LeoWebsite GitHub Wiki

Overview

Part 1: General data cleaning

Part 2: Creating Screener IDs

Part 3: Transforming numeric value responses back to characters

Part 4: Scoring DD from the CFM responses

Part 5: Listing all difficulties into one variable

Part 6: Scoring Epilepsy

Part 7: Creating our final KBB DD status

Part 8: Exclusion status (KBB_CFM)

Part 9: Saving the data

Quality Control

DD Status

Exclusion Status

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️