KBB CFM5‐17 and Birbeck Epilepsy Scoring - LeoLedesma237/LeoWebsite GitHub Wiki
Overview
The function of this R script is to clean and score information from the CFM5-17 and the Bierbeck Epilepsy Questionnaire. The R script is broken down into chunks so it is easier to read and document. There are 9 parts to this script:
- Part 1: General data cleaning
- Part 2: Creating Screener ID's (for HOH and children)
- Part 3: Transforming numeric values into strings
- Part 4: Scoring DD from the CFM
- Part 5: Creating one variable for all CFM difficulties
- Part 6: Scoring Epilepsy
- Part 7: Creating a KBB DD status (from CFM + Bierbeck)
- Part 8: Creating an Exclusion status
- Part 9: Saving the data
There is also a Quality Control section where visual data inspection confirms the code from these chunks are functioning as intended.
Part 1: General data cleaning
The script below will be:
- loading in the data
- Renaming variables and keeping variables that are of interest
- Cleaning string variables (date related)
- Creating the variable Child_age from DOE since only DOB is collected
library(tidyverse)
library(readxl)
library(lubridate)
# Set working directory to import data
setwd("~/KBB_new_2/1_screener/raw_data")
# Load in CFM2_4 files
CFM5_17.uncleaned <- read_excel(list.files(pattern = "5_18"))
# Remove any redundant rows
CFM5_17.uncleaned <- unique(CFM5_17.uncleaned)
CFM5_17.removed.variables <- CFM5_17.uncleaned %>%
select(GPS.lat = `_GPS_latitude`,
GPS.long = `_GPS_longitude`,
Date_of_Evaluation = start,
Evaluator_ID,
Name_of_the_Village,
Location_Type,
HOH_First_Name,
HOH_Last_Name,
HOH_Date_of_Birth = HOH_Date_of_birth,
Respondant_First_Name,
Respondant_Last_Name,
Child_First_Name,
Child_Last_Name,
Child_Gender,
Child_Date_of_Birth,
BF = Child_s_Biological_Father,
BM = Child_s_Biological_Mother,
Respondant_relationship = `_08_Respondent_Relationship_to`,
CF1,
CF2,
CF3,
CF4,
CF5,
CF6,
CF7,
CF8,
CF9,
CF10,
CF11,
CF12,
CF13,
CF14,
CF15,
CF16,
CF17,
CF18,
CF19,
CF20,
CF21,
CF22,
CF23,
CF24,
E1 = `_25_Mumwaka_wamana_akakozyeka_kulesegwa`,
E2 = `_26_Mmwaka_wamana_m_uwa_akucinca_cikanda`,
E3 = `_27_Mumwaka_wamana_m_oyu_kuli_na_nyongene`,
E4 = `_28_Mumwaka_wamana_kazilukide_kumizeezo`,
E5 = `_29_Mumwaka_wamana_akuwa_akuluma_mulaka`,
E6 = `_30_Mumwaka_wamana_ilwa_kulijata_kusuba`,
E7 = `_31_Mumwaka_wamana_umaulu_nokuba_kumeso`,
E8 = `_32_Mumwaka_wamana_limwi_akuvwa_kununka`,
E9 = `_33_Mumwaka_wamana_idwe_bulwazi_bwakuwa`,
EN10 = `_34_Eeli_penzi_lyaka_na_buyo_ciindi_comwe`,
EN11 = `_35_Ipenzi_eeli_lyak_o_wakalivwide_ampeyo`,
EN12 = `_36_Ipenzi_eeli_lyak_e_mumalo_kucibbadela`)
# Cleaning variables with date data
CFM5_17.cleaned.date <- CFM5_17.removed.variables %>%
mutate(Date_of_Evaluation = substr(Date_of_Evaluation, start = 1, stop = 10),
HOH_Date_of_Birth = substr(HOH_Date_of_Birth, start = 1, stop = 10),
Child_Date_of_Birth = substr(Child_Date_of_Birth, start = 1, stop = 10))
# Creating the variable age
dob <- ymd(CFM5_17.cleaned.date$Child_Date_of_Birth)
doe <- ymd(CFM5_17.cleaned.date$Date_of_Evaluation)
age_weeks <- as.numeric(difftime(doe, dob, units = "weeks"))
age <- age_weeks/52
CFM5_17.cleaned.date$Child_age <- round(age,1)
Part 2: Creating Screener IDs
In further down processing scripts, we will need to keep track of which child is which. This is very important for the matching stage. Additionally, the children will need to be matched with other children that are within the same household. Thus, we will need to create a variable that function as this. We will create screener IDs for the households and for each child.
CFM5_17.HOH.ID <- CFM5_17.cleaned.date %>%
mutate(HOH_ID = paste(HOH_First_Name,
HOH_Last_Name,
HOH_Date_of_Birth),
Child_ID = paste(Child_First_Name,
Child_Last_Name,
Child_Date_of_Birth,
Child_Gender))
Part 3: Transforming numeric value responses back to characters
There are up to 24 questions that could be asked from the CFM5-17 (this means not everyone gets asked the same number of questions). Some of these questions have binary responses (0 = No; 1 = Yes), but the majority have four possible responses (1 = No Difficulty; 2 = Some Difficulty; 3 = A lot of Difficulty; 4 = Cannot at All). The questions not asked universally are the ones related to physical related difficulties while using some type of assistive equipment. Additionally, the last two questions, which are related to affective have five possible responses (1 = Daily; 2 = Weekly; 3 = Monthly; 4 = A few times a year; 0 = Never).
We will be changing these numeric values to these strings mentioned. Below are the questions asked and what their possible responses are (2 for binary responses; 4 for four responses; 5 four five responses):
- CF1: wearing glasses (2)
- CF2: difficulty seeing while wearing glasses (4; not asked universally)
- CF3: difficulty seeing (4)
- CF4: using a hearing aid (2)
- CF5: difficulty hearing while using a hearing aid (4; not asked universally)
- CF6: difficulty hearing (4)
- CF7: uses assistive equipment for walking (2)
- CF8: difficulty walking 100 yards without using assistive equipment (4; not asked universally)
- CF9: difficulty walking 500 yards without using assistive equipment (4; not asked universally)
- CF10: difficulty walking 100 yards while using assistive equipment (4; not asked universally)
- CF11: difficulty walking 500 yards while using assistive equipment (4; not asked universally)
- CF12: difficulty walking 100 yards (4)
- CF13: difficulty walking 500 yards (4)
- CF14: difficulty feeding or dressing (4)
- CF15: difficulty being understood inside (4)
- CF16: Difficulty being understood outside (4)
- CF17: Difficulty learning (4)
- CF18: Difficulty remembering (4)
- CF19: Difficulty concentrating (4)
- CF20: Difficulty accepting routine change (4)
- CF21: Difficulty controlling behavior (4)
- CF22: Difficulty making friends (4)
- CF23: Difficulty with anxiousness (5)
- CF24: Difficulty with depression (5)
CFM5_17.HOH.ID <- CFM5_17.HOH.ID %>%
mutate(glasses = ifelse(CF1 == 1, "Yes", "No"),
hearing.aid = ifelse(CF4 == 1, "Yes", "No"),
walking.equipment = ifelse(CF7 == 1, "Yes", "No"))
# Create a function to change numeric values to the associate difficulty severity
difficulty_type_fun <- function(value) {
case_when(
value == 1 ~ "No difficulty",
value == 2 ~ "Some Difficulty",
value == 3 ~ "A lot of Difficulty",
value == 4 ~ "Cannot at all",
TRUE ~ as.character(value)
)
}
# This is for the mental health related questions
mental_health_freq_fun <- function(value) {
case_when(
value == 1 ~ "Daily",
value == 2 ~ "Weekly",
value == 3 ~ "Monthly",
value == 4 ~ "A few times a year",
value == 0 ~ "Never",
TRUE ~ as.character(value)
)
}
# Create new variables with these labels
CFM5_17.CFM.labeled <- CFM5_17.HOH.ID %>%
mutate(CF3_Seeing = difficulty_type_fun(CF3),
CF6_Hearing = difficulty_type_fun(CF6),
CF12_Walking_100 = difficulty_type_fun(CF12),
CF13_Walking_500 = difficulty_type_fun(CF13),
CF14_Self_care = difficulty_type_fun(CF14),
CF15_Understood_Inside = difficulty_type_fun(CF15),
CF16_Understood_Outside = difficulty_type_fun(CF16),
CF17_Learning = difficulty_type_fun(CF17),
CF18_Remembering = difficulty_type_fun(CF18),
CF19_Concentrating = difficulty_type_fun(CF19),
CF20_Accepting_Challenge = difficulty_type_fun(CF20),
CF21_Controlling_Behavior = difficulty_type_fun(CF21),
CF22_Making_Friends = difficulty_type_fun(CF22),
CF23_Anxiety = mental_health_freq_fun(CF23),
CF24_Depression = mental_health_freq_fun(CF24))
Part 4: Scoring DD from the CFM responses
This part is complex because not only are we grouping children by severity of difficulty type, but we also need to introduce the responses that come from the affective questions. Therefore we can have the following four groups + one of the five responses for the affective questions.
- No difficulty
- Some Difficulty
- A lot of Difficulty
- Cannot at all
in combination with
- Never
- A few times a year
- Monthly
- Weekly
- Daily
Two variables will be created categorizing both respectively. In accordance with the grant, the KBB study is defining DD as having at least 'Some Difficulty' in a cognitive/behavioral domain and or at least 'Weekly' in an affective domain. They are not interested in children with only physical difficulties. Thus, half of the code below is in place to capture the KBB definition of DD and the other half is defining DD the way the CFM does, with having only physical difficulties also being categorized as DD.
# Weird ChatGPT function
CFM_opr_fun <- function(...) {
# Use pmap to iterate over rows
pmap_chr(list(...), function(...) {
if ("Cannot at all" %in% c(...)) {
return("Cannot at all")
} else if ("A lot of Difficulty" %in% c(...)) {
return("A lot of Difficulty")
} else if ("Some Difficulty" %in% c(...)) {
return("Some Difficulty")
} else {
return("No difficulty")
}
})
}
# Weird ChatGPT function
mental_health_opr_fun <- function(...) {
# Use pmap to iterate over rows
pmap_chr(list(...), function(...) {
if ("Daily" %in% c(...)) {
return("Daily")
} else if ("Weekly" %in% c(...)) {
return("Weekly")
} else if ("Monthly" %in% c(...)) {
return("Monthly")
} else if ("A few times a year" %in% c(...)) {
return("A few times a year")
} else {
return("Never")
}
})
}
CFM5_17 <- CFM5_17.CFM.labeled %>%
mutate(CFM_DD = CFM_opr_fun(CF3_Seeing,
CF6_Hearing,
CF12_Walking_100,
CF13_Walking_500,
CF14_Self_care,
CF15_Understood_Inside,
CF16_Understood_Outside,
CF17_Learning,
CF18_Remembering,
CF19_Concentrating,
CF20_Accepting_Challenge,
CF21_Controlling_Behavior,
CF22_Making_Friends))
CFM5_17 <- CFM5_17 %>%
mutate(KBB_CFM_DD = CFM_opr_fun(CF14_Self_care,
CF15_Understood_Inside,
CF16_Understood_Outside,
CF17_Learning,
CF18_Remembering,
CF19_Concentrating,
CF20_Accepting_Challenge,
CF21_Controlling_Behavior,
CF22_Making_Friends))
CFM5_17 <- CFM5_17 %>%
mutate(DD_mental = mental_health_opr_fun(CF23_Anxiety,
CF24_Depression))
Part 5: Listing all difficulties into one variable
It might be difficulty to scan each column for each row to identify which difficulty a child has. An easier way is to have one column where each question that at least some difficulty was reported is present. The code below, while not pretty nor easy to interpret, does the following using for loops. It does this for both the CFM definition of DD and the KBB CFM definition of DD.
# First for CFM questions
CFM_data <- CFM5_17 %>% select(CF3_Seeing,
CF6_Hearing,
CF12_Walking_100,
CF13_Walking_500,
CF14_Self_care,
CF15_Understood_Inside,
CF16_Understood_Outside,
CF17_Learning,
CF18_Remembering,
CF19_Concentrating,
CF20_Accepting_Challenge,
CF21_Controlling_Behavior,
CF22_Making_Friends,
CF23_Anxiety,
CF24_Depression)
CFM_data[is.na(CFM_data)] <- "No difficulty"
`%nin%` = Negate(`%in%`)
CFM_DD_type_list <- list()
for(ii in 1:nrow(CFM_data)) {
current_row <- CFM_data[ii,]
current_row_name <- names(current_row)
current_CFM_DD_type_list <- list()
for(iii in 1:length(current_row)) {
if(current_row[iii](/LeoLedesma237/LeoWebsite/wiki/iii) %nin% c("No difficulty", "Never", "A few times a year", "Monthly" )) {
current_CFM_DD_type_list[iii](/LeoLedesma237/LeoWebsite/wiki/iii) <- current_row_name[iii]
}
}
CFM_DD_type_list[ii](/LeoLedesma237/LeoWebsite/wiki/ii) <- paste(current_CFM_DD_type_list %>% unlist(), collapse="; ")
}
CFM5_17$CFM_DD_type <- CFM_DD_type_list %>% unlist()
# Next for how KBB is operationalizing it
KBB_CFM_data <- CFM5_17 %>% select(CF14_Self_care,
CF15_Understood_Inside,
CF16_Understood_Outside,
CF17_Learning,
CF18_Remembering,
CF19_Concentrating,
CF20_Accepting_Challenge,
CF21_Controlling_Behavior,
CF22_Making_Friends,
CF23_Anxiety,
CF24_Depression)
KBB_CFM_data[is.na(KBB_CFM_data)] <- "No difficulty"
KBB_CFM_DD_type_list <- list()
for(ii in 1:nrow(KBB_CFM_data)) {
current_row <- KBB_CFM_data[ii,]
current_row_name <- names(current_row)
current_KBB_DD_type_list <- list()
for(iii in 1:length(current_row)) {
if(current_row[iii](/LeoLedesma237/LeoWebsite/wiki/iii) %nin% c("No difficulty", "Never", "A few times a year", "Monthly" )) {
current_KBB_DD_type_list[iii](/LeoLedesma237/LeoWebsite/wiki/iii) <- current_row_name[iii]
}
}
KBB_CFM_DD_type_list[ii](/LeoLedesma237/LeoWebsite/wiki/ii) <- paste(current_KBB_DD_type_list %>% unlist(), collapse="; ")
}
CFM5_17$KBB_CFM_DD_type <- KBB_CFM_DD_type_list %>% unlist()
Part 6: Scoring Epilepsy
The Birbeck Epilepsy Questionnaire is comprised of 12 questions, each has a binary response (0 = negative; 1 = positive). The first 9 questions ask about an experience that could be likely to be explained by epilepsy. Thus, if at least one of the first 9 questions is answered with positive then the child is likely to have epilepsy.
However, there are an additional 3 questions asked related to the frequency and whether the above occurred during an illness. If any of these three questions are responses with positive then the child will not be given the label of epilepsy.
Therefore, for a child to have epilepsy according to this screener, the need to have at least one positive for the first 9 questions and only negative for the last 3 questions.
epilepsy_positive_questions <- CFM5_17 %>%
select(E1,
E2,
E3,
E4,
E5,
E6,
E7,
E8,
E9)
epilepsy_negative_questions <- CFM5_17 %>%
select(EN10,
EN11,
EN12)
# Any NA's present convert them into 0's
epilepsy_positive_questions[is.na(epilepsy_positive_questions)] <- 0
epilepsy_negative_questions[is.na(epilepsy_negative_questions)] <- 0
# Score if epilepsy is present or not
epilepsy <- list()
for (ii in 1:nrow(epilepsy_positive_questions)) {
if(any(epilepsy_positive_questions[ii,] > 0)) {
if (any(epilepsy_negative_questions[ii,] == 1)) {
epilepsy[ii](/LeoLedesma237/LeoWebsite/wiki/ii) <- "No"
} else {
epilepsy[ii](/LeoLedesma237/LeoWebsite/wiki/ii) <- "Yes"
}
} else {
epilepsy[ii](/LeoLedesma237/LeoWebsite/wiki/ii) <- "No"
}
}
# Add this back to the screener
CFM5_17$Epilepsy <- unlist(epilepsy)
Part 7: Creating our final KBB DD status
To be assigned a DD status in accordance with the grant, a child must have at least some difficulty in one of the CFM domains (not including physical), at weekly in an affective domain and/or positive for epilepsy. This variable will label children as 'Yes' for having DD if they meet the criteria or 'No' if they do not.
CFM5_17 <- CFM5_17 %>%
mutate(KBB_DD_status = case_when(
Epilepsy == "Yes" ~ "Yes",
KBB_CFM_DD != "No difficulty" ~ "Yes",
DD_mental %in% c("Daily", "Weekly") ~ "Yes",
TRUE ~ "No"
))
Part 8: Exclusion status (KBB_CFM)
There are two different exclusion criteria, each corresponding to whether a child has DD or not.
DD Exclusion: Have at least one 'Cannot at all' for a physical related question
Non-DD Exclusion: Have at least one 'Some difficulty' or more for a physical related question
Ideally we are not interested in children that have physical disabilities, however, children with cognitive/behavioral difficulties will tend to have physical difficulties. Therefore, as long as their physical disability does not make testing too difficult (e.g. not being blind, deaf, or immobile) then they can be recruited as DD. For our non-DD children, we want them to have 'No difficulty' for every question asked.
The first chunk of the code will create a variable that identifies the severity of difficulty for the physical questions- specifically we are interested in those that are 'Cannot at all'. The next part of the code will take into account the child's DD/non-DD status and then label them as excluded or not ('Yes' or 'No') based on the severity of their physical disability if any.
# Extract only the sensory questions
CFM5_17_physical_questions <- CFM5_17 %>%
select(CF3_Seeing,
CF6_Hearing,
CF12_Walking_100,
CF13_Walking_500)
# Some data cleaning
CFM5_17_physical_questions[is.na(CFM5_17_physical_questions)] <- "No difficulty"
# Create a for loop to obtain the rows that have "Cannot at all" for sensory or motor difficulties
physical_difficulty_type <- list()
for(ii in 1:nrow(CFM5_17_physical_questions)) {
current_row <- CFM5_17_physical_questions[ii,]
if(any(current_row == "Cannot at all")) {
physical_difficulty_type[ii](/LeoLedesma237/LeoWebsite/wiki/ii) <- "Cannot at all"
} else if (any(current_row == "A lot of Difficulty")){
physical_difficulty_type[ii](/LeoLedesma237/LeoWebsite/wiki/ii) <- "A lot of Difficulty"
} else if (any(current_row == "Some Difficulty")) {
physical_difficulty_type[ii](/LeoLedesma237/LeoWebsite/wiki/ii) <- "Some Difficulty"
} else {
physical_difficulty_type[ii](/LeoLedesma237/LeoWebsite/wiki/ii) <- "No difficulty"
}
}
CFM5_17$Physical_difficulty_type <- unlist(physical_difficulty_type)
# Create an exclusion variable
CFM5_17 <- CFM5_17 %>%
mutate(Excluded = case_when(
KBB_DD_status == "Yes" & Physical_difficulty_type == "Cannot at all" ~ "Yes",
KBB_DD_status == "No" & Physical_difficulty_type != "No difficulty" ~ "Yes",
TRUE ~ "No"
))
Part 9: Saving the data
# Set working directory to save the data
setwd("~/KBB_new_2/1_screener/processed_data")
# Save the data
write_csv(CFM5_17, file = "CFM5_17_clean.csv")
Quality Control
These codes below can be added to the bottom of the main script but should be removed after testing.
DD Status
We need to verify that the code above was able to assign a DD status correctly. To check for this, we can create a smaller dataset of the one above that includes all questions from the CFM2_4, the Bierbeck Epilepsy Questionnaire questions, and the KBB_DD_status. Additionally, we can incorporate KBB_CFM_DD_type, which stores all reported difficulties into one variable.
After visual inspection, the DD status children had
- at least "some difficulty" in one or more cognitive/behavioral domains; and/or
- at least "weekly" in one or both affective domains; and/or
- positive for epilepsy.
The KBB_CFM_DD_type variable was a great help in verifying this. The Non-DD children did not have difficulties in the domains above and did not have epilepsy. However, some did have difficulties in physical domains, which is expected. This was not investigated visually but through code that outputted all unique values for each variable.
# Quality Control
CFM5_17 %>%
select(CF3_Seeing:CF24_Depression, Epilepsy, KBB_CFM_DD_type, KBB_DD_status) %>%
filter(KBB_DD_status == "Yes") %>%
view()
No.DD.Status <- CFM5_17 %>%
select(CF3_Seeing:CF24_Depression, Epilepsy, KBB_CFM_DD_type, KBB_DD_status) %>%
filter(KBB_DD_status == "No")
do.call(cbind,sapply(No.DD.Status, unique))
Exclusion Status
Now that we are certain the DD status was given correctly, we can check if the exclusion status was also assigned correctly. We can split the data into DD status and see if each were excluded or not correctly.
After visual inspection and looking at the unique combinations for each of the four groups created below.
- Excluded DD children contained 'Cannot at all' for at least one of the physical domains;
- Excluded non-DD children contained at least 'Some Difficulty' for at least one of the physical domains;
- Not Excluded DD children did not contain 'Cannot at all'; and
- Not Excluded non-DD children only contained no difficulty for all physical domains.
# Quality Control
CFM5_17.DD.Excluded <- CFM5_17 %>%
select(CF3_Seeing:CF13_Walking_500, KBB_DD_status, Excluded) %>%
filter(KBB_DD_status == "Yes" & Excluded == "Yes")
CFM5_17.DD.Not.Excluded <- CFM5_17 %>%
select(CF3_Seeing:CF13_Walking_500, KBB_DD_status, Excluded) %>%
filter(KBB_DD_status == "Yes" & Excluded == "No")
CFM5_17.Not.DD.Excluded <- CFM5_17 %>%
select(CF3_Seeing:CF13_Walking_500, KBB_DD_status, Excluded) %>%
filter(KBB_DD_status == "No" & Excluded == "Yes")
CFM5_17.Not.DD.Not.Excluded <- CFM5_17 %>%
select(CF3_Seeing:CF13_Walking_500, KBB_DD_status, Excluded) %>%
filter(KBB_DD_status == "No" & Excluded == "No")
# use the do.call function for these to get the unique values
do.call(cbind,sapply(CFM5_17.DD.Excluded, unique))
do.call(cbind,sapply(CFM5_17.DD.Not.Excluded, unique))
do.call(cbind,sapply(CFM5_17.Not.DD.Excluded, unique))
sapply(CFM5_17.Not.DD.Not.Excluded, unique)