Day 2 Overview Test - GeoCenter/StataTraining GitHub Wiki

Fundamentals of Data Analysis & Visualization (Using Stata)

Instructors: Tim Essam, Aaron Chafetz, Laura Hughes

Day 2: Data Munging

15 January 2016

GOALS

  • Fun.

  • Learn how to automate repetitive data munging tasks

  • Learn how to plan and execute data munging operations through pseudocode.

  • Learn how to transform data (subset, aggregate, mutate, rename, label, etc.)

  • Learn about macros and how to structure

Assumptions (Day 1)

  • Importance of .do files

  • Basic data operations:

  • unique IDs: why to create (merging), how to create (egen [varname], group(var1 var2), how to test (isid, assert)) - idea being that once you can uniquely identify observations then you have a pivot point by which you can start to explore your data -- can use the by command at various levels

  • how to use "help"

  • Data types: categorical, ordinal (string, char); labelled/factors; interval, ratio (int, double); boolean; missing

  • help functions: how do you figure out the input/output arguments for a function?

SYLLABUS /AGENDA

2:00 - 2:15 pm Review Day 1 & Introduction to Data Munging Review the concepts covered in day 1, field questions, and introduce the goals for the day.
2:15 - 2:35 pm Planning and psuedocoding Students work in groups to diagram how they will accomplish a data processing task.

Instructors then ask the students what they came up with, and coalesce the ideas into a single pseudocode plan.

2:25 - 2:50 p.m. Munging functions (subset, reshape, rename, label, merge, egen, replace, summarize, collapse, etc… TBC) Discuss approaches and walk through one potential solution. Highlight the functions in Stata used to execute each of them.
2:50 - 3:20 p.m. Converting psuedocode to Stata code Students will practice implementing their psuedocode using Stata functions. Instructors will roam around providing advice when students get stuck.
3:20 - 3:30 p.m. Break
3:30 - 3:45 p.m. Data exploration, tabular summaries, and queries Can I trust my data? Outlier exploration How can I quickly summarize data? How can I quickly plot data?
3:45 - 4:00 p.m. Exercise: practice creating data summaries and queries Students will practice creating tabular summaries, queries, and collapsing commands. Students will also be begin to look for patterns and stories in the data.
4:00 - 4:20 p.m. Exporting results Introduce how to export summary statistics and results to Excel or machine readable filetypes (.csv, .txt, .tsv)
4:20 - 4:40 p.m. What exists to help you, within USAID and externally.
4:40 - 5:00 p.m Exercise & Wrap - up Discuss

Research questions for the day:

What are the types of foreign aid tracked?

What agency has the most observations in the data?

What sectors are tracked? What is the largest sector?

What countries receive the most foreign assistance? What type of assistance are they getting?

What do the temporal trends, by sector look like? Does this make sense?

⚠️ **GitHub.com Fallback** ⚠️