Participant Profiles - brianhigh/data-workshop GitHub Wiki
Participants
Participant A
1.) Primarily a windows user
2.) I have some experience with STATA, very little with R (not super
comfortable w/o being able to search google and have my past do files)
3.) I have my data in STATA and excel
4.) I would like to learn more about what I am doing while managing data
(not just how to do it)
5.) I have to do 5 more sampling "runs" by June 22nd, 2014, and I have to
defend my proposal by December 12th,
2014. Otherwise I do not have any long term goals.
Participant B
1. Primarily a mac user. However, I use windows on a daily basis and for
all programming tasks (although curious about Apple scripts and linux).
2. Most experience with SAS and Stata programming. Very little R
experience. I also use ArcGIS, but only via menuing/gui. I'd really like to
learn Python for scriping in ArcGIS (and everywhere is, it seems).
3. My data are in .xls, .xlsx, .csv, .mdb, .dta, and .sas7bdat. Both
character and numeric data. These data originated from previous projects
that I was not involved with. Content and structure varies. I currently am
in the process of creating data dictionaries in an effort to keep myself
organized.
4. While I am in survival mode for some tasks (and thus need to know just
enough to finish them on time), I prefer to have a more complete
understanding because that is (usually) more edifying. I suppose this is on
a case-by-case basis for me.
5. For one project, I already have data described in #3 that I'd like to
have at least partially analyzed by the end of spring quarter. For another
project, I will be collecting interview data in May and June. I would like
to have those data analyzed by August 30 if at all possible.
Coaches
Brian High
Are you primarily a windows, mac, or linux/other user?
I primarily use Linux (since the late 90s), but use Mac OSX and Windows enough to stay "current". I used a Mac primarily for a few years after I started working with DEOHS in 2007 in order to become more familiar with it. I find Linux to be easier to use (without handicapping me) and therefore I am more productive using it. Somewhere along the line the other popular operating systems were so "tuned" for the absolute beginner, that they actually became harder to use for someone with even a modest degree of experience. With each release of Windows, for example, I have to go through more clicks, menus, and windows to get to settings I want to change or attributes I want to see. And with Linux I can manage almost all of the software installation and downloads from one tool (of my choice), instead of having to search the internet for each and every application. And it is free. (Okay, no more soapbox. :)
What is your level of training, experience, and degree of comfort with programming and command-line interfaces and tools? (And what are your preferred programming tools or languages?)
I learned BASIC in 1984 (on the Apple ][ and the original IBM PC). I learned HTML, Bash, Perl and dBase IV in 1996-1997, then JavaScript and PHP in 1998. I took a bunch of programming courses in 1999-2000 including Visual Basic, VB-Script, C, C++, Java, and SQL. A few years later I learned a little Ruby and Python. From 1996 to 2001 I was a database programmer and data manager for the Metals Lab at Analytical Resources, Inc. Since around 2000, I have primarily been a systems administrator, primarily with Linux systems (using Bash, Perl, and some Python). This is a heavily command-line oriented occupation. I prefer using a keyboard over a mouse when I can. (More buttons, better ergonomics.)
What types of data sources and files will you be using in your current research? (Formats, filetypes, content, structure, origination)
I am not doing any "research", but I have worked with a very large variety of file types. Many of the short programs I have written for people have been to help convert files from a human-oriented structure to a data-management-oriented structure.
In general, do you prefer to know just enough to complete a task, or do you like to delve deeper for a more complete understanding?
When it comes to something I am truly interested in, I usually feel compelled to understand it as thoroughly as I possibly can and to gain as much knowledge about it as possible. But there is rarely enough time for this, so I try and prioritize and "pick my battles".
What sorts of near-term deadlines do you have regarding the processing of your data?
Since I will be here to help others, my intention is to provide some training and guidance on the topics which will help you meet your deadlines. I often see people waste a lot of time and get very frustrated with inadequate or inefficient tools, and this distresses me. I want to help people find slick, elegant solutions that are labor-saving, scalable, and reusable. When people discover how easy (and powerful) these methods can be, they become inspired and unstoppable. I wonder why these (text-processing and data management) techniques are not generally taught in science graduate programs. Since all research involves data, an increasing amount of it, why don't programs that train researchers, include data management content? We are in the midst of a data revolution and the quantity is exploding exponentially. Who will have the skills to analyse this much information? Everyone talks about "The Cloud" and "Big Data". But what are we doing to prepare students for these developments? Ultimately, we need to address this serious gap, and I hope our experiment this quarter will help lay some groundwork. So, my "deadline" is: The sooner the better!