Advice for typological database construction - grambank/grambank GitHub Wiki

Based on Grambank, here are some principles for the construction of typological databases that we recommend other projects consider. Below are also general project management advice that may be relevant outside of linguistics or even outside academia.

Graphic by Michell O'Reilly to illustrate Grambank data collection workflow.

Advice when making a cross-linguistic database

  • Consider the specific research questions you want to address first before designing features
  • Consider how many and which observations (languages/cultures etc) you want to include to produce high-quality research. Consider samples such as the Standard Cross-Cultural Sample for cultures, WALS 100 or 200 samples, Glottolog's data on available descriptions and available language phylogenies.
  • Run a pilot phase of the questionnaire before taking on the full sample of observations you want to cover. In this pilot phase, take a smaller sample, go through all of the features and revisit anywhere you need to change definitions, tighten up documentation etc. The pilot phase preferably should include an inter-coder reliability test, i.e. the same observations coded independently by different people.
  • Write short, medium length and long descriptions for each feature. They should be practically oriented and continuously updated. Review our Grambank feature wiki articles as a suggestion for format. Short = the feature question itself, medium = the summary in the wiki article and long = the entire wiki article with procedure etc.
  • Specify valid values for each feature clearly
  • When you can, opt for aggregating data later. For example, have a set of several binary features instead of one asking for what the "major" strategy is. Do not reduce information too early.
  • consider incorporating coding of form as well as pattern, i.e. when you ask "is there a past tense suffix?" also have a space to type in what the actual suffix looks like
  • Consider if you can incorporate confidence in coding or proportion of variation as an extra variable for each data point (see APiCS section on relative importance here)
  • Avoid features or feature values that capture only "other" options. Negatively defined categories are dangerous.
  • Assign each feature a unique ID. It does not need to be meaningful or follow in a neat sequential order with other features. It needs to be unique and stable
  • The order of features and data points are irrelevant. No description or data point comment should contain "as above". Users may reorder or subset data.
  • If the data is linguistic or cultural, consider releasing your data as a CLDF-dataset (possibly in addition to other formats). You don't need to curate your data in CLDF from the start.
  • Be consistent. Features should be coded the exact same way in every language regardless of family or region.
  • Nominate an expert for each feature who has the final say on definitions and applications
  • Code observations (societies/cultures/informants/languages/dialects/doculects) in separate sheets with at least these columns: Feature_ID, Value, Comment, Source.
  • Encourage comments. See example of our comment guidelines here.
  • Keep track of who did what by assigning everyone a unique abbreviation and use this explicitly directly in sheets rows or filenames to indicate who did what
  • Have a bibTeX-file with all references cited. You can use hh.bib in addition if that is useful
  • For each datapoint, specify the source used (including page).
  • Avoid using personal correspondence-referencing when possible, it is better if other people can verify easily.
  • Do basic sanity checks at every submission of new coding (see our automatic checks here)
  • Outline a clear workflow for who does what in terms of coding, quality control, project management and curation - including potentially setting an end-point for when the dataset isn’t curated anymore
  • Have regular meetings for everyone involved in data collection where people can discuss difficulties in coding
    • don't discuss problems immediately when they occur in person or over email, make dedicated time for it in everyone's schedule
    • have one document with an agenda that everyone can edit where people add issues they have
    • in the description of the coding issue, be very explicit and clear what exactly the problem is and what information you already have
    • keep meeting notes as well as an agenda
    • make the person keeping the notes and the person chairing the meeting two different people
    • for meetings, focus on problem-solving and stick to concrete issues (not hypotheticals)
    • see more on our coding problem-solving workflow here

See also:

General project management

The Grambank project has been ongoing for many years and involved a large number of people, many of whom are not in the same country let alone the same city. The work is complex and constantly generates decision points. This has necessitated the development of a remote network and project management practices to ensure quality work and happy workers. The project management style has some similarities with the scrum approach in software development, Extreme programming (XP) and meeting techniques from Swedish non-profit leisure associations. Here are some more general points that may be transferrable to other projects facing similar challenges, be they in academia, industry or elsewhere.

  • There is no such thing as a flat hierarchy. It is better to explicit and clear about who has the power to make what decisions than to hide behind the false pretense that power is distributed equally. This makes it easier to move forward, and it also facilitates locating where the responsibility lies in the future. Leaders can and should take input, but in the end, everyone should know where authority lies. It may be awkward and embarrassing to be explicit about power, but it is 100% worth it. Draw a tree diagram if you need to.
  • Reject suggestions without being dismissive. Not all ideas are good or feasible. Everyone can make suggestions that are bad or otherwise not suitable. The group, and in particular the person in charge, needs to be able to say "no" without also embarrassing the person who made the suggestion. Acknowledging the good parts and outlining why the idea is not feasible as opposed to inherently bad is often the best way. In addition, it is preferable to offer alternatives compared to only shooting down ideas. Group members who feel dismissed will be less happy, produce less good work and give less input to discussions etc. This is bad for the group's goals.
  • Move forward explicitly also through tough decisions. It is safer to weigh in on less important things, therefore groups can sometimes find themselves spending a disproportionate amount of time on trivial matters (Oxford commas, color schemes etc) and avoid more heavy agenda items. This is also known as the "Law of Triviality" or bike-shed-effect. It is necessary for the leader in the group to identify when this is happening and find a way to move past it. For example, trivial matters could be relegated to pre-meeting text-based communication only whereas more important topics could be discussed in precious face-to-face meetings.
  • Encourage discussion, within reason. It is good if project members feel comfortable to suggest ideas, give criticism etc. However, avoid discussions solving hypothetical problems that have not occurred yet (see also "You Ain't Gonna Need It (YAGNI)" in XP). Group discussions should have an end-point that results in a decision on a specific real matter. The leader should make clear when that point is reached and exactly what the decision is (e.g. "Regarding model fit selection, we are going with WAIC. Next item.").
  • It's probably not life-and-death. It is important to be professional and take things seriously, but don't go overboard. If someone makes a mistake, something goes wrong or something is urgent - take action but make sure you're not stressing everyone out unnecessarily. Your project is probably not determining life-and-death scenarios, so scale the emotional response accordingly while focusing on the next practical steps of solving the problem.
  • Be encouraging. It's easy to only focus on negative criticism when working together because you want to "fix it" and move on. You may take for granted that people know you think they're fundamentally doing a good job. Don't neglect giving positive feedback too and reassuring them, otherwise, you create a tense atmosphere. Failure to make people feel secure can lead to imposter syndrome, which is destructive for the individual and the group.
  • Aggregate decisions into dedicated time slots. It is generally not acceptable to interrupt someone else's work directly when you have an idea or a problem (unless you have already mutually established that this is okay, of course). Keep a running agenda/list that everyone can add to and set-up either regular meetings or another way of calling meetings when everyone has the mind space. In that way, all group members can move forward safe in the knowledge that their issue will be addressed while at the same time, everyone's focus time is respected and people can mentally prepare for the meeting. In general, don't pounce on people with things that take considerable cognitive effort.
    • Advanced level: label agenda items as "information dissemination" or "decision". These are generally the only two kinds of items you should have in a meeting. If a decision is made, write explicitly who is responsible for taking action and reporting back.
  • Keep track. When decisions are made, keep a written record of it. If an issue needs a follow-up, keep it on the agenda for the next meeting. Using online collaborative documents for agendas makes this easier. Do not rely on human memory only, it is fallible and not possible to edit or consult collaboratively.
  • Follow up. Keep a list of all ongoing goals and regularly go through them, even if that means the person in charge of the goal only says "nothing has happened since last time" or "we have decided to drop this". Again, do not rely on human memory only.
  • Acknowledge failure and build trust and mutual respect. We learn from failures more than from success. All group members should be made to feel comfortable sharing mistakes and problems. If you are part of the project, that means you are good enough and have the group's trust. Your leader should inspire this trust that you are good enough. Failure to do this can lead to imposter syndrome, which is destructive for the individual and the group.
  • Terminate explicitly rather than let slip through the cracks. If something isn't working out, be explicit about putting an end to it. Don't let (sub-)projects fall into the limbo state of "Are we still doing that or did we decide to drop it?". Not everything was meant to be, say goodbye and let go. Do not assume that people will understand implicitly that something should be dropped, humans do not have telepathy (yet).
  • Be kind and be flexible. Be nice to people and be willing to compromise, not everything will go exactly your way. You do not need to die on every hill, let people do it their own way sometimes. Reign in your need for control.
  • Telepathy is not real (yet). Your colleagues are not mind-readers, you will need to be explicit about many things that you may believe are obvious. Conversely, people may tell you things you think are obvious and you may find that patronizing. It is possible (but not necessary) that it is not patronizing, just someone being extra careful about being explicit. State clearly what you know, what you assume and the conclusions you draw.
  • Social norms are not universal. Politeness, directness, etc are not universal. Be mindful of the cultural diversity in your group and strive to minimize unnecessary friction due to people having different "baselines" for politeness, anger etc. Some people prefer more indirect communication, and direct requests are understood as rude and aggressive. Each group member should take mental stock of the variation in the group and aim to adapt communication such that it reduces friction.
  • Plan in levels. Some plans are for things that need to be finished soon, others are less urgent. Make clear what the expected time period is for each. Just because something isn't urgent it does not mean it doesn't deserve a plan, even if it's just a preliminary sketch. Making long-term plans explicit allows everyone in the project to plan their work life and non-work life accordingly, instead of only working against urgent deadlines. Focussing exclusively on urgent matters is often very stressful and will lead to reduced mental health and poor work quality.
  • E-mails are tricky. Everyone has their own way of dealing with e-mails and e-mails will probably feature a lot in your project management. At the very start of your project, have a frank and explicit conversation about how e-mails will feature in the group work. Possible guidelines are:
    • Always use informative headings with distinct keywords, this helps everyone find things.
    • Don't expect replies within unreasonable time limits (consider people's time zones, care responsibilities, vacation etc.).
    • Be mindful of who receives the message when you reply-all.
    • Avoid long threads, when possible use short in-line responses and call to a meeting for anything that is generating paragraph-length e-mail threads.
    • Avoid using email for document coordination, especially do not email files back and forth. Find a shared collaborative solution (Google Docs, GitHub, Dropbox, Nextcloud etc) and set up a folder structure there with the files you are working on together.
  • Centralise work and information. Reduce the number of platforms (Google, GitHub, Tableau, Slack etc) you use and the number of structures within them (folders, repos, tables etc). The fewer places people have to look for information, the less cognitively taxing. If you have many, keep one document that lists them (e.g. our GitHub repos wiki article). When possible, explore if you can aid people in using fewer tools. For example, is it possible to make the group calendar viewable in Google Calendar, Apple's iCal etc.? If so, group members are less likely to miss events. Keep It Simple Silly.
  • Don't Repeat Yourself (DRY). Don't store the same information in several places, only keep information in one place and if necessary use links between places. It is hard to keep multiple locations in sync, discrepancies are likely to arise and cause frustration and unnecessary workload. This is a principle in software development, but is relevant outside of programming too.

Written by Hedvig Skirgård.