Background of the Grambank questionnaire - grambank/grambank GitHub Wiki

Introduction

Grambank is a database of morphosyntactic features of the languages of the world. This article gives an overview of the project and the history of the questionnaire.

Organization

The Grambank project began as a joint project in 2015 between departments in two Max Planck Institutes (MPI): the Language and Cognition department (L&C) of the MPI of Psycholinguistics in Nijmegen, Netherlands - led by Stephen C. Levinson - and the Department of Linguistic and Cultural Evolution (DLCE) at the MPI of the Science of Human History (MPI-SHH) in Jena, Germany - led by Russell Gray. This collaboration took place within the larger international research consortium named Glottobank which also involves the Centre of Excellence for the Dynamics of Language in Canberra, Australia, and the University of Auckland, New Zealand. When DLCE later moved from MPI-SHH to the MPI of Evolutionary Anthropology in 2020, the database moved with it. The Australian National University, the ARC Center of Excellence for the Dynamics of Language, University of Kiel, Uppsala University and the School of Oriental and African Studies also take part in the organization of Grambank.

Grambank is a part of Glottobank, a research consortium that involves work on complementary databases of lexical data, paradigms, numerals and sound patterns in the world's languages.

The project is led by Harald Hammarström, Simon Greenhill, Russell Gray, Hedvig Skirgård and Robert Forkel. Many scholars aided in the creation of the questionnaire and the data is collected by a large global team of enthusiastic and talented student assistants. The coders are organized into nodes:

Node	City	Institution	Leaders	Members
Nijmegen (inactive)	Nijmegen, NL	MPI-Nijmegen	Harald Hammarström	Hedvig Skirgård, Hilário de Sousa, Hugo de Vos, Jakob Lesage, Jesse Peacock and Suzanne van der Meer
Worldwide (inactive)	N/A		Harald Hammarström	Robert Borges, Luise Dorenbusch and Tessa Yuditha
SOAS/ELDP	London, UK & Berlin, Germany	SOAS & ELDP	Jay Latarche and Jeremy Collins	Karolin Abbas, Giulia Barbos, Ella Dorn, Hannah Gibson, Jemima Goodall, Samuel Griggs, Andrew Harvey, Rebekah Hayes, Biu Huntington-Rainey, Aarifah Khoodoruth, Jay Latarche, Tânia Martins, Celia Mata German, India Pearey, Amna Raja, Sydney Rey, Julia Rizaew, Frederick Schmidt and Maisie Yong
ANU (inactive)	Canberra, Australia	ARC CoEDL & ANU	Hedvig Skirgård	Yustinus Ghanggo Ate, Eri Kashima, Saliha Muradoglu, Naomi Peck, Daniel Prestipino, Rhiannon Schembri, Henry Wu and Stephanie Yam
Uppsala (inactive)	Uppsala, Sweden	Dept of Linguistics & Philology, Uppsala Univeristy	Harald Hammarström	Giada Falcone, Katerina Koti, Richard Kowali and Nora Lindvall
Kiel	Kiel, Germany	Institute for Scandinavian Studies, Frisian and General Linguistics Department of General Linguistics Christian-Albrechts-Universität zu Kiel	Alena Witzlack-Makarevich & Tobias Weber	Nancy Bakker, Anina Bolls, Hans-Philipp Göbel, Leonard Heer, Nataliia Hübler, Jessica Ivani, Marilen Johns, Erika Just, Carolina Kipf, Janina Klingenberg, Nikita König, Mandy Lorenzen, Johanna Nickel, Cheryl Akinyi Oluoch, Jana Peter, Stephanie Petit, Sören Pieper, Linda Raabe, Eloisa Ruppert, Jill Sammet, Judith Voss, Jana Winkler and Tim Witte
Leipzig	Leipzig, Germany	DLCE, MPI-EVA	Hedvig Skirgård	Daniel Auer, Sinoël Dohlen, Victoria Gruner, Roberto Herrera, Michael Müller, Janis Reimringer, Kim Salmon, John Elliott and Jingting Ye
International	N/A	DLCE, MPI-EVA	Tobias Weber & Hedvig Skirgård	Hoju Cha, Marvin Martiny, Grace Ephraums and Manuel Rüdisühli
Colorado	Colorado, Boulder	Department of Linguistics, University of Colorado Boulder	Hannah Haynie	Elizabeth Goodrich

For a full list of coders, please go here.

Acknowledgements of contributions from scholars

We would like to thank the many language experts, linguists and speakers, who have enriched our dataset by sharing with us their expertise and knowledge of particular languages. These are:

Niina Aasmäe, Alfredo Acosta Blanco, Yvonne Agbetsoamedo, Cynthia Allen, Sunkulp Ananthanarayan, Victoria Apel, I Wayan Arka, Amadu Sajoh Bah, Danielle Barth, Rasmus Bernander, Rogier Blokland, Jeremy Bradley, Mitchell Browen, Yihan Chen, Jiaoyi Chen, Bernand Comrie, Denis Creissels, Mervi de Heer, Rebecca Defina, Cephas Delalorm, Anne Marie Diagne, Rebecca Dixon, Christian Döhler, Mark Donohue, Marie-France Duhamel, Ebikudo Ebitare, Niklas Edenmyr, Nicholas J. Enfield, Nicholas Evans, Gisbert Fanselow, Anne-Marie Fehn, Simeon Floyd, Ulla-Maija Forsberg, Alexandre François, Paul Geraghty, Nikolett F. Gulyás, Roy Stephen Hagman, Hyun-Jong Hahm, Arja Hamari, Abbie Hantgan, Andrew Harvey, Torgny Hedström, Heinike Heinsoo, Caroline Hendy, Sulev Iva, Peggy Jacob, Ivan Kapitonov, Olle Kejonen, Maria Khachaturyan, Myjolynne Kim, Jinyoung Kim, Jacqueline van Kleef, Sjaak van Kleef, Gerson Klumpp, Elizaveta Kushnir, Olga Kuznetsova, Rosés Labrada, Kate Lynn Lindsey, Florian Lionnet, Constance Kutsch Lojenga, Carlos M. López Lacayo, Adela López Vargas, Hannah Lutzenberger, Antonio Magaña Macías, Andrej L. Malchukov, Alexandra Marley, Orkhan Mehraliev, Chenxi Meng, Amina Mettouchi, Alexis Michaud, Daria Mishchenko, Mirjan Möller, Zarina Molochieva, Steve Morelli, Marten Mous, Åshild Naess, David Nash, Tatiana Nikitina, Rainer Oetzel, Ratih Oktarini, Bruno Olsson, David Osgarby, Sofia Oskolskaya, Sarah Parkinson, Becky Paterson, Andrew Pawley, Bron Peddington-Webb, John Peterson, Netra Prasad Paudyal, James Lee Pratchett, Saskia van Putten, Tihomir Rangelov, Luis Migel Rojas Berscia, Nicholas Rolle, Paulette Roulon-Doko, Alan Rumsey, Eva Saar, Sophie Salffner, Alexandr Savelyev, Jonathan Schlossberg, Stefan Schnell, Dineke Schokkin, Guillaume Segerer, Frank Seidel, Gunter Senft, Jeff Sigel, Jane Simpson, Yannick Staschull, Lana Grelyn Takau, Angela Terrill, Jachueline Thomas, Bill Thurston, Yvonne Treis, Laura Trokhymenko, Martine Vanhove, Hein van der Voort, Valentin Vydrin, Alexandra Vydrina, Mary Walworth, Joshua Wilbur, Vera Wilhelmsen, Solace Yankson and Raoul Zamponi.

We would also like to extend our gratitude to the speakers and signers who have given of their time and energy to collaborate with linguists to make the descriptive works we rely on possible.

We would also like to thank J. A. Brindle and Martine Vanhove for aiding with the translation of the feature questions into French during the NTS phase of the project.

The Grambank dataset also contains imported data from two related databases: South American Indigenous Language Structures (SAILS) and the Hunter Gatherer's database (HG). 2,017 datapoints were imported from Swintha Danielsens's part of SAILS and 8,086 datapoints from HG. Imported datapoints are marked as such in the comment field of the dataset. The SAILS data was easy to import because Danielsen had used the same underlying questionnaire as Grambank. The HG dataset took more effort to import. Judith Voss, Thiago Chacon, Claire Bowern, Harald Hammarström and Hedvig Skirgård oversaw the mapping of HG features to Grambank features and the import of the data.

Besides all the scholars and speakers who were generous enough to contribute to our project, we also acknowledge that we benefit from the time and energy donated by communities to grammar writes worldwide and the authors of the descriptions themselves. Describing a language is a very difficult task, and we are grateful to the authors and to the communities for sharing their knowledge and labour with us.

The aim of Grambank

Grambank aims to provide a large amount of typological data on languages, as recorded in grammars and grammar sketches, that can be used to investigate deep language prehistory, geographical and historical grammatical patterns, language universals and the functional interaction of grammatical features.

The design of Grambank

Features are designed in terms of language independent and cross-linguistically comparable concepts (cf. Haspelmath 2010). The features conform to basic database design principles to the extent that this is reconcilable with the aims of coverage, codability and reusability of legacy data.

Data sources for Grambank

The primary sources for Grambank data are grammatical descriptions. At least 4,702 languages have at least a grammar sketch (Hammarström et al. 2020). We also reach out to language experts that are willing to volunteer their time and energy to the project. We are well aware that there can be different analyses of the same phenomenon in a single language. We attempt to access as many available sources of information as possible in order to record the most suitable answer to each of the questions given our current state of knowledge. Different languages are subject to different levels and quality of analysis, which may in some cases affect the quality of our data. We are aware of this and strive to minimize this effect as much as possible.

Questionnaire history

The Grambank questionnaire consists of a set of 195 features covering a large range of grammatical topics. The features of Grambank are to an extent adapted from previous surveys, which are treated in some more detail below.

The Nijmegen Typological Survey (cf. Skirgård et al. 2014), which in turn was based on
the Sahul questionnaire (Reesink et al. 2009) and
the Pioneers questionnaires (Dunn et al. 2005, 2008).
The World Atlas of Language structures (Dryer & Haspelmath 2013).

Sources: Pioneers, Sahul and NTS

Grambank largely consists of features adapted from other surveys.

The first survey was designed in 2005, by Terrill, Dunn, Levinson, Lindström and Reesink, for the Eurocores OMLL project Pioneers of Island Melanesia (Dunn et al. 2005). The features made up the questionnaire that is sometimes called the "Pioneers Questionnaire". The features were selected to investigate common characteristics of various Austronesian and Papuan lineages. The set was improved for a second version of the questionnaire (Dunn et al. 2008). Later, in Reesink et al. (2009), the questionnaire was overhauled and the database expanded to include Australian and Andamanese languages. The new questionnaire was called the Sahul questionnaire (cf. Reesink & Dunn 2012: 40).

The following people advised during the creation of the Pioneers and Sahul questionnaires: Sjef Barbiers, Mily Crevels, Nick Evans, Rob Goedemans, Eva Lindström, Pieter Muysken, Gunter Senft, Leon Stassen and Hein van der Voort.

Later, plans were made to expand the Sahul questionnaire, both adding new features and expanding the scope of the database to include African languages. The resulting project, the Nijmegen Typological Survey, was the direct predecessor of Grambank, which now has a global scope and has the additional ambition to only include logically independent grammatical features.

The following summaries give an overview of the different questionnaires, their associated publications, their numbers of features and the languages included.

Pioneers1

Features: 121
Languages: 21
Designers: Angela Terrill, Michael Dunn, Stephen Levinson, Eva Lindström, and Ger Reesink
Publication: Dunn, Michael, Terrill, Angela, Reesink, Ger, Foley, Robert A., & Levinson, Stephen C. 2005. Structural phylogenetics and the reconstruction of ancient language history. Science, 309(5743), 2072-2075.
Focus: Oceanic and Papuan languages.

Pioneers2

Features: 115
Languages: 38
Designers: Dunn, Michael, Stephen C. Levinson, Eva Lindström, Ger Reesink, and Angela Terrill
Publication: Dunn, Michael, Levinson, Stephen C., Lindström, Eva, Reesink, Ger, & Terrill, Angela. 2008. Structural phylogeny in historical linguistics: Methodological explorations applied in Island Melanesia. Language, 84(4), 710-759.
Focus: Oceanic and Papuan languages.

Sahul

Features: 160
Languages: 121
Designers: Ger Reesink and Michael Dunn
Publication: Reesink, Ger, Singer, Ruth, & Dunn, Michael. 2009. Explaining the linguistic diversity of Sahul using population models. PLoS Biology, 7(11), e1000241.
Focus: Australian (both Pama-Nyungan and Non-Pama-Nyungan), Oceanic, other Western Austronesian, Papuan (both Trans-New Guinea and non-Trans New-Guinea), Andamanese.

NTS (Nijmegen Typological Survey)

Features: 270
Languages: 310
Designers: Ger Reesink, Hedvig Skirgård, Michael Dunn, Harald Hammarström, Jeremy Collins, Suzanne van der Meer, Mark Dingemanse
Publication: None, so far. See Skirgård et al. (2014).
Focus: Legacy material from Sahul & Pioneers questionnaires + Africa.

Grambank

Features: 195
Languages: currently 2,071. Aim: 3,500.
Designers: Harald Hammarström, Jeremy Collins, Hannah Haynie, Stephen Levinson, Nicholas Evans, Hedvig Skirgård, Martin Haspelmath, Michael Dunn
Publication: None, so far
Focus: Global.

Inherited features in Grambank

Grambank consists of

118 features from the Sahul questionnaire
42 features from the Nijmegen Typological Survey that are not in the Sahul questionnaire
4 features from the Pioneers questionnaire that do not occur in the Sahul questionnaire, viz. the numeral system questions
34 new features that do not occur in any of the source questionnaires. These features include questions on interrogative pronouns and on evidentiality as well as elaboration on the negation and verbal derivation questions

We have strived to keep the meaning of the legacy features unchanged in Grambank. Some features have seen a change in interpretation to facilitate coding of phenomena that were not relevant in the languages of the Sahul and Pioneers questionnaires.

References

Dryer, Matthew S. & Haspelmath, Martin (eds.) 2013. The World Atlas of Language Structures Online. Leipzig: Max Planck Institute for Evolutionary Anthropology.

Dunn, Michael, Angela Terrill, Ger Reesink, Robert A. Foley & Stephen C. Levinson. 2005. Structural phylogenetics and the reconstruction of ancient language history. Science 309. 2072–2075.

Dunn, Michael, Stephen C. Levinson, Eva Lindström, Ger Reesink, & Angela Terrill. 2008. Structural phylogeny in historical linguistics: Methodological explorations applied in Island Melanesia. Language 84(4). 710-759

Hammarström, Harald & Forkel, Robert & Haspelmath, Martin & Bank, Sebastian. 2020. Glottolog 4.3. Jena: Max Planck Institute for the Science of Human History.

Haspelmath, Martin. 2010. Comparative concepts and descriptive categories and in cross-linguistic studies. Language, 86:663–687. Max Planck Institute for Evolutionary Anthropology.

Hedvig Skirgård, Suzanne van der Meer & Harald Hammarström. 2014. The Nijmegen Typological Survey (NTS). 44rd Colloquium on African languages and linguistics. Leiden, the Netherlands.

Reesink, Ger & Michael Dunn 2012. Systematic typological comparison as a tool for investigating language history. in Nicholas Evans and Marian Klamer (eds.) Language Documentation & Conservation Special Publication No. 5: Melanesian Languages on the Edge of Asia: Challenges for the 21st Century. pp. 34–71

Reesink, Ger, Singer, Ruth, & Dunn, Michael. 2009. Explaining the linguistic diversity of Sahul using population models. PLoS Biology, 7(11), e1000241.