Dataset review criteria - acm-toce/documentation GitHub Wiki

Grounding in Prior Work

The submission is grounded in relevant prior work.

Papers should demonstrate the long-term value of their data by citing relevant prior work and explicitly show how it relates to the paper’s dataset. This also includes citing work that is repeatedly ignored, including work by scholars of color, and other scholars whose identities are marginalized. Such works are often systematically ignored and dismissed, and so papers need to give extra attention to not ignoring these discoveries, theories, and insights. After reading the paper, you should feel more informed about prior literature and how that literature is related to the paper’s contributions. Such coverage of related work might come before a work’s contributions, or it might come after (e.g, connecting a new theory derived from observations to prior work). Don't worry about exactly where it is placed; the important thing is that the work is done in the context of the most relevant prior discoveries. Identify related work the authors might have missed and include it in your review. Missing a paper that is relevant, but would not dramatically change the paper, is not sufficient grounds for rejecting a paper. Focus on missing prior work that would significantly impact the utility of the data.

Do not critique work for...

Missing 1 or 2 peripherally related papers. Just note them, helping the authors to broaden their citations.
Not citing your own work, unless it really is objectively highly relevant.
Not having a related work section. Sometimes a dedicated section is appropriate, sometimes it is not. Sometimes prior work is better addressed at the end of a paper, not at the beginning.

Do critique work for...

Listing papers without meaningfully addressing their relevance to the data presented in the paper. Lists and summaries of work are not sufficient; there should be interpretation and synthesis of prior work in regards to the datas’ utility and implications.

Soundness

The data submission soundly documents the content and context of the dataset.

The paper should thoroughly describe the contents of the dataset. This should include any data types included, as well as the structure and format of the dataset. Soundness also includes thoughtful and meaningful descriptions of the dataset's context, including subject demographics and data collection methods. Author’s should transparently discuss any limitations or biases within the participant group, and address the dataset’s generalizability.

The submission must also include a supporting data statement, following the provided data statement schema. This statement should include descriptions of the dataset, the data collection process, participant demographics and contexts, and how to access the dataset. Both the data paper and the data statement should describe the dataset in enough detail to ensure that any researcher can easily understand and use it for their work.

Soundness can also include positionality; for example, a dataset might seek to address a problem in a community, but if none of the authors are part of that community, and the data does not engage or partner with that community, the dataset’s application in that community may not be sound.

Do not critique a paper for...

Not describing every detail. There are an infinite number of details to include; the goal is to include the most significant ones for general comprehension and use of the dataset.
Being primarily qualitative data. If you disagree with this, you should probably not be reviewing the paper.
Overlapping with existing published datasets. Overlapping datasets allow for better validation of methods in multiple contexts. Being focused on a particular location, geography, demographic, or identity. Learning is dependent on context and all contexts and identities matter.
Not doing more; if the demonstrated claims are sufficiently publishable, then we should publish them (e.g., Don't say “I would publish this if it had also demonstrated knowledge transfer”).
Lack of full generalizability. As long as the authors acknowledge its limitations and discuss the specific contexts in which it is applicable.

Do critique a paper for:

Omitting details that would prevent use of the dataset for further works.
Overlooking research published in communities outside of computing education research; just because a method hasn’t been used in computing education literature doesn't mean that it isn’t standard somewhere else. The field draws upon methods from many communities. Look for evidence that the method is used elsewhere.
Using dehumanizing language or terminology to refer to people's identities and communities
Not clearly identifying the methods used in a study, unless small samples might identify participants.
Not providing information about who was studied.
Not describing analysis procedures.
Not thoroughly addressing threats to validity.
Not addressing intersectionality in analyses involving identity, either methodologically or in limitations.
Not addressing research ethics (see McGill et al.)
Not engaging researcher reflexivity (see McGill et al.)

Significance

The data submission has implications for advancing knowledge in one or more computing education topics.

The paper must justify the motivation for a new dataset. It should describe the dataset’s utility in the scope of computing education research and where applicable demonstrate the dataset’s advantages and distinctions compared to existing data or methodologies. It is up to the authors to convince you that the discoveries data will advance our knowledge in some way, whether it’s as incremental as confirming uncertain prior work, or adding a significant new idea.

Also, there should be someone who might find the data interesting and useful. It does not have to be interesting to you, and you do not have to be 100% confident that an audience exists. A possible audience is sufficient for publication, as reviewers and the editorial board likely does not perfectly reflect the broader audience of readers (especially future readers). In particular, reflect on who stands to benefit from this work, who might be harmed by it, as well on the history of who prior knowledge has served and not served.

Analysis and interpretation of the data is not required. Although some analysis may be necessary to demonstrate data quality or validate the data’s utility, there should not be extensive interpretation of the data. If the paper includes considerable analyses, it would be better suited as a research submission.

Do not critique a submission... Because a comparable dataset has already been published. Multiple related data sources both reinforce discoveries through greater generalizability and allow for comparative studies. For containing a small population; just because a group is minoritized, marginalized, or small in number does not mean they are insignificant. For not being generalizable enough; generalizability takes time, and data publication supports that process. For advancing knowledge about a phenomena you personally don’t like (e.g., “I hate object-oriented languages, this work doesn’t matter”).

Do critique a submission for...

Not discussing the implications and utility of the data for future research.

Clarity

The submission's writing is clear and concise.

Papers need to be clear and concise, both to be comprehensible to diverse audiences, but also to ensure the community is not overburdened by verboseness. We recognize that not all authors are fluent English writers; however, if the paper requires significant editing to be comprehensible to fluent English readers, or it is unnecessarily verbose, it is not yet ready for publication.

Do not critique a paper for...

Having easily fixed spelling and grammar issues. They can just fix them in revisions.
Merely being too short. Some papers don't need a lot of space to clearly convey their discoveries. (Of course, a short paper might be critiqued on other grounds, such as omitting methodological details).
Merely being too long. Some papers (especially qualitative work) need more words to convey their discoveries. They shouldn't be penalized for their choice of method. (Of course, a long paper might be critiqued on other grounds, such as being too verbose).