Reviewing Criteria - acm-toce/documentation GitHub Wiki

Below we discuss the criteria that all reviewers and Associate Editors should use to evaluate research submissions and why we use those criteria. For a paper to be accepted, it should generally meet all four of these criteria well. Meeting any one criterion poorly is grounds for rejection. Many of these criteria and guidelines overlap with the recommendations by Heckman et al. 2022, which offers helpful recommendations and reviewing rubrics, as well as Brooke C. Coley, Denise R. Simmons, and Susan M. Lord's Dissolving the margins: LEANING INto an antiracist review process.

Grounding in Prior Work

The submission is grounded in relevant prior work.

Papers should both cite relevant prior work and explicitly show how it relates to the paper’s research questions.

This includes any available theory, which can be an informative guide in shaping a research question. If there’s theory that’s relevant to a submission, the submission should discuss it. That said, not all types of research will have relevant theory to discuss.

This also includes citing work that is repeatedly ignored, including work by scholars of color, and other scholars whose identities are marginalized. Such works are often systematically ignored and dismissed, and so papers need to give extra attention to not ignoring these discoveries, theories, and insights.

After reading the paper, you should feel more informed about prior literature and how that literature is related to the paper’s contributions. Such coverage of related work might come before a work’s contributions, or it might come after (e.g, connecting a new theory derived from observations to prior work). Don't worry about exactly where it is placed; the important thing is that the work is done in the context of the most relevant prior discoveries.

Identify related work the authors might have missed and include it in your review. Missing a paper that is relevant, but would not dramatically change the paper, is not sufficient grounds for rejecting a paper. Focus on missing prior work that would significantly alter research questions, analysis, or interpretation of results.

Do not critique work for...

Missing 1 or 2 peripherally related papers. Just note them, helping the authors to broaden their citations.
Not citing your own work, unless it really is objectively highly relevant.
Not having a related work section. Sometimes a dedicated section is appropriate, sometimes it is not. Sometimes prior work is better addressed at the end of a paper, not at the beginning.
For making discoveries inconsistent with theory. The point of empirical work is to test and refine theories, not conform to them.
Not building upon theory when there’s no sufficient theory available.
Not using the same interpretation of a theory as you; many theories have multiple competing interpretations and multiple distinct facets that can be seen from multiple perspectives.

Do critique work for...

Listing papers without meaningfully addressing their relevance to the paper’s questions or innovations. Lists and summaries of work are not sufficient; there should be interpretation and synthesis of prior work in relation to the research question posed.

Soundness

The submission’s methods and/or innovations soundly address its research questions.

Obviously, this isn't possible if the submission does not have a research question, or the research question isn't clear. But even if it does, the paper should answer the questions it poses, and it should do so with rigor (broadly construed). This is the single most important difference between research papers and other kinds of knowledge sharing in computing education (e.g., experience reports), and the source of certainty researchers can offer.

Soundness includes thoughtful and meaningful use of concepts. For example, many submissions will claim contributions to diversity, equity, and inclusion, but simply use keywords, and not engage with any of the deep scholarship on the topic. This is not "sound" in the same way that mentioning a statistical technique and then not using it correctly is not sound.

Soundness can also include positionality; for example, a work might seek to address a problem in a community, but if none of the authors are part of that community, and the work does not engage or partner with that community, claims about "solving" a problem in that community may not be sound.

Note that soundness is relative to claims. For example, if a paper claims to have provided evidence of causality, but its methods did not do that, that would be grounds for critique. But if a paper only claimed to have found a correlation, and that correlation is a notable discovery that future work could explain, critiquing it for not demonstrating causality would be inappropriate.

A key part of assessing soundness is having the necessary details about methods and/or how an innovation was constructed. You should be able to understand most of the key details about how the authors conducted their work or made their invention possible. This is key for replication and meta-analysis of studies that come from positivist or post-positivist epistemologies. For interpretivist works, it is also key for what Checkland and Howell called “recoverability” (See Tracy et al. 2010 for a detailed overview for evaluating qualitative work). Focus your critiques on omissions of research process or innovation details that would significantly alter your judgement of the paper’s validity.

Do not critique a paper for...

Not describing every detail. There are an infinite number of details to include; the goal is to include the most significant ones for replication, recoverability, meta-analysis, and comprehension.
Not including a method section when one isn't appropriate. Papers that contribute new theories, novel arguments, or new innovations do not need them, as they are not necessarily empirical.
Using qualitative methods. If you disagree with this, you should probably not be reviewing the paper.
Failing to follow quantitative methods standards for qualitative methods (e.g., critiquing a case study for a “small N” makes no sense; that is the point of a case study).
A lack of a statistically significant difference if the study demonstrates sufficient power to detect a difference; a lack of difference can be discovery too.
Being focused on a particular location, geography, demographic, or identity. Learning is dependent on context and all contexts and identities matter.
Not doing more; if the demonstrated claims are sufficiently publishable, then we should publish them (e.g., Don't say “I would publish this if it had also demonstrated knowledge transfer”).
Discovering something inconsistent with your experience (e.g., no inexpert, anecdotal judgements such as “I don’t know much about this but I played with it once and it didn’t work”).
Expecting generalizability from interpretive work (e.g. requiring demographic information about participants, setting an arbitrary number of observations, reporting inter-rater reliability or code counts, requesting participant IDs for every quote). All of these requests incorrectly apply a positivist frame to interpretive work by expecting evidence that data presented is a presentative of a population. (See Soden et al. for details on why these are inappropriate).

Do critique a paper for:

Omitting details that would support replication or meta-analysis for positivist or post-positivist works, and recoverability for interpretivist works using qualitative methods.
Overlooking research published in communities outside of computing education research; just because a method hasn’t been used in computing education literature doesn't mean that it isn’t standard somewhere else. The field draws upon methods from many communities. Look for evidence that the method is used elsewhere.
Using dehumanizing language or terminology to refer to people's identities and communities
Not clearly identifying the methods used in a study, unless small samples might identify participants.
Not providing information about who was studied.
Not offering a clear research question, when appropriate.
Not describing analysis procedures.
Not thoroughly addressing threats to validity.
Not addressing intersectionality in analyses involving identity, either methodologically or in limitations.
Not addressing research ethics (see McGill et al.)
Not engaging researcher reflexivity (see McGill et al.)

Significance

The submission advances knowledge of one or more computing education phenomena.

A submission can meet the previous criteria and still fail to advance what we know about the phenomena (e.g., a paper well-situated in prior work with sound methods can discover something we already knew with certainty). It is up to the authors to convince you that the discoveries advance our knowledge in some way, whether it’s as incremental as confirming uncertain prior work, or adding a significant new idea.

Also, there should be someone who might find the discovery interesting. It does not have to be interesting to you, and you do not have to be 100% confident that an audience exists. A possible audience is sufficient for publication, as reviewers and the editorial board likely does not perfectly reflect the broader audience of readers (especially future readers). In particular, reflect on who stands to benefit from this work, who might be harmed by it, as well on the history of who prior knowledge has served and not served.

Part of articulating how a submission advances our understanding is offering interpretations of the significance of a paper’s discoveries. If it makes significant advances, but does not explain what those advances are and why they matter, the paper is not ready for publication.

Do not critique a submission...

Because a single prior work has already been published on the topic. Discoveries accumulate over many papers, not just one.
For “only” being a replication. Replications are important.
For examining a small population; just because a group is minoritized, marginalized, or small in number does not mean they are insignificant.
That contributes a new idea but does not yet have everything figured out. Such insight can require multiple papers.
For not being generalizable enough; generalizability takes time, and some types of qualitative work don’t intend to be generalizable.
For advancing knowledge about a phenomena you personally don’t like (e.g., “I hate object-oriented languages, this work doesn’t matter”).

Do critique a submission for...

Not summarizing its discoveries.
Not discussing the significance of its discoveries.
Interpreting its discoveries in ways that go beyond its evidence and/or arguments.

Clarity

The submission's writing is clear and concise.

Papers need to be clear and concise, both to be comprehensible to diverse audiences, but also to ensure the community is not overburdened by verboseness. We recognize that not all authors are fluent English writers; however, if the paper requires significant editing to be comprehensible to fluent English readers, or it is unnecessarily verbose, it is not yet ready for publication.

Do not critique a paper for...

Having easily fixed spelling and grammar issues. They can just fix them in revisions.
Merely being too short. Some papers don't need a lot of space to clearly convey their discoveries. (Of course, a short paper might be critiqued on other grounds, such as omitting methodological details).
Merely being too long. Some papers (especially qualitative work) need more words to convey their discoveries. They shouldn't be penalized for their choice of method. (Of course, a long paper might be critiqued on other grounds, such as being too verbose).

Recommendations

Based on the criteria above, reviewers and Associate Editors select one of four recommendations: Accept, Accept with Revisions, Revise and Resubmit, or Reject. You can see the precise meaning of each of these in our Associate Editor guidelines.