1 Introduction

Journal editors occupy an important position in the scientific landscape. By making the final decision on which papers get published in their journal and which papers do not, they have a significant influence on what work is given attention and what work is ignored in their field (Crane 1967).

In this paper I investigate the following question: should the editor be informed about the identity of the author when she is deciding whether to publish a particular paper? Under a single- or double-anonymous reviewing procedure, the editor knows who the author of each submitted paper is.Footnote 1 Under a triple-anonymous reviewing procedure, the author’s name and affiliation are hidden from the editor unless and until the paper is accepted for publication. So the question is: should journals practice triple-anonymous reviewing?Footnote 2

Two kinds of arguments have been given in favor of triple-anonymous reviewing. One focuses on the treatment of the author by the editor. On this kind of argument, revealing identity information to the editor will lead the editor to (partially) base her judgment on irrelevant information. This is unfair to the author, and is thus bad.

The second kind of argument highlights the effect on the journal and its readers. Again, the idea is that the editor will base her judgment on identity information if given the chance to do so. But now the further claim is that as a result the journal will accept worse papers. After all, if a decision to accept or reject a paper is influenced by the editor’s biases, this suggests that a departure has been made from a putative “objectively correct” decision. This harms the readers of the journal, and is thus bad.Footnote 3

This paper assesses these arguments. I distinguish between two different ways the editor’s judgment may be affected if the author’s identity is revealed to her. First, the editor may treat authors she knows differently from authors she does not know, a phenomenon I will call connection bias. Second, the editor may treat authors differently based on some aspect of their identity (e.g., their gender), which I will call identity bias. I make the following three claims.

My first claim is that connection bias actually benefits rather than harms the readers of the journal. This benefit is the result of a reduction in editorial uncertainty about the quality of submitted papers. I construct a model to show in a formally precise way how such a benefit might arise—surprisingly, no assumption that the scientists the editor knows are “better scientists” is required—and I cite empirical evidence that such a benefit indeed does arise. However, this benefit only applies in certain fields; I argue that mathematics and parts of the humanities are excluded (Sect. 2).

My second claim is that whenever connection bias or identity bias affects an editorial decision, this constitutes an epistemic injustice in the sense of Fricker (2007) against the disadvantaged author. If the editor is to be (epistemically) just, she should prevent these biases from operating, which can be done through triple-anonymous reviewing. So I endorse an argument of the first of the two kinds I identified above: triple-anonymous reviewing is preferable because not doing so is unfair to authors (Sects. 3, 4).

My third claim is that whether editorial biases harm the journal and its readers depends on a number of factors. Connection bias benefits readers, whereas identity bias harms them. Whether there is an overall benefit or harm depends on the strength of the editor’s identity bias, the relative sizes of the different groups, and other factors, as I illustrate using the model. As a result I do not in general endorse the second kind of argument, that triple-anonymous reviewing is preferable because readers of the journal are harmed otherwise. However, I do endorse this argument for fields like mathematics, where I claim that the benefits of connection bias do not apply (Sect. 5).

Zollman (2009) has studied the effects of different editorial policies on the number of papers published and the selection criteria for publication, but he does not focus specifically on the editor’s decisions. Economists have studied models in which editorial decisions play an important role (Ellison 2002; Faria 2005; Besancenot et al. 2012), but they have not been concerned with biases the editor may be subject to. Other economists have done empirical work investigating the differences between papers with and without an author-editor connection (Laband and Piette 1994; Medoff 2003; Smith and Dombrowski 1998, more on this later), but they do not provide a model that can explain these differences. This paper thus fills a gap in the literature.

I compare double- and triple-anonymous reviewing as opposed to single- and double-anonymous reviewing. The latter comparison has been studied extensively, see Blank (1991) for a prominent empirical study and Snodgrass (2006) and Lee et al. (2013, especially pp. 10–11) for literature reviews. In contrast, I know of almost no empirical or theoretical work directly comparing double- and triple-anonymous reviewing (one exception is Lee and Schunn 2010, p. 7).

While I focus on comparing double- and triple-anonymous review, some of what I say may carry over to the context of comparing single- and double-anonymous review. In Sect. 5 I comment briefly on the extent to which the formal model I present applies in the context of comparing single- and double-anonymous review. However, I leave it to the reader to judge to what extent the arguments I make on the basis of the model carry over.

2 A model of connection bias

As mentioned, journal editors have a certain measure of power in a scientific community because they decide which papers get published.Footnote 4 An editor could use this power to the benefit of her friends or colleagues, or to promote certain subfields or methodologies over others. This phenomenon has been called editorial favoritism.

Bailey et al. (2008a, b) find that academics believe editorial favoritism to be fairly prevalent, with a nonnegligible percentage claiming to have perceived it firsthand. Hull (1988, chapter 9) finds a limited degree of favoritism in his study of reviewing practices at the journal Systematic Zoology. And Laband (1985) and Piette and Ross (1992) find that papers whose author has a connection to the journal editor are allocated more journal pages than papers by authors without such a connection.Footnote 5

In this paper, I refer to the phenomenon that editors are more likely to accept papers from authors they know than papers from authors they do not know as connection bias.

Academics tend to disapprove of this behavior (Sherrell et al. 1989; Bailey et al. 2008a, b). In both studies by Bailey et al., in which subjects were asked to rate the seriousness of various potentially problematic behaviors by editors and reviewers, this disapproval was shown to be part of a general and strong disapproval of “selfish or cliquish acts” in the peer review process.Footnote 6 Thus it appears that the reason academics disapprove of connection bias is that it shows the editor acting on private interests, whereas disinterestedness is the norm in science (Merton 1942).

On the other hand, there is some evidence that connection bias improves the overall quality of accepted papers (Laband and Piette 1994; Medoff 2003; Smith and Dombrowski 1998). Does this mean scientists are misguided in their disapproval?

In this section, I use a formal model to show that editors may display connection bias even if their only goal is to accept the best papers, and that this may improve quality, consistent with Laband and Piette’s, Medoff’s, and Smith and Dombrowski’s findings. Note that in this section I discuss connection bias only. Subsequent sections discuss identity bias.

Consider a simplified scientific community. Each scientist produces a paper and submits it to the community’s only journal which has one editor. Some papers are more suitable for publication than others. I assume that this suitability can be measured on a single numerical scale. For convenience I call this the quality of the paper. However, I remain neutral on how this notion should be interpreted, e.g., as an objective measure of the epistemic value of the paper, or as the number of times the paper would be cited in future papers if it was published, or as the average subjective value each member of the scientific community would assign to it if they read it.Footnote 7

Crucially, the editor does not know the quality of the paper at the time it is submitted. This section aims to show how uncertainty about quality can lead to connection bias. To make this point, I assume that the editor cares only about quality, i.e., she makes an estimate of the quality of a paper and publishes those and only those papers whose quality estimate is high.

Let \(q_i\) be the quality of the paper submitted by scientist i. \(q_i\) is modeled as a random variable to reflect uncertainty about quality. Since some scientists are more likely to produce high quality papers than others, the mean \(\mu _i\) of this random variable may be different for each scientist. I assume that quality follows a normal distribution with fixed variance: \(q_i\mid \mu _i \sim N(\mu _i,\sigma _{in}^2)\) (read: “\(q_i\) given \(\mu _i\) follows a normal distribution with mean \(\mu _i\) and variance \(\sigma _{in}^2\)”; the subscript in indicates that this is the variance in the quality of individual papers by the same author).

The assumptions of normality and fixed variance are made primarily to keep the mathematics simple. Below I make similar assumptions on the distribution of average quality in the scientific community and the distribution of reviewers’ estimates of the quality of a paper. The results below likely hold under many different distributional assumptions.Footnote 8

If the editor knows scientist i, she has some prior information on the average quality of scientist i’s work. This is reflected in the model by assuming that the editor knows the value of \(\mu _i\). In contrast, the editor is uncertain about the average quality of the work of scientists she does not know. All she knows is the distribution of average quality in the larger scientific community, which I also assume to be normal: \(\mu _i \sim N(\mu ,\sigma _{sc}^2)\).

Note that I assume the scientific community to be homogeneous: average paper quality follows the same distribution in the two groups of scientists (those known to the editor and those not known to the editor). If I assumed instead that scientists known to the editor write better papers on average the results would be qualitatively similar to those I present below. If scientists known to the editor write worse papers on average this would affect my results. However, since most journal editors are relatively central figures in their field (Crane 1967), this seems implausible for most cases.

The editor’s prior for the quality of a paper submitted by some scientist i reflects this difference in information. If she knows the scientist she knows the value of \(\mu _i\), and so her prior is \(\pi (q_i\mid \mu _i) \sim N(\mu _i,\sigma _{in}^2)\). If the editor does not know scientist i she is uncertain about \(\mu _i\). Integrating out this uncertainty yields a prior \(\pi (q_i) \sim N(\mu ,\sigma _{in}^2 + \sigma _{sc}^2)\) for the quality of scientist i’s paper.

When the editor receives a paper she sends it out for review. The reviewer provides an estimate \(r_i\) of the paper’s quality which is again a random variable. I assume that the reviewer’s report is unbiased, i.e., its mean is the actual quality \(q_i\) of the paper. Once again I use a normal distribution to reflect uncertainty: \(r_i\mid q_i \sim N(q_i,\sigma _{rv}^2)\).Footnote 9

The editor uses the information from the reviewer’s report to update her beliefs. I assume that she does this by conditioning on \(r_i\). Thus, her posterior for the quality of scientist i’s paper is \(\pi (q_i\mid r_i)\) if she does not know the author, and \(\pi (q_i\mid r_i,\mu _i)\) if she does.

The posterior distributions are themselves normal distributions whose mean is a weighted average of \(r_i\) and the prior mean (see Proposition 5 in “Appendix”). I write \(\mu _i^U\) for the mean of the posterior distribution if the editor does not know scientist i and \(\mu _i^K\) if she does.

I assume that the editor publishes any paper whose (posterior) expected quality is above some threshold \(q^*\). So a paper written by a scientist unknown to the editor is published if \(\mu _i^U > q^*\) and a paper written by a scientist known to the editor is published if \(\mu _i^K > q^*\). Other standards could be used: risk-averse standards might require high (greater than 50%) confidence that the paper is above some threshold. For the qualitative results presented here this makes no difference (see Proposition 7 in the Appendix).

The first theorem establishes the existence of connection bias in the model (refer to the Appendix for all proofs). It says that the editor is more likely to publish a paper written by an arbitrary author she knows than a paper written by an arbitrary author she does not know, whenever \(q^* > \mu\) (for any positive value of \(\sigma _{sc}^2\) and \(\sigma _{rv}^2\)). The condition amounts to a requirement that the journal’s acceptance rate is less than 50%. This is true of most reputable journals in most fields (physics being a notable exception).

Theorem 1

(Connection Bias) If \(q^* > \mu\), \(\sigma _{sc}^2 > 0\), and \(\sigma _{rv}^2 > 0\), the acceptance probability for authors known to the editor is higher than the acceptance probability for authors unknown to the editor, i.e., \(\Pr (\mu _i^K> q^*)> \Pr (\mu _i^U > q^*).\)

Theorem 1 shows that in my model any journal with an acceptance rate lower than 50% will be seen to display connection bias. Thus I have established the surprising result that an editor who cares only about the quality of the papers she publishes may end up publishing more papers by her friends and colleagues than by scientists unknown to her, even if her friends and colleagues are not, as a group, better scientists than average.Footnote 10

Why does this surprising result hold? The distribution of the posterior mean \(\mu _i^U\) has lower variance than the distribution of \(\mu _i^K\) (see Proposition 6 in the Appendix). That is, the variance of \(\mu _i^U\) is lower in an “objective” sense: this is not a claim about the editor’s subjective uncertainty about her judgment. This is because \(\mu _i^U\) is a weighted average of \(\mu\) and \(r_i\), keeping it relatively close to the overall mean \(\mu\) compared to \(\mu _i^K\), which is a weighted average of \(\mu _i\) and \(r_i\) (which tend to differ from \(\mu\) in the same direction).

Note that the result assumes that scientists known to the editor and scientists unknown to the editor are held to the same “standard” (the threshold \(q^*\)). Alternatively, the editor might enforce equal acceptance rates for the two groups. This would be formally equivalent to raising the threshold for known scientists (or lowering the threshold for unknown scientists).

Theorem 1 describes a subjective effect: an editor who uses information about the average quality of papers produced by scientists she knows will believe that scientists she knows produce on average more papers that meet her quality threshold. Does this translate into an objective effect?

In order to answer this question I compare the average quality of accepted papers, or more formally, the expected value of the quality of a paper, conditional on meeting the publication threshold, given that the author is either known to the editor or not.

Theorem 2

(Positive Effect of Connection Bias) If \(\sigma _{sc}^2 > 0\), and \(\sigma _{rv}^2 > 0\) , the average quality of accepted papers from authors known to the editor is higher than the average quality of accepted papers from authors unknown to the editor, i.e., \(\mathbb {E}[q_i\mid \mu _i^K> q^*]> \mathbb {E}[q_i\mid \mu _i^U > q^*]\).

The editor’s knowledge of the average quality of papers written by scientists she knows makes it such that among those scientists relatively many whose papers are accepted have relatively high average quality. Since this correlates with paper quality the average quality of accepted papers in this group is relatively high, yielding Theorem 2.

The theorem shows that the editor can use the extra information she has about scientists she knows to improve the average quality of the papers published in her journal. The surprising result, then, is that the editor’s connection bias actually benefits rather than harms the readers of the journal. In other words, the editor can use her connections to “identify and capture high-quality papers”, as Laband and Piette (1994) suggest.

To what extent does this show that the connection bias observed in reality is the result of editors capturing high-quality papers, as opposed to editors using their position of power to help their friends? At this point the model yields an empirical prediction. If connection bias is (primarily) due to capturing high-quality papers, the quality of papers by authors the editor knows should be higher than average, as shown in the model. If, on the other hand, connection bias is (primarily) a result of the editor accepting for publication papers written by authors she knows even though they do not meet the quality standards of the journal, then the quality of papers by authors the editor knows should be lower than average.

If subsequent citations are a good indication of the qualityFootnote 11 of a paper, a simple regression can test whether accepted papers written by authors with an author-editor connection have higher or lower average quality than papers without such a connection. This empirical test has been carried out a number of times, and the results favor the hypothesis that editors use their connections to improve the quality of published papers (Laband and Piette 1994; Smith and Dombrowski 1998; Medoff 2003).Footnote 12

Note that in the above (qualitative) results, nothing depends on the sizes of the variances \(\sigma _{in}^2\), \(\sigma _{sc}^2\), and \(\sigma _{rv}^2\). The values of the variances do matter when the acceptance rate and average quality of papers are compared quantitatively. For example, reducing \(\sigma _{rv}^2\) (making the reviewer’s report more accurate) reduces the differences in the acceptance rate and average quality of papers.

Note also that the results depend on the assumption that \(\sigma _{sc}^2\) and \(\sigma _{rv}^2\) are positive. What is the significance of these assumptions?

If \(\sigma _{rv}^2 = 0\), i.e., if there is no variance in the reviewer’s report, the reviewer reports the quality of the paper with perfect accuracy. In this case the “extra information” the editor has about authors she knows is not needed, and so there is no difference in acceptance rate or average quality based on whether the editor knows the author. But it seems unrealistic to expect reviewer’s reports to be this accurate.

If \(\sigma _{sc}^2 = 0\) there is either no difference in the average quality of papers produced by different authors, or learning the identity of the author does not tell the editor anything about the expected quality of that scientist’s work. In this case there is no value to the editor (with regard to determining the quality of the submitted paper) in learning the identity of the author. So here there is also no difference in acceptance rate or average quality based on whether the editor knows the author.

Under what circumstances should the identity of the author be expected to tell the editor something useful about the quality of a submitted paper? This seems to be most obviously the case in the lab sciences. The identity of the author, and hence the lab at which the experiments were performed, can increase or decrease the editor’s confidence that the experiments were performed correctly, including all the little checks and details that are impossible to describe in a paper. In such cases, “ the reader must rely on the author’s (and perhaps referee’s) testimony that the author really performed the experiment exactly as claimed, and that it worked out as reported” (Easwaran 2009, p. 359).

But in other fields, in particular mathematics and those parts of the humanities that focus on abstract arguments, there is no need to rely on the author’s reputation. This is because in these fields the paper itself is the contribution, so it is possible to judge papers in isolation of how or by whom they were created (Easwaran 2009). And in fact there exists a norm that this is how they should be judged: “Papers will rely only on premises that the competent reader can be assumed to antecedently believe, and only make inferences that the competent reader would be expected to accept on her own consideration.” (Easwaran 2009, p. 354).

Arguably then, the epistemic advantage conferred by revealing identity information about the author to the editor applies only in certain fields. The relevant fields are those where part of the information in the paper is conferred on the authority of testimony. In mathematics and parts of the humanities, where a careful reading of a paper itself constitutes a reproduction of its argument, there is no relevant information to be learned from the identity of the author (i.e., \(\sigma _{sc}^2 = 0\)). Or at least the publishing norms in these fields suggest that their members believe this to be the case.

3 Connection bias as an epistemic injustice

The previous section discussed a formal model of editorial uncertainty about paper quality. I first established the existence of connection bias in this model. Then I showed that connection bias benefits the readers of the journal, insofar as readers care about the quality of accepted papers. Despite this benefit to readers, I claim that connection bias is unfair to authors. In this section I argue this claim by appealing to the concept of epistemic injustice, as developed by Fricker (2007).

The type of epistemic justice that is relevant here is testimonial injustice. Fricker (2007, pp. 17–23) defines a testimonial injustice as a case where a speaker suffers a credibility deficit for which the hearer is ethically and epistemically culpable.

Testimonial injustices may arise in various ways. Fricker is particularly interested in what she calls “the central case of testimonial injustice” (Fricker 2007, p. 28). This kind of injustice results from a negative identity-prejudicial stereotype, which is defined as follows:

A widely held disparaging association between a social group and one or more attributes, where this association embodies a generalization that displays some (typically, epistemically culpable) resistance to counter-evidence owing to an ethically bad affective investment. (Fricker 2007, p. 35)

Because the stereotype is widely held, it produces systematic testimonial injustice: the relevant social group will suffer a credibility deficit in many different social spheres.

It is clear that connection bias is not an instance of the central case of testimonial injustice. This would require some negative stereotype associated with scientists unknown to the editor (as a group) which does not normally exist. So I set the central case aside (I return to it in Sect. 4) and focus on the question whether connection bias can produce (non-central cases of) testimonial injustice.

How are individual scientists affected by the differential acceptance rates established in Sect. 2? For scientist i, the probability of acceptance given the average quality of her papers \(\mu _i\) denotes the long-run average proportion of her papers that will be accepted (assuming she submits all her papers to the journal).

Theorem 3

(Acceptance Rate for Individual Authors) Assume \(\sigma _{sc}^2 > 0\) and \(\sigma _{rv}^2 > 0\) . The acceptance rate for author i (with average quality  \(\mu _i\) ) is higher if the editor knows her if and only if \(\mu _i\) exceeds a weighted average of \(\mu\) and \(q^*:\)

$$\Pr \left( {\mu _{i}^{K} > q^{*} |\mu _{i} } \right) \ge \Pr \left( {\mu _{i}^{U} {\text{ }} > q^{*} |\mu _{i} } \right)\quad iff\quad \mu _{i} \ge \frac{{\sigma _{{in}}^{2} }}{{\sigma _{{in}}^{2} + \sigma _{{sc}}^{2} }}\mu + \frac{{\sigma _{{sc}}^{2} }}{{\sigma _{{in}}^{2} + \sigma _{{sc}}^{2} }}q^{*}.$$

The strict version is true as well, i.e., if the editor knows scientist i she is strictly better off if and only if \(\mu _i\) strictly exceeds the weighted average.

Note that regardless of the values of the variances, any scientist whose average quality exceeds the threshold value (\(\mu _i \ge q^*\)) benefits from connection bias. Conversely, a scientist of below average quality (\(\mu _i \le \mu\)) is actually worse off if the editor knows her.Footnote 13

Consider what this theorem says for a particular scientist i who is unknown to the editor and whose average quality \(\mu _i\) strictly exceeds the weighted average. Some of her papers are rejected even though they would have been accepted if the editor knew her. In Fricker’s terminology, scientist i suffers from a credibility deficit: fewer of her papers are considered credible (i.e., publishable) by the editor than would have been considered credible if the editor knew her.

Is this credibility deficit suffered by scientist i ethically and epistemically culpable on the part of the editor? On the one hand, the editor is simply making maximal use of the information available to her. It just so happens that she has more information about scientists she knows than about others. But that is hardly the editor’s fault. Is it incumbent upon her to get to know the work of every scientist who submits a paper?

This may well be too much to ask. But an alternative option is to remove all information about the authors of submitted papers. This can be done by using a triple-anonymous reviewing procedure, in which the editor is prevented from using information about scientists she knows in her evaluation.

I conclude that the editor is ethically and epistemically culpable for credibility deficits suffered by scientists unknown to the editor whose average quality exceeds the weighted average specified in Theorem 3, and hence testimonial injustices are committed against such authors when a double-anonymous reviewing procedure is used. A similar epistemic injustice occurs for scientists known to the editor whose average quality is below the weighted average, as such authors would prefer that the editor not use information she has about their average quality.

It is worth noting explicitly which scientists are better or worse off in terms of acceptance rates if a triple-anonymous procedure is introduced. If the acceptance threshold \(q^*\) is held constant,Footnote 14 nothing changes for scientists unknown to the editor. Scientists known to the editor will see their acceptance rate go down if their average quality exceeds the weighted average specified in Theorem 3, and up otherwise. The overall acceptance rate of the journal will go down (by Theorem 1).

So the group that I based my argument on (unknown scientists of high average quality) is not necessarily made better off by switching to triple-anonymous reviewing. The argument for triple-anonymous reviewing given in this section is not about benefiting one group of scientists or harming another: rather, it is about fairness. Under a triple-anonymous procedure, at least all scientists are treated equally: any scientist who writes a paper of a given quality has the same chance of seeing that paper accepted. Whereas under a double-anonymous procedure, scientists are treated unfairly in that their acceptance rates may differ based only on an epistemically irrelevant characteristic (knowing the editor).

I conclude that while journal readers may benefit from connection bias, it involves unfair treatment of authors. Because this unfair treatment takes the form of an epistemic injustice, which involves both ethically and epistemically culpable behavior, connection bias has both an epistemic benefit (to readers) and a cost (to the author). It would be a misinterpretation of my analysis, then, to conclude that connection bias is epistemically good but ethically bad.

4 Identity bias as an epistemic injustice

So far, I have assumed that connection bias is the only bias journal editors display. The literature on implicit bias suggests further biases: “[i]f submissions are not anonymous to the editor, then the evidence suggests that women’s work will probably be judged more negatively than men’s work of the same quality” (Saul 2013, p. 45). Evidence for this claim is given by Wennerås and Wold (1997), Valian (1999, chapter 11), Steinpreis et al. (1999), Budden et al. (2008), and Moss-Racusin et al. (2012).Footnote 15 So women scientists are at a disadvantage simply because of their gender identity. Similar biases exist based on other irrelevant aspects of scientists’ identity, such as race or sexual orientation (see Lee et al. 2013 for a critical survey of various biases in the peer review system). As Crandall (1982, p. 208) puts it: “The editorial process has tended to be run as an informal, old-boy network which has excluded minorities, women, younger researchers, and those from lower-prestige institutions”.Footnote 16

I use identity bias to refer to these kinds of biases. I now complicate the model of Sect. 2 to include identity bias. I then argue that allowing the editor’s decisions to be influenced by identity bias is unfair to authors, analogous to the argument of the previous section.

I incorporate identity bias in the model by assuming the editor consistently undervalues members of one group (and overvalues the others). More precisely, she believes the average quality of papers produced by any scientist i from the group she is biased against to be lower than it really is by some constant quantity \(\varepsilon\). Conversely, she raises the average quality of papers written by any scientist not belonging to this group by \(\delta\).Footnote 17 So the editor has a different prior for the two groups; I use \(\pi _A\) to denote her prior for the quality of papers written by scientists she is biased against, and \(\pi _F\) for her prior for scientists she is biased in favor of.

As before, the editor may know a given scientist or not. So there are now four groups. If scientist i is known to the editor and belongs to the stigmatized group the editor’s prior distribution on the quality of scientist i’s paper is \(\pi _A(q_i\mid \mu _i)\sim N(\mu _i - \varepsilon ,\sigma _{in}^2)\). If scientist i is known to the editor but is not in the stigmatized group the prior is \(\pi _F(q_i\mid \mu _i)\sim N(\mu _i + \delta ,\sigma _{in}^2)\). If scientist i is not known to the editor and is in the stigmatized group the prior is \(\pi _A(q_i)\sim N(\mu - \varepsilon ,\sigma _{in}^2 + \sigma _{sc}^2)\). And if scientist i is not known to the editor and not in the stigmatized group the prior is \(\pi _F(q_i)\sim N(\mu + \delta ,\sigma _{in}^2 + \sigma _{sc}^2)\).Footnote 18

After the reviewer’s report comes in the editor updates her beliefs about the quality of the paper. This yields posterior distributions \(\pi _A(q_i\mid r_i,\mu _i)\), \(\pi _F(q_i\mid r_i,\mu _i)\), \(\pi _A(q_i\mid r_i)\), and \(\pi _F(q_i\mid r_i)\), with posterior means \(\mu _i^{KA}\), \(\mu _i^{KF}\), \(\mu _i^{UA}\), and \(\mu _i^{UF}\), respectively. As before, the paper is published if the posterior mean exceeds the threshold \(q^*\). This yields the unsurprising result that the editor is less likely to publish papers by scientists she is biased against.

Theorem 4

(Identity Bias) If \(\varepsilon > 0\), \(\delta > 0\),Footnote 19 \(\sigma _{sc}^2 > 0\) , and \(\sigma _{rv}^2 > 0\) , the acceptance probability for authors the editor is biased against is lower than the acceptance probability for authors the editor is biased in favor of (keeping fixed whether or not the editor knows the author). That is,

$$\Pr \left( {\mu _{i}^{{KA}} > q^{*} } \right) < \Pr \left( {\mu _{i}^{{KF}} > q^{*} } \right)\quad and \quad \Pr \left( {\mu _{i}^{{UA}} > q^{*} } \right) < \Pr \left( {\mu _{i}^{{UF}} > q^{*} } \right).$$

Theorem 4 establishes the existence of identity bias in the model: authors that the editor is biased against are less likely to see their paper accepted than other authors.

Any time a paper is rejected because of identity bias (i.e., the paper would have been accepted if the relevant part of the author’s identity had been different, all else being equal), a testimonial injustice occurs.

Testimonial injustices resulting from identity bias can be instances of the central case of testimonial injustice, in which the credibility deficit results from a negative identity-prejudicial stereotype. The evidence suggests that negative identity-prejudicial stereotypes affect the way people (not just men) judge women’s work, even when one does not consciously believe in these stereotypes. Moreover, those who think highly of their ability to judge work objectively and/or are primed with objectivity are affected more rather than less (Uhlmann and Cohen 2007; Stewart and Payne 2008, p. 1333). Similar claims plausibly hold for biases based on race or sexual orientation.

So both connection bias and identity bias are responsible for injustices against authors. This is one way to spell out the claim that it is unfair to authors when journal editors do not use a triple-anonymous reviewing procedure. This constitutes the first kind of argument for triple-anonymous reviewing which I mentioned in the introduction, and which I endorse based on these considerations.

5 The tradeoff between connection bias and identity bias

The second kind of argument I mentioned in the introduction claims that failing to use triple-anonymous reviewing harms the journal and its readers, because it would lower the average quality of accepted papers. In Sect. 2 I argued that connection bias actually has the opposite effect: it increases average quality. Identity bias complicates the picture, as it generally lowers the average quality of accepted papers. This raises the question whether the combined effect of connection bias and identity bias is positive or negative. In this section I show that there is no general answer to this question.

I compare the average quality of accepted papers under a procedure subject to connection bias and identity bias to that under a triple-anonymous reviewing procedure. Under this procedure, the editor’s prior distribution for the quality of any submitted paper is \(\pi (q_i)\sim N(\mu ,\sigma _{in}^2 + \sigma _{sc}^2)\), i.e., the prior for unknown authors from Sect. 2. Hence the posterior is \(\pi (q_i\mid r_i)\) with mean \(\mu _i^U\), the probability of acceptance is \(\Pr (\mu _i^U > q^*)\) and the average quality of accepted papers is \(\mathbb {E}[q_i\mid \mu _i^U > q^*]\). As a result, the editor displays neither connection bias nor identity bias.

In contrast, the double-anonymous reviewing procedure is subject to connection bias and identity bias. The overall probability that a paper is accepted under this procedure depends on the relative sizes of the four groups. I use \(p_{KA}\) to denote the fraction of scientists known to the editor that she is biased against, \(p_{KF}\) for the fraction known to the editor that she is biased in favor of, \(p_{UA}\) for unknown scientists biased against, and \(p_{UF}\) for unknown scientists biased in favor of (\(p_{KA} + p_{KF} + p_{UA} + p_{UF} = 1\)).

Let \(A_i\) denote the event that scientist i’s paper is accepted under the double-anonymous procedure. The overall probability of acceptance is

$$\begin{aligned} \Pr \left( A_i\right)&= p_{KA}\Pr \left( \mu _i^{KA}> q^*\right) + p_{KF}\Pr \left( \mu _i^{KF}> q^*\right) \\&\qquad {} + p_{UA}\Pr \left( \mu _i^{UA}> q^*\right) + p_{UF}\Pr \left( \mu _i^{UF} > q^*\right) , \end{aligned}$$

and the average quality of accepted papers is \(\mathbb {E}[q_i\mid A_i]\).Footnote 20

In the remainder of this section I assume that the editor’s biases are such that she believes the average quality of all submitted papers to be equal to the overall average \(\mu\). In other words, her bias against womenFootnote 21 is canceled out on average by her bias in favor of men, weighted by the relative sizes of those groups: \((p_{KA} + p_{UA})\varepsilon = (p_{KF} + p_{UF})\delta\). Given the other parameter values, this fixes the value of \(\delta\). This is a kind of commensurability requirement for the two procedures because it guarantees that the editor perceives the average quality of submitted papers to be \(\mu\) regardless of which reviewing procedure is used.

As far as I can tell there are no interesting general conditions on the parameters that determine whether the double-anonymous procedure or the triple-anonymous procedure will lead to a higher average quality of accepted papers. The question I explore next, using some numerical examples, is how biased the editor needs to be for the epistemic costs of her identity bias to outweigh the epistemic benefits resulting from connection bias.

In order to generate numerical data values have to be chosen for the parameters. First I set \(\mu = 0\) and \(q^* = 2\) . Since quality is an interval scale in this model, these choices are arbitrary. For the variances \(\sigma _{in}^2\) (of the quality of individual papers), \(\sigma _{sc}^2\) (of the average quality of authors), and \(\sigma _{rv}^2\) (of the accuracy of the reviewer’s report), I choose a “small” and a “large” value (1 and 4 respectively).

For the sizes of the four groups, I assume that the percentage of women among scientists the editor knows is equal to the percentage of women among scientists the editor does not know. I consider two cases for the editor’s identity bias: either half of all authors are women or women are a 30% minority.Footnote 22 Similarly, I consider the case in which the editor knows half of all scientists submitting papers, and the case in which the editor knows 30% of them. As a result, there are 32 possible settings of the parameters (\(2^3\) choices for the variances times \(2^2\) choices for the group sizes).

It follows from Theorem 2 that when \(\varepsilon = 0\) the double-anonymous procedure helps rather than harms the readers of the journal by increasing average quality relative to the triple-anonymous procedure. If \(\varepsilon\) is positive but relatively small, this remains true, but when \(\varepsilon\) is relatively big, the double-anonymous procedure harms the readers. This is because the average quality of published papers under the double-anonymous procedure decreases continuously as \(\varepsilon\) increases.

The interesting question, then, is where the turning point lies. How big does the editor’s bias need to be in order for the negative effects of identity bias on quality to cancel out the positive effects of connection bias?

I determine the value of \(\varepsilon\) for which the average quality of published papers under the double-anonymous procedure and the triple-anonymous procedure is the same. Figure 1 reports these numbers. I plot them against the acceptance rate that the triple-anonymous procedure would have for those values of the parameters. The bias \(\varepsilon\) is measured in “quality points” (for reference: since \(\mu = 0\) and \(q^* = 2\), a paper needs to be two quality points above average to be accepted).

Fig. 1
figure 1

The minimum size of the editor’s bias such that the quality costs of the double-anonymous procedure outweigh its benefits (measured in “quality points”), in 32 cases, plotted as a function of the acceptance rate of the corresponding triple-anonymous procedure

The variances determine the acceptance rate of the triple-anonymous procedure. The eight possible settings correspond to six acceptance rates: 0.72, 4.16, 11.51, 16.36, 19.32, and 22.66%. The four different settings for the group sizes are indicated through the different shapes of the data points in Fig. 1. X’es indicate all groups are of equal size (\(p_{KA} = p_{KF} = p_{UA} = p_{UF} = 0.25\)), circles indicate women are a minority, pluses indicate authors known to the editor are a minority, and diamonds indicate both women and known authors are a minority.

Since quality points do not have a clear interpretation outside the context of the model, I use the values of \(\varepsilon\) shown in Fig. 1 to calculate the average rate of acceptance of papers authored by women and the average rate of acceptance of papers authored by men.Footnote 23 The difference between these numbers gives an indication of the size of the editor’s bias: it measures (in percentage points, abbreviated pp) how many more papers the editor accepts from men, compared to women.

These differences are reported in Fig. 2. Even with this small sample of 32 cases, a large variation of results can be observed. I illustrate this by looking at two cases in detail.

Fig. 2
figure 2

The minimum size of the editor’s bias such that the quality costs of the double-anonymous procedure outweigh its benefits (given as a percentage point difference in acceptance rates)

First, suppose that \(\sigma _{in}^2 = \sigma _{sc}^2 = 1\) and \(\sigma _{rv}^2 = 4\), so there is relatively little variation in the quality of individual papers and in the average quality of authors but relatively high variation in reviewer estimates of quality. Then the triple-anonymous procedure has an acceptance rate as low as 0.72%. If the groups are all of equal size then under the double-anonymous procedure the acceptance rate for men needs to be as much as 2.66 pp higher than the acceptance rate for women, in order for the average quality under the two procedures to be equal. Clearly a 2.66 pp bias is very large for a journal that only accepts less than 1% of papers. If the bias is any less than that there is no harm to the readers in using the double-anonymous procedure.

Second, suppose that \(\sigma _{in}^2 = \sigma _{sc}^2 = 4\) and \(\sigma _{rv}^2 = 1\), so the variation in quality of both papers and authors is relatively high but reviewers’ estimates are relatively accurate. Then the triple-anonymous procedure has an acceptance rate of 22.66%. If, moreover, the editor knows relatively few authors then the quality costs of the double-anonymous procedure outweigh its benefits whenever the acceptance rate for men is more than 2.23 pp higher than the acceptance rate for women. For a journal accepting about 23% of papers that means that even if the gender bias of the editor is relatively mild the journal’s readers are harmed if the double-anonymous procedure is used.

Based on these results, and the fact that the parameter values are unlikely to be known in practice, it is unclear whether the double-anonymous procedure or the triple-anonymous procedure will lead to a higher average quality of published papers for any particular journal.Footnote 24 So in general it is not clear that an argument that the double-anonymous procedure harms the journal’s readers can be made. At the same time, a general argument that the double-anonymous procedure helps the readers is not available either. Given this, I am inclined to recommend a triple-anonymous procedure for all journals because not doing so is unfair to authors.

One might be tempted to draw a different policy recommendation from this paper: use triple-anonymous review to prevent the negative effects of identity bias on quality, but provide the editor with the author’s h-index or some other citation index to benefit from the reduced uncertainty associated with knowing an author’s average quality. I do not endorse this suggestion for at least two reasons. First, it is unfair to authors as discussed in Sect. 3. Second, depending on one’s interpretation of quality, it may be difficult or impossible to infer author quality from citations (Lindsey 1989; Heesen forthcoming; Bright 2017).

I have argued in this section that the net effect of connection bias and identity bias on quality is unclear. But I argued in Sect. 2 that the positive effect of connection bias only exists in certain fields. In fields where papers rely partially on the author’s testimony there is value in knowing the identity of the author. But in other fields such as mathematics and parts of the humanities testimony is not taken to play a role—the paper itself constitutes the contribution to the field—and so arguably there is no value in knowing the identity of the author.

In those fields, then, there is no quality benefit from connection bias, but there is still a quality cost from identity bias. So here the strongest case for the triple-anonymous procedure emerges, as the double-anonymous procedure is both unfair to authors and harms readers.

I have focused on evaluating triple-anonymous review, in particular in contrast to double-anonymous review. In many fields, particularly in the natural sciences, single-anonymous review is the norm, and so the more pertinent question is whether they should switch to double-anonymous review. Can the present model be used or adapted to address this question?

Analyzing a model in which both the editor and one or more reviewers display connection bias and/or identity bias is beyond the scope of this paper. Here I only discuss one relatively simple scenario: the case in which the editor does not display identity bias but the reviewer does.

Suppose the reviewer is biased against one group, reducing reviewer estimates of paper quality by \(\varepsilon\) if the author belongs to that group and raising estimates by \(\delta\) otherwise. If the editor knows the reviewer is biased, she can take the reviewer’s bias into account. In particular, if she knows which group the reviewer is biased against and the size of the bias, learning the biased reviewer estimate is equivalent to learning what the unbiased reviewer estimate would have been, and so a rational unbiased editor simply updates on the unbiased reviewer estimate. In this case reviewer bias has no effect on acceptance decisions at all.

If the editor does not know the reviewer is biased, she may (naively) treat the biased reviewer estimate as an unbiased estimate. In this case the analysis is very similar to the one given above. A close analogue of Theorem 4 holds. The only difference is that the effect of the variances is flipped. High values of \(\sigma _{in}^2\) and \(\sigma _{sc}^2\) increase the consequences of the reviewer’s bias, while high values of \(\sigma _{rv}^2\) reduce it. This is the reverse of what happens in the version of the model I analyzed above (cf. Proposition 12 in the Appendix).

6 Conclusion

In this paper I have considered two types of arguments for triple-anonymous review: one based on fairness considerations from the perspective of the author and one based on the consequences for the readers of the journal.

I have argued that the double-anonymous procedure introduces differential treatment of scientific authors. In particular, editors are more likely to publish papers by authors they know (connection bias, Theorem 1) and less likely to publish papers by authors they apply negative identity-prejudicial stereotypes to (identity bias, Theorem 4). Whenever a paper is rejected as a result of one of these biases an epistemic injustice (in the sense of Fricker 2007) is committed against the author. This is a fairness-based argument in favor of triple-anonymizing.

From the readers’ perspective the story is more mixed, as connection bias has a positive effect on the quality of published papers and identity bias a negative one. Whether the readers are better off under the triple-anonymous procedure then depends on how these effects trade off, which is highly context-dependent. This yields a more nuanced view than that suggested by either Laband and Piette (1994), who focus only on connection bias, or by an argument for triple-anonymizing which focuses only on identity bias.

However, in mathematics and parts of the humanities there is arguably no positive quality effect from connection bias, as knowing about an author’s other work is not taken to be relevant (Easwaran 2009). So here the negative effect of identity bias is the only relevant consideration from the readers’ perspective. In this situation, considerations concerning fairness for the author and considerations concerning the consequences for the readers point in the same direction: in favor of triple-anonymous review.