Last week Ravi Mill and I had a paper accepted to Consciousness and Cognition. It was accepted after 49 weeks with the journal in four rounds of review. The editorial decisions on the paper were: Reject, Revise and Resubmit, Accept with Minor Revisions and finally, Accept. What makes this decision history somewhat remarkable is that it was initially rejected from the journal it was eventually published in.

This blog post won’t give as much information on that initial rejection as I wanted it to – I sought permission from the journal to publish all correspondence from the reviewer and the editor, which was denied. Below you find only my response to the editorial decision. As context, the manuscript was rejected on the basis of one review, in which it was suggested that we had adopted  some unconventional and even nefarious practices in gathering and analysing our data. These  suggestions didn’t sit well with me, so I sent the following email to the Editor via the Elsevier Editorial System.

 

Dear Prof XXXX,

 

Thank you for your recent consideration of our manuscript, ‘”Old?” or “New?”: The test question provokes a goal-directed bias in memory decision-making’ (Ms. No. XXXXX-XX-XXX), for publication in Consciousness & Cognition. We were, of course, disappointed that you chose to reject the manuscript.

 

Having read the justification given for rejection, we respectfully wish to respond to the decision letter. Whilst we do not believe that this response will prompt reconsideration of the manuscript (your decision letter was clear) we believe it is important to respond for two reasons. First, to reassure you that we have not engaged in any form of data manipulation, and second, to state our concerns that the editorial team at Consciousness & Cognition appear to view it as acceptable that reviewers base their recommendations on poorly substantiated inferences they have made about the motivations of authors to engage in scientific misconduct.

 

As you highlighted in your decision letter, Reviewer 1 raised “substantial concerns about the manner in which [we] carried out the statistical analysis of [our] data”. The primary concern centred on our decision to exclude participants whose d’ was below a threshold of 0.1. In this reply we hope to demonstrate to you that use of such an exclusion criterion is not only standard practice, but it is indeed a desirable analysis step which should give readers more, not less, confidence in the analyses and their interpretation. We will then take the opportunity to demonstrate, if only to you, that our data behave exactly as one would expect them to under varying relaxations of the enforced exclusion criterion.

 

Recognition memory studies often require that participants exceed a performance (sensitivity; d’) threshold before they are included in the study. This is typically carried out when the studies themselves treat a non-sensitivity parameter as their primary dependent variable (as in our rejected paper), as a means of excluding participants that were unmotivated or disengaged from the task. Below are listed a small selection of studies published in the past 2 years which have used sensitivity-based exclusion criteria, along with the number of participants excluded and the thresholds used:

 

Craig, Berman, Jonides & Lustig (2013) – Memory & Cognition
Word recognition
Expt2 – 10/54, Expt3 – 5/54
80% accuracy

 

Davidenko, N. & Flusberg, S.J. (2012) – Cognition
Face recognition
Expt1a – 1/56, Expt1b –1/26, Expt2b –3/26
chance (50%) accuracy

 

Gaspelin, Ruthruff, & Pashler, (2013). Memory & Cognition
Word recognition
Expt1 – 3/46, Expt2 – 1/49, Expt3 – 3/54
70% accuracy

 

Johnson & Halpern (2012) – Memory & Cognition
Song recogntion
Expt1 – 1/20, Expt2 – 3/25
70% accuracy

 

Rummel, Kuhlmann, & Touron (2013) – Consciousness & Cognition
Word classification (prospective memory task)
Expt1 – 6/145
“prospective memory failure”

 

Sheridan & Reingold (2011) – Consciousness & Cognition
Word recognition
Expt1 – 5/64
“difficulty following instructions”

 

Shedden, Milliken, Watters & Monteiro (2013) – Consciousness & Cognition
Letter recognition
Expt4 – 5/28
85% accuracy

 

You will note that there is tremendous variation in the thresholds used, but that it is certainly not “unusual” as claimed by Reviewer 1, not even for papers published in Consciousness and Cognition. Of course, it would not be good scientific practice to accept the status quo uncritically, and we must therefore explain why the employed sensitivity-based exclusion criterion was appropriate for our study. The reasoning is that we were investigating an effect associated with higher order modulation of memory processes. If we had included participants with d’s below 0.1 (an overall accuracy rate of approximately 52% where chance responding is 50%) then it is reasonable to assume that these participants were not making memory decisions based on memory processes, but were at best contributing noise to the data (e.g. via random responding), and at worst systematically confounding the data. To demonstrate the latter point, one excluded participant from Experiment 1 had a d’ of -3.7 (overall accuracy rate of 13%), which was most likely due to responding using the opposite keys to those which they were instructed to use, substituting “new” responses for “old” responses. As we were investigating the effects of the test question on the proportion of “new” and “old” responses, it is quite conceivable that if they had displayed the same biases as our included participants did overall, they would have reduced our ability to detect a real effect. Including participants who did not meet our inclusion criteria, instead of helping us to find effects that hold across all participants, would have systematically damaged the integrity of our findings, leading to reduced estimates of effect size caused by ignoring the potential for influence of confounding variables.

 

If we had been given the opportunity to respond to Reviewer 1’s critique via the normal channels, we would have also corrected Reviewer 1’s inaccurate reading of our exclusion rates. As we stated very clearly in the manuscript, our sensitivity-based exclusion rates were 5%, 6% and 17%, not 25%, 6% and 17%. Reviewer 1 has conflated Experiment 1’s exclusions based on native language with exclusion based on sensitivity. As an aside, we justify exclusion based on language once again as a standard exclusion criterion in word memory experiments to ensure equivalent levels of word comprehension across participants. This is of particular importance when conducting online experiments which allow anyone across the world to participate. In coding the experiment, we thought it far more effective to allow all visitors to participate after first indicating their first language, with a view to excluding non-native speakers’ data from the study after they had taken part. We wanted all participants to have the opportunity to take part in the study (and receive feedback on their memory performance – a primary motivator for participation according to anecdotal accounts gleaned from social media) and to minimise any misreporting of first language which would add noise to the data without recourse for its removal.

 

We would next have responded to Reviewer 1’s claims that our conclusions are not generalisable based on the subset of analysed data by stating that Reviewer 1 is indeed partially correct. Our conclusions would not have been found had we included participants who systematically confounded the data (as discussed above) – as Reviewer 1 is at pains to point out, the effect is small. Nonetheless as demonstrated by our replication of the findings over three experiment and the following reanalyses, our findings are robust enough to withstand inclusion of some additional noise, within reason. To illustrate, we re-analysed the data under two new inclusion thresholds. The first, a d’ <= 0 threshold, equivalent to chance responding (Inclusion 1), and the second a full inclusion in which all participants were analysed (Inclusion 2). For the sake of brevity we list here the results as they relate to our primary manipulation, the effects of question on criterion placement.

 

EXPERIMENT 1
Original – old emphasis c > new emphasis c, t(89) = 2.141, p = .035, d = 0.23.

 

Inclusion 1:
old emphasis c > new emphasis c, t(90) = 2.32, p = .023, d = 0.24.
Inclusion 2:
no difference between old and new emphasis c, t(94) = 1.66, p = .099, d = 0.17

 

EXPERIMENT 2
Original:
Main effect of LOP, F(1,28) = 23.66, p = .001, ηp2 = .458, shallow > deep.
Main effect of emphasis, F(1,28) = 6.65, p = .015, ηp2 = .192, old? > new?.
No LOP x emphasis interaction, F(1,28) = 3.13, p = .088, ηp2 = .101.
Shallow LOP sig old > new, t(28) = 3.05, p = .005, d = 0.62.
Deep LOP no difference, t(28) = .70, p = .487, d = 0.13.

 

Inclusion 1:
No change in excluded participants – results identical.
Inclusion 2:
Main effect of LOP, F(1,30) = 20.84, p = .001, ηp2 = .410, shallow > deep.
Main effect of emphasis, F(1,30) = 8.73, p = .006, ηp2 = .225, old? > new?.
Sig LOP x emphasis interaction, F(1,30) = 4.28, p = .047, ηp2 = .125.
Shallow LOP sig old > new, t(30) = 3.50, p = .001, d = 0.64.
Deep LOP no difference, t(30) = .76, p = .454, d = 0.14.

 

EXPERIMENT 3
Original:
No main effect of response, F(1,28) = 3.73, p = .064, ηp2 = .117
No main effect of question, F < 1.
Significant question x response interaction, F(1,28) = 8.50, p = .007, ηp2 = .233.
“Yes” response format, old > new, t(28) = 2.41, p = .023, d = 0.45.
“No” response format, new > old, t(28) = 2.77, p = .010, d = 0.52.

 

Inclusion 1:
No change in excluded participants – results identical.
Inclusion 2:
No main effect of response, F <1.
No main effect of question, F < 1.
No question x response interaction, F(1,34) = 3.07, p = .089, ηp2 = .083.
“Yes” response format, old > new, t(34) = 1.33, p = .19, d = 0.23.
“No” response format, new > old, t(34) = 1.84, p = .07, d = 0.32.

 

To summarise, including participants who responded anywhere above chance had no untoward effects on the results of our inferential statistics and therefore our interpretation cannot be called into question by the results of Inclusion 1. Inclusion 2 on the other hand had much more deleterious effects on the patterns of results reported in Experiments 1 and 3. This is exactly what one would expect given the example we described previously where the inclusion of a participant responding systematically below chance would elevate type II error. In this respect, our reanalysis in response to Reviewer 1’s comments does not weaken our interpretation of the findings.

 

As a final point, we wish to express our concerns about the nature of criticism made by Reviewer 1 and accepted by you as appropriate within peer-review for Consciousness & Cognition. Reviewer 1 states that we the authors “must accept the consequences of data that might disagree with their hypotheses”. This strongly suggests that we have not done so in the original manuscript and have therefore committed scientific misconduct or entered a grey-area verging on misconduct. We deny this allegation in the strongest possible terms and are confident we have demonstrated that this is absolutely not the approach we have taken through the evidence presented in this response. Indeed if Reviewer 1 wishes to make these allegations, they would do well to provide evidence beyond the thinly veiled remarks in their review. If they wish to do this, we volunteer full access to our data for them to conduct any tests to validate their claims, e.g. those carried out in Simonsohn (2013) in which a number of cases of academic misconduct and fraud are exposed through statistical methods. We, and colleagues we have spoken to about this decision, found it worrying that you chose to make your editorial decision on the strength of this unsubstantiated allegation and believe that at the very least we should have been given the opportunity to respond to the review, as we have done here, via official channels.

 

We thank you for your time.

 

Sincerely,

 

Akira O’Connor & Ravi Mill

 

References
Craig. K. S., Berman, M.G., Jonides, J. & Lustig, C (2013) Escaping the recent past: Which stimulus dimensions influence proactive interference? Memory & Cognition 41, 650-670.
Davidenko, N. & Flusberg, S.J. (2012) Environmental inversion effects in face perception. Cognition 123(2), 442-447.
Gaspelin, N., Ruthruff, E. & Pashler, H. (2013) Divided attention: An undesirable difficulty in memory retention. Memory & Cognition 41, 978-988.
Johnson, S. K. & Halpern, A. R. (2012) Semantic priming of familiar songs. Memory & Cognition 40, 579-593.
Rummel, J., Kuhlmann, B.G. & Touron, D. R. (2013) Performance predictions affect attentional processes of event-based prospective memory. Consciousness and Cognition, 22 (3), 729-741.
Shedden, J. M., Milliken, B., Watters, S. & Monteiro, S (2013) Event-related potentials as brain correlates of item specific proportion congruent effects. Consciousness & Cognition, 22 (4), 1442-1455.
Sheridan, H. & Reingold, E. M. (2011) Recognition memory performance as a function of reported subjective awareness. Consciousness & Cognition 20 (4), 1363-1375.
Simonsohn, U. (2013) Just Post It: The Lesson From Two Cases of Fabricated Data Detected by Statistics Alone. Psychological Science, 24(10), 1875-1888.

 

The Editor’s very reasonable response was to recommend we resubmit the manuscript, which we did. The manuscript was then sent out for review to two new reviewers, and the process began again, this time with a happier ending.

My recommendations for drafting unsolicited responses are:

  • Allow the dust to settle (this is key to Jim Grange’s tips on Dealing with Rejection too). We see injustice everywhere in the first 24 hours following rejection. Give yourself time to calm down and later, revisit the rejection with a more forensic eye. If the reviews or editorial letter warrant a response, they will still warrant it in a few days, by which time you will be better able to pick the points you should focus on.
  • Be polite. (I skate on thin ice in a couple of passages in the letter above, but overall I think I was OK).
  • Support your counterarguments with evidence. I think our letter did this well. If you need to do some more analyses to achieve this, why not? It will at least reassure you that the reviewer’s points aren’t supported by your data.
  • Don’t expect anything to come of your letter. At the very least, it will have helped you manage some of your frustration.

2 thoughts on “Objecting to an Editorial Decision

  1. I will look at the paper later. Here are some impressions based on this blog.

    1. Mentioning the example of subject who reversed the keys is a bit of a straw-man argument. Rev1 would have presumably nothing against exclusion of subjects who misunderstood the instruction (or subjects who were not native speakers). What he was concerned about were the subjects who fullfilled the above criteria yet were excluded because of d<0.1. Above, you should have shown how the inclusion/exclusion of this latter group affects your results.

    2. Also the argument based on status quo is pretty weak. To be sure, lot of sloppy research is being published. But you are not doing research because you wish to copycat the published research. Rather you wish to find out how human mind works. Improving your data analysis will help you in this pursuit.

    3. Is there a way to improve your analysis? Sure. Your threshold of d=0.1 assumes that subject with d=0.099 doesn't show the effect at all while a subject with d=0.11 does show the full effect (as does say subject with d=0.3). This assumption is implausible. Rather you should have included the information about d in you data analysis, such that the evidential impact of the data from each subject is weighted based on d. If done correctly the inclusion of subject with low d would not change your mean estimate of ES.

    I will look at the paper later and think about some good analysis.

    4. To reject the manuscript (based solely on the questionable exclusion criterion) was an extreme reaction. Reviewer 1 should have helped you to improve it.

    5. "we volunteer full access to our data for them to conduct any tests to validate their claims" there is no method that reliably detects misconduct based on the data. The methods of Simonsohn and co. worked in the case of Stapel and co. because the latter folks were extraordinarily stupid when they faked data. Your challenge to the reviewer is thus empty.

    Anyway, you should have published your data, because these may be of interest to other researchers. It's pretty sad that researchers think of data sharing as a last emergency measure to which they resort only when their reputation is at stake…

    Reply
  2. Thanks for sharing your experiences and how you dealt with the rejection letter. This is extremely valuable. I wish more researchers shared this process.

    Reply

Leave a reply

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> 

required


*