Should we cite preprints?

By Leah Cannon posted 05-18-2017 23:07


Earlier this week, paleontologist and open science advocate Jon Tennant published a blog post discussing whether scientists should cite preprints in their scientific articles. This discussion has been kicking around twitter for a while and Jon's article got a big response with over 60 comments both for and against citing preprints. He generously agreed to let me re-post his article here on LSN.

Should we cite preprints?

The citation of preprints is something of great interest to me, as the founder of PaleorXiv, a preprint server for Palaeontology, and because it seems to be against the direction that academia is generally heading towards (e.g., the NIH is now allowing preprints to be cited in grant applications).

Nonetheless, a lot of interesting and controversial discussion points were raised about this issue, and I think are worth trying to bring together coherently, so that we can all learn from them constructively. It’s worth noting in advance that how you perceive the traditional peer review model is an important factor in these discussions, but too complex to address in this post alone. This post is also probably not comprehensive, although I’ve tried to present a decent overview, and definitely not 100% ‘correct’. If anything is missing, please do comment below, especially where I am wrong about something.

What we have to remember though ‘in advance’ is the main reason why preprints exist: They are intended to get the results of research acknowledged and make them rapidly available to others in preliminary form to stimulate discussion and suggestions for revision before/during peer review and publication.

  • Issue: We shouldn’t cite work that hasn’t been peer reviewed

Citing work that hasn’t been peer reviewed has been traditionally seen as a big academic no no, and is definitely the primary problem that gets continuously raised. However, we need to remember that many journals have been more than happy historically to accept ‘personal communication’, conference abstracts, various other ‘grey literature’, and as someone alluded to on Twitter, personal phone calls, as citations, so why should preprints be treated any differently?

If we look at academic re-use statistics, 4 out of the top 5 most cited physics and maths journals are subsections from the arXiv (thanks, Brian Nosek, for this). In Economics, the top cited journal is the NBER Working Papers preprint archive, both according to data from Google Scholar. What this tells us is that research disciplines that have a well-embedded preprint culture see a massive research value in them, even transcending traditional journals in many cases, and does not seem to be causing any major issues in these fields.

The differences in preprint ‘standing’ between disciplines (source) (AOP = Advanced Online Publication)

A really key point is that citing work without reading it is a form of bad academic practice: responsibility of the citation lies with the citer. Reading a paper as a researcher without evaluating it is also bad academic practice. Evaluating papers is basically a form of peer review, therefore a citation is a sign that we have critically reviewed a paper and decided its value; or at least it should be. After all, evaluating the quality of scientific work is the job of scientists, and should not simply stop because something has a preprint stamp. As Casey Greene said, citations do not “absolve one of the ability to think critically.” We should have enough confidence in ourselves to be able to make these judgement calls, but perhaps just be slightly more wary when it comes to preprints. This is particularly the case for research which we heavily draw upon. It is our collective responsibility to carefully evaluate whether a citation supports a particular point – if we cannot do that, we do not deserve the title of scholar. Now this doesn’t alleviate all potential biases, but then we can’t claim that traditional closed peer review does either in this case if this is our argument, as it is always the same peers doing the review, or at least to random to draw a line between them.

If a preprint was so bad that it was not able to be cited, then we don’t have to cite it. Furthermore, anyone evaluating a preprint who comes to that conclusion and does not leave a comment on the preprint as to how they reached that conclusion is doing the scholarly community, and the public in general, a disservice. The vast majority of preprint services offer comment sections for evaluation, which alleviates a great deal of the issues regarding ‘bad science’ being published. Of course, not everyone uses these functions yet, but we can expect that they will as preprint adoption and open peer evaluation increases.

The inverse is also true in this case, that just because a paper has gone through traditional peer review, does not necessarily mean it has higher standards and is 100% true. If we believe that published papers are immune from the same problems as preprints, we undermine our own ability to conduct research properly.

  • Issue: Non-specialists might read papers that have ‘unverified’ information in and then mis-use it.

Now, this is an interesting and valid concern, and one which is discussed briefly here in the ASAPbio FAQ.

What we also have to remember is that some disciplines have huge preprint sharing cultures, including Physics/Maths (1.2 million, arXiv), Economics (804,000, RePEC), and Applied Social Sciences (732,000, SSRN), and so far they seem to have managed the outflow of information very well. Some existing mitigation methods exist already, such as a simple screening process to block pseudoscience and the spread of misinformation.

In fledgling fields where preprint use is accelerating, such as the Life Sciences, as far as I am aware this has not led to any notable difference in the proliferation of ‘fake news’ or bad science. Of course, this doesn’t mean it couldn’t happen, just that it hasn’t yet. Ignoring the history of other fields is generally bad-practice and just leads to less-informed discussions about these potential issues. We should make sure to use the positive experiences from Physics, Social Sciences, and Economics and make sure that the ways they combat misinformation are also in place on any new preprint servers for other fields.

Even still, this is a problem not just for preprints, but the entire scholarly literature – research is out there, and people will use it in different ways, and often incorrectly. Think autism and vaccines, for example. This happens whether research or any other content is ‘peer reviewed’ or not. We should see this more as an opportunity to engage people with the research process and scholarly communication, rather than belittling people for being non-specialists. Bad actors are going to be bad actors no matter what we tell them or what is published or peer reviewed – the last 12 months of ‘fake news’ and ‘alternative facts’, as well as decades of ‘climate change denial’ and ‘anti-evolutionists’, are testament to that.

In terms of science communication and journalism/reporting, these would benefit greatly from having standards whereby they wait until final versions have been published, just in case. Indeed, most journalists are savvy and respectable enough to recognise these differences. If you want to report on preprints as a communicator, pay attention to any discussion (perhaps via the Altmetric service to track online conversations), and make it clear that you are explicit that the research is preliminary. Journalists frequently do this already when reporting on things like conference talks and proceedings, and the standard should not be any different for preprints.

  • Issue: Citing preprints under-values formal peer review.

While there are always exceptions, peer review generally benefits both authors and articles. By citing a preprint, you are not de-valuing this process. You are simply using your own judgement, while an article is undergoing formal peer review, to decide whether or not to cite an article and the context of that citation. They are not in conflict, they are complimentary.

  • Issue: Citing preprints lowers the standards of scholarship.

A big issue here is that often preprints can change compared to their final published form. For example, there might be changes in taxonomy, additional experiments required to be ran, new analyses to add, all of which can change the discussions and conclusions.

These are all important things to consider, and will vary depending on research community practices. However, one thing which will greatly ease this is simply to have a big ‘Not Peer Reviewed’ stamp on preprints, as most do, which should act as a sort of nudge to be more cautious. No one should rampantly re-use published research in any form without adequate consideration and evaluation anyway (see above), but having this stamp makes it easier to slow things down if needed and know when extra care and evaluation is needed. Something that would also make this much easier for us all is to allow data to be shared alongside preprints, as well as code and other materials, so that results can be rapidly verified, or not, by the research community.

We should also take note that science changes through time, and conclusions alter as new evidence is gathered. The very nature of how we conduct research means that previously published information can be, and often is, over-turned by new results, which is not very different from information changing through article versions. The major difference here though is that preprint version control happens in the open, which is invariably advantageous to all parties.

One of Matt’s points was that while preprints themselves were fine, it was just their citation that was bad practice. This creates a logical conflict (to me at least), as if the information contained within was fine, then why not cite it appropriately where it is re-used? If it were of such bad quality for re-use, don’t cite it, and indicate why. As Ethan White said recently, “There are good preprints and bad preprints, good reports and bad reports, good data and bad data, good software and bad software, and good papers and bad papers.” As before, it is up to the research community and their professional judgments to decide whether or not to cite any research object.

A rule of thumb, for now, that might help with this is if an author, and the reviewers of their article, think it is appropriate for them to cite a preprint then they should be allowed to do so as they would any other article.

  • Issue: Preprint availability creates conflicting articles

Preprints can change. Preprints published on the arXiv, biorXiv, with the Center for Open Science all have version control that allow preprints to either be updated with new versions or linked through to final published versions. These are usually clearly labeled and can even be cited as separate versions. Using DOIs combined with revision dates makes this a lot easier, as well as simple author/title matching algorithms. Furthermore, preprints are largely on Google Scholar now too, and thankfully this is smart enough to merge records where matches (or ‘conflicts’) are found.

Instead, what we should be recognising instead of this simple technical non-issue, is the immense value having early versions of articles out can be (in the vast majority of cases). Especially, for example, to younger researchers, who want to escape the often unbearably long publication times of journals and demonstrate their research to potential employers. Or in fast moving and highly competitive research fields, where establishing discovery priority can be extremely important.

  • Issue: What happens to preprints that never get published

Well, papers never get published for a multitude of reasons. Also, a lot of bad science gets published, and there is no definitive boundary between ‘unpublished and wrong’ and ‘published and correct’, as we might like to think. Instead, it’s more of a huge continuum that varies massively between publishing cultures and disciplines.

If a preprint never gets published, it can still be useful in getting information out there and retaining the priority of an idea. However, if you find an article that has been published as a preprint for a long time but never formally published, this might be an indicator to be extra careful. Here, checking article re-use and commentary is essential prior to any personal re-use. As before, a simple exercise of judgment solves a lot, for yourself and for non-specialists.

However, as Lenny Teytelman pointed out, in the last 20 years nearly 100% of articles published in Higher Energy Physics are also published as preprints in the HEP section of the arXiv, which suggests that this might be a relatively minor issue, at least in this field.


A somewhat relevant tangent

One thing which keeps popping up on Twitter discussions like the ones that inspired this post, and I have to call out here, is the constant “Well my experience is x, y, and z, and therefore there is not an issue..” in one form or another, and in particular from those in a position of massive privilege. I’m getting sick and tired of this lack of empathy and use of ‘anecdata’ as if it was anything meaningful. It’s counter-productive, non-scientific, and completely undermines people who have different experiences or come from completely disparate walks of life. Twitter also makes these exchanges intolerably toneless, and often seem unnecessarily aggressive. Let’s keep things civil, professional, constructive, and where possible informed by real evidence and data – you know, like peer review should be. Some times, people also don’t need to know your opinion, and that’s totally okay.

Wrapping it up

At the end of this all, we have to remember that no system is perfect, especially in scholarly publishing. What we should all be doing is making evidence-informed evaluations of processes in order to decide what is best for all members of our community. Especially those who are under-represented or marginalised. We have to listen to a diversity of voices and experiences, and use these to make sure that the processes and tools we advocate for are grounded in the principles of fairness and equity, not reinforcement of privilege or the status quo. This means we have to look at the costs versus benefits, and where we don’t have data, make decisions to either gather those data, or proceed in a way where risk is minimalised and potential gain is maximised.

In terms of solutions to all of this, I think there are several simple points to finish off:

  • If you’re going to cite a preprint, make it clear that it’s a preprint in the reference list, and in the main text if possible.
  • If you’re going to publish a preprint, make it clear that it’s a preprint (most servers already do this).
  • Preprint servers need version control. The vast majority do already.
  • Community commenting on preprints is essential, especially to combat potential mis-information.
  • Preprints compliment, not undermine, the existence of traditional peer review and journals.
  • Preprints are gaining increasing high-level support globally. Just like with Open Access and Open Data, this stuff is happening, so best to engage with it constructively in a manner that benefits your community.
  • Exercise judgement when it comes to publishing, citing, and re-using preprints.
  • If preprint citation is happening anyway, rather than fighting against it, let’s spend our collective effort on working to support and improve the process. Find what works in other fields, and apply that to ourselves.

Ultimately though, the answer is YES.


*Kudos to Matt, who engaged with these discussions with civility despite a clash of opinions. Matt is a colleague, whose inter-disciplinary research includes palaeontology, and he was very helpful as an Editor at PLOS ONE for me recently in helping to support my requests to open up my review report, which I greatly appreciate.

You can read the original article and comments here.

To hear more about issues affecting scientists, join Life Science Network.

Join the Life Science Network to access more news, articles and in-depth reports and to join discussions with thousands of your peers.