myscience.org › news › news 2014 › 'Text overlap' clutters scientific papers, arXiv analysis finds

'Text overlap' clutters scientific papers, arXiv analysis finds

22 December 2014

Computer text analysis of a huge database of scientific papers shows a large amount of "text overlap," where authors use text from previous papers of their own and others, not always with attribution. This is not necessarily good or bad, Cornell researchers say. "Our first goal was to characterize the accepted practice, not to be judgmental," said Paul Ginsparg, professor of physics and information science and founder of the online arXiv collection of scientific papers, now maintained by Cornell University Library. The analysis was conducted on thousands of papers in the arXiv. Ginsparg and Cornell graduate student Daniel Citron reported their study in the Dec. 8 online edition of the Proceedings of National Academy of Sciences. "While it is technically plagiarism, which more generally is stealing of ideas," Ginsparg said, "it's a benign form in the sense that most of it cites the source (at least somewhere in the article), and many authors have rationales for the practice." Many readers find the reuse of text "an annoyance and a distraction," he added, and some worry that it wastes space online and in print journals.