Adopting large language models (LLMs) for writing helps scientists boost their productivity, new research finds. But the smooth writing of an LLM may also obscure shoddy science.
Faster writing, mixed results
In human-written scientific papers, poor writing correlates with poorer quality research, claims study leader Mathijs De Vaan, an associate professor at the University of California, Berkeley’s Haas School of Business. But when LLMs take the writing wheel, this relationship reverses, so that polished and complex writing often accompanies work that is not accepted for publication.
“That’s creating a really important and challenging problem for science evaluators,” De Vaan says, especially because scientific journals are already reporting an increase in the number of papers submitted as scientists lean on LLMs to produce text in the blink of an eye. “In addition to having to evaluate more articles,” he says, “you’re now also evaluating [lower-quality] articles that look really good on the outside.”
Comparing LLM writing to human writing
De Vaan and his colleagues took advantage of the now-common practice of posting preprints of scientific papers online before peer review and publication to understand how the advent of general artificial intelligence was affecting scientific publishing. They looked at papers posted on the preprint website ArXiv before November 2022, which was prior to the release of tools such as ChatGPT. They asked an LLM to rewrite the abstracts of these papers, creating pairs of abstracts: one known to be human-written, and one known to be AI-generated. They then compared the word distributions of the pairs using AI, generating a value they called “alpha” to determine whether it was statistically more likely that a given text was written by a human or an LLM.
Using this alpha scale, the researchers then examined papers put on ArXiv from November 2022 through 2024. AI-generated writing went from zero to around 30% of papers during that time period, according to De Vaan; it’s likely higher now.
The researchers found that authors who adopted LLMs to speed their writing did see a large productivity increase, publishing subsequent papers between 30% and 80% faster than those who did not use the technology. The improvement was particularly stark for authors who were likely non-native English speakers, based on their names and countries of origin.
“The largest productivity effects are found among researchers in Asian countries,” De Vaan says. “Which is not surprising, because that is where you find most of the non-native English speakers.”
This could represent a leveling of the playing field for researchers who are disadvantaged by English being the primary language of science.
The researchers next tried to evaluate publication quality, first by looking at the complexity of the language in the paper and then by determining whether the paper was eventually accepted and published by a peer-reviewed journal. They found that pre-LLMs, the papers with the most complex language were also the most likely to make it to publication, indicating that scientific peers found the work worthy of note. But after the advent of LLMs, complexity no longer correlated with the likelihood of publication.
“The LLM can, in a very complex way, describe the science so that it’s making something not as great look really good,” De Vaan says. That could be an extra barrier for editors and reviewers, he believes.
Broader citations, new concerns
Scientists are also using LLMs to search for other research to cite in their studies. De Vaan and his colleagues found that LLM-assisted papers had more diverse citations than non-LLM-assisted papers, with a notable increase in the number of books being cited. “Our findings are suggesting that we are searching more broadly,” he says.
But the researchers couldn’t evaluate whether those citations were appropriate or rigorous. AI has a problem with hallucinating citations, with one recent study in the Journal of Empirical Legal Studies finding that even specialty AI tools for literature searches such as Lexis+AI hallucinate between 17% and 33% of the time, meaning they returned either false information or claimed that a real source supported a proposition that it actually did not. In the current study, De Vaan and his team evaluated only citations that they could confirm as real, but they are conducting a separate project studying hallucinated citations. Investigating the impacts of AI on the quality of citations is an important next step, De Vaan adds.
Another goal is to figure out if LLMs themselves can be used to solve the problems created by LLMs related to publication quality and citation relevance. “The next step is to see if there is a way to use LLMs effectively to evaluate the better science and the more rigorous articles,” he says.
Kusumegi, K., et al., “Scientific Production in the Era of Large Language Models,” Science, doi: 10.1126/science.adw3000 (Dec. 18, 2025).
This article originally appeared in the Update column in the February 2026 issue of CEP. Members have access online to complete issues, including a vast, searchable archive of back-issues found at www.aiche.org/cep.