Research suggests that junk science generated by artificial intelligence is a big problem with Google Scholar

Photo of author

By [email protected]


Scientific research generated by artificial intelligence is polluting the online academic information ecosystem, according to a worrying report published At Harvard Kennedy School Review misinformation.

A team of researchers investigated the prevalence of research articles with evidence of artificially generated text Google Scholaran academic search engine that facilitates searching for historically published research in a wide range of academic journals.

The team specifically investigated the misuse of generative pre-trained transformers (or GPTs), a type of large language model (LLM) that includes now-familiar software like OpenAI’s ChatGPT. These models are able to quickly interpret text inputs and quickly generate responses, in the form of shapes, images, and long lines of text.

In the research, the team analyzed a sample of scientific papers found on Google Scholar with signs of GPT use. Selected papers contain one or two common phrases Conversational agents (Typically, chatbots) are supported using LLMs. The researchers then investigated the extent to which those questionable papers were distributed and hosted online.

“The risk of what we call ‘evidence piracy’ increases dramatically when AI-generated research spreads to search engines,” said Björn Ekström, a researcher at the Swedish School of Library and Information Sciences and co-author of the study. University of Borås He releases. “This could have tangible consequences because incorrect results could leak into society and perhaps also into more and more areas.”

The way Google Scholar pulls papers from all over the internet, according to the latest team, does not exclude papers whose authors lack scientific affiliation or peer review; The engine will pull in secondary academic catches—student papers, reports, preprints, and more—along with research that has passed a higher level of scrutiny.

The team found that two-thirds of the papers they studied were produced at least in part through undeclared use of GPTs. Of the papers fabricated with GPT, the researchers found that 14.5% related to health, 19.5% related to the environment, and 23% related to computing.

“Most of these GPT-fabricated papers were found in non-indexed journals and working papers, but some cases included papers published in major scientific journals and conference proceedings,” the team wrote.

Researchers have identified two main risks arising from this development. “First, the abundance of fabricated ‘studies’ seeping into all areas of the research infrastructure threatens to overwhelm the scholarly communications system and jeopardize the integrity of the scientific record,” the group wrote. “The second risk is the increasing possibility that content with a compelling scientific appearance was in fact deceptively generated using AI tools, and also optimized for retrieval by publicly available academic search engines, particularly Google Scholar.”

Since Google Scholar is not an academic database, it is easier for the public to use it when searching for scientific literature. This is good. Unfortunately, it is difficult for members of the public to separate the wheat from the chaff when it comes to reputable magazines; Even the difference between a piece of peer-reviewed research and a working paper can be confusing. Furthermore, AI-generated text has been found in some peer-reviewed works as well as in those less scrutinized articles, suggesting that work fabricated with GPT is muddying the waters throughout the online academic information system—and not just in Work created by artificial intelligence. It exists outside most official channels.

“If we cannot trust that the research we read is real, we risk making decisions based on incorrect information,” study co-author Jutta Heider, who is also a researcher at the Swedish School of Library and Information Science, said in the same release. “But to the extent that this is an issue of scientific misconduct, it is an issue of media and information literacy.”

In recent years, publishers have failed to screen a few scientific articles that were actually just nonsense. In 2021, it was Springer Nature He had to retreat More than 40 papers in Arab Journal of Earth SciencesDespite the magazine’s title, it discussed a variety of topics, including sports, air pollution, and pediatrics. Besides being off-topic, the essays were poorly written—to the point of being meaningless—and the sentences often lacked a cogent line of reasoning.

Artificial intelligence is Exacerbate the problem. Last February, publisher Frontiers The fire caught fire To publish a paper in its journal cell and Developmental biology Which included images generated by the artificial intelligence program Midjourney; especially, very Anatomically incorrect images of signaling pathways and reproductive organs of mice. border I pulled out the paper Several days after it was published.

AI models could be a boon to science; Systems can Decoding fragile texts From the Roman Empire, you find The Nazca Lines were not previously knownand Uncover hidden details In dinosaur fossils. But the impact of artificial intelligence can be as positive or negative as the human who uses it.

Peer-reviewed journals—and perhaps the hosts and search engines of academic writing—need guardrails to ensure that technology works in the service of scientific discovery, not in opposition to it.



https://gizmodo.com/app/uploads/2025/01/ai-generated-articles-google.jpg

Source link

Leave a Comment