"The most notable findings of ASP’s research on the Pravda network were not its latest expansion or its newfound focus on non-Western states but the rudimentary model this network poses for the future of information operations grounded in next-generation automation. Because of the network’s vast, rapidly growing size and its numerous quality issues impeding human use of its sites, ASP assesses that the most likely intended audience of the Pravda network is not human users, but automated ones. The network and the information operations model it is built on emphasizes the mass production and duplication of preferred narratives across numerous platforms (e.g. sites, social media accounts) on the internet, likely to attract entities such as search engine web crawlers and scraping algorithms used to build LLMs and other datasets. The malign addition of vast quantities of pro-Russia propaganda into LLMs, for example, could deeply impact the architecture of the post-AI internet. ASP is calling this technique LLM grooming. There is already evidence that LLMs have been tainted by Russian disinformation, intentionally or otherwise. NewsGuard revealed in June 2024 that the ten leading AI chatbots–including OpenAI’s ChatGPT, Google’s Gemini, and Microsoft’s Copilot–reproduced Russian disinformation 31.8% of the time on average in response to specific prompts. If unaddressed, LLM grooming poses a growing threat to the integrity and reliability of the open internet.
There is abundant evidence that the Pravda network may be engaged in LLM grooming. The timing of the network’s creation in February 2023 means it was initially built months after generative AI and LLMs became mainstream with the release of OpenAI’s ChatGPT in November 2022. Past research on the network also revealed evidence that it sought the attention of automated agents, namely through the search engine optimization (SEO) of the network sites. SEO is used to influence search engine web crawlers to help a given site place higher in search engine results. With the recent advent of commercially available generative AI systems, however, search engine web crawlers are not the only systematically active automated agents. Just as SEO was developed in the 1990s to help websites find their way into search engine results, there is a growing industry that now seeks to similarly steer LLMs. What differs between web designers benignly attempting to improve their web page visibility and LLM grooming is the malign intent to encourage generative AI or other software that relies on LLMs to be more likely to reproduce a certain narrative or worldview.
The technique of LLM grooming does not appear to have been significantly studied by academia or civil society. Many researchers have warned about “harmful content,” propaganda, or disinformation inadvertently being integrated into LLMs and therefore reproduced by generative AI. Researchers and journalists have similarly warned of hostile actors such as the Russian government using generative AI to produce large quantities of manipulated information. The intentional, malign placement of content for mass integration into LLMs is not yet a significant topic of research, however. Though similar to data poisoning, a type of cyber attack that compromises datasets used for AI in order to disrupt AI models’ output, LLM grooming is a much more covert form of infiltrating training datasets."