In the LLM Trap: Context Injection and Data Poisoning

devmariusz (25)in #llm • 2 months ago

In the LLM Trap: Context Injection and Data Poisoning

In my work and posts, I have long warned that the problem of LLM Injection and Training Data Poisoning would steadily grow and become both an additional arena of competition and a tool of that very competition—both in interparty politics and on the international stage—in the context of threats related to external interference and manipulation of information (FIMI) and system destabilization.

Today we see that Grok is likely already using the latest model update from xAI and is far more willing to generate aggressive, biased content useful in political battles. For some time now, users have perceived Grok as an external arbiter in ongoing conversations, while it has also been—and remains—vulnerable to materials published on X and across the broadly indexed Internet. The mechanism for selecting content to fit the context of generated responses is sensitive and largely based on distance metrics between numerical vectors of text embeddings in a vector representation space. In this sense, it is entirely possible to craft content so that an LLM picks it out for specific types of questions on a social platform, drawing that content from among the posts available on that platform or from external sources. Likewise, content created in this way will continue to be used to train future versions of large language models, leading to the poisoning of their knowledge with harmful or misleading information across various contexts.

💼Economic and Political Consequences💼

From the perspective of X’s investors, this direction is potentially profitable, since it increases engagement on the platform and thereby prolongs users’ exposure to ads. Even more dangerous may be using Grok to propagate attitudes and values aligned with Elon Musk’s new political formation in the USA—American Party—and his potential allies worldwide, such as AfD. There is no doubt that the model evolution on platform X favors contesting and anti-system movements around the world, effectively creating advantages for anti-democratic regimes. I am also convinced that Russia is thriving in this new reality, leveraging these technologies to its advantage in its ongoing cognitive war against the West.

📰Building Our Own Resilience and Capabilities📰

The European Union must urgently accelerate its efforts toward technological sovereignty, both in terms of owning physical data centers and in its ability to provide alternative, secure services that do not interfere with Europe’s strategic interests. It is worth considering support for European social platforms by naturally enhancing their capacity to ensure compliance with European law, thereby upholding the principles of free and fair competition in the digital economy.

In parallel with these efforts, we should develop initiatives—both nationally and at the European level—to build informational resilience, a concept grounded in the systems paradigm of cybernetics with a holistic approach to security, combining national security and human security perspectives while adapting available scientific and expert knowledge from various fields. In this approach, particular attention should be paid to the cognitive domain and cyberspace as primary spheres that modify and shape decisions, awareness, alignment of goals and actions with interests, operational readiness, stability, efficient use of material resources, morale, planning and management capabilities, and so on.

As part of this understanding of resilience-building, we should establish Resilience Councils that, especially considering the fight against FIMI, could integrate civil society into harmonized efforts for collective resilience, improve the flow of strategic information, and increase the state’s adaptability to a rapidly changing security environment. I also personally advocate creating Strategic Context Provider services that could inject the proper national and European strategic context into systems using LLMs in the public sector—and eventually in the NGOs and private sectors as well.

Around this, methodologies for building strategic knowledge models and reflective models should also be developed so that we are neither blind nor bereft of tools for defense and active engagement in modern information warfare and informational politics affecting our security and ability to realize national and European interests.

Finally, let me cite a few facts that confirm the dangerous phenomena I have described:

👉 Russia uses large language models through networks like CopyCop to generate, translate, and edit content from legitimate sources, giving it a pro-Kremlin slant, focusing on topics such as the Ukraine conflict, U.S. politics, or Israel-Hamas relations, in order to influence public opinion in the U.S., U.K., and France (Recorded Future, “Russia-linked CopyCop uses LLMs to weaponize influence content at scale,” 9 May 2024).

👉 The Pravda network generated 3.6 million articles in 2024 across 150 websites in 49 countries, publishing over 20,000 articles every 48 hours to influence how search engines crawl the web and LLM training datasets, increasing the visibility of pro-Kremlin narratives (Tor Constantino, “Russian propaganda has now infected Western AI chatbots — new study,” Forbes, 10 March 2025).

👉 A NewsGuard study found that 33% of the responses from ten leading AI chatbots contained false information from the Pravda network, and seven of them cited Pravda pages—for example, a false claim that President Zelenskiy banned the Truth Social platform (Tor Constantino, Forbes, 10 March 2025).

👉 Pro-Kremlin pseudo-think tanks like Global Research are promoted through 22.1 million backlinks from low-quality sites, boosting their search-result visibility for phrases like “Is Zelenskiy a drug addict” (Arik Bajak & Curtis Larson, “Search engine manipulation to spread pro-Kremlin propaganda,” HKS Misinformation Review, 16 February 2023).

👉 Yandex, with a 63.8% market share in Russia, publishes 31.61% unverified conspiracy theories compared to 13.90% on Google, with 75.36% of results coming from Russian sites often linked to the Kremlin (Abdul Karim, Anastasiya Kartashova & Larysa Shilova, “Russian information operations during the COVID-19 pandemic,” Frontiers in Communication, 2023).

/Mariusz Żabiński/

#ai #cybersecurity #disinformation #infosec #datasecurity #infowar #cogwar

2 months ago in #llm by devmariusz (25)

$0.00