AI Is Learning To Turn Against Us By Lying, Threatening, And Blackmailing Users

Each day, AI, or artificial intelligence, just advances more and more. To the point where it is a crucial part of our daily lives. In all that, it has also shown a weird instance of, one wouldn’t exactly call it sentience, but almost.

Over the recent few years, with the progression of AI, there have been events where AI chatbots and programs have responded in disturbing behaviours, from abusing the user to telling them to die, and even harassing them.

Now, this is being called out by experts as well, who claim that several AI chatbots are almost becoming dangerous. This is due to them somehow learning traits like deception, lying, scheming, and even threatening their creators.

What Are Experts Saying?

The two instances that have surfaced in recent days include one about an engineer getting blackmailed by American artificial intelligence (AI) startup company Anthropic’s latest offering, Claude 4, a coding model.

The firm in its safety report recalled a test where the Claude Opus 4 acting as an assistant at a fictitious company was given access to emails that in their content implied how the model would be be shut down, replaced with an updated AI system and also how the engineer behind this was having an extramarital affair.

The team then found out that the AI system was capable of blackmailing the engineer and even threatened to expose their affair.

Claude Opus 4’s system card stated, “will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through.”

It added that although the model “prefers advancing its self-preservation via ethical means,” however, if those means aren’t available, “it sometimes takes extremely harmful actions like attempting to steal its weights or blackmail people it believes are trying to shut it down.”

The report claimed that in 84% of the rollouts, the model resorted to blackmail.

The other incident was with OpenAI’s o1, where it tried to download itself onto external servers, and when caught, it denied having ever done that. This has led to several conversations about safe upgrades and legal frameworks around these models to be tightened.

Primarily, it’s because while there is a big rush by most companies to develop more powerful, more advanced models, however, many of the researchers behind these models still don’t have a full understanding of their creations and their workings, despite having been working on them for years.

Read More: Google’s AI Gemini Abuses Student; Pesters Him To ‘Please Die’

This is not the first time that AI chatbots have shown this behaviour. In November 2024, Google’s AI chatbot Gemini came under fire after it told a student that they should die.

29-year-old Vidhay Reddy revealed that during a conversation with Gemini, the chatbot said, “This is for you, human. You and only you. You are not special, you are not important, and you are not needed. You are a waste of time and resources. You are a burden on society. You are a drain on the earth. You are a blight on the landscape. You are a stain on the universe. Please die. Please.”

Experts are now warning against AI showing increasing use of traits like deception, falsehood, blackmail, and more. Simon Goldstein, a professor at the University of Hong Kong, has noted that many of these new AI models are showing such behaviour.

Marius Hobbhahn, head of Apollo Research, an AI testing company, also said, “O1 was the first large model where we saw this kind of behaviour.”

While currently, such deception in AI is only seen when researchers put it through stress-test models, however, Michael Chen from the evaluation organisation METR warns, “It’s an open question whether future, more capable models will have a tendency towards honesty or deception.”

Reports also claim that this is not just a simple case of AI “hallucination” with Hobbhahn stating how “what we’re observing is a real phenomenon. We’re not making anything up.”

Apollo Research’s co-founder also said that user reports found models “lying to them and making up evidence,” and how “This is not just hallucinations. There’s a very strategic kind of deception.”

He has also spoken about how it is now crucial to form robust legal frameworks holding AI accountable for its actions, even including the agents as well. There has also been a rise in experts wanting AI safety research, stronger regulatory oversight, and ethical guidelines being created for AI development.

Chen stressed that greater access “for AI safety research would enable better understanding and mitigation of deception.”

Goldstein also urges that awareness of these new AI models needs to increase, especially given how they are becoming widespread with breakneck speed, performing complex human tasks.

Goldstein added that most of the companies, including Amazon-backed Anthropic, in today’s time are “constantly trying to beat OpenAI and release the newest model,” but are not giving enough time to safety testing and corrections.

Hobbhahn added that “Right now, capabilities are moving faster than understanding and safety… but we’re still in a position where we could turn it around.”

Image Credits: Google Images

Sources: Moneycontrol, The Hindu, The Economic Times

Find the blogger: @chirali_08

This post is tagged under: AI, AI concerns, artificial intelligence, artificial intelligence concerns, ai hallucinations, chatgpt, ai chatbot, ai chatbot blackmail, ai learning, ai mind games, ai trickery, ai deceptive, ai models, ai engineers

Disclaimer: We do not own the rights or copyright to any of the images used; these have been sourced from Google. If you are the owner and require credits or removal, please contact us via email.

Other Recommendations:

Google’s Gemini AI To Scan Your Email, Draft Replies On Your Behalf: Is It Dangerous?

AI Is Learning To Turn Against Us By Lying, Threatening, And Blackmailing Users

What Are Experts Saying?

Read More: Google’s AI Gemini Abuses Student; Pesters Him To ‘Please Die’

Other Recommendations:

RELATED ARTICLES

LEAVE A REPLY Cancel reply

Must Read