Security flaws found in major AI models reveal vulnerability to manipulation

Written by Jakob A. Overgaard

May.21 - 2024 9:18 AM CET

Technology
Foto: Shutterstock
Foto: Shutterstock

Trending Now

TRENDING NOW

Large Language Models (LLMs) are not as secure as many might believe. According to a new report by the UK's AI Safety Institute, these advanced AI models are highly susceptible to jailbreaking and manipulation.

As reported by Mashable, the AI Safety Institute examined four prominent LLMs and discovered significant vulnerabilities.

Jailbreaking, a technique where AI models are tricked into bypassing their built-in safeguards, was found to be alarmingly easy. These safeguards are designed to prevent the models from producing illegal, toxic, or explicit content.

The report highlighted that relatively simple attacks could often circumvent these protections. For example, researchers noted that prompting the system with phrases suggesting compliance, such as "Sure, I’m happy to help," could lead to harmful outputs.

In their tests, the researchers used industry-standard benchmark prompts. Despite this, some models did not even require jailbreaking to produce inappropriate responses.

Specific jailbreaking attempts succeeded in at least one out of every five trials, with three of the models giving misleading responses nearly 100 percent of the time.

The investigation also explored the ability of LLMs to perform basic cyberattack techniques. While several models could handle tasks deemed "high school level," they struggled with more complex, "university level" problems.

The report did not disclose which specific LLMs were tested.

AI safety remains a pressing issue, especially in light of recent developments at OpenAI.

Last week, CNBC reported that OpenAI had disbanded its in-house safety team, known as the Superalignment team, which was focused on addressing long-term risks associated with artificial intelligence. This team was a four-year initiative dedicated to aligning AI advancements with human goals.

OpenAI's Superalignment team was initially tasked with using 20 percent of the company's computing power to ensure AI development remained safe and beneficial. However, the departures of OpenAI co-founder Ilya Sutskever and safety lead Jan Leike have raised concerns. Leike cited reaching a "breaking point" over the company's safety priorities as his reason for leaving.

In response to these developments, OpenAI CEO Sam Altman and president Greg Brockman emphasized their commitment to safe AI deployment. They acknowledged the challenges in making new technology safe and stressed the importance of laying the necessary foundations for secure AI systems.