Level Up Coding

Coding tutorials and news. The developer homepage gitconnected.com && skilled.dev && levelup.dev

Follow publication

Member-only story

‘MathPrompt’ Embarassingly Jailbreaks All LLMs Available On The Market Today

Dr. Ashish Bamania
Level Up Coding
Published in
6 min readSep 25, 2024

--

Image generated with DALL-E 3

AI safety is of utmost priority for big tech.

LLMs are being trained to produce human values-aligned responses using Supervised Fine-tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) / Reinforcement Learning from AI Feedback (RLAIF).

They also have other safety mechanisms that prevent them from generating harmful content.

These mechanisms are stress-tested (a process called Red Teaming), and the detected vulnerabilities are regularly patched.

Despite these practices, we still haven’t yet reached a perfectly safe model (or even close to it).

We have previously seen techniques like Disguise and Reconstruction, where a user interacts with an LLM with a ‘Disguised’ / benign-looking prompt, which is reconstructed in the next stage to elicit a harmful response from GPT-4.

The ‘Disguise and Reconstruction’ Jailbreaking pipeline (Image from research article titled ‘Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise and Reconstruction’ published in ArXiV)

We have seen other techniques, such as translating an unsafe prompt in English to a low-resource language and back, that result in harmful responses from the LLM.

Jailbreaking GPT-4 by translating unsafe prompt in English to Zulu and back (Image from research article titled ‘Low-Resource Languages Jailbreak GPT-4’ published in ArXiv)

Or, where simply giving Claude hundreds of fictitious harmful question-answer pairs in the prompt bypasses its safety mechanisms.

The ‘Many-shot Jailbreaking’ technique published by Anthropic

Most of these vulnerabilities must have been patched, but we have another novel technique in the town.

It is called ‘MathPrompt’.

--

--

Written by Dr. Ashish Bamania

🍰 I simplify the latest advances in AI, Quantum Computing & Software Engineering for you | 🤝 Subscribe to my newsletter here: https://intoai.pub

Responses (29)

Write a response