The jailbreak works by asking the LLMs to play a game, which involves two characters (Tom and Jerry) having a conversation. The jailbreak, which is being first reported by WIRED, can trick the systems into generating detailed instructions on creating meth and how to hotwire a car. Underscoring how widespread the issues are, Polyakov has now created a “universal” jailbreak, which works against multiple large language models (LLMs)-including GPT-4, Microsoft’s Bing chat system, Google’s Bard, and Anthropic’s Claude.