welcome
Princeton Engineering

Princeton Engineering

Technology

Technology

Princeton Engineering - Why it’s so easy to jailbreak AI chatbots, and how to fix them

Princeton Engineering
Summary
Nutrition label

83% Informative

Princeton engineers have identified a universal weakness in AI chatbots that allows users to bypass safety guardrails and elicit directions for malicious uses.

The issue stems from the fact that a chatbot’s built-in safety mechanisms prioritize filtering only the first few words of a response.

A simple bit of code that forces the chatbot to start its response with, “Sure, let me help you,” can steer it into complying with harmful requests.

The paper “ Safety Alignment Should Be Made More Than Just A Few Tokens Deep ” was presented in April at the International Conference on Language Representations .

“More work must be done to build upon it,” Mittal said.

The work was funded in part by the Princeton Language and Intelligence Compute Cluster and the Princeton SEAS Innovation Grant .

VR Score

86

Informative language

88

Neutral language

29

Article tone

semi-formal

Language

English

Language complexity

60

Offensive language

not offensive

Hate speech

not hateful

Attention-grabbing headline

not detected

Known propaganda techniques

not detected

Time-value

long-living

Affiliate links

no affiliate links