logo
welcome
Mashable

Mashable

Anthropic is testing AI’s capacity for sabotage

Mashable
Summary
Nutrition label

76% Informative

Anthropic, the company behind Claude AI , is looking at how its models could deceive or sabotage users.

Anthropic ’s latest research — titled "Sabotage Evaluations for Frontier Models" — comes from its Alignment Science team.

The goal is to gauge just how capable AI might be at misleading users or even "subverting the systems we put in place".

VR Score

74

Informative language

69

Neutral language

67

Article tone

formal

Language

English

Language complexity

65

Offensive language

possibly offensive

Hate speech

not hateful

Attention-grabbing headline

not detected

Known propaganda techniques

not detected

Time-value

medium-lived

Affiliate links

no affiliate links