logo
welcome
Wired

Wired

A New Benchmark for the Risks of AI

Wired
Summary
Nutrition label

82% Informative

MLCommons is launching a new benchmark to gauge AI 's bad side too.

The new benchmark assesses the responses of large language models to more than 12,000 test prompts.

Models are given a score of poor,” fair,’ good’ or very good’ depending on how they perform.

The benchmark is not designed to measure the potential for AI models to become deceptive.

VR Score

85

Informative language

86

Neutral language

51

Article tone

formal

Language

English

Language complexity

62

Offensive language

not offensive

Hate speech

not hateful

Attention-grabbing headline

not detected

Known propaganda techniques

not detected

Time-value

long-living

Affiliate links

no affiliate links