logo
welcome
AppleInsider

AppleInsider

Reasoning failures highlighted by Apple research on LLMs

AppleInsider
Summary
Nutrition label

82% Informative

Apple 's artificial intelligence scientists have found that AI engines based on large language models still lack basic reasoning skills.

The study found that slight changes in the wording of queries can result in significantly different answers, undermining the reliability of the models.

The group has proposed a new benchmark, GSM-Symbolic , to help others measure the reasoning capabilities of various language models.

VR Score

89

Informative language

94

Neutral language

28

Article tone

informal

Language

English

Language complexity

59

Offensive language

not offensive

Hate speech

not hateful

Attention-grabbing headline

not detected

Known propaganda techniques

not detected

Time-value

long-living

Source diversity

1

Affiliate links

no affiliate links