welcome
TechCrunch

TechCrunch

Technology

Technology

AI isn’t very good at history, new paper finds | TechCrunch

TechCrunch
Summary
Nutrition label

89% Informative

Researchers created a benchmark to test three large language models on historical questions.

The benchmark, Hist-LLM , tests the correctness of answers according to the Seshat Global History Databank.

The best-performing LLM was GPT-4 Turbo , but it only achieved about 46% accuracy.

Researchers say LLMs still lack the depth of understanding required for advanced history.

VR Score

95

Informative language

97

Neutral language

72

Article tone

formal

Language

English

Language complexity

58

Offensive language

not offensive

Hate speech

not hateful

Attention-grabbing headline

not detected

Known propaganda techniques

not detected

Time-value

long-living

Source diversity

2

Affiliate links

no affiliate links

Small business owner?

Otherweb launches Autoblogger—a revolutionary way to bring more leads to any small business, using the power of AI.