logo
welcome
Ars Technica

Ars Technica

Australian government trial finds AI is much worse than humans at summarizing

Ars Technica
Summary
Nutrition label

89% Informative

Australia 's Securities and Investments Commission evaluated large language models' ability to quickly summarize lengthy documents for easier human consumption.

The Llama2-70B model was judged as significantly worse than those provided by humans.

The study has a number of limitations that make it hard to generalize about the summarizing capabilities of state-of-the-art models in the present day .

Larger models with bigger context windows and better embedding strategies may have more success, the authors write, because "finding references in larger documents is a notoriously hard task for LLMs." Despite the results, ASIC says it still believes "there are opportunities for Gen AI as the technology continues to advance... Technology is advancing in this area and it is likely that future models will improve performance and accuracy of results.".

VR Score

95

Informative language

99

Neutral language

35

Article tone

formal

Language

English

Language complexity

70

Offensive language

not offensive

Hate speech

not hateful

Attention-grabbing headline

not detected

Known propaganda techniques

not detected

Time-value

long-living

Affiliate links

no affiliate links