logo
welcome
Gizmodo

Gizmodo

Harvard Makes 1 Million Books Available to Train AI Models

Gizmodo
Summary
Nutrition label

79% Informative

Harvard University launches a dataset containing nearly one million public domain books.

Books scanned by Google Books are old enough that their copyright protection has expired.

Publishers including the Wall Street Journal and the New York Times have sued AI companies for ingesting their data without permission.

The Institutional Data Initiative’s dataset can offer some assistance to AI companies trying to train their initial models without getting into any legal trouble.