logo
welcome
ACL Anthology - ACL Anthology

ACL Anthology - ACL Anthology

Super donors and super recipients: Studying cross-lingual transfer between high-resource and low-resource languages

ACL Anthology - ACL Anthology
Summary
Nutrition label

77% Informative

Despite the increasing popularity of multilingualism within the NLP community, numerous languages continue to be underrepresented due to the lack of available resources.

Our findings surprisingly reveal that the optimal language pairs with improved performance do not necessarily align with direct linguistic motivations, with subtoken overlap playing a more crucial role.

Specific languages tend to be almost universally beneficial for pretraining (super donors), while others benefit from pretraining with almost any language (super recipients).

The Seventh Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2024) is being held in Bangkok , Thailand .

The conference addresses the gap between high-resource ( 158 -high-resource) languages and 31 low-resource languages (LRLR) languages) Across 158

31 HRLR pairs of language pairs, we investigate how continued pretraining on different languages affects the pretraining model.

Our findings surprisingly reveal that the optimal language pairs with improved performance do not necessarily align with direct linguistic motivations, with subtoken overlap playing a more crucial role.

Specific languages tend to be almost universally beneficial for pretraining (super donors), while others benefit from pretraining with almost any language (super recipients).

VR Score

88

Informative language

97

Neutral language

13

Article tone

formal

Language

English

Language complexity

70

Offensive language

not offensive

Hate speech

not hateful

Attention-grabbing headline

not detected

Known propaganda techniques

not detected

Time-value

long-living

External references

no external sources

Source diversity

no sources

Affiliate links

no affiliate links