Semantic Error Prediction: Estimating Word Production Complexity

Proceedings of the 13Th Workshop on Natural Language Processing for Computer Assisted Language Learning 13:209-225 (2024)
  Copy   BIBTEX

Abstract

Estimating word complexity is a well-established task in computer-assisted language learning. So far, however, complexity estimation has been largely limited to comprehension. This neglects words that are easy to comprehend, but hard to produce. We introduce semantic error prediction (SEP) as a novel task that assesses the production complexity of content words. Given the corrected version of a learner-produced text, a system has to predict which content words replace tokens from the original text. We present and analyse one example of such a semantic error prediction dataset, which we generate from an error correction dataset. As neural baselines, we use BERT, RoBERTa, and LLAMA2 embeddings for SEP. We show that our models can already improve downstream applications, such as predicting essay vocabulary scores.

Author's Profile

David Strohmaier
Cambridge University

Analytics

Added to PP
2024-10-19

Downloads
178 (#93,606)

6 months
178 (#20,238)

Historical graph of downloads since first upload
This graph includes both downloads from PhilArchive and clicks on external links on PhilPapers.
How can I increase my downloads?