InstRead: Research Instruments for Text Complexity, Simplification and Readability Assessment

Project PN-IV-P2-2.1-TE-2023-2007, funded by the Romanian National Authority for Scientific Research and Innovation, UEFISCDI: “Research Instruments for Text Complexity, Simplification and Readability Assessment “.

Abstract

In this proposal, we aim to develop the first set of instruments for the creation of simplified texts by assessing lexical complexity and readability for Romanian. Our goals are to reduce the gap in this research field in comparison with other languages and to propose new methods inspired by the recent advances in Large Language Models (LLMs) for these tasks. Our approach aims to 1) build and collect a corpus of lexical complexity assessments by young adult (18-25), native Romanian speakers; 2) provide a statistical analysis of the annotations comparatively between different text genres and different linguistic features; 3) train and evaluate deep learning algorithms by leveraging LLMs and compare them with traditional methods; and 4) develop a set of tools on the project’s website that can be used to evaluate lexical complexity, readability or simplify new documents. The main scientific contributions of this project consist in the release to the general audience of modern readability resources for Romanian that initiate the development of this field in the local context, reduce the research gap with other well-studied languages, and open new interdisciplinary collaborations for future research on text complexity.

Team

Core Team

Partners: Psychological Research and Professional Training Laboratory

Students and Research Assistants