CoToHiLi: Computational Tools for Historical Linguistics
Project PN-III-P4-ID-PCE-2020-1544, funded by the Romanian National Authority for Scientific Research and Innovation, UEFISCDI: “Dezvoltarea de sisteme automate suport pentru lingvistica istorică”.
Abstract
This project represents a computational framework for historical linguistics (“Computational Tools for Historical Linguistics” – CoToHiLi). The general purpose of the CoToHiLi project is to integrate expert knowledge and computational power to address the following topics: cognate identification, cognate-borrowing discrimination, Latin protoword reconstruction and semantic divergence. The goal of the project is twofold: 1) to automate certain parts of the traditional work-flow of the comparative method (such as the collection and selection of valid data, the initial pre-processing, or the automatic alignment based on predefined or inferred rules), and 2) to bring new insights or avenues of investigation, which might not be easily accessible otherwise (for example, the automatic identification of patterns and regularities in large amounts of data). The project is focused on the Romance languages, and will provide tools for the main Romance kernel group: Romanian, Italian, French, Spanish, Portuguese, including, of course, the mother-tongue, Latin. Nonetheless, we envision that the methodologies and computational tools proposed by the CoToHiLi project will also serve as a basis for further development for other comparable language families, including less studied languages, with scarce resources available.
Principal investigator
- Liviu P. Dinu, PhD
Members
- Alina Maria Cristea, PhD
- Anca Dinu, PhD
- Simona Georgescu, PhD
- Ana Sabina Uban, PhD
- Laurențiu Zoicaș, PhD
Publications
- Ana Uban, Liviu P Dinu. 2020. Automatically Building a Multilingual Lexicon of False Friends With No Supervision.* In Proceedings of LREC 2020.
- Alina Maria Ciobanu, Liviu P. Dinu, Laurentiu Zoicas. 2020. Automatic Reconstruction of Missing Romanian Cognates and Unattested Latin Words.* In Proceedings of LREC 2020.
- Alina Maria Ciobanu, Liviu P. Dinu. 2019. Automatic Identification and Production of Related Words for Historical Linguistics.* In Computational Linguistics, 45(4), 667–704.
- Alina Maria Ciobanu, Liviu P. Dinu. 2018. Ab Initio: Automatic Latin Proto-word Reconstruction.* In Proceedings of COLING 2018, 1604-1614.
- Ana Uban, Alina Maria Ciobanu, Liviu P. Dinu. 2019. Studying Laws of Semantic Divergence across Languages using Cognate Sets.* In Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change (LChange @ ACL 2019).
- Ana Uban, Alina Ciobanu, Liviu P. Dinu. 2019. A Computational Approach to Measuring the Semantic Divergence of Cognates.* In Proceedings of CICLING 2019.
*Published before the beginning of the project