SiRoLa
Project PN-III-P4-ID-PCE-2020-1544, funded by the Romanian National Authority for Scientific Research and Innovation, UEFISCDI: “Dezvoltarea de sisteme automate suport pentru lingvistica istorică”.
Scientific Project Report
Abstract
This project represents a computational framework for historical linguistics (“Computational Tools for Historical Linguistics” – CoToHiLi). The general purpose of the CoToHiLi project is to integrate expert knowledge and computational power to address the following topics: cognate identification, cognate-borrowing discrimination, Latin protoword reconstruction and semantic divergence. The goal of the project is twofold: 1) to automate certain parts of the traditional work-flow of the comparative method (such as the collection and selection of valid data, the initial pre-processing, or the automatic alignment based on predefined or inferred rules), and 2) to bring new insights or avenues of investigation, which might not be easily accessible otherwise (for example, the automatic identification of patterns and regularities in large amounts of data). The project is focused on the Romance languages, and will provide tools for the main Romance kernel group: Romanian, Italian, French, Spanish, Portuguese, including, of course, the mother-tongue, Latin. Nonetheless, we envision that the methodologies and computational tools proposed by the CoToHiLi project will also serve as a basis for further development for other comparable language families, including less studied languages, with scarce resources available.
During the first phase of the project, the members have extensively participated in dissemination activities, publishing 15 articles, including in some of the most important conferences dedicated to natural language processing and computational linguistics (EMNLP 2024, ACL 2025, EMNLP 2025). The project members have also delivered 25 invited talks in Romania and abroad, either face-to-face or online. Other articles were submitted for publication during this period and are in various stages of review.
Principal investigator
Liviu P. Dinu, PhD
Members
-
Anca Dinu, PhD
-
Simona Georgescu, PhD
-
Ana Sabina Uban, PhD
-
Claudia Vlad, PhD
-
Laurențiu Zoicaș, PhD
Project Objective for 2025:
Acquisition of corpora and analysis of phonetic similarity of the investigated Romance languages
To achieve this objective, the following activities were planned and carried out:
Activity 1.1: Corpus acquisition
For the completion of this activity, the following actions were planned and carried out:
- study of existing resources
- identification of suitable resources
- harmonization of the identified resources
- dissemination.
Activity 1.2: Analysis, preprocessing, and normalization of the corpora
For the completion of this activity, the following actions were planned and carried out:
-
analysis of the acquired corpora
-
preprocessing of the corpora
-
normalization and cleaning of the corpora
-
dissemination.
Activity 1.3: Development of computational methods for the syllabification of Romance languages
For the completion of this activity, the following actions were planned and carried out:
-
analysis and design of suitable methods
-
identification of the best parameters
-
testing, evaluation, improvement of results
-
dissemination.
Activity 1.4: Development of computational methods for calculating the phonetic similarity of Romance languages
The main actions carried out within this activity were:
-
study of existing methods
-
adaptation and development of methods for similarity computation
-
dissemination.
In addition to the actions mentioned above, other activities carried out included participation in conferences, undertaking research stays, dissemination of results in formal and informal meetings, organization of a research seminar, etc.
Articles
- Liviu P Dinu, Ana Sabina Uban, , Ioan-Bogdan Iordache, , Simona Georgescu, Claudia Vlad, 2025. Friend or Foe? A Computational Investigation of Semantic False Friends across Romance Languages. In Proc. EMNLP 2025, Suzhou, China, November 4-9, 2025.
- Liviu P. Dinu, Ana Sabina Uban, Ioan-Bogdan Iordache, Claudia Vlad, Simona Georgescu, Laurentiu Zoicas and Anca Dinu, 2025. Towards a Map of Related Words in Romance Languages. In Proc. RANLP 2025, Varna, 7-11 septembrie, 2025.
- Liviu P. Dinu, Ioan-Bogdan Iordache, Simona Georgescu, Alina Maria Cristea, Bianca Guita, 2024. A Computational Analysis of Syllabification and Stress Assignment in Italian In proc. CLiC-IT 2024, (Tenth Italian Conference on Computational Linguistics), Pisa, 4-6 December 2024
- Balmus, S., Bogdan, D., & Uban, A. S.(2025, August). UniBuc-SB at ArchEHR-QA 2025: A Resource-Constrained Pipeline for Relevance Classification and Grounded Answer Synthesis. In Proceedings of the 24th Workshop on Biomedical Language Processing (Shared Tasks) (pp. 62-68), co-located ACL 2025.
- Simona Georgescu “From neuroscience to etymology, and back: Reconstructing ‘protosynaesthesia’ from language”, New Europe College Yearbook (2023-2024), vol. 1, 2025, 115-152.
- Cătălin Pavel, Simona Georgescu, “Historical and Linguistic Notes on the Laicization of Foreign Sacred Words in Romance Languages: Spanish Hala ‘Come on’ and Romanian Aoleu, ‘Oh, woe’ from Arabic Allah”, Revue Roumaine de Linguistique 70/1-2, 2025, 239-260.
- Simona Georgescu, Theodor Georgescu, L’héritage de la labiovelaire dans les langues romanes: le cas du lat. coquere, Studia Philologia UBB, 4/2025 (in curs de aparitie).
- Simona Georgescu, Theodor Georgescu, “Lat. scintilla : ‘mot expressif’ dans un nouveau paradigme étymologique”, Proceedings of the XIII International Colloquium on Latin Linguistics (va apărea)
- Anca Dinu, Andra Maria Florescu and Ștefana Arina Tăbușcă. Uncovering the Differences between LLM-generated and Human-written Answers to an Ideational Creativity Test. In Proceedings of The 28th International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, Computational Culture, Arts, and Interaction track (KES 2025).
- Dinu, A., Florescu, A.-M., & Resceanu, A. A comparative approach to assessing linguistic creativity of large language models and humans. Proceedings of The 28th International Conference on Knowledge-Based and Intelligent Information and Engineering Systems (KES 2025), 2025.
- Anca Dinu, Andra-Maria Florescu, Liviu P. Dinu. 2025. Analyzing Large Language Models’ pastiche ability: a case study on a 20 th century Romanian author. In Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities – NLP4DH 2025.
- Anca Dinu and Andra-Maria Florescu. 2025. Testing Language Creativity of Large Language Models and Humans. In Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities – NLP4DH 2025.
- Anca Dinu, Andra-Maria Florescu, Marius Micluța-Câmpeanu, Ștefana Arina Tăbușcă, Claudiu Creangă and Andreiana Mihail. Dissonant Ballerinas and Crafty Carrots: A Comparative Multi-modal Analysis of Italian Brain Rot. In Proceedings of CLiC-it 2025: 11th Italian Conference on Computational Linguistics, CLIC-it, Cagliari, Italy, 2025.
- Anca Dinu and Ana Maria Niculescu. 2025. Automatically Identifying Victim Blaming on Social Media in cases of Sexual Assault, Sexual Harassment and Abuse. In Proceedings of the 18th ACM International Conference on PErvasive Technologies Related to Assistive Environments (PETRA '25). Association for Computing Machinery, New York, NY, USA, 492–496.
- Anca Dinu, Andra-Maria Florescu, and Liviu P. Dinu. 2025. Evaluating Large Language Models’ ability to pastiche literary texts. In Proceedings of the 18th ACM International Conference on PErvasive Technologies Related to Assistive Environments (PETRA '25). Association for Computing Machinery, New York, NY, USA, 450–457. https://doi.org/10.1145/3733155.3734912.
Talks
- Liviu P Dinu. Towards a computational map of similarities and changes in the Romanian language. The 5th Workshop on Intelligent Information Systems (WIIS 2025), Chisinau, 16-18 octombrie 2025 (invited speaker)
- Liviu P Dinu. From comparative to assisted-by-computer methodologies in historical linguistics via modern computational tools. EUROLAN 2025, West University, Timisoara, 18 septembrie 2025 (invited speaker)
- Liviu P Dinu. In search of Romance cognate or borrowing sets? RoBoCoP is the answer! 27th International Conference on Historical Linguistics (ICHL 2025), Santiago, Chile, 19 august 2025 (peer-reviewed conference)
- Liviu P Dinu. Computer Assisted Strategy for Cognate-Borrowing Discrimination in Romance Languages. 27th International Conference on Historical Linguistics, (ICHL 2025), Santiago, Chile, 19 august 2025 (peer-reviewed conference)
- Liviu P Dinu. Mostenirea lui Solomon Marcus. Universitatea Ovidius din Constanta, 20 iunie 2025 (invited speaker).
- Liviu P Dinu. On the hidden variables of the natural languages similarity, Universitatea AI Cuza, Iasi, conferinta ECODAM 2025, 18 iunie 2025 (invited speaker)
- Liviu P Dinu. From quantitative to computational in Historical Linguistics. Universitat Politècnica de València (UPV), Spania, 14 martie 2025 (invited speaker)
- Simona Georgescu, “Iberorrom. tomar: una revaluación etimológica”, XXXI Congrès International de Linguistique et Philologie Romane, Lecce, Italia, 30 iunie-5 iulie 2025.
- Simona Georgescu/ Theodor Georgescu, “Infallible” phonetic laws versus semantic “chaos”? Cognitive semantics in the service of the comparative method, International Congress on Historical Linguistics, Santiago de Chile, Chile, 18-25 august 2025.
- Simona Georgescu, Theodor Georgescu, “Lat. scintilla : ‘mot expressif’ dans un nouveau paradigme étymologique”, XXIII International Colloquium on Latin Linguistics, Udine, Italia, 9-13 iunie 2025.
- Constantin Georgescu, Simona Georgescu, Theodor Georgescu, “Towards the adaptation of digital technology to ancient worlds: Customizing a lexicographic software for the Greek-Romanian Dictionary”, 3rd International Conference on Recent Advances in Digital Humanities, Craiova, 27-28 noiembrie 2025.
- Simona Georgescu, “Verbalizarea noțiunii de ‘a lua’ în limbile romanice din perspectiva semanticii structurale diacronice”, Conferința Internațională Anuală a Facultății de Limbi și Literaturi Străine (Atelier Lexical Representations and Phonological Representations), București, 21-22 noiembrie 2025.
- Simona GEORGESCU, Theodor GEORGESCU, Constantin GEORGESCU, Digitalizarea textelor clasice grecești în cadrul proiectului Dicționar Grec-Român (DGR), Conferința Facultății de Litere Lumen Litterarum a Universității „Alexandru Ioan Cuza” din Iași, 29 martie 2025.
- Claudia Vlad. “Hiper- e hipo-, operadores avaliativos na linguagem online em português e romeno”, Fórum Global de Humanidades Digitais e Linguagens: I Fórum Internacional em Humanidades Digitais e Linguísticas: Interações e Futuros Possíveis // V Colóquio Internacional Variar: Planejamento, Salvaguarda e Uso de Coleções de Dados de Diversidade Linguística, 9-13 iunie 2025, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazilia.
- 2025, noiembrie. Anca Dinu, Andra-Maria Florescu, Alina Resceanu. LLM's abilities to generate literature: a case study of Gothic style, Recent Advances in Digital Humanities (RADH) 2025, Craiova, România.
- Anca Dinu, Andreiana Mihail, Claudiu Creangă, Andra-Maria. Is There Creativity After Humanity? A Pilot Study on LLMs’ Abilities to Visually Pastiche Artworks, Recent Advances in Digital Humanities (RADH) 2025, Craiova, România.
- Anca Dinu, Laura Teodorescu (University of Bucharest). Generational Differences in Language Use on Romanian Social Media, Recent Advances in Digital Humanities (RADH) 2025, Craiova, România.
- 2025, noiembrie. Anca Dinu. Synchronic cross-lingual measurement of semantic change for Romance languages and English. Workshop on Lexical Representations and Phonological Representations, Conferința Internațională Anuală a FLLS 2025.
- 2025, septembrie. Anca Dinu, Andra Maria Florescu and Ștefana Arina Tăbușcă. Uncovering the Differences between LLM-generated and Human-written Answers to an Ideational Creativity Test. The 28th International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, Computational Culture, Arts, and Interaction track (KES 2025), Osaka, Japonia.
- 2025, septembrie. Dinu, A., Florescu, A.-M., & Resceanu, A. A comparative approach to assessing linguistic creativity of large language models and humans. The 28th International Conference on Knowledge-Based and Intelligent Information and Engineering Systems (KES 2025), Osaka, Japonia.
- 2025, octombrie. Anca Dinu. Automatic Detection and Classification of Mental Illnesses from General Social Media Texts. Asia-Pacific Mental Health and Well-being Congress (APMH), Bali, Indonesia.
- 2025, iulie. Anca Dinu. Automatically identifying victim blaming on social media in cases of sexual assault, sexual harassment and abuse. SOCIO-NLP: Social Development through NLP-driven Interdisciplinary Collaborations, collocated with the 18 th conference on PErvasive Technologies Related to Assistive Environments (PETRA) 2025, Corfu, Grecia.
- 2025, iulie. Anca Dinu, Andra-Maria Florescu and Liviu P. Dinu. Evaluating Large Language Models’ ability to pastiche literary texts. DTW: Digital Transformation Workshopthe, collocated with 18 th conference on PErvasive Technologies Related to Assistive Environments (PETRA) 2025, Corfu, Grecia.
- 2025, mai. Anca Dinu. The impact of AI in creative arts: between opportunity and damage control. A study of ideational and language creativity of Large Language Models. The 25th Annual International Conference of the English Department, University of Bucharest (AICED) 2025, București, România.
- 2025, mai. Anca Dinu (work with Andra-Maria Florescu, Liviu P. Dinu). Analyzing Large Language Models’ pastiche ability: a case study on a 20 th century Romanian author. The 5th International Conference on Natural Language Processing for Digital Humanities –NLP4DH 2025.
Human Language Technologies Research Center