A complete description of this resource is available here: A Corpus of Native, Non-native and Translated Texts, LREC, 2016, PDF
For the raw corpus, please check the dataset available here
For the experiments presented in the ACL 2016 paper, please check the dataset available here
For the experiments presented in the LREC 2016 paper, please check the dataset available here
This is a monolingual English corpus of native, non-native and (human) translated texts extracted from the European Parliament. The translated texts from different source languages represent a subset of the Haifa Corpus of Translationese. We preserved the same annotation style and included an ID and the EU state that each member of the European Parliament represents.
We hope this dataset will facilitate a unified comparative study of translations and language produced by highly fluent non-native speakers, two closely-related phenomena that have only been studied in isolation so far.
We compile a multilingual parallel corpus from different versions of Wittgenstein’s Tractatus Logico-Philosophicus, including the original in German and translations into English, Spanish, French, and Russian. Using this corpus, we compute a similarity measure between propositions and render a visual network of relations for different languages.
The first version of Romanian Determiners Lexicon (RoDetLexicon 1.1) specifies the relevant features for determiners studied so far during the research project “The structure and interpretation of Romanian Determiner Phrase in Discourse Representation Theory: the determiners”. The importance of determiners comes from both syntax and semantics. From the point of view of syntactic theory, specifying the determiner’s relevant features naturally leads to the determination of the parameters of syntactic variation in the Determiner Phrase domain. From the discursive perspective, determinants have a fundamental role, being the most important constituents when it comes to establishing the logical structure of the sentence or of the discourse.
The feature matrix of each determiner contains morpho-syntactic and semantic features, as they emerged from the studies developed during the project, such as: syntactic category, selectional features, phi-features (person, number, gender), definiteness, quantificational features, cardinality, focus, topic, deixis, proximity, contrastive, location, anaphoric, cataphoric or classifier.