Human Language Technologies Research Center
Faculty of Mathematics and Computer Science, University of Bucharest

Machine Translation Learning

MT Bibliography (expanding…)

* blog posts, tutorials, visual explanations

Prerequisites: general ML concepts, blogs, tutorials
Statistical Machine Translation & Language Models
Evaluation
Data Collection, Alignment
Neural MT with RNNs
Neural MT with CNNs
Tokenizers
Neural MT with Transformers
Neural MT with Diffusion Models
Back to the future
Other Courses

Prerequisites: general ML concepts, blogs, tutorials

Prerequisites, general ML concepts, books

Pattern Recognition And Machine Learning - huge book of the early 2000s, excellent coverage of probability distributions, graphical models, Bayesian inference
Statistical Foundations of ML - course syllabus, good coverage of probabilities, statistical tests, general treatment of ML methods
Information Theory, Inference, and Learning Algorithms - huge book of the early 2000s, excellent coverage of information theory and probabilistic inference
Math for ML - when you want to start from the very basics of algebra, geometry, calculus, probabilities
Probabilistic Machine Learning Book 1,2,3 - excellent coverage of foundations, and more advanced topics (like diffusion models)
Deep Learning Book - good place to understand neural networks

Statistical Machine Translation & Language Models

Evaluation

BLEU, Papineni et al., 2002
Statistical significance tests, 2004
Statistical significance tests of models’ correlation, 2014
chrF++, 2015
Comparison of metrics, Formicheva & Specia, 2018
A Call for Clarity in Reporting BLEU Scores, 2018, sacre bleu
Good translation wrong in context, 2019
BERTScore, 2020
Scientific Credibility, 2021
COMET, more recent paper 2022
Lab Notebook: MT Eval 1, MT Eval 2

Data Collection, Alignment

*bitextor
Parallel corpora for medium density languages, 2005 hunalign
Word Alignment with Markov Chain Monte Carlo, 2016, efmaral
Backtranslation, 2015
Word Alignments Without Parallel Training Data, 2020, SimAlign
Aligned segments from unclean parallel data, 2020
Comparison of GIZA++ vs. Neural Word Alignment, 2020
Massively Multilingual Sentence Embeddings, 2019
Multilingual Sentence Embeddings, 2020
Mining Using Distilled Sentence, 2022, LASER
MT for the next 1000 Lang, 2022
Lab Notebook: Using LASER

Neural MT with RNNs

*Seq2seq Models With Attention
*Seq2seq Models Tutorial
*Another tutorial
*Different attention types
*Tutorial on training RNNs, 2002-2013
Learning Long-term Dependencies are Difficult, 1994
LSTM, 1997
Neural Probabilisitc Language Model, 2003, also here
Seq2seq learning with NNs, 2014
RNN Encoder-Decoder, 2014
Seq2seq with Attention, 2015
More Types of Attention, 2015
Lab Tutorial: Training an RNN seq2seq, Generic LLM training using Axolotl, Unbabel models

Neural MT with CNNs

Tokenizers

*Transformers - Tutorials

Transformers - Essential Readings

Other Transformer Models

Transformers and Explainability

Visualizing Attention, 2019
Is Attention Interpretable?, 2019
Quantifying Attention Flow in Transformers, 2020
Transformer Interpretability Beyond Attention, 2021, code

Machine Translation Frameworks

Extra Readings on Machine Translation

Gender Bias in MT, 2019
MT Domain Robustness, 2019
Fixed Encoder Self-Attention Patterns, 2020
Translationese, 2020
Character-level NMT, 2021

Recent / Interesting Research

Synchronous Bidirectional Beam Search, 2019
Specialized Heads Do the Heavy Lifting, code, tutorial 2019
Transformer Circuits, 2021
Why Beam Search Works, 2021
What Works Best for Zero-Shot, 2022
Contrastive Text Generation, 2022, code
Induction Heads, 2022
Wide Attention vs Depth, 2022
The 48 params of BERT, 2022
Mixture of Experts, 2022
The Importance of Attention, 2022

Neural MT with Diffusion Models

Back to the future

Other Courses