Machine Translation Course, Information, Yearly Results
Academic year 2022-2023
\[ T(f \rightarrow e) = \arg \max_{e} P(e)P(f|e) \]
Contents
Schedule of MT classes 2022
Week | Date | Topic | Materials | Presenters |
---|---|---|---|---|
01 | 7. Oct. 2022 | Teaching | slides | |
02 | 14. Oct. 2022 | Teaching | reading | |
03 | 21. Oct. 2022 | Teaching | reading MT metrics(video) |
|
04 | 28. Oct. 2022 | Teaching | reading *Seq2seq Models Tutorial |
|
05 | 4. Nov. 2022 | Teaching | reading *Step-by-step debug |
|
06 | 11. Nov. 2022 | Human and Automatic Evaluation Metrics |
🤔main paper additional readings BERTScore, WMT2020 Metrics, Significance |
Rebeca Oprea (engineer), Teodor Dumitrescu (author), ChiruÈ› Veronica (reviewer) |
07 | Data Acquisition | 🤔main paper additional important readings training LASER teacher-student references |
Ahmad Wali (engineer), Daniel Sava (author), Iordăchescu Anca (reviwer) |
|
08 | 25. Nov. 2022 | Language Models, Translation Models, Tokenizers |
🤔main paper additional readings BPE dropout references language models, tokenizers |
Stan Flavius (author), Bazavan Cristian (engineer), Blăgescu Alex (reviewer), Stegarescu Ana (visionary) |
09 | Neural MT, Attention, Multilingualilty |
🤔main paper additional readings Annotated Transformer, Illustrated Transformer, Lena Voita’s Tutorial |
Ranete Cristian (reviewer), Nedelcu Mihai (visionary), Ilicea Anca (author), Mărilă Mircea (engineer) |
|
10 | 9. Dec. 2022 | Tokenizers, Transformers, Explainability |
🤔main paper additional readings Visualizing Attention, Quantifying Attention Flow in Transformers |
Bleoţiu Eugen (visionary), Antal Mihaela (reviewer), Zăvelcă Miruna (engineer), Dăscălescu Dana (author) |
11 | 16. Dec. 2022 | 🤔main paper additional readings ChatGPT blog RLHF1, RLHF2 BLOOM📖 Galactica |
Istrati Lucian📖 (engineer), Lazăr Dorian (author), Creanga Claudiu (reviewer), Aldea Gabriela (visionary) |
|
12 | 23. Dec. 2022 | Projects 🌲 | ||
13 | 13. Jan. 2023 | Projects | ||
14 | 20. Jan. 2023 | Projects |
Roles
Each person is assigned a role (almost randomly) and must prepare the reading from the Materials column from the row where their name is added. Materials will be announced shortly. Consider taking these roles seriously as they account for half of your grade. Since December, 2 is during a public holiday, we can postpone the presentations for Thursday, December, 8.
Author
Pretend you are the main author of the papers, prepare a presentation and talk about:
- Problem definition: present what problems the authors intend to solve and present the context and necessity of why it is important to adress the problems.
- Methodology: present the methodology, the mathematical foundations; find an explanation for why this methodology is suitable for the problems at hand.
- Experimental findings: present the main results, how they compare with previous work.
- Get into as many details as possible; the appendices in the paper should also be covered.
Scientific reviewer
You must make a critical evaluation of the paper, not necessarily negative; read the guidelines and examples from NIPS
- Summary and contributions: Briefly summarize the paper and its contributions
- Strengths: Describe the strengths of the work. Typical criteria include: soundness of the claims (theoretical grounding, empirical evaluation), significance and novelty of the contribution, and relevance to the Machine Translatiion community.
- Weaknesses: Explain the limitations of this work along the same axes as above.
- Correctness: Are the claims and method correct? Is the empirical methodology correct?
- Clarity: Is the paper well written?
- Additional feedback, comments, suggestions for improvement and questions for the authors
- Overall score
Engineer
Implement something related to the paper either on the same dataset or on a new one; prepare to share the code and some empirical intuition behind the paper.
- Reproducibility: If the original authors already provide the code, try to run it on a new dataset.
- Comments: Are there enough details to reproduce the major results of this work?
- Efficiency: Measure the time it takes to run the code, provide an assement of how suitable the approach is for being run at scale.
Visionary
Propose a follow-up research project or a new application; take into account the previous work and existing work being done; take into account ethics and the socio-economic impact:
- Relation to prior work: Is it clearly discussed how this work differs from previous contributions?
- Have the authors adequately addressed the broader impact of their work, including potential negative ethical and societal implications of their work?
- Does the submission raise potential ethical concerns? This includes methods, applications, or data that create or reinforce unfair bias or that have a primary purpose of harm or injury. If so, please explain briefly.
Attendees
Everyone must ask a question at the end of the presentations to qualify as being present. Being present at all the presentations will account for 1 bonus point at the end.
Lab Projects
- gather some colleagues and make a team of maximum 3 people
- choose an MT topic that you would like to research (see the project list on the website or propose your own)
- make sure your topic does not overlap with other topics that are in progress and that have been chosen by your colleagues
- email sergiu to announce your team, your proposal and to discuss how to approach it
- after you obtain the approval, mark it as being in progress on the kanban list and start working on it
- prepare the project, a presentation, and a report using this template
- place everything in a digital storage space somewhere: a git repo, a drive, some file on a server etc.; don’t send large files by email, send only URLs
- current deadlines are December 23, January 13, and January 20
MT Bibliography (expanding…)
* blog posts, tutorials, visual explanations
Prerequisites: general ML concepts, blogs, tutorials
- Linear Algebra
- Multivariate Calculus
- Probability Course
- Beautiful Visualizations - Proba and Stats
- Bayesian Statistics
- Count Bayes Blog
- Five Minutes Stats
- Expectation Maximization (EM) Foundations
- EM for Gaussian Mixture Models
- Hidden Markov Models, EM, and Viterbi
- Information Theory, Entropy, KL-Divergence
- Monte Carlo / Metropolis
- DS Handbook
Prerequisites, general ML concepts, books
- Pattern Recognition And Machine Learning - huge book of the early 2000s, excellent coverage of probability distributions, graphical models, Bayesian inference
- Statistical Foundations of ML - course syllabus, good coverage of probabilities, statistical tests, general treatment of ML methods
- Information Theory, Inference, and Learning Algorithms - huge book of the early 2000s, excellent coverage of information theory and probabilistic inference
- Math for ML - when you want to start from the very basics of algebra, geometry, calculus, probabilities
- Probabilistic Machine Learning Book 1,2,3 - excellent coverage of foundations, and more advanced topics (like diffusion models)
- Deep Learning Book - good place to understand neural networks
Statistical Machine Translation & Language Models
- *Kevin Knight’s Workbook
- *Lena Voita’s explanations on LM
- Koehn’s SMT book (SMT from scratch)
- Knesser-Ney smoothing, 1995, *tutorial
- Och’s PhD thesis, 2002
- Mathematics of SMT, 2003
- N-gram Language Models, Jurafsky, SLP
Evaluation
- BLEU, Papineni et al., 2002
- Statistical significance tests, 2004
- Statistical significance tests of models’ correlation, 2014
- chrF++, 2015
- Comparison of metrics, Formicheva & Specia, 2018
- A Call for Clarity in Reporting BLEU Scores, 2018, sacre bleu
- Good translation wrong in context, 2019
- BERTScore, 2020
- Scientific Credibility, 2021
- COMET, more recent paper 2022
- Lab Notebook: MT Evaluation
Data Collection, Alignment
- *bitextor
- Parallel corpora for medium density languages, 2005 hunalign
- Word Alignment with Markov Chain Monte Carlo, 2016, efmaral
- Backtranslation, 2015
- Word Alignments Without Parallel Training Data, 2020, SimAlign
- Aligned segments from unclean parallel data, 2020
- Comparison of GIZA++ vs. Neural Word Alignment, 2020
- Massively Multilingual Sentence Embeddings, 2019
- Multilingual Sentence Embeddings, 2020
- Mining Using Distilled Sentence, 2022, LASER
- MT for the next 1000 Lang, 2022
- Lab Notebook: Using LASER
Neural MT with RNNs
- *Seq2seq Models With Attention
- *Seq2seq Models Tutorial
- *Another tutorial
- *Different attention types
- *Tutorial on training RNNs, 2002-2013
- Learning Long-term Dependencies are Difficult, 1994
- LSTM, 1997
- Neural Probabilisitc Language Model, 2003, also here
- Seq2seq learning with NNs, 2014
- RNN Encoder-Decoder, 2014
- Seq2seq with Attention, 2015
- More Types of Attention, 2015
- Lab Tutorial: Training an RNN seq2seq
Neural MT with CNNs
- Language Modeling with Gated Convolutional Networks, 2016
- Convolutional Sequence to Sequence Learning, 2017
- *Tutorial with code
- Lab Tutorial: Training a CNN seq2seq
Tokenizers
- *Byte Pair Encoding
- *Tokenizers
- *Understanding Sentencepiece
- *EM, Viterbi, Unigram LM
- Byte-Pair Encoding Compression, 1994
- Byte-Pair Encoding Tokenization, 2015
- Unigram LM Tokenizer, 2018
- sentencepiece library, 2018, code
- BPE Dropout, 2020
- Lab Tutorial:, sentencepiece only
*Transformers - Tutorials
- Illustrated Transformer
- Lena Voita’s Tutorial
- The Annotated Transformer
- Peter Bloem’s Tutorial
- Illustrated BERT
- Illustrated GPT-2
- Huggingface Transformers Tutorial
- Annotated GPT-2
- Transformer for people outside NLP
- E2ML School Tutorial
Transformers - Essential Readings
- Attention is all you Need, 2017
- BERT, 2019
- GPT-2, 2019
- WMT2021 Baselines and Models, 2021
- *Models in huggingface
Other Transformer Models
- GPT-3, 2020, open gpt flavors
- ELECTRA, 2019, hgfce
- RoBERTa, 2019, hgfce
- BART, 2020, hgfce
- mBART, 2020, hgfce
- *Reformer, 2020, hgfce
- T5, 2020, hgfce, hgfce
- M2M-100, 2021, model, hgfce
- Lab Tutorial: T5
Transformers and Explainability
- Visualizing Attention, 2019
- Is Attention Interpretable?, 2019
- Quantifying Attention Flow in Transformers, 2020
- Transformer Interpretability Beyond Attention, 2021, code
Machine Translation Frameworks
- Marian MT, 2018
- OpenNMT, 2017
- fairseq, 2019
- JoeyNMT, 2019
- Huggingface seq2seq
Extra Readings on Machine Translation
- Gender Bias in MT, 2019
- MT Domain Robustness, 2019
- Fixed Encoder Self-Attention Patterns, 2020
- Translationese, 2020
- Character-level NMT, 2021
Recent / Interesting Research
- Synchronous Bidirectional Beam Search, 2019
- Specialized Heads Do the Heavy Lifting, code, tutorial 2019
- Transformer Circuits, 2021
- Why Beam Search Works, 2021
- What Works Best for Zero-Shot, 2022
- Contrastive Text Generation, 2022, code
- Induction Heads, 2022
- Wide Attention vs Depth, 2022
- The 48 params of BERT, 2022
- Mixture of Experts, 2022
- The Importance of Attention, 2022
Neural MT with Diffusion Models
- *Energy-based Models
- Translation with DM, 2021
- Text Generation, 2021
- DiffuSeq, 2022
Back to the future
- Shannon’s Autoregressive Language Models, 1950
- ALPAC report, 1966, summary here
- Statistical Methods and Linguistics, 1995
- The Future of MT, seen from 1985
- MT in the USSR, 1984
- Soviet MT overview, Gordin, 2020
- Survey of MT in USSR, 2010