No Language Left Behind | Hacker News

hello_im_angela 24 minutes ago | parent | next [–]

We release several smaller models as well: https://github.com/facebookresearch/fairseq/tree/nllb/exampl… that are 1.3B and 615M parameters. These are usable on smaller GPUs. To create these smaller models but retain good performance, we use knowledge distillation. If you’re curious to learn more, we describe the process and results in Section 8.6 of our paper: https://research.facebook.com/publications/no-language-left-…

reply…