“Reformer: The Efficient Transformer”, Anonymous et al 2019 {G} [handling sequences up to L=64k on 1 GPU] : MachineLearning