The student of the now ubiquitous GPT-2 does not come short of its teacherโs expectations. Obtained by distillation, DistilGPT-2 weighs 37% less, and is twice as fast as its OpenAI counterpart, while keeping the same generative power. Runs smoothly on an iPhone 7. The dawn of lightweight generative transformers? ๐คฏ
From the paper: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter by Victor Sanh, Lysandre Debut, Julien Chaumond and Thomas Wolf. The same method was applied to distill GPT-2, and a Medium blogpost describes the process in detail.