[Day 76] Finishing the Retrieval-based LM talk, and learning about distillation, quantization and pruning
Hello :) Today is Day 76! Finished section 6 and 7 about multilingual retrieval-based LMs and retrieval-based LMs' challenges and opportunities ( ACL 2023 ) Covered lecture 11 of CMU 11-711 Advanced NLP - Distillation, Quantization, and Pruning Section 6: Section 7: Lecture 11: Distillation, Quantization, and Pruning Problem: The best models for NLP tasks are massive. So how can we cheaply, efficiently and equitably deploy NLP systems at the expense of performance? Answer: Model compression Quantization - keep the model the same but reduce the number of bits Pruning - remove parts of the model while retaining performance Distillation - train a smaller model to imitate the larger model Quantization - no parameters are changed, up to k bits of precision Amongst other methods, we can use post-training quantization We can binarize the parameters and activations Pruning - a number of parameters are set to zero, the rest are unchanged There is Ma...