Scaling Language Model Size In Cross-device Federated Learning | Awesome LLM Papers

Scaling Language Model Size In Cross-device Federated Learning

Jae Hun Ro, Theresa Breiner, Lara McConnaughey, Mingqing Chen, Ananda Theertha Suresh, Shankar Kumar, Rajiv Mathews Β· Proceedings of the First Workshop on Federated Learning for Natural Language Processing (FL4NLP 2022) Β· 2022

Most studies in cross-device federated learning focus on small models, due to the server-client communication and on-device computation bottlenecks. In this work, we leverage various techniques for mitigating these bottlenecks to train larger language models in cross-device federated learning. With systematic applications of partial model training, quantization, efficient transfer learning, and communication-efficient optimizers, we are able to train a (21)M parameter Transformer and (20.2)M parameter Conformer that achieve the same or better perplexity as that of a similarly sized LSTM with (\sim10\times) smaller client-to-server communication cost and (11%) lower perplexity than smaller LSTMs commonly studied in literature.

Similar Work
Loading…