[LG] Optimal Embedding Learning Rate in LLMs: The Effect of Vocabulary Size
[UC Berkeley & Microsoft Research]
https://arxiv.org/abs/2506.15025