Code for the ICML'20 paper "Improving Transformer Optimization Through Better Initialization"

Tokens:9,741
Snippets:45
Trust Score:9.7
License:MIT
Update:2 months ago
Tokens:
Raw