Abstract: Normalization layers are ubiquitous in modern neural networks and have long been considered essential. This work demonstrates that Transformers without normalization can achieve the same or ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results