Attention Is All You Need
A PyTorch implementation of the original encoder-decoder Transformer architecture for neural machine translation, including multi-head attention, sinusoidal positional encoding, greedy decoding, OPUS Books training, and CER/WER/BLEU evaluation.
