@wujiaju
2021-11-21T07:54:37.000000Z
字数 3064
阅读 475
2021
PostGraduate
You can click here to get the Chinese version.
\t
.pip install jieba
)You can refer to Pytorch tutorial to get the code samples of the machine translation model based on the attention mechanism , the detailed steps are as follow:
Code sample of this experiment:github
Download English-Chinese translation dataset, unzip it as ./data/eng-cmn.tx
Read the dataset by row and remove the attribute information (only use top-2 split for each line) when constructing the training data pair, otherwise an error will be reported.
Split words from the training sentences and construct a comparison table of Chinese and English words in the dataset.
PS : set reverse=False
will construct English-->Chinese
translator;you also can construct the Chinese-->English
translator if you want.
Build a machine translation model:
Define loss function and train machine translation model.
Evaluate the trained model using BLEU
score. More details can be found in nltk.
# pip install nltk
from nltk.translate.bleu_score import sentence_bleu
bleu_score = sentence_bleu([reference1, reference2, reference3], hypothesis1)
[Optional 1] You can adjust the hyper-parameters, such as the MAX_LENGTH
, n_iters
, hidden_size
and so on.
[Optional 2] Divide the training/test split by yourself, where the recommended ratio is 7:3.
[Optional 3] You can explore and use Transformer on your own,you can also refer to The Annotated Transformer blog and github (PS: Process English-Chinese translation dataset by yourself)
Finishing experiment report according to experiment result: The template of report can be found in here.