@mymy
2020-12-10T09:41:48.000000Z
字数 3470
阅读 889
machine-learning
experiments
点击这里获取中文版本.
Complete in group.
You can refer to link to get the sample codes of the speech synthesis model, the detailed steps are as follow:
Download LJSpeech dataset,unzip as ./data/LJSpeech-1.1
.
Extract grounth truth Mel-Spectrogram from audio.
python extract_mel_spec.py -i data/LJSpeech-1.1/wavs -o data/LJSpeech-1.1-spectrogram -n 16
Train Tacotron2 from scratch.
# For Single GPU training
CUDA_VISIBLE_DEVICES=7 python train.py save_dir ckpt/v2 batch_size 48 val_epoch 20 audio_root data/LJSpeech-1.1/wavs mel_spectrogram_root data/LJSpeech-1.1-spectrogram
# For Multi GPU training
CUDA_VISIBLE_DEVICES=6,7 python distributed.py save_dir ckpt/v1 batch_size 48 val_epoch 20 audio_root data/LJSpeech-1.1/wavs mel_spectrogram_root /LJSpeech-1.1-spectrogram
Inference
CUDA_VISIBLE_DEVICES=6 python inference.py checkpoint_path ckpt/v1/model_00052130
Visualize the test results and organize experiment results to complete experiment report (The experiment report template will be included in the example repository.
[Optional 1] You can adjust the hyper-parameters, such as encoder_kernel_size and learning rate.
[Optional 2] Draw graph of Mean Opinion Score (MOS) with the number of iterations.
[Optional 3] Interested students can explore use WaveGlow to reconstruct waveform from predicted Mel-Spectrogram. We recommend you to refer to WaveGlow repository.
Extension Advice:
[Extension 1] Students can train a speech synthesis model for Chinese.
[Extension 2] Interested students can design your own software (simple mobile phone APP or web page). For example, the page has an input box and a play button, input some English words, click the button, user can hear corresponding audio.
[Extension 3] Students can also show some failure case, such as some words are skipped or repeated.
Item | Proportion | Description |
---|---|---|
Attendance | 40% | Ask for a leave if time conflict |
Code availability | 20% | Complied successfully |
Report | 30% | According to report model |
Code specification | 10% | Mainly consider whether using the readable variable name |
Any advice or idea is welcome to discuss with teaching assistant in QQ group.
[1] http://fancyerii.github.io/dev287x/ssp
[2] Speech Signal Processing for Machine Learning
[3] Shen J, Pang R, Weiss R J, et al. Natural tts synthesis by conditioning wavenet on mel spectrogram predictions. In ICASSP, 2018.
[4] Prenger R, Valle R, Catanzaro B. Waveglow: A flow-based generative network for speech synthesis. In ICASSP, 2019.