【开源项目】Flow Matching 语音合成

发布时间：2023年12月22日

CFM是一种新技术，已被证明可以改进扩散模型，Meta的Voicebox模型将CFM引入语音合成领域，下面是voicebox的一个工作流程图

Matcha-TTS是第一个开源conditional normalising flows语音合成项目，提供基于 LJSpeech 和 VCTK 数据预训练模型以供测评

Matcha-TTS有两个主要的贡献和其他建议:

1. We propose an improved encoder-decoder TTS architecture that uses a combination of 1D CNNs and Transformers in the decoder. This reduces memory consumption and is fast to evaluate, improving synthesis speed.

相对于Grad-TTS的decoder，使用了1D CNNs替换2D CNNs、并加入Transformers块

2. We train these models using optimal-transport conditional flow matching (OT-CFM) , which is a new method to learn ODEs that sample from a data distribution. Compared to conventional CNFs and score-matching probability flow ODEs, OT-CFM defines simpler paths from source to target, enabling accurate synthesis in fewer steps than DPMs.

使用Flow Matching加速技术

3. 使用旋转位置编码(rotational position embeddings)?RoPE，减少存储

4. 使用MAS对齐

5. 使用snake beta激活函数

??开源地址：

https://github.com/shivammehta25/Matcha-TTS

??工程展示：

https://shivammehta25.github.io/Matcha-TTS/

??在线推理：

https://huggingface.co/spaces/shivammehta25/Matcha-TTS

??中文实现：

https://github.com/PlayVoice/Grad-TTS-Chinese?

（Grad-TTS-CFM，其他优化还未集成）

模型架构：

性能指标：

推理界面：

中文测试句子：

时光仿佛有穿越到了从前，在你诗情画意的眼波中，在你舒适浪漫的暇思里，我如风中的思绪徜徉广阔天际，仿佛一片沾染了快乐的羽毛，在云环影绕颤动里浸润着风的呼吸，风的诗韵，那清新的耳语，那婉约的甜蜜，那恬淡的温馨，将一腔情澜染得愈发的缠绵。（Grad-TTS-CFM，使用BigVGAN通用声码器，优化1&3&5还未集成，还有明显发音错误）

文章来源:https://blog.csdn.net/weixin_48827824/article/details/129791852
本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若内容造成侵权/违法违规/事实不符，请联系我的编程经验分享网邮箱：chenni525@qq.com进行投诉反馈，一经查实，立即删除！