admin管理员组文章数量:1022989
Using a microphone as an input for real-time audio. How do I extract the currently said phoneme from the audio? I need it for lipsyncing 2d characters.
Basically, my approach would be to:
- Fetch the real-time audio using a microphone
- Detect the current phoneme that is being pronounced from the audio.
I have tried looking everywhere for an example or library that could solve this type of problem. Most libraries don't seem to output phonemes from audio.
There is a website that explains how they used machine learning to solve this, however without any code or tutorial on how to do it. .08685/
There is also this cool speech recognition tool called Pocketsphinx, but I cannot seem to find an example of it using Phoneme Recognition yet.
Using a microphone as an input for real-time audio. How do I extract the currently said phoneme from the audio? I need it for lipsyncing 2d characters.
Basically, my approach would be to:
- Fetch the real-time audio using a microphone
- Detect the current phoneme that is being pronounced from the audio.
I have tried looking everywhere for an example or library that could solve this type of problem. Most libraries don't seem to output phonemes from audio.
There is a website that explains how they used machine learning to solve this, however without any code or tutorial on how to do it. https://www.arxiv-vanity./papers/1910.08685/
There is also this cool speech recognition tool called Pocketsphinx, but I cannot seem to find an example of it using Phoneme Recognition yet.
Share Improve this question edited May 18, 2023 at 21:46 NectoJ asked May 18, 2023 at 21:20 NectoJNectoJ 902 silver badges8 bronze badges1 Answer
Reset to default 5The way I would approach this is to get the word from the audio using Whisper or a similar STT service (the Python Speech Recognition Library is the go-to at the moment), then I would use the CMU Dict Library to provide phonemes for each word.
The phonemes are given using the CMU dictionary - for example DH
for the θ
phoneme - the th
sound in this
and that
. That is, they are not given in IPA pronunciation - so you may need another layer if you need the phonemes in IPA format. If you need IPA formatted phonemes, then consider the IPA2 library.
Using a microphone as an input for real-time audio. How do I extract the currently said phoneme from the audio? I need it for lipsyncing 2d characters.
Basically, my approach would be to:
- Fetch the real-time audio using a microphone
- Detect the current phoneme that is being pronounced from the audio.
I have tried looking everywhere for an example or library that could solve this type of problem. Most libraries don't seem to output phonemes from audio.
There is a website that explains how they used machine learning to solve this, however without any code or tutorial on how to do it. .08685/
There is also this cool speech recognition tool called Pocketsphinx, but I cannot seem to find an example of it using Phoneme Recognition yet.
Using a microphone as an input for real-time audio. How do I extract the currently said phoneme from the audio? I need it for lipsyncing 2d characters.
Basically, my approach would be to:
- Fetch the real-time audio using a microphone
- Detect the current phoneme that is being pronounced from the audio.
I have tried looking everywhere for an example or library that could solve this type of problem. Most libraries don't seem to output phonemes from audio.
There is a website that explains how they used machine learning to solve this, however without any code or tutorial on how to do it. https://www.arxiv-vanity./papers/1910.08685/
There is also this cool speech recognition tool called Pocketsphinx, but I cannot seem to find an example of it using Phoneme Recognition yet.
Share Improve this question edited May 18, 2023 at 21:46 NectoJ asked May 18, 2023 at 21:20 NectoJNectoJ 902 silver badges8 bronze badges1 Answer
Reset to default 5The way I would approach this is to get the word from the audio using Whisper or a similar STT service (the Python Speech Recognition Library is the go-to at the moment), then I would use the CMU Dict Library to provide phonemes for each word.
The phonemes are given using the CMU dictionary - for example DH
for the θ
phoneme - the th
sound in this
and that
. That is, they are not given in IPA pronunciation - so you may need another layer if you need the phonemes in IPA format. If you need IPA formatted phonemes, then consider the IPA2 library.
本文标签: javascriptConverting realtime audio to phonemesStack Overflow
版权声明:本文标题:javascript - Converting real-time audio to phonemes - Stack Overflow 内容由热心网友自发贡献,该文观点仅代表作者本人, 转载请联系作者并注明出处:http://it.en369.cn/questions/1745571259a2156751.html, 本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容,一经查实,本站将立刻删除。
发表评论