Machine Learning in Linux: Bark - Text-Prompted Generative Audio
One of the standout machine learning apps is Stable Diffusion, a latent text-to-image diffusion model capable of generating photo-realistic images given any text input. We’ve explored quite a few hugely impressive web frontends such as Easy Diffusion, InvokeAI, and Stable Diffusion web UI.
|
|
Extending this theme but from an audio perspective, step forward Bark. This is a transformer-based text-to-audio model. The software can generate realistic multilingual speech as well as other audio – including music, background noise and simple sound effects, from text. The model also generates nonverbal communications like laughing, sighing, crying, and hesitations.
Bark follows a GPT style architecture. It is not a conventional Text-to-Speech model, but instead a fully generative text-to-audio model capable of deviating in unexpected ways from any given script.
Full review Full Story |
This topic does not have any threads posted yet!
You cannot post until you login.