Machine Learning in Linux: Bark - Text-Prompted Generative Audio

Posted by sde on Jun 21, 2023 3:28 PM EDT
LinuxLinks.com; By LinuxLinks
Mail this story
Print this story

One of the standout machine learning apps is Stable Diffusion, a latent text-to-image diffusion model capable of generating photo-realistic images given any text input. We’ve explored quite a few hugely impressive web frontends such as Easy Diffusion, InvokeAI, and Stable Diffusion web UI.

Extending this theme but from an audio perspective, step forward Bark. This is a transformer-based text-to-audio model. The software can generate realistic multilingual speech as well as other audio – including music, background noise and simple sound effects, from text. The model also generates nonverbal communications like laughing, sighing, crying, and hesitations.

Bark follows a GPT style architecture. It is not a conventional Text-to-Speech model, but instead a fully generative text-to-audio model capable of deviating in unexpected ways from any given script.

Full review

Full Story

  Nav
» Read more about: Story Type: Reviews; Groups: Linux, Multimedia, Python

« Return to the newswire homepage

This topic does not have any threads posted yet!

You cannot post until you login.