Explore text-to-audio models for creating music from scratch

news7f12/07/2022

1 1 minute read

Explore text-to-audio models for creating music from scratch

Text-to-audio model creating music from scratch #ASA183 — Algorithm for converting text prompts into audio. Credit: Zach Evans

Type a few words into the text-to-image model and you get a completely unique, uncannyally accurate photo. While the tool is fun to use, it also opens up many avenues of creative exploration and application, and provides workflow-enhancing tools for visual artists and filmmakers alike. cartoon. For musicians, sound designers, and other sound professionals, text-to-audio modeling will do the same thing.

As part of the 183rd Meeting of the Acoustical Society of America, Zach Evans, of AI Stabilization, presented progress towards this goal in his talk, “Samples of Musical Acoustics generated from generic text embeddings.”

“Text-to-image model uses deep neural network to generate original, novel images based on learned semantic correlations with text captions,” says Evans. modifying user-supplied images.”

The text-to-audio model can do the same thing, but the end result is music. Among other applications, it can be used to create sound effects for video games or samples for music production.

But training these deep learning models is more difficult than their visual models.

“One of the main challenges when training a text-to-audio model is finding a large enough dataset of text-aligned audio to train,” says Evans. “Outside of speech data, the research datasets available for text-aligned audio tend to be much smaller than those available for text-aligned images.”

Evans and his team, including Scott Hawley of Belmont University, soon succeeded in creating coherent and appropriate music and sound from the text. They used data compression methods to generate audio with reduced training time and improved output quality.

The researchers plan to expand to larger data sets and release their model as an open-source option for other researchers, developers, and audio professionals to use and improve.

More information:
Conference: acoustic alsociety.org/asa-meetings/

Provided by
Acoustical Society of America

quote: Exploring text-to-audio models for creating music from scratch (2022, December 7) retrieved December 7, 2022 from https://techxplore.com/news/2022-12-exploring -text-to-audio-music.html

This document is the subject for the collection of authors. Other than any fair dealing for private learning or research purposes, no part may be reproduced without written permission. The content provided is for informational purposes only.