The situation with music generators is a bit depressing: for all their capabilities, they are not particularly interesting to the community, unlike image generators. When Stable Diffusion appeared two years ago, there was a noticeable explosion in the world of open source. New tools for working with images, new ways of generating images, new models, algorithms optimised, accelerated and adapted for new purposes quickly appeared and continue to appear. Users are doing all this for themselves and for others, for free. For audio, they don't.
No multiple communities, no improvements, almost no integration with other existing tools. Even given that sound generation now relies heavily on picture generation in video, previous developments, it's evolving so slowly that it raises question for me — does anyone really care about sound?
Obviously pictures are easier to see as a finished product, they are easier to evaluate, easier to share and the areas of commercial application are more obvious — they are easier to sell. At least, people think so: still, they generate mostly wild kitsch and pornography (all of which, as with the spread of cheap VHS, has become the one of the reasons of rapid AI progress). This won't be the case with sound, of course.
Sound needs to be loved and understood. Sound should be assembled from several tracks — preferably in stereo. The sound generator needs to be trained on other quality sound - and most people don't have access to multitracks. And then you need to use it as a…what? Too many problems.
• • •