|
Post by bossanova on Nov 14, 2024 14:50:33 GMT -6
I just listened to my first AI “Jazz” album on Spotify after one got recommended to me. Well, parts of it anyway.
It’s about 50 tracks spread over 2 1/2 hours. All generic titles like “Harlem Mist Blues, Velvet Nightfall, Rhythm in Red, Solitude in C Minor” etc. The last one isn’t even in C Minor! It’s in F Minor! And it’s up-tempo!
There is no stylistic consistency whatsoever. One track will be smooth jazz, the next track will be something that’s supposed to sound like 1950s bebop complete with a “tape” sound, then jazz funk, and so on.
There are many weird things about it, but the strangest one to me is this weird high frequency distortion that “tracks” along with each individual note on some of the lead instruments. It’s hard to describe, but it’s like the old phenomenon you could get from sampling a single note with a lot of hiss and pitching it up and down to cover an entire melodic range. Only in this case, it sounds like low bit rate MP3 distortion. I’m not sure how these programs even work when it comes to generating the individual instrument parts (does it “chart” each of them from a bank of samples using something like MIDI and then generate and mix them?) but there seems to be a significant bottleneck on the quality of certain sounds.
Overall, it just feels like junk. There’s not much soloing due to the length of the tracks as much as anything else, and the few tracks I listened to relied more on unison riffing and multiple statements of the melody with a partial chorus of generic licks thrown in there.
If you hunt for the copyright at the bottom of the album entry it’s just one guy, and he’s released albums under at least two different, generic “brands”. (I somehow can’t bring myself to call it a pseudonym or a stage name.)
The good news is that I don’t think any living, breathing musicians have to worry about this eclipsing the quality of their music. The bad news is that there’s just so much of it, like the Tribbles filling up the engine room on the Enterprise.
|
|
|
Post by Johnkenn on Nov 14, 2024 14:58:56 GMT -6
Wonder if this dude is making money with people spinning this stuff to see what it sounds like. Shit...maybe I need to just flood the market with tens of thousands of shitty country songs.
|
|
|
Post by bossanova on Nov 14, 2024 16:38:06 GMT -6
Wonder if this dude is making money with people spinning this stuff to see what it sounds like. Shit...maybe I need to just flood the market with tens of thousands of shitty country songs. I dunno... He got maybe 10 minutes of listening time out of me, and effectively burned that bridge going forward. The more I think about it, the people who have the most to lose from this are the musicians and producers on those generic "studio" jazz albums and compilations where you have competent players who are anonymous half the time. The kind of album that's meant to fill a couple hours at a restaurant or a coffee shop. I'm guessing streaming services had nearly killed those off already but this is just another nail in the coffin.
|
|
|
Post by Johnkenn on Nov 14, 2024 16:52:36 GMT -6
I just wonder if it's just a matter of time before it's indistinguishable.
|
|
|
Post by copperx on Nov 14, 2024 17:09:10 GMT -6
I just wonder if it's just a matter of time before it's indistinguishable. The technology is there. Suno's AI is generating the audio, and that has artifacts. But just as easily it could generate MIDI, feed it to a few VSTis, and it could be indistinguishable. Except for the vocals.
|
|
|
Post by damoongo on Nov 14, 2024 17:16:29 GMT -6
I just wonder if it's just a matter of time before it's indistinguishable. The technology is there. Suno's AI is generating the audio, and that has artifacts. But just as easily it could generate MIDI, feed it to a few VSTis, and it could be indistinguishable. Except for the vocals. That's not how ai works. (That's just lazy, bad, modern music production in a nutshell.) It could also just assemble loops on garageband. But that's not how it works. Idiots that use the service need to be able to "create" it from a one sentence prompt.
|
|
|
Post by bossanova on Nov 14, 2024 17:43:13 GMT -6
I just wonder if it's just a matter of time before it's indistinguishable. The technology is there. Suno's AI is generating the audio, and that has artifacts. But just as easily it could generate MIDI, feed it to a few VSTis, and it could be indistinguishable. Except for the vocals. To me it's a shift more akin to a guy with a Korg workstation and a sequencer (or MIDI studios before that) making electronic lounge music that filled the space once occupied by the composers, arrangers, and orchestra players who recorded "mood music" and Muzak albums. It's one form of sonic wallpaper (and hey, sometimes I like sonic wallpaper) replacing another without being a threat to the more ambitious forms of those genres (orchestral music, electronic music). That doesn't change that those particular gigs dried up for orchestra players as a result, just as "ambient"/library jazz and electronic musicians have the most to lose from the current state of AI music. Hans Zimmer (/his 18 ghost composers at RCP), Pat Methany, and Brad Mehldau are probably still safe. [Sorry if that's rambling. A Paul McCartney concert and two (short) international flights over two days has me feeling like I may have picked up a bug somewhere along the way.]
|
|
|
Post by copperx on Nov 14, 2024 17:43:41 GMT -6
The technology is there. Suno's AI is generating the audio, and that has artifacts. But just as easily it could generate MIDI, feed it to a few VSTis, and it could be indistinguishable. Except for the vocals. That's not how ai works. (That's just lazy, bad, modern music production in a nutshell.) It could also just assemble loops on garageband. But that's not how it works. Idiots that use the service need to be able to "create" it from a one sentence prompt.
What do you mean? AI can certainly generate MIDI. That's not how Suno works, but if you needed to generate music without artifacts, it could certainly be done today.
|
|
|
Post by copperx on Nov 14, 2024 17:52:00 GMT -6
A quick and dirty way of understanding generative AI would be this. Imagine that you want to generate a picture of a cat. 1) Using classical machine learning from the 1970s, build a cat recognizer (a program that takes an image and tells you whether an image is a cat or not). The program will not say CAT or NOT CAT, but instead it will give you a probability like 0.7 CAT (that would be 70% probability that the image is a cat). To build this, you will need to feed the recognizer with millions of labeled images of cats and millions of images of non cats so it "learns" what is a cat by creating a network of probabilities. 2) Generate an image of white noise. Feed it to the recognizer. You will get something like 0.0 CAT. Then, modify the image randomly and feed it to the recognizer again. If the cat probability went up, keep the modification, and repeat this over and over until you get a high enough probability of a cat. 3) Marvel at your generated cat image. Profit. It's obviously more complicated than that (for example, the image modifications are not random, but chosen strategically), but that's the gist of it. Generating text and generating music is a bit different, because they're both sequential, but the foundations are the same.
There is no wizard behind the curtain. But it sure seems like there is.
|
|
|
Post by bossanova on Nov 14, 2024 18:53:38 GMT -6
A quick and dirty way of understanding generative AI would be this. Imagine that you want to generate a picture of a cat. 1) Using classical machine learning from the 1970s, build a cat recognizer (a program that takes an image and tells you whether an image is a cat or not). The program will not say CAT or NOT CAT, but instead it will give you a probability like 0.7 CAT (that would be 70% probability that the image is a cat). To build this, you will need to feed the recognizer with millions of labeled images of cats and millions of images of non cats so it "learns" what is a cat by creating a network of probabilities. 2) Generate an image of white noise. Feed it to the recognizer. You will get something like 0.0 CAT. Then, modify the image randomly and feed it to the recognizer again. If the cat probability went up, keep the modification, and repeat this over and over until you get a high enough probability of a cat. 3) Marvel at your generated cat image. Profit. It's obviously more complicated than that (for example, the image modifications are not random, but chosen strategically), but that's the gist of it. Generating text and generating music is a bit different, because they're both sequential, but the foundations are the same.
There is no wizard behind the curtain. But it sure seems like there is.
That last part is the thing that turns me off regarding AI for music or writing that requires deep knowledge. I write and present on music on a semi-regular basis. (Once upon a time I even did it five days a week when I was teaching college classes.) At one point I tried using Chat GPT to see if it could generate the kind of musical analysis that I can do with 25 years of experience. Random example: why is "Kokomo" still played today while "Hot and Cold" from Weekend at Bernie's never became an enduring classic? Or just explain what makes "Kokomo" great? GPT can't do it. It can describe lots of surface level facts, but that's it. It strings descriptive words together in a way that sounds correct and may even be correct within its limited base of knowledge, but it can't give insights. That's how I feel about Suno music. I can hear that it is generating waveforms that sound vaguely correct for a genre of music, and that's all it's doing.
|
|
|
Post by damoongo on Nov 14, 2024 19:54:56 GMT -6
That's not how ai works. (That's just lazy, bad, modern music production in a nutshell.) It could also just assemble loops on garageband. But that's not how it works. Idiots that use the service need to be able to "create" it from a one sentence prompt.
What do you mean? AI can certainly generate MIDI. That's not how Suno works, but if you needed to generate music without artifacts, it could certainly be done today.
Of course it can generate midi (MuseNet etc). But currently a human is needed to handle the sound design—choosing the VSTi’s, adjusting the parameters, and finalizing the mix. Automating all of this would require extremely advanced AI models trained specifically on sound production techniques, so they just skip to the easy part and generate a song that sounds like another song that it's been trained on. Gen ai models don't know how to actually create the nuanced sounds they rip off. (Using the tools musicians use)
|
|
|
Post by copperx on Nov 14, 2024 20:29:03 GMT -6
What do you mean? AI can certainly generate MIDI. That's not how Suno works, but if you needed to generate music without artifacts, it could certainly be done today.
Of course it can generate midi (MuseNet etc). But currently a human is needed to handle the sound design—choosing the VSTi’s, adjusting the parameters, and finalizing the mix. Automating all of this would require extremely advanced AI models trained specifically on sound production techniques, so they just skip to the easy part and generate a song that sounds like another song that it's been trained on. Gen ai models don't know how to actually create the nuanced sounds they rip off. (Using the tools musicians use)
You don't need advanced models, you just need the data. If we had access to millions of ProTools sessions with plugin settings, automation, multitrack midi and audio, one could train a model quite easily. The problem is the lack of data.
That's the same reason we're getting sophisticated genAI now and not 20 years ago. There hasn't been a big breakthrough (except perhaps Google's Transformer paper which makes everything faster); models are using 50-year old technology. Today we have an incredible amount of information (the internet, digitized books, images, etc) to train the models, and hardware that is fast enough to be able to ingest that in a reasonable amount of time.
|
|
|
Post by damoongo on Nov 14, 2024 21:05:47 GMT -6
Even if they did have access to millions of session, it's not just about data, but understanding WHY producers make specific artistic choices — why they compress the kick drum a certain way or add certain amounts of reverb in specific contexts. This "human intent" is hard to infer even with detailed datasets.
The AI could learn correlations between different actions and outputs by studying a vast datasets (e.g., applying reverb to vocals in certain song sections), but understanding the intention behind those actions (the emotional or creative reasoning) is much harder. Even with large datasets, an AI would struggle to consistently replicate the subtle artistic choices. Let's hope a large database never becomes available. (Is that what this avid cloud shit is doing? Anyone read the fine print?)
|
|
|
Post by bossanova on Nov 14, 2024 21:14:16 GMT -6
Wouldn’t the engine have to have some basis of pitch, time, and form/structure prior to generating audio? It might not be MIDI, but to make coherent music of the kind we hear coming out of Suno, there has to be some sort of (what sounds like) on-grid composition process going on to form the basis of the songs/tunes, even if it’s just working off of existing patterns.
|
|
|
Post by damoongo on Nov 14, 2024 21:42:25 GMT -6
Wouldn’t the engine have to have some basis of pitch, time, and form/structure prior to generating audio? It might not be MIDI, but to make coherent music of the kind we hear coming out of Suno, there has to be some sort of (what sounds like) on-grid composition process going on to form the basis of the songs/tunes, even if it’s just working off of existing patterns. I don't think there's any analysis (or control) of form structure with Suno and onther gen ai... I don't think your prompt can say: "Make me a lame rock song about being depressed, and I want a double prechorus before the second chorus and a bridge between the second chorus and guitar solo". Can you? (I think it's like the pictures of cats. Start with noise and keep changing the image incrementally until it is recognized as a cat.)
|
|
|
Post by bossanova on Nov 14, 2024 21:53:39 GMT -6
Wouldn’t the engine have to have some basis of pitch, time, and form/structure prior to generating audio? It might not be MIDI, but to make coherent music of the kind we hear coming out of Suno, there has to be some sort of (what sounds like) on-grid composition process going on to form the basis of the songs/tunes, even if it’s just working off of existing patterns. I don't think there's any analysis (or control) of form structure with Suno and onther gen ai... I don't think your prompt can say: "Make me a lame rock song about being depressed, and I want a double prechorus before the second chorus and a bridge between the second chorus and guitar solo". Can you? (I think it's like the pictures of cats. Start with noise and keep changing the image incrementally until it is recognized as a cat.) If that's the case though, shouldn't we get more random song forms? Suno seems to have a very good idea of how to structure a folk song, a verse chorus pop song, or a 32 bar jazz song, and even understands some of the specifics I mentioned in my original post of a jazz composition having a head, some sort of secondary unison, an improvised chorus, etc. I don't think you can suggest forms to it but at the same time it clearly has certain form structures that it's working off of, otherwise I suspect we would be getting more through composed forms. It's also "smart" enough not to give you the structure of a Foo Fighters song if you ask it to generate jazz. I can't remember the name of it right now but there's another piece of composing software that does not work off of set forms, and as a result it starts to "drift" from the starting template the longer that it goes. I admit this all could be a failure of my imagination to envision the kind of thing that you're talking about, but in my experience something generating polyphonic music like this has to have underlying structural patterns at some level in order to sound like something other than noise or gibberish.
|
|
|
Post by damoongo on Nov 14, 2024 22:29:35 GMT -6
I don't think there's any analysis (or control) of form structure with Suno and onther gen ai... I don't think your prompt can say: "Make me a lame rock song about being depressed, and I want a double prechorus before the second chorus and a bridge between the second chorus and guitar solo". Can you? (I think it's like the pictures of cats. Start with noise and keep changing the image incrementally until it is recognized as a cat.) If that's the case though, shouldn't we get more random song forms? Suno seems to have a very good idea of how to structure a folk song, a verse chorus pop song, or a 32 bar jazz song, and even understands some of the specifics I mentioned in my original post of a jazz composition having a head, some sort of secondary unison, an improvised chorus, etc. I don't think you can suggest forms to it but at the same time it clearly has certain form structures that it's working off of, otherwise I suspect we would be getting more through composed forms. It's also "smart" enough not to give you the structure of a Foo Fighters song if you ask it to generate jazz. I can't remember the name of it right now but there's another piece of composing software that does not work off of set forms, and as a result it starts to "drift" from the starting template the longer that it goes. I admit this all could be a failure of my imagination to envision the kind of thing that you're talking about, but in my experience something generating polyphonic music like this has to have underlying structural patterns at some level in order to sound like something other than noise or gibberish. Yeah. They don't build it with building blocks. They just manipulate a "complete output" to match the songs it is ripping off in their respective genres. It really is just like cat images. The ai doesn't start with a cat skeleton and then add skin and fur. It just generates pixels until it recognizes the output as a picture of a cat. This generates waveforms until it recognizes it as music in the genre you requested. So, inadvertently it will stick to traditional structures in those genres, even though it doesn't know why.
|
|
|
Post by bossanova on Nov 14, 2024 22:35:40 GMT -6
If that's the case though, shouldn't we get more random song forms? Suno seems to have a very good idea of how to structure a folk song, a verse chorus pop song, or a 32 bar jazz song, and even understands some of the specifics I mentioned in my original post of a jazz composition having a head, some sort of secondary unison, an improvised chorus, etc. I don't think you can suggest forms to it but at the same time it clearly has certain form structures that it's working off of, otherwise I suspect we would be getting more through composed forms. It's also "smart" enough not to give you the structure of a Foo Fighters song if you ask it to generate jazz. I can't remember the name of it right now but there's another piece of composing software that does not work off of set forms, and as a result it starts to "drift" from the starting template the longer that it goes. I admit this all could be a failure of my imagination to envision the kind of thing that you're talking about, but in my experience something generating polyphonic music like this has to have underlying structural patterns at some level in order to sound like something other than noise or gibberish. Yeah. They don't build it with building blocks. They just manipulate a "complete output" to match the songs it is ripping off in their respective genres. It really is just like cat images. The ai doesn't start with a cat skeleton and then add skin and fur. It just generates pixels until it recognizes the output as a picture of a cat. This generates waveforms until it recognizes it as music in the genre you requested. So, inadvertently it will stick to traditional structures in those genres, even though it doesn't know why. I believe it, but it makes my head hurt, especially when you add matching vocals and lyrics into that mix.
|
|
|
Post by damoongo on Nov 14, 2024 22:41:28 GMT -6
It makes your head hurt AND your ears hurt. Stay away! haha
|
|
|
Post by thehightenor on Nov 15, 2024 11:08:45 GMT -6
.... coming to an elevator near you.
|
|
|
Post by Johnkenn on Nov 15, 2024 12:07:55 GMT -6
|
|
ericn
Temp
Balance Engineer
Posts: 16,059
|
Post by ericn on Nov 15, 2024 12:31:03 GMT -6
When Sonic Foundry introduced ACID everyone was saying how it was the end, well it wasn’t. We adapt to incorporate new tools, very few technical revolutions but pretty much everything we touch is a technological evolution.
|
|
|
Post by thehightenor on Nov 15, 2024 13:26:38 GMT -6
On a positive note (no pun intended) ChatGPT is a very sympathetic “agony aunt” You can moan to it about anything and it’s just so darn positive with it’s replies It’s an AI Pollyanna …. “a Polly AI na …. LOL
|
|
|
Post by professorplum on Nov 15, 2024 14:23:52 GMT -6
When Sonic Foundry introduced ACID everyone was saying how it was the end, well it wasn’t. We adapt to incorporate new tools, very few technical revolutions but pretty much everything we touch is a technological evolution. The major difference is that not a single piece of technology spanning the last 100 years of music recording was able to write melodies, lyrics, parts, sing, mix, etc. and generate an entire finished song out of thin air.
AI can. It will be getting exponentially better at it for years to come. This is not a "new tool" that we will just be incorporating. For the first time in human history, we are asking if a completed recording was created by humans or an AI.
From this point forward, all music created on planet Earth can be divided into two categories: music made by humans, and music made by AIs. That is far more impactful than just a piece of new technology.
|
|
|
Post by thehightenor on Nov 15, 2024 14:41:48 GMT -6
When Sonic Foundry introduced ACID everyone was saying how it was the end, well it wasn’t. We adapt to incorporate new tools, very few technical revolutions but pretty much everything we touch is a technological evolution. The major difference is that not a single piece of technology spanning the last 100 years of music recording was able to write melodies, lyrics, parts, sing, mix, etc. and generate an entire finished song out of thin air.
AI can.
Can it? Superficially it can. It’s fake intelligence - producing fake art. I prefer to separate digital drivel from actual art. Call me old fashioned Ironically, if you ask an AI why it produces such awful art - it will tell you the reasons why
|
|