from Hacker News

Deepjazz: AI-generated 'jazz'

by mattdennewitz on 4/18/16, 3:36 PM with 90 comments

by rryan on 4/18/16, 5:00 PM
This is really neat! But I think it's a stretch to call it AI-generated jazz music.
As I understand it, the author has trained an LSTM on a single MIDI file -- "And Then I Knew" by Pat Metheny. The network is then asked to generate MIDI notes in sequence.
What this network has been asked to do is to produce an output stream that is statistically similar to the single MIDI input file it has been trained on. It would be more accurate to call this an "And Then I Knew" generator. Its "cost function" -- the function the network is trying to minimize during training -- is exactly how well it reproduced the target song.
Neural networks are "universal function approximators". It's not surprising that given a single input, a network can produce outputs that are statistically similar to it.
A network that could compose novel MIDI jazz would look like this:
* Train a network on a corpus of thousands to hundreds of thousands of MIDI jazz files.
* Add significant regularization and model capacity limits to prevent the network from "memorizing" its inputs.
* Generate music somehow -- the char-RNN approach described here is fine. There are other methods.
You want the network to build representations that capture the patterns of jazz music necessary to pastiche them but not high-level enough representations that the network is exactly humming the tune "And Then I Knew". This is so much of a problem that any paper presenting a novel result in generative modeling pretty much must include a section presenting evidence their model is not memorizing its inputs.
I can hum a few classic jazz tunes from memory but that mental process is not jazz music composition -- it's reproducing something from memory. If we're going to call a model "AI-generated jazz" you need some way to tell the network to not hum a tune it knows and instead compose a new tune with the principles/patterns it knows. Since we can't speak to our models and tell them to think one way and not the other, part of the trick in this field is to come up with models that can only do one thing and not the other.
by JamilD on 4/18/16, 4:26 PM
This sounds to me like the "uncanny valley" of music. It's close to being pleasant, but it's very discordant and hard to listen to…
by neurobuddha on 4/18/16, 5:01 PM
Coming from an avid Jazz listener, this is awful. Not even close.
I don't mean this as a slight at all, but definitely raise the bar on your experiments.
by daviddaviddavid on 4/18/16, 5:10 PM
One of the central features of jazz (or any music) is rhythm. In the case of swing-based jazz, including bebop you have the upbeats of 2 and 4 emphasized. It's the opposite of rock. The Metheny track here has a typical rock beat, so it's a very odd target.
Also, unless I missed something the clips just play the network's attempt at duplicating the "head" of the track; not the soloing.
As a jazz musician I find this cool but I also feel safe that it won't be stealing gigs from me anytime soon.
by devin on 4/18/16, 4:14 PM
As a card-carrying jazz nerd, I am impressed. If there were more dynamics, some of these soundcloud examples would sound significantly better.
ETA: The default midi sound font doesn't do it any favors, either. I have some software instruments I could throw at this that would make it sound a whole lot better.
by brandonmenc on 4/18/16, 4:06 PM
Anyone interested in algorithmic jazz should check out Al Biles:
http://igm.rit.edu/~jabics/
by newobj on 4/18/16, 4:27 PM
The best part is that the resultant "jazz" sounds more like vaporwave[1].
[1] https://www.youtube.com/watch?v=PdpP0mXOlWM
by alexc05 on 4/18/16, 4:48 PM
That's funny, I was just researching this last week.
I stumbled across some music generators. A downloadable one http://duion.com/link/cgmusic-computer-generated-music
And http://www.abundant-music.com/
Both are "procedurally generated music" so I'm not sure where that falls in the AI spectrum.
I found that the quality was interesting and there was some potential there but at least in these cases, there were some issues with the quality of the midi instruments and song structure was very "same-y"
Anyways, Looking forward to poking around in the DeepJazz code.
by mpdehaan2 on 4/18/16, 5:17 PM
Always good to see more computer music projects.
I started on recently - and need to do more work on it - to do some things in a bit more of an object-oriented way trying to model more music theory concepts (like scales) as objects, not so much analyzing existing files but making the primatives you might need to build a sequencer (and eventually some generative stuff).
If people are interested check out:
https://github.com/mpdehaan/camp (in the README, there is mailing list info).
The next thing for me is to make an ASCII sequencer so it's a program that can also be used by people who can't code, and then I'll get back more into the generative parts.
by shams93 on 4/18/16, 6:16 PM
George Lewis wrote a realtime improv AI in forth back in the 90s it used midi so the sounds were like general midi at the time but the interplay between human trombone and the machine listening to his playing on the fly was amazing given the limitations of the machines at the time. To be AI jazz it has to be able to jam with humans or other machines. https://en.wikipedia.org/wiki/George_Lewis_(trombonist)
by ARothfusz on 4/18/16, 6:00 PM
I'd be more impressed if they had trained it on Pat Metheny and then given it "Mary Had a Little Lamb" and said "jazz this up"
by trsohmers on 4/18/16, 5:21 PM
Serious question: Who is the copyright holder on generated works? The program author? The person who wrote it? Do you have to give any sort of authorship credit to those who created the works in the mined data set? Copyright law in the 21st century is just getting more and more complicated...
by twic on 4/18/16, 7:07 PM
There's an enjoyable summary of some other efforts in neural network music synthesis here:
https://highnoongmt.wordpress.com/2015/08/11/deep-learning-f...
The same author's Endless Traditional Music Session supplies all the Irish session music you could ever need, by mechanical means:
http://www.eecs.qmul.ac.uk/~sturm/research/RNNIrishTrad/inde...
by phatbyte on 4/18/16, 10:24 PM
Awesome work, and this is quite interesting, something worth exploring with more depth that an hackaton can't provide.
Having said that, and as a Jazz fan, the generated music is horrible. Keep feeding it more jazz tunes :P
by gluelogic on 4/18/16, 8:41 PM
One thing that comes to mind is that, to me, it sounds like all of the notes' velocities are equal. It would sound a lot more natural if volume differences were incorporated
by granttimmerman on 4/18/16, 6:40 PM
I built a very similar project for classical music using Theano and MusicXML for a Sound Capstone Project at UW.
Blogpost + music: https://medium.com/@granttimmerman/algo-rhythm-music-composi...
GitHub: https://github.com/grant/algo-rhythm
by desireco42 on 4/18/16, 8:44 PM
I respect the criticism of people who love and listen to jazz quite a bit.
As someone who maybe is not as sophisticated in his taste for jazz, this sounds good enough for me. Especially this can be passed as elevator music.
On the other hand, it would be more valuable if there were more than a single file used for seeding. This way this is a theme that is listenable but will always have the same style of it's seed.
I intend to play with it and see if I can get more interesting melodies.
by imaginenore on 4/18/16, 7:19 PM
It's rendered with some really shitty sounding instruments. Run it through Ableton Live at least. Or even better, a specialized piano engine.
by pjdorrell on 4/18/16, 10:26 PM
When human composers attempt to compose original music, they have immediate access to their own subjective judgement of the quality of the music.
Until such time as we discover an algorithm that replicates human taste in music, any AI-based approach to composing music will fail because it will not have any feedback about the quality of the music.
by return0 on 4/18/16, 6:12 PM
It sounds like with a few epochs it captured some rhythmicity. The notes still sound random, but overall its promising. This is only a hackathon project, I 'm pretty sure we ll see more elaborate networks in the future that make acceptable jazz. Its gonna be a bit more difficult for other kinds of music, i guess.
by I_HALF_CATS on 4/18/16, 5:51 PM
Can someone explain to me the difference between this and the computer generated music David Cope of the early 1990s? https://youtu.be/yFImmDsNGdE?t=44s
It seems like the word 'AI' is getting thrown around.
by jbmorgado on 4/18/16, 9:41 PM
An improvement that should be quite straightforward and take you no more than a couple of hours is to use sampled sounds for recording the play.
It would massively improve the quality of the output and make it sound more "humane" IMO.
You can use the samples from www.freesound.org for instance.
by ryanmarsh on 4/18/16, 5:34 PM
Was expecting to hear some Blue Note, got frantic muzak. Humans are safe... for now it seems.
by genolilie on 4/21/16, 1:57 PM
https://www.youtube.com/watch?v=Fq6lypuUPeg
by squeaky-clean on 4/18/16, 8:19 PM
Even if it is a very limited model and the tracks get boring quickly like everyone is saying, this is still extremely cool. I really need to buy a new GPU that I can run Theano on.
by KON_Air on 4/18/16, 7:52 PM
Knowing next to nothing about musical terms I couldn't figure out the workflow of the AI. Does it generate note after note trying to follow the learned "structure"?
by sengork on 4/19/16, 12:17 AM
This reminds me of AWK Music http://kmkeen.com/awk-music/
by fiatjaf on 4/18/16, 8:51 PM
I like this because I don't like jazz.
by DonHopkins on 4/19/16, 12:49 AM
Hook it up to a speech synthesizer, to make Deep Scat!
I played around with looping different speech synthesizers back into different speech recognizers, kind of like audio or video feedback, but with chaotic noise injected like quirks of the synthesizer, the voice, speech speed and pitch, and the audio environment around the microphone (you could talk over it to interfere with the words it was speaking and lay down new words in the loop), working against the lawful pattern matching and error correction behavior of the speech recognizer, and the HMM language model it was trained with.
It was a lot like beat poetry, in that it tended to rhyme and have the same number of syllables and use plausible sounding sequences of words that didn't actually make any sense, like Sarah Palin.
You can start it out with a sensible sentence, and it will play the telephone game, distorting it again and again. If you slow down the speech rate, words will split into more words or syllables, and if you speed it up, words will collapse into fewer words or syllables, or you can tune the speech rate to maintain the same number of syllables. Its analogous to zooming the video camera in and out with video feedback.
It would wander aimlessly randomly around poetic landscapes, sometimes falling into strange attractors in the speech recognizer's hidden markov model and repeating itself with little or no variation.
At any time you can join in with your own voice and add words during the pause at the end of the loop, or talk over its voice, much the way you can hold things in front of the camera during video feedback to mix them in.
Different speech recognizers are better at recognizing different vocabularies, and therefore like to babble about different topics, depending on which data they were trained on, which we could guess by attepmting to psychoanalyze their incoherent babbling.
IBM's ViaVoice was apparently trained on a lot of newspaper articles about the Watergate hearings, as it was quite paranoid, but business like, as if it were dictating a memo, and would start chanting and fixating on phrases like "congressional investigation," and "burglary and wiretapping," and "convicted of conspiracy".
Microsoft's speech recognizer had obviously been trained on newspaper articles about the Clinton Lewinsky scandal, since it was quite obsessed with repeatedly chanting about blow jobs (just like the news of the time), and whenever you mentioned Clinton this or Clinton that, it would rapidly converge on Clinton Lewinsky, Clinton presidency, Clinton impeachment, etc.
What I'd love to have would be a speech recognizer that returns a pitch envelope and timing that you could apply back on the synthesized words, then it could sing to you!
by aaronlevin on 4/18/16, 4:25 PM
If you're interested in making deep-jazz more discoverable, consider applying to our Search team! :)
https://soundcloud.com/jobs/2016-02-19-search-engineer-berli...
by SubiculumCode on 4/18/16, 3:57 PM
Sorry. Not impressed.