by huan9huan on 12/5/16, 3:36 AM with 28 comments
by savanaly on 12/5/16, 4:53 PM
The actual thing they're reporting is:
'“You need to be able to say from 3 seconds and 50 milliseconds to 78 milliseconds, this instrument is playing an A. But that’s impractical or impossible for even an expert musician to track with that degree of accuracy.”
The UW research team overcame that challenge by applying a technique called dynamic time warping — which aligns similar content happening at different speeds — to classical music performances. This allowed them to synch a real performance, such as Beethoven’s ‘Serioso’ string quartet, to a synthesized version of the same piece that already contained the desired musical notations and scoring in digital form.
Time warping and mapping that digital scoring back onto the original performance yields the precise timing and details of individual notes that make it easier for machine learning algorithms to learn from musical data.'
It also mentions that they attempted to apply existing deep learning algorithms designed for speech recognition to their new dataset, hoping to be able to accomplish a task such as predict a single missing note from a long string of notes. It does not say whether this worked.
by ktRolster on 12/5/16, 5:43 PM
by haberman on 12/5/16, 4:47 PM
So to me this seems more directly applicable to transcription (ie. taking audio and turning it into sheet music) or synthesis (taking sheet music and turning it into audio of a human-sounding performance) than it does to composition or finishing unfinished works by famous composers. The output of the compositional process is generally sheet music, not audio, so it seems to make more sense that problems around composition would be trained and learn in the sheet music domain.
I'm not a machine learning researcher though! This is just my impression as a musician.
by haberman on 12/5/16, 10:50 PM
http://www.markheadrick.com/midi/absmfaq.txt
In section 1.4 they very emphatically state that "with current technology, IT CAN'T BE DONE."
They conclude: "Think of it this way: If you don't mind spending more than the US national debt on computer equipment and waiting a few years for the job to complete, you can have a system that MIGHT accurately convert the digital waveform data of a 5 minute song into a small, compact MIDI file.
Otherwise, you can blow a couple of thousand dollars hiring a professional band of studio musicians and engineers who can probably give you what you want in about one day."
It is humorous for its emphatic-ness, but also educational for being a picture into how we've historically thought about this problem.
by pierrec on 12/5/16, 5:56 PM
As a composer, the coolest potential I see here is training a model to create realistic mockups from MIDI compositions. For that purpose, though, it would be better to start with a fully monophonic/solo-instrument dataset, which would simplify the learning. Also, MIDI data is not entirely sufficient: annotations on dynamics and playing technique would be necessary to make a good mockup tool, since this is the kind of information one might even give to human performers.
Anyways, it would be tough for such a tool to catch up with current state-of-the-art, sample-based mockup tools, which are already baffling in their realism, although they usually require a lot of work to get good results. But one can always dream of a "Stokowski" or "Karajan" neural network that interprets your MIDI composition with emotion and sensibility!
by mrcactu5 on 12/5/16, 5:48 PM
Another problem is once you have the music there's a tremendous amount of "interpretation" that a musician does. the nodes may each read 1/8 but a musician might add or subtract 1/64 has he/she feels is good.
other times the change is more mathematical 1/8+1/8+1/8 might have to actually be read 1/12+1/12+1/12 = 1/4 but that is much easier to fit into a computer
I have said nothing of dynamics (loud/quiet), articulation (stoccatto, slurring etc).
scores are available in IMSLP and other sources. but are computer files available as well?
by gattilorenz on 12/6/16, 8:46 AM
http://www.global-supercomputing.com/people/kemal.ebcioglu/p...
Unfortunately I can't seem to find the samples now, but to my (untrained) ear they sounded as Bach as the real thing.
by Gaussian on 12/5/16, 10:39 PM
by lalos on 12/5/16, 5:59 PM