by iheredia on 5/12/17, 1:28 PM with 29 comments
by 6stringmerc on 5/12/17, 5:24 PM
Taylor Swift, Britney Spears, Kelly Clarkson, NSYNC, Bieber, Katy Perry, Demi Lovato...Max Martin's fingerprints are all over the hits. He has a defined style as well. Balanced lines. It's brilliant. Thus, it's not about the performers if you want to study the composition - you have to go to the actual composer(s).
Just saying that this is a sound technique and approach but looking at the data set at the exclusion of pertinent considerations. Revised, it would make for an interesting story.
by gwern on 5/12/17, 4:56 PM
by geluso on 5/12/17, 7:45 PM
by IgorPartola on 5/12/17, 6:12 PM
by stuffedBelly on 5/12/17, 8:34 PM
by woliveirajr on 5/12/17, 5:46 PM
There is some theory out there called Kolmogorov Complexity [0]. It says that something is as complex as how much information you need to express it. In your case, lyrics are as complex as how many symbols (letters? words? bytes?) you need to represent it.
And one good way to calculate it is as you done: compress it. If you're using the same compression method for all the lyrics, you'll find that the ones that are more simple (and more repetitive) are the ones that have a great reduction on their sizes. In that case, the choice of which compression method you use is somehow irrelevant. Had you used Bzip, PPMD, etc., the results probably would be similar.
In case you want to extend your research, for example, as 6stringmerc said, you might consider that the composer matters more than the actual artist.
And, for that, you can use Normalized Compression Distance (NCD) [1]. That way you can measure how two lyrics are similar. Basicaly, you compress those lyrics together. When they are similar, clues from one are used by the compression to also compress the second one, so similar lyrics get more compression than lyrics that aren't related.
And by doing that you can even discover who was the composer of the songs, i.e., the authorship of the lyrics, since each person usually has the same writing style... [2]
[0] https://en.wikipedia.org/wiki/Kolmogorov_complexity
[1] https://en.wikipedia.org/wiki/Normalized_compression_distanc...
[2] https://link.springer.com/chapter/10.1007%2F978-3-642-34475-...
by marzell on 5/12/17, 5:00 PM
by twiss on 5/12/17, 5:31 PM
by rayuela on 5/12/17, 5:15 PM
by sn9 on 5/12/17, 9:02 PM
by ashark on 5/12/17, 9:41 PM
by cttet on 5/15/17, 5:22 AM
by marzell on 5/12/17, 10:42 PM