from Hacker News

Lessons learned from doing the one billion row challenge

by anthony88 on 2/23/24, 1:01 PM with 11 comments

  • by sampo on 2/26/24, 7:34 PM

    > My implementation ... All the results are incorrect. The station names should be sorted alphabetically but the last station showing is İzmir and it should be Zürich.

    It is easy to forget, that names of places can have non-ASCII characters. As this is a speed contest, I wonder how slow the default library implementation for ordering unicode strings alphabetically is in Java?

    Edit: Apparently there is no universal way to order words alphabetically, but it depends on the (human) language in question.

    For example, İzmir is in Turkey and in Turkish alphabetical ordering the dotted capital İ comes after the dotless capital I. And in Turkish, Ö comes right after O, but for example in Swedish, the Swedish special letters Å, Ä and Ö are at the very end of the alphabet.

    How are you supposed to deal with this in this contest? Are you somehow supposed to know that Özalp is a town in Turkey, and thus comes after O in alphabetical ordering, but Örebro is a town in Sweden and should be ordered to the very end of the alphabetical ordering, after Z and Å and Ä?

  • by kubb on 2/26/24, 7:28 PM

    JVM arguments as an optimization technique give me that winter melancholy.
  • by NicoJuicy on 2/26/24, 7:19 PM

    This is weird. I've read about the 1brc and the first thing I remember is that you need to establish the base result in your pc vs. The metric and then normalize the results to that benchmark.

    This post doesn't seem to take that into account.

  • by netcraft on 2/26/24, 7:23 PM

    Is anyone doing the challenge on other platforms besides java?