from Hacker News

Show HN: Python can make 3M+ WebSocket keys per second

by cprogrammer1994 on 7/3/23, 7:06 AM with 70 comments

  • by throwaway2037 on 7/3/23, 8:27 AM

    The hardest part of debugging Python is "hitting the wall" when you come to a native library (compiled C code). And Python has achieved a lot of speed-ups from Py2 to Py3 by adding more compiled C code. This is a real blocker for understanding the foundation library.

    On the other hand, in Java, with a few exceptions around Swing (native painting for GUIs) in Java, almost everything is written in pure Java, so you can debug all the way down if need be. It is a huge help for understanding the foundation library and all of its edge cases (normal for any huge library). Modern Java debuggers, like IntelliJ, are so crazy, they will decompile JARs and allow you to set debug breakpoints and step-into decompiled code. It is mind blowing when trying to debug a library that you don't own the source code (random quant lib, ancient auth lib, etc.).

  • by phoe-krk on 7/3/23, 11:56 AM

    > Show HN: Python can make 3M+ WebSocket keys per second

    > This article is about optimizing a tiny bit of Python code by replacing it with its C++ counterpart.

    So it's C++ rather than Python.

  • by jeroenhd on 7/3/23, 10:36 AM

    This has nothing to do with websockets and much more with doing hashing and Python-to-native calls. It's comparing generating base64(sha1(something)) in Python which I suppose also means "websocket keys".

    I'm not sure why the author implemented SHA1 and a base64 digest thereof manually rather than including a small library, but perhaps that was part of the challenge.

    Python can generate a whole lot more keys per second if you enable SIMD, multithreading, or even GPU support. In fact, Ryzen / 11th+ Gen Intel/ARMv8A have dedicated SHA1 instructions that should significantly boost performance here. Together with something like https://github.com/WojciechMula/base64-avx512 I bet you could increase the performance an order of magnitude if daw CPU speed were really a concern.

    I suppose three million keys per second ought to be enough for any websocket server, especially for a relatively simple implementation of the code.

  • by Someone on 7/3/23, 7:26 AM

    The title should be “optimization-demo” (original title) or “Replacing parts of Python programs by C++ can be easy and profitable”.

    They replace Python code that makes 5 calls into native code by code that makes 1 call that makes those 5 calls, and get a speed up from 869k calls per second to 3.15m calls per second, so a snarky title could even be “Python-to-native calls are slow”.

    They could even measure it by adding a C++ version of that

      def magic_accept(key: str) -> str:
        return 's3pPLMBiTxaQ9kYGzzhZRbK+xOo='
    
    code and benchmarking that.
  • by k__ on 7/3/23, 10:17 AM

    "...by replacing it with its C++..."

    Nice try!

  • by grodes on 7/3/23, 9:51 AM

    Python can make 3M+ WebSocket keys per second C++ 85.1%
  • by tyingq on 7/3/23, 12:32 PM

    I suspect part of the speedup is avoiding the python included base64 implementation. Third party extensions[1] claim fairly large improvements.

    [1] https://github.com/mayeut/pybase64

    This particular one also includes b64encode_as_string, which would also reduce some work/copying.

  • by akx on 7/3/23, 1:27 PM

    While the article clearly says this is a toy/example and so on, one of the nice points of the Python version is that it doesn't e.g.

    - segfault the interpreter if you pass in something that's not a string

    - read bogus memory if the length of the string is < 24

  • by raverbashing on 7/3/23, 12:54 PM

    Ok now try the python version with Pypy and see how it goes
  • by joshxyz on 7/3/23, 1:35 PM

    the tragedy though is websocket tls is what will actually slow us down