from Hacker News

The case for a modern language

by bshanks on 1/22/22, 5:58 AM with 196 comments

  • by prirun on 1/22/22, 2:27 PM

    PL/I, created in 1964, had strings. Real strings, where the compiler knows the length even when it gets passed around and is declared char(*) var in the receiving function. You can't have buffer overflows because the compiler and runtime know every string's current length and allocated length.

    This isn't a particularly hard problem. C just took a shitty shortcut to fake strings using byte arrays and the world glombed onto it. Now we're stuck with a crappy "standard" that people should have scoffed at when it first showed its ugly face.

  • by ErikCorry on 1/22/22, 8:47 AM

    In practice it's _worse_ than that because you probably don't want a "long", you probably want a particular size like a 64 bit integer. So you have to add ifdefs to call either strtol or strtoll depending on the size of "long" and "long long".

    And if you are using base 16 then strtol will allow an optional "0x" prefix. So if you didn't want that you have to check for it manually.

    Strtol also accepts leading whitespace so if you didn't want that you have to test manually for it.

    Don't pass a zero base thinking it means base ten. This works almost all the time but misinterprets a leading zero to mean octal.

    Good luck!

  • by WalterBright on 1/22/22, 8:35 AM

    The #1 problem with C is buffer overflows. The solution is pretty simple:

    https://www.digitalmars.com/articles/C-biggest-mistake.html

    and does not break existing code.

  • by dottedmag on 1/22/22, 7:41 AM

    Seriously, the biggest gripe about C is the design of standard library?

    Not the pervasive undefined behaviour and compilers that become more aggressive every release about breaking previously-working code?

    Not the reams of code that assume sizes of integers and signedness of char?

    Not the wild build process that makes it awfully hard to actually build anything that has any dependencies whatsoever.

    strtol. Damn, what a nuisance!

  • by stncls on 1/22/22, 6:23 PM

    From the article:

      char *forty_two_bee = "42b";
      char *end;
      errno = 0; // remember errno?
    
      long i = strtol(forty_two_bee, &end, 10);
    
    > This will return 0

    No, this will return 42. strtol() parses greedily until a character cannot be parsed, but then it returns the conversion of what it did parse.

    I guess the fact the author got this wrong... kind of proves their point that strtol()'s API is not great?

    On the other hand, while the article purports to criticize a language, it then proceeds to only cover its standard library. Sure, C's stdlib is old-fashioned, but there are many things in C that are much worse than its standard library! (And I say that as someone who still likes the language.)

  • by dcposch on 1/22/22, 9:30 PM

    Author mentions four increasingly obscure C replacements (first I've heard of Odin) without mentioning that the creators of the original C and Unix went on to make Go.

    Go does not have manual memory management. Despite (actually because of) that captures the spirit and design goal of the original C beautifully. It's a minimalist systems programming language.

    One of the amazing things about Go is the standard library-- the thing he complains about with C. The Go standard library is incredibly readable. It's night and day from C/C++ where opening glibc/STL etc is assault on the senses.

  • by skywhopper on 1/22/22, 1:01 PM

    What a weird post. The examples from Rust and Zig don’t fail gracefully, so they can’t be considered complete. Panicking on bad user input is bad code, too. And the main complaint seems to be that the C stdlib could be improved. But where it has been improved, the author complains that it’s really just doing the ugly stuff under the hood. What does the author think the Rust stdlib function is doing exactly?
  • by tragomaskhalos on 1/22/22, 2:42 PM

    Either I'm going mad - in which case please set me straight - or the Rust example doesn't even compile: had to remove the odd-looking borrows on the method calls, and replace the type annotation in the final 'if let' with a turbofish on the call.
  • by gumby on 1/22/22, 5:04 PM

    > It exists because it became part of the POSIX standard way back when a pdp7 was an advanced computer…

    The PDP-7 was long obsolete by the time the POSIX effort started. By then the most common Unix host was a VAX (32 bits), though it, or Unix-alikes, ran on a variety of 16 and 32 bit machines, hence a desire for standardization.

  • by creativemonkeys on 1/22/22, 11:00 AM

    One of C's design principles is to be fast at the cost of safety, just like an F1 formula car. It will let you make fast mistakes.

    You drove a Corolla in college, then got a job and drove a cool BMW for several years and now you think you're hot shit, so you hope in an F1 car and not only does it take forever to learn how to drive it, it has to be driven on a special track and the gearbox is different, what a nuisance!

    "If only we could add 4 doors, automatic transmission, snow tires, and a trunk to put our stuff in, people won't keep getting into accidents with this car", you say. Right, but then it becomes a BMW. If you want real speed, you need to first go slow and master the car because otherwise you'll crash and burn.

    C is messy because real world hardware is very messy. You can't push bytes through the hardware at its speed limit without getting your hands dirty, and we all come out into the real world wearing "class Dog extends Animal" white gloves.

    To use C effectively, you should not be coding in C in your mind. You should be thinking in assembly, but your fingers should be typing C code. It's not safe, but if you want to reach 230MPH and accelerate at 60MPH in 2.6 seconds, you better know exactly what you're doing when you hop behind the wheel of that car. It's not for the weak.

  • by alkonaut on 1/22/22, 3:13 PM

    Wait is this article saying that there is no good/obvious/standard function to parse a string to a number and has the two obvious outputs of such a function (the number, and a bool or error code)?

    Even a person in the 60s would realize that that’s the api for conversion from a string to a number (or any conversion that might fail)! What happened? Why do these functions even exist?

  • by kazinator on 1/22/22, 7:48 AM

    This isn't the usual way this is coded:

      char *one = "one";
      char *end;
      errno = 0; // remember errno?
      long i = strtol(one, &end, 10);
      if (errno != 0) {
          perror("Error parsing integer from string: ");
      } else if (i == 0 && end == one) {
          fprintf(stderr, "Error: invalid input: %s\n", one);
      } else if (i == 0 && *end != '\0') {
          f__kMeGently(with_a_chainsaw); 
      }
    
    It's actually like this:

      errno = 0;
    
      long i = strtol(input, &end, 10);
    
      if (end == input) {
        // no digits were found
      } else if (*end != 0 && no_ignore_trailing_junk) {
        // unwanted trailing junk
      } else if ((i == LONG_MIN || i == LONG_MAX)) && errno != 0) {
        // overflow case
      } else {
        // good!
      }
    
    errno only needs to be checked in the LONG_MIN or LONG_MAX case. These cares are ambiguous: LONG_MIN and LONG_MAX are valid values of type long, and they are used for reporting an underflow or overflow. Therefore errno is reset to zero first. Otherwise what if errno contains a nonzero value, and LONG_MAX happens to be a valid, non-overflowing value out of the function?

    Anyway, you cannot get away from handling these cases no matter how you implement integer scanning; they are inherent to the problem.

    It's not strtol's fault that the string could be empty, or that it could have a valid number followed by junk.

    Overflows stem from the use of a fixed-width integer. But even if you use bignums, and parse them from a stream (e.g. network), you may need to set a cutoff: what if a malicious user feeds you an endless stream of digits?

    The bit with errno is a bit silly; given that the function's has enough parameters that it could have been dispensed with. We could write a function which is invoked exactly like strtoul, but which, in the overflow case, sets the *end pointer to NULL:

      // no assignment to errno before strtol
    
      int i = my_strtoul(input, &end, 10);
    
      if (end == 0) {
        // underflow or overflow, indicated by LONG_MIN or LONG_MAX value
      } else if (end == input) {
        // no digits were found
      } else if (*end != 0 && no_ignore_trailing_junk) {
        // unwanted trailing junk, but i is good
      } else {
        // no trailing junk, value in i
      }
    
    errno is a pig; under multiple threads, it has to access a thread local value. E.g

      #define errno (*__thread_specific_errno_location())
    
    The designer of strtoul didn't do this likely because of the overriding requirement that the end pointer is advanced past whatever the function was able to recognize as a number, no matter what. This is lets the programmer write a tokenizer which can diagnose the overflow error, and then keep going with the next token.
  • by PaulDavisThe1st on 1/22/22, 8:09 PM

    strto*() is the wrong API to use if you care about errors.

      char* forty_two = 42;
      int i;
      if (sscanf (forty_two, "%d", &i) != 1) {
          /* error */
      }
    
    Sometimes, there's more than one way to skin a cat, and one of them is more suited to the task at hand.
  • by AtlasBarfed on 1/24/22, 12:56 AM

    I thought Zig didn't have unicode strings?

    https://www.reddit.com/r/Zig/comments/9q3or3/how_to_deal_wit...

    If that's true, Zig is NOT a modern language. Modern languages use international strings, and are unicode aware with a good unicode aware string library.

    For crap's sake, the code example for comparing modern languages USES A STRING. The fact it is not unicode doesn't matter.

  • by EVa5I7bHFq9mnYK on 1/22/22, 7:32 PM

    That case has existed for 40 years now, yet C still stands. Guess its the power of network effect.
  • by futharkshill on 1/22/22, 10:40 AM

    If a user wants to parse integers etc. from a string, the function snprintf and family is often applied. It is a neatly simple function. This article seems to invent a problem rather than an organic one.
  • by treeshateorcs on 1/23/22, 6:14 PM

    the rss feed is broken on that site, it outputs relative links (as opposed to absolute links)
  • by gengiskush on 1/22/22, 12:27 PM

    How about leaving the old stuff you want to "replace" alone? People are using it.