from Hacker News

Writing a minimal Lua implementation with a virtual machine from scratch in Rust

by finite_jest on 1/16/22, 1:54 AM with 29 comments

  • by jstimpfle on 1/16/22, 10:11 AM

    Being that tokens are the leaves of the AST, there are a lot of them and they can take a lot of space. To save memory it is a good idea to store only a file location instead of a full token. Whenever token information is needed, just lex again to get the full token, starting at the file location. This works only for languages with a context-free lexical syntax, of course (and not entirely sure "context-free" is the right term here but you get what I mean).

    Storing row/column in file location data is wasteful - just a file offset should be enough. Whenever the row/column coordinates are needed (normally only in user messages) they can be quickly recomputed.

    In effect, parsed tokens can be stored as just an offset - a 4 or 8 byte integer.

  • by da39a3ee on 1/16/22, 5:11 AM

    The article looks great and I’m looking forward to reading it; this comment is not a criticism of the article.

    This API is the only bad thing about Rust!

      .expect("Could not read file")
    
    It’s so unfortunate to have an API that reads

      .expect("thing we don’t expect")
    
    I think we should all just forget it’s there and use

      .unwrap_or_else(|| panic!(“thing we don’t expect”))
  • by duped on 1/16/22, 9:51 AM

    Working on tokenization and parsing there have been two "lights clicking on" moments that I think every dev working on a PL implementation should have :

    - Tokens are the leaves of your syntax trees

    - File locations are relative, not absolute

    It's easier to build a parser that doesn't buy into these things, but it's way harder to build tooling and good error messaging if you don't.

  • by eatonphil on 1/16/22, 6:37 AM

    Hey folks just saw this, author here. Happy to answer questions!
  • by xvilka on 1/16/22, 4:29 AM

    There's also Luster[1].

    [1] https://github.com/kyren/luster

  • by cgoto89798 on 1/16/22, 5:04 AM

    Does Rust have computed goto, which really helps interpreter speed?

    It basically means you can do something like "goto opcode_table[*(++ip)];"

    GCC offers it as a non-standard extension to C.

      https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html
    
    FORTRAN has had it since 1957. But Pascal and C purged "evil computed GOTO" and only offered non-computed goto. Then Java etc. purged non-computed goto.
  • by debdut on 1/16/22, 5:17 AM

    Thanks for sharing! A great learning