from Hacker News

Hashing Modulo Theories

by philzook on 5/17/24, 8:51 PM with 3 comments

  • by yarg on 5/19/24, 5:14 AM

    (What the article calls) Canonisation is an interesting issue - it seems very hard to get right and to do so efficiently.

    I'm trying to wrap my head around a the issue in a regex parser that I've knocked up.

    I'm currently ending up (expectedly) with multiple nodes representing equivalent languages; I want to strip these out when I use code generation to convert the constructed network into a switch based static automaton.

    The most robust way to see whether two languages are the same is to xor them and test for interminability - but this means comparing each pair of nodes throughout the network, and I'd rather avoid the n^2 scaling if there's another option.

    That option is to generate a canonical expression for the language that the machine represents, somewhat difficult but far more efficient when it comes to detecting collisions.

    https://en.wikipedia.org/wiki/Canonization

    https://en.wikipedia.org/wiki/Canonicalization