from Hacker News

RegExpBuilder – Create regular expressions using chained methods

by jrullmann on 2/11/15, 2:59 PM with 54 comments

by draegtun on 2/11/15, 6:21 PM
Thought this might be of interest; below shows how the examples provided would look in Rebol:
```
    digits: digit: charset "0123456789"

    rule: [
        thru "$"
        some digits
        "."
        digit
        digit
    ]

    parse "$10.00" rule    ;; true


    pattern: [
        some "p"
        2 "q" any "q"
    ]

    new-rule: [
        2 pattern
    ]

    parse "pqqpqq" new-rule    ;; true
```
Rebol doesn't have regular expressions instead it comes with a parse dialect which is a TDPL - http://en.wikipedia.org/wiki/Top-down_parsing_language
Some parse refs: http://en.wikibooks.org/wiki/REBOL_Programming/Language_Feat... | http://www.rebol.net/wiki/Parse_Project | http://www.rebol.com/r3/docs/concepts/parsing-summary.html
by tragomaskhalos on 2/11/15, 4:21 PM
There have been many efforts similar to this in many languages, but most of us seem happy to stick to the more succinct canonical form, supplemented via /x # comments when things get too hairy
by marktangotango on 2/11/15, 4:52 PM
Generally, I find that if one's regexes are so complex that one needs visualizers or other aids in writing them, one doesn't have a regex problem, but a parsing problem. The method of parsing by recursive descent can often lead to much more understandable (if more verbose) "pattern matching".
by UnoriginalGuy on 2/11/15, 5:02 PM
Looks like Linq (from .Net/C#). Pretty sexy way to write Regular Expressions if you ask me.
I've "learned" regular expressions multiple times but it just never sticks, I have no idea why. It certainly doesn't help that there are several different incompatible syntaxes (so what I remember and think "should" work doesn't).
I'd prefer to write RegX's in this style, however I would pay attention to performance (not that Regular Expressions are high performance, however I wouldn't want to see a large performance loss either).
by chris-at on 2/11/15, 3:12 PM
Thanks, this is a lot better than writing this (even if the formatting worked here):
``` (?xi) \b ( # Capture 1: entire matched URL (?: [a-z][\w-]+: # URL protocol and colon (?: /{1,3} # 1-3 slashes | # or [a-z0-9%] # Single letter or digit or '%' # (Trying not to match e.g. "URI::Escape") ) | # or www\d{0,3}[.] # "www.", "www1.", "www2." … "www999." | # or [a-z0-9.\-]+[.][a-z]{2,4}/ # looks like domain name followed by a slash ) (?: # One or more: [^\s()<>]+ # Run of non-space, non-()<> | # or $([^\s()<>]+|(\([^\s()<>]+$))\) # balanced parens, up to 2 levels )+ (?: # End with: $([^\s()<>]+|(\([^\s()<>]+$))\) # balanced parens, up to 2 levels | # or [^\s`!()\[\]{};:'".,<>?«»“”‘’] # not a space or one of these punct chars ) ) ```
by jluxenberg on 2/11/15, 6:02 PM
S-expressions are a natural fit for construction of regular expressions, see http://community.schemewiki.org/?scheme-faq-programming#H-1w...
e.g.
```
  (: (or (in ("az")) (in ("AZ"))) 
    (* (uncase (in ("az09")))))
```
by jgalt212 on 2/11/15, 5:18 PM
Definitely a debugable way to write regexes. Whenever I have to maintain a hairy regex, I like to plot the regex as a railroad diagram.
These web based tools can do it:
https://www.debuggex.com/
http://jex.im/regulex/
by dkarapetyan on 2/11/15, 4:56 PM
Generalize just a little bit and you got parser combinators.
by zzzcpan on 2/11/15, 10:55 PM
Regexpes exist to avoid cumbersome code like this, to make it less error prone. Makes me sad to see so many upvotes.
I get that some people have a hard time understanding regexpes with all the backtracking and greediness. Yes, syntax is a bit complicated. Maybe simplified predictable default mode could help. But there is no problem with DSL being used as an abstraction. In fact, we need more DSLs, for everything!
by psychometry on 2/11/15, 5:19 PM
Now you have three problems.

by kazinator on 2/11/15, 6:20 PM

Yes, regexes can have other syntactic representations, like:

    (compound "$" (1+ :digit) "." :digit :digit)

Run:

    $ txr -p "(regex-compile '(compound \"$\" (1+ :digit) \".\" :digit :digit))"
    #/$\d+\.\d\d/

by epicureanideal on 2/11/15, 7:45 PM
Nice work! I don't know if it'll be ideal for all use cases, but it does add some readability.
by otakucode on 2/11/15, 10:56 PM
Now do an example where you create a regex to parse the IMDB movies.list data file!
by gcao on 2/11/15, 4:17 PM
Great work! This is very intriguing!