from Hacker News

Show HN: Regex Cheatsheet

by geongeorgek on 1/31/20, 10:32 AM with 129 comments

by robert_tweed on 1/31/20, 4:05 PM
OK, these kinds of regex tools get posted quite often. I get it, regex is very confusing at first. And some of these use-cases result in rather complex expressions nobody should be forced to write from scratch (you are still remembering to write unit tests for them though, right?)
But as someone who actually knows [some flavours of] regex fairly well, what I would really like, is a reference that covers all the subtle differences between the various regex engines, along with community-managed documentation (perhaps wiki pages) of which applications & API versions use which flavour of regex.
For example, the other day I wanted to run a find on my NAS. I needed to use a regex, but the Busybox version of find doesn't support the iregex option, so all expressions are case-sensitive. With some googling, I was able to find out that the default regex type is Emacs, but I wasn't able to find either a good reference for exactly what Emacs regex does and doesn't support, nor any information about how to set the "i" flag. In the end I had to manually convert every character into a class (like [aA] for "a") which was tedious, but quicker than trying to find a better solution or resorting to grep.
A related, annoyingly common pattern is that the documentation for `find` states that `--regex` specifies a regex, but it does not state which flavour of regex. The documentation for certain versions of `find`, which support alternative engines, note that the default is Emacs. From this I was able to infer (perhaps wrongly) that the Busybox `find` uses Emacs-flavoured regex, but ultimate I still had to resort to some trial-and-error. This problem is all too common in API documentation.
by crispyambulance on 1/31/20, 1:57 PM
I use regex a lot but deliberately keep it simple.
One thing that confounded me often was positive and negative look-arounds. I always got the expressions mixed up, until I just put the expressions into a table like this...
```
              look-behind  |  look-ahead
    ------------------------------------
    positive    (?<=a)b    |    a(?=b)
    ------------------------------------
    negative    (?<!a)b    |    a(?!b)
```
It's not hard, but for whatever reason my brain had trouble remembering the usage because every time I looked it up, each of those expressions was nested in a paragraph of explanation, and I could not see the simple intuitive pattern.
Putting it into a simple visualization helps a lot.
Now, if I can find a similar mnemonic for backreferences !?
by darau1 on 1/31/20, 2:27 PM
Nobody pointed it out, but there's also https://regexr.com/
It's how I learned regex years ago, and I still use it today to test/build more complex patterns.
by __tk__ on 1/31/20, 1:26 PM
I'm loving the graphs which for the first time in years are giving me an idea of what an expression is actually doing. Just because the visualization is kept in a form that is easy to understand with a programming background but can also be translated to the expression itself in a straightforward manner.
by geongeorgek on 1/31/20, 10:34 AM
I used to spend hours trying to craft the perfect expression for my scraping projects not realizing that I don't really know regex.
This tool is a cheat sheet that also explains the commonly used expressions so that you understand it.
- There is a visual representation of the regular expression (thanks to regexpr)
- The application shows matching strings which you can play around
- Expressions can be edited and these are instantly validated
by StavrosK on 1/31/20, 1:31 PM
I love regex and have no trouble reading them, but still love this tool, great job. I especially like the railroad diagrams, for those cases where I brainfarted on a regex and it's doing something other than what I intended. Thanks for this.
by lfglopes on 1/31/20, 2:54 PM
I used to use this site http://txt2re.com which is now off the grid, at the least since yesterday. :(
Unlike most regex helpers, in this one you would start with the text you want to filter/parse and then it would suggest you possible extractions.
Do you know any alternatives?
by rubyn00bie on 1/31/20, 9:17 PM
Nice work on this!
Something subtle, but I quite loved the email regex is, IMHO, close to perfect: \S+@\S+\.\S+
Because the "perfect" one is just absurd, and no one realizes it's going to be so fucking absurd until they start getting support cases and then go read something like this: https://stackoverflow.com/a/201378/931209
> If you want to get fancy and pedantic, implement a complete state engine. A regular expression can only act as a rudimentary filter. The problem with regular expressions is that telling someone that their perfectly valid e-mail address is invalid (a false positive) because your regular expression can't handle it is just rude and impolite from the user's perspective.
by philshem on 1/31/20, 2:14 PM
I have a secret hobby of answering python + regex questions on stackoverflow with pure python.
by vzidex on 1/31/20, 3:16 PM
Very cool! The site that worked best for me to learn regex was https://regexcrossword.com/ - after solving my way through all of them (I got really hooked when I discovered the site) I found I was alright at regex.
by adambowles on 1/31/20, 3:47 PM
>/h.llo/ the '.' matches any one character other than a new line character... matches 'hello', 'hallo' but not 'h llo'
in the cheatsheet is false. (https://regexr.com/4tc48)
`.` can match any character except linebreaks (including whitespace)

by dana321 on 1/31/20, 4:22 PM

One thing i've always missed from the Perl programming language is the regex operators.

You could do:

  my $var='foo foo bar and more bar foo!!!';

  if($var=~/(foo|bar)/g){  # does the variable contain foo or bar?

    print "foo! $1 removing foo..\n";

    # remove our value..

    $var=~s/$1//g;

  }

by asicsp on 1/31/20, 3:03 PM
neat site! clicking an example opens up a playground with live update and explanation and railroad diagrams, similar to sites like regex101[1] and regulex[2]
one suggestion would be to mention clearly which tool/language is being used, regex has no unified standard.. based on "Cheatsheet adapted" message at the bottom, I think it is for JavaScript. I wrote a book on js regexp last year, and I have post for cheatsheet too [3]
[1] https://regex101.com/
[2] https://jex.im/regulex
[3] https://learnbyexample.github.io/cheatsheet/javascript/javas...
by Glench on 1/31/20, 1:39 PM
Plug for Verbal Expressions (no affiliation), which has an alternate way of compiling more human-readable regexes for a dozen languages: http://verbalexpressions.github.io/
by mimixco on 1/31/20, 1:44 PM
This is awesome! Thank you! I hate regex, too, but I love your inline railroad diagramming tool.
by superasn on 1/31/20, 5:31 PM
Regex are quite simple and useful but my only issue is with those recursive things. Like how do you match balanced brackets? I have a regex (pcre) copy-pasted for it but for the life of me I don't get it or maybe nod my head but instantly ununderstand it. I wish there was a simple to understand doc that teaches to me how I can match something like:
```
    "(this is inside a bracket (and this is nested or (double nested)))
```
P.S. I know token parsing is better for these things but still I just want to learn the other thing too.
by xxsaculxx on 1/31/20, 3:16 PM
Nice tool! I personally use https://regex101.com/ as I like the explanations and quick reference.
by sylvanaar on 1/31/20, 10:02 PM
Nothing will ever beat RegexBuddy when it comes to Regex tools. It is an entire IDE just for regex, and has been my not-so-secret weapon for a decade or more.
by kitd on 1/31/20, 1:50 PM
This is really cool!
2 points:
1. it fiddled with my back button which is a bit annoying
2. a better email sample is
```
    ^[^@]+@[^@]+\.[^@]+$ 
```
which removes the 2 ampersands problem.
by dan_hawkins on 1/31/20, 2:40 PM
Is there a bug? In regexp for IPv4: https://ihateregex.io/expr/ip expression ends with {3} but the diagram states "2 times" in lower right - shouldn't it say "3 times"?
by KenanSulayman on 1/31/20, 3:09 PM
I don't understand why the Github repository lists regexper as the source of the visual graph code but the frame only shows iHateRegex as watermark?
If the only thing that is embedded in that frame was taken entirely from a different project, that project should at least be mentioned in the frame.
by hyperpape on 1/31/20, 4:47 PM
Really nice idea.
I found that you can see your own regex with railroad diagram by going to one of the prepopulated examples and editing it. However, it wasn't clear to me that's the intended use of the tool. It's either a little side-effect, or not super-discoverable.
by mNovak on 1/31/20, 7:39 PM
I always refer back to http://rexegg.com/ Not a tool as such, but a good reference if you know how it works and just need to refresh on syntax.
by kazinator on 1/31/20, 9:23 PM
There is no way I would just plop that IPv6 regex into any serious program. :)
by Diti on 2/1/20, 11:58 AM
For the love of god, PLEASE DON’T USE REGEX TO VALIDATE EMAIL. The RegEx of this website ignores plus-addressing, for example. All you need to do to validate email is send a verification email.
by axegon on 1/31/20, 2:52 PM
This is awesome but.... I don't hate regex. Matter of fact, I love regex.
by Amarok on 1/31/20, 2:24 PM
^[a-z0-9_-]{3,15}$
The username reference doesn't match 16 characters as claimed
by chenster on 1/31/20, 11:09 PM
For email specific regular expression, it's all covered on https://emailregex.com
by binarysneaker on 1/31/20, 4:41 PM
These regexs are garbage. Others have suggested better sites for learning how to construct regexs, and stackoverflow has plenty of great examples.
by olalonde on 1/31/20, 4:23 PM
Thumbs up for the relatable domain name.
by esaym on 1/31/20, 6:16 PM
Either I'm a regex wizard and don't know it, or perhaps I think I know something but know nothing at all but I've never complained about using regex expressions. I use them all the time without thought. Never quite figured out the need for a cheatsheet either, your language of choice should have a good documentation page for any specific supported syntax.
by hamid_ra on 2/1/20, 9:47 PM
love the idea! I would crowdsource it so people can add their regex and vote on other people rexgexes!
by ape4 on 1/31/20, 2:36 PM
The IPv6 regex is surprisingly complicated.
by samat on 1/31/20, 10:09 PM
This is very neat, thank you!
by blauditore on 1/31/20, 2:41 PM
Would be nice to have a regex for parsing HTML...
grabs popcorn
by shawnyou on 2/1/20, 9:02 AM
Good tool