by geongeorgek on 1/31/20, 10:32 AM with 129 comments
by robert_tweed on 1/31/20, 4:05 PM
But as someone who actually knows [some flavours of] regex fairly well, what I would really like, is a reference that covers all the subtle differences between the various regex engines, along with community-managed documentation (perhaps wiki pages) of which applications & API versions use which flavour of regex.
For example, the other day I wanted to run a find on my NAS. I needed to use a regex, but the Busybox version of find doesn't support the iregex option, so all expressions are case-sensitive. With some googling, I was able to find out that the default regex type is Emacs, but I wasn't able to find either a good reference for exactly what Emacs regex does and doesn't support, nor any information about how to set the "i" flag. In the end I had to manually convert every character into a class (like [aA] for "a") which was tedious, but quicker than trying to find a better solution or resorting to grep.
A related, annoyingly common pattern is that the documentation for `find` states that `--regex` specifies a regex, but it does not state which flavour of regex. The documentation for certain versions of `find`, which support alternative engines, note that the default is Emacs. From this I was able to infer (perhaps wrongly) that the Busybox `find` uses Emacs-flavoured regex, but ultimate I still had to resort to some trial-and-error. This problem is all too common in API documentation.
by crispyambulance on 1/31/20, 1:57 PM
One thing that confounded me often was positive and negative look-arounds. I always got the expressions mixed up, until I just put the expressions into a table like this...
look-behind | look-ahead
------------------------------------
positive (?<=a)b | a(?=b)
------------------------------------
negative (?<!a)b | a(?!b)
It's not hard, but for whatever reason my brain had trouble remembering the usage because every time I looked it up, each of those expressions was nested in a paragraph of explanation, and I could not see the simple intuitive pattern.Putting it into a simple visualization helps a lot.
Now, if I can find a similar mnemonic for backreferences !?
by darau1 on 1/31/20, 2:27 PM
It's how I learned regex years ago, and I still use it today to test/build more complex patterns.
by __tk__ on 1/31/20, 1:26 PM
by geongeorgek on 1/31/20, 10:34 AM
This tool is a cheat sheet that also explains the commonly used expressions so that you understand it.
- There is a visual representation of the regular expression (thanks to regexpr)
- The application shows matching strings which you can play around
- Expressions can be edited and these are instantly validated
by StavrosK on 1/31/20, 1:31 PM
by lfglopes on 1/31/20, 2:54 PM
Unlike most regex helpers, in this one you would start with the text you want to filter/parse and then it would suggest you possible extractions.
Do you know any alternatives?
by rubyn00bie on 1/31/20, 9:17 PM
Something subtle, but I quite loved the email regex is, IMHO, close to perfect: \S+@\S+\.\S+
Because the "perfect" one is just absurd, and no one realizes it's going to be so fucking absurd until they start getting support cases and then go read something like this: https://stackoverflow.com/a/201378/931209
> If you want to get fancy and pedantic, implement a complete state engine. A regular expression can only act as a rudimentary filter. The problem with regular expressions is that telling someone that their perfectly valid e-mail address is invalid (a false positive) because your regular expression can't handle it is just rude and impolite from the user's perspective.
by philshem on 1/31/20, 2:14 PM
by vzidex on 1/31/20, 3:16 PM
by adambowles on 1/31/20, 3:47 PM
in the cheatsheet is false. (https://regexr.com/4tc48)
`.` can match any character except linebreaks (including whitespace)
by dana321 on 1/31/20, 4:22 PM
You could do:
my $var='foo foo bar and more bar foo!!!';
if($var=~/(foo|bar)/g){ # does the variable contain foo or bar?
print "foo! $1 removing foo..\n";
# remove our value..
$var=~s/$1//g;
}
by asicsp on 1/31/20, 3:03 PM
one suggestion would be to mention clearly which tool/language is being used, regex has no unified standard.. based on "Cheatsheet adapted" message at the bottom, I think it is for JavaScript. I wrote a book on js regexp last year, and I have post for cheatsheet too [3]
[3] https://learnbyexample.github.io/cheatsheet/javascript/javas...
by Glench on 1/31/20, 1:39 PM
by mimixco on 1/31/20, 1:44 PM
by superasn on 1/31/20, 5:31 PM
"(this is inside a bracket (and this is nested or (double nested)))
P.S. I know token parsing is better for these things but still I just want to learn the other thing too.by xxsaculxx on 1/31/20, 3:16 PM
by sylvanaar on 1/31/20, 10:02 PM
by kitd on 1/31/20, 1:50 PM
2 points:
1. it fiddled with my back button which is a bit annoying
2. a better email sample is
^[^@]+@[^@]+\.[^@]+$
which removes the 2 ampersands problem.by dan_hawkins on 1/31/20, 2:40 PM
by KenanSulayman on 1/31/20, 3:09 PM
If the only thing that is embedded in that frame was taken entirely from a different project, that project should at least be mentioned in the frame.
by hyperpape on 1/31/20, 4:47 PM
I found that you can see your own regex with railroad diagram by going to one of the prepopulated examples and editing it. However, it wasn't clear to me that's the intended use of the tool. It's either a little side-effect, or not super-discoverable.
by mNovak on 1/31/20, 7:39 PM
by kazinator on 1/31/20, 9:23 PM
by Diti on 2/1/20, 11:58 AM
by axegon on 1/31/20, 2:52 PM
by Amarok on 1/31/20, 2:24 PM
The username reference doesn't match 16 characters as claimed
by chenster on 1/31/20, 11:09 PM
by binarysneaker on 1/31/20, 4:41 PM
by olalonde on 1/31/20, 4:23 PM
by esaym on 1/31/20, 6:16 PM
by hamid_ra on 2/1/20, 9:47 PM
by ape4 on 1/31/20, 2:36 PM
by samat on 1/31/20, 10:09 PM
by blauditore on 1/31/20, 2:41 PM
grabs popcorn
by shawnyou on 2/1/20, 9:02 AM