from Hacker News

A regular expression to check for prime numbers

by Moyamo on 2/12/15, 4:29 PM with 67 comments

  • by btilly on 2/12/15, 10:18 PM

    This regular expression is due to Abigail of Perl fame. Who is still very active, see http://search.cpan.org/~abigail/ for more.

    However while cute, it is very slow. It tries every possible factorization as a pattern match. When it succeeds, on a string of length n that means that n times it tries to match a string of length n against a specific pattern. This is O(n^2). Try it on primes like 35509, 195341, 526049 and 1030793 and you can observe the slowdown. (Some regular expression engines may employ tricks that speed this up over the naive analysis, but none will be super speedy about it.)

    Usual arithmetic is much more efficient. :-)

  • by anyfoo on 2/12/15, 10:19 PM

    For some this will be obvious, but for others it may still be interesting:

    If this was a "true" regular expression, in the sense that it describes a regular language (http://en.wikipedia.org/wiki/Regular_language), then checking whether the number is prime or not, by checking if the expression matches, would run in linear time. This would be worth a few nobel prizes at least.

    Unfortunately, it is an "extended" regular expression, i.e. not regular at all in the strictest sense, and (as the article says) requires backtracking.

  • by xyzzyz on 2/13/15, 10:34 AM

    It has been posted before [2], so I'll repost my comment from back then [3]:

    It's cool, but it's regexp, not regular expression -- regular expressions cannot test for primeness. The proof is a simple application of pumping lemma[1]: we want to show that language { a^p: p prime } is not regular. Suppose it is. By pumping lemma we have numbers N, n >= 1, such that for every k >= 0, we have a^(N+nk) in our language. But that means that for every k, N+nk is prime, and it fails for k = N, since N+nN = N(1+n) is not a prime, since both N and 1+n are greater than 1.

    [1] - http://en.wikipedia.org/wiki/Pumping_lemma_for_regular_langu... [2] - https://news.ycombinator.com/item?id=3391547 [3] - https://news.ycombinator.com/item?id=3391572

  • by elzr on 2/12/15, 11:08 PM

    This is indeed beautiful. But the author limits itself to a low-level unpacking of the regex and running a few tests cases, all the while remaining mystified as to why this works.

    A high-level description helps makes obvious what's happening:

      To check the primality of a number N with string matching,     
        repeat a string (say, "1") N times
        and declare N prime:
          UNLESS the repetitions match the string 0 or 1 times
          OR the repetitions can be EVENLY matched into groups of MORE than one string.
  • by andrewflnr on 2/12/15, 10:05 PM

    Uses back-references, so it doesn't describe a regular language in the strict sense. Still, it's pretty cool to use them to trick a string-processing engine to do integer factoring.
  • by tom-lord on 2/13/15, 10:30 AM

    I'm currently working on a ruby gem to generate examples of given regular expression. For example:

        /this|is|awesome/.examples # => ["this", "is", "awesome"]
    
    So, I tried it out on this "prime number validator" regex:

        /1?|(11+?)\1+/.examples
          #=> ["", "1", "1111", "111111", "11111111"]
        /1?|(11+?)\1+/.examples(max_repeater_variance: 50, max_group_results: 100).uniq.sort.map(&:length).first(10)
          #=> [0, 1, 4, 6, 8, 9, 10, 12, 14, 15]
    
        require 'prime'
        Array(1..100) - /1?|(11+?)\1+/.examples(max_repeater_variance: 50, max_group_results: 3000).uniq.sort.map(&:length) == Prime.first(25)
          #=> true
    
    Horribly inefficient, but pretty cool! Here's my gem if case anyone's interested: https://github.com/tom-lord/regexp-examples
  • by charlesacknin on 2/13/15, 2:01 AM

    Nice (abuse of regexp) !

    AFAICT the way the string is going to be parsed is equivalent to this:

      def isprime(n):
        if n == 0 or n == 1:
          return False
        i = 2
        while i <= n:
          if (n - i) % i == 0:
            return False
          i += 1
        return True
    
    A parser implementation may optimize and do "while (n - i) >= 1" instead, which cuts in half the number of iterations.
  • by fxn on 2/13/15, 7:17 AM

    I explored this technique applied to a few other problems just for fun, integer square root, coprimes, continued fractions, modulo, etc.

    https://github.com/fxn/math-with-regexps/blob/master/one-lin...

  • by nsajko on 2/12/15, 10:50 PM

    It uses a string of '1' as a unary numeral representation of a number and then checks it for divisibility with 2,3,4... using brute force.
  • by logicallee on 2/12/15, 10:36 PM

    less sexy when you write

      /^(11+?)\1+$/ is a prime test on whether a unary string  > 2 is composite
    
    then people have a real chance of figuring it out. (go ahead and try, if you haven't read the article.)

    As it stands that 1 x shift is super confusing. who would have thought 1 would become a literal! we wouldn't even write it that way in english.

  • by bantic on 2/16/15, 11:31 PM

    The thing that made this click for me was realizing the "1" in the regex was not special. The regex is just looping through integers (starting with 2) and checking if the target number is divisible by any of them. It can use any digit if the target string uses the same digit. Ruby example using 5 instead of 1:

    def is_prime(n); str = "5"*n; str !~ /^5?$|^(55+?)\1+$/; end

    The regex essentially says, for integer n, "does 2 ones fit evenly into n ones", then "does 3 ones fit into n ones", etc., which is the same as "does 2 fit into n", "does 3 fit into n", etc.

  • by avinash on 2/13/15, 7:04 AM

    Thanks for sharing. This is a post I wrote way back in 2007 and, even though, as many have pointed out, the algorithm is very inefficient, it is still a good illustration of how powerful regular expressions are.

    When I am explaining the basics of regular expressions to people, I always refer them to this post to make them realize that regular expressions are not only glorified search terms. In general, they are amazed :-)

  • by ekimekim on 2/13/15, 12:18 PM

    An aspect of the syntax is confusing me here. I would have assumed that the expression 'x+?' would be parsed as '((x)+)?', ie. 'x*'. However, it seems the parser special-cases this as (x)(+?) instead, and interprets that as a non-greedy operator. Isn't this horribly inconsistent?
  • by _lce0 on 2/12/15, 10:00 PM

    nice trick!

    here a JS implementation

        function isPrime(n) {
          var re = /^1?$|^(11+?)\1+$/;
          var s = [];
          while (n -- > 0)
            s.push(1);
          return !re.test(s.join(''));
        }
  • by Amorymeltzer on 2/12/15, 10:23 PM

    We all learned this way back with the Regex Game: http://regex.alf.nu/6
  • by TheLoneWolfling on 2/13/15, 5:10 AM

    Is there a way to write a (non-regular, of course) regular expression that checks for prime numbers in a non-unary base?
  • by geertj on 2/12/15, 11:02 PM

    This one has been going around for a while. I'd be curious to see a base-2 or base-10 variant of it.
  • by okey on 2/12/15, 11:38 PM

    Evidently beauty is in the eye of the beholder.