Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.java.programmer > #14816
| From | Robert Klemme <shortcutter@googlemail.com> |
|---|---|
| Newsgroups | comp.lang.java.programmer |
| Subject | Re: simple regex pattern sought |
| Date | 2012-05-26 17:13 +0200 |
| Message-ID | <a2ca88Ft90U1@mid.individual.net> (permalink) |
| References | <e9vvr7p7l8l5kem31v5a37apdlubrqjq5e@4ax.com> <dc4ca9b0-9aa9-4fe1-bbc9-2d3a28250a9d@googlegroups.com> <a2aeesF2s0U1@mid.individual.net> <6sl1s7dpqhg4l0gfa5duva3j8m9rf9opr5@4ax.com> <jpqr04$94d$1@dont-email.me> |
On 26.05.2012 16:57, markspace wrote:
> On 5/26/2012 6:19 AM, Roedy Green wrote:
>
>> exercisePattern( Pattern.compile(
>> "\"(?:\\\\.|[^\\\"])*\"|'(?:\\\\.|[^\\'])*'" ) ); // works, accepts
>> empty strings
>> // (?: ) is a non-capturing group. This is Robert Klemme's
>> contribution. I don't understand how it works.
>
>
> Ah, OK, so here's my contribution to your excellent SSCCE. First this
> pattern is basically the same as mine. It uses alternation (the vertical
> bar |) to pick a string delimited by either ' or "
>
> Here's his regex string without the extra escapes for Java:
>
> "(?:\\.|[^\"])*"|'(?:\\.|[^\'])*'
> ^^^^^^^^^^^^^^^^
>
> Let's look at just the first half for a moment, without the (?:\\. part.
>
> "[^\"]*"
> ^^^^^^^^
> 12 3
> Example for the first part:
> 1. " string starts with double quote
> 2. [^\"]* doesn't contain a "
> 3. " ends with double quote
>
> Same for the second half of the string.
>
> Notice he's using * instead of +'s, which is why his matches 0 width
> strings.
>
> The other part didn't appear in your problem statement, but in HTML/XML
> it's allowed to escape characters. E.g., 'Bob\'s your uncle.' So his
> inclusion is very reasonable.
>
> So he Robert adds (\\.|[^\"])* to the first part, which is
> 12 345 6
>
> 1. Start a group
> 2. A slash. It needs to be escaped for regex, hence \\.
> 3. . is regex "any character". 2 and 3 together mean "match \ followed
> by any character"
> 4. OR (alternation again)
> 5. character class, negated (the ^), matches anything except \ or ". I
> think this is a mistake: the \ needs to be quoted.
Oh, right, thanks for finding that!
> 6. zero or more.
>
> Then after that mess, he does the obvious thing and adds non-capturing
> group, to make the regex do a little less work.
>
> "(?:\\.|[^\"])*"
>
> Phew! Next, he adds one alternation and does the same for a ' delimited
> string.
>
> |'(?:\\.|[^\'])*'
>
> Same thing, just ' instead of ".
>
> Finally I think this could be simplified slightly with Lew's
> back-reference idea.
>
> (['"])(?:\\.|[^\1\\])*
>
> (Untested.) This allows empty strings between delimiters; instead of a *
> use + for only non-empty strings between the quotes.
Interesting approach - but it doesn't work. Simple test with
Pattern.compile("(.)[a\\1]"):
Exception in thread "main" java.util.regex.PatternSyntaxException:
Illegal/unsupported escape sequence near index 6
(.)[a\1]
^
> My executive summary:
>
> Regex is a great rapid development tool, except when it isn't. You
> realize your problem is simple, and you could have hand-coded a parser
> to do this much quicker than all these news post exchanges?
Maybe, maybe not.
Kind regards
robert
--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
Back to comp.lang.java.programmer | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
simple regex pattern sought Roedy Green <see_website@mindprod.com.invalid> - 2012-05-25 14:45 -0700
Re: simple regex pattern sought markspace <-@.> - 2012-05-25 14:55 -0700
Re: simple regex pattern sought Lew <lewbloch@gmail.com> - 2012-05-25 14:55 -0700
Re: simple regex pattern sought markspace <-@.> - 2012-05-25 15:04 -0700
Re: simple regex pattern sought Lew <noone@lewscanon.com> - 2012-05-26 14:07 -0700
Re: simple regex pattern sought markspace <-@.> - 2012-05-26 18:34 -0700
Re: simple regex pattern sought Lew <noone@lewscanon.com> - 2012-05-27 11:39 -0700
Re: simple regex pattern sought Lew <lewbloch@gmail.com> - 2012-05-25 15:03 -0700
Re: simple regex pattern sought Robert Klemme <shortcutter@googlemail.com> - 2012-05-26 00:12 +0200
Re: simple regex pattern sought markspace <-@.> - 2012-05-25 18:43 -0700
Re: simple regex pattern sought Robert Klemme <shortcutter@googlemail.com> - 2012-05-26 16:37 +0200
Re: simple regex pattern sought markspace <-@.> - 2012-05-26 08:06 -0700
Re: simple regex pattern sought Robert Klemme <shortcutter@googlemail.com> - 2012-05-26 17:34 +0200
Re: simple regex pattern sought Peter Duniho <NpOeStPeAdM@NnOwSlPiAnMk.com> - 2012-05-26 10:07 -0700
Re: simple regex pattern sought Roedy Green <see_website@mindprod.com.invalid> - 2012-05-26 06:19 -0700
Re: simple regex pattern sought markspace <-@.> - 2012-05-26 07:19 -0700
Re: simple regex pattern sought markspace <-@.> - 2012-05-26 07:57 -0700
Re: simple regex pattern sought Robert Klemme <shortcutter@googlemail.com> - 2012-05-26 17:13 +0200
Re: simple regex pattern sought markspace <-@.> - 2012-05-26 10:08 -0700
Re: simple regex pattern sought Roedy Green <see_website@mindprod.com.invalid> - 2012-05-26 14:14 -0700
csiph-web