Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.help > #2212

Re: regex puzzle

From "Peter J. Holzer" <hjp-usenet2@hjp.at>
Newsgroups comp.lang.java.help
Subject Re: regex puzzle
Date 2012-11-01 13:56 +0100
Organization LUGA
Message-ID <slrnk94sc9.5vl.hjp-usenet2@hrunkner.hjp.at> (permalink)
References (2 earlier) <pgiv881g37e73fek318423bvrmtncgto4e@4ax.com> <slrnk8vmks.svs.hjp-usenet2@hrunkner.hjp.at> <k6pg3f$bgs$1@dont-email.me> <slrnk91pqj.v21.hjp-usenet2@hrunkner.hjp.at> <k6rqea$ps4$1@dont-email.me>

Show all headers | View raw


On 2012-10-31 18:25, markspace <-@> wrote:
> On 10/31/2012 1:54 AM, Peter J. Holzer wrote:
>> On 2012-10-30 21:16, markspace <-@> wrote:
>>> On 10/30/2012 6:48 AM, Peter J. Holzer wrote:
>>>> Java Regexps seem to be Perl-compatible, so
>>>>
>>>> s.replaceAll("&quot;(\S*?)&quot;", "<span class=\"quoted\">$1</span>");
>>>>
>>>
>>> I don't think this will work, in the general case.  What about input like:
>>>
>>> Hi&quot;I'm-a&quot;-dash-seperated-&quot;string.&quot;
>>>
>>> You'll end up with one replacement,
>>
>> No, that should be two replacements: The *? operator is non-greedy, so
>> \\S*? matches the shortest possible sequence of non-space characters.
>
>
> OK, what about the obverse?  What if the quoted string contains 
> whitespace?

Then it won't match. /\S/ matches any non-space character (it's the
opposite of /\s/, which matches any space character). If you want to
match any character, use /./. 


> Not trying to bug you, but regex is tricky, and I don't often see it as 
> an ideal, solution, so I'm trying to learn its corner cases.

As Jamie Zawinski once quipped:

    Some people, when confronted with a problem, think "I know, I'll use
    regular expressions." Now they have two problems. 

Regular expressions are a tool. There are situations where they are the
right tool and situations where they aren't. When processing HTML, they
are usually the wrong tool. I already gave one example where a simple
regexp like this fails. The problem is that unless the HTML is tightly
controlled (in this case: No &quot;...&quot; sequences in parameters)
you have to build a complete HTML lexer into the regexp. This is
possible, but cumbersome[1]. But for a one-time job that may not be a
problem: If you have to convert 100 files, a regexp which converts 95 of
them correctly and mangles 5 of them may be a better solution than a
program which handles them all correctly, but takes longer to write than
correcting the 5 mangled files manually.

	hp

[1] I demonstrated this some time ago in the German Perl newsgroup. 
    It wasn't even as bad as I expected, but then Perl makes it easy to
    write readable regexps.


-- 
   _  | Peter J. Holzer    | Fluch der elektronischen Textverarbeitung:
|_|_) | Sysadmin WSR       | Man feilt solange an seinen Text um, bis
| |   | hjp@hjp.at         | die Satzbestandteile des Satzes nicht mehr
__/   | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel

Back to comp.lang.java.help | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

regex puzzle Roedy Green <see_website@mindprod.com.invalid> - 2012-10-29 13:46 -0700
  Re: regex puzzle Lew <lewbloch@gmail.com> - 2012-10-29 14:48 -0700
    Re: regex puzzle Roedy Green <see_website@mindprod.com.invalid> - 2012-10-30 05:59 -0700
      Re: regex puzzle "Peter J. Holzer" <hjp-usenet2@hjp.at> - 2012-10-30 14:48 +0100
        Re: regex puzzle markspace <-@.> - 2012-10-30 14:16 -0700
          Re: regex puzzle "Peter J. Holzer" <hjp-usenet2@hjp.at> - 2012-10-31 09:54 +0100
            Re: regex puzzle markspace <-@.> - 2012-10-31 11:25 -0700
              Re: regex puzzle "Peter J. Holzer" <hjp-usenet2@hjp.at> - 2012-11-01 13:56 +0100
                Re: regex puzzle Roedy Green <see_website@mindprod.com.invalid> - 2012-11-01 18:46 -0700
          Re: regex puzzle Roedy Green <see_website@mindprod.com.invalid> - 2012-10-31 07:09 -0700
        Re: regex puzzle Roedy Green <see_website@mindprod.com.invalid> - 2012-10-31 07:11 -0700
          Re: regex puzzle Roedy Green <see_website@mindprod.com.invalid> - 2012-10-31 16:22 -0700
            Re: regex puzzle markspace <-@.> - 2012-10-31 17:29 -0700
              Re: regex puzzle Roedy Green <see_website@mindprod.com.invalid> - 2012-11-01 18:43 -0700
  Re: regex puzzle Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-10-30 16:39 -0700
    Re: regex puzzle Roedy Green <see_website@mindprod.com.invalid> - 2012-10-31 07:33 -0700

csiph-web