Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.help > #1962 > unrolled thread

Regexp help

Started bymike <mikaelpetterson@hotmail.com>
First post2012-08-07 07:03 -0700
Last post2012-08-08 18:43 -0700
Articles 8 — 4 participants

Back to article view | Back to comp.lang.java.help


Contents

  Regexp help mike <mikaelpetterson@hotmail.com> - 2012-08-07 07:03 -0700
    Re: Regexp help Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-08-07 14:38 -0700
      Re: Regexp help mike <mikaelpetterson@hotmail.com> - 2012-08-07 22:14 -0700
        Re: Regexp help Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-08-08 09:11 -0700
          Re: Regexp help markspace <-@.> - 2012-08-08 10:16 -0700
            Re: Regexp help Gene Wirchenko <genew@ocis.net> - 2012-08-08 12:13 -0700
              Re: Regexp help markspace <-@.> - 2012-08-08 14:15 -0700
                Re: Regexp help Gene Wirchenko <genew@ocis.net> - 2012-08-08 18:43 -0700

#1962 — Regexp help

Frommike <mikaelpetterson@hotmail.com>
Date2012-08-07 07:03 -0700
SubjectRegexp help
Message-ID<3e0aa0ea-bdb5-4cd5-bdae-4fcd31c70a7f@googlegroups.com>
Hi,

I have the following:

Input 1: Needs Merge "." [(automatic) to \main\2 from \main\rel1\2 (base also \main\2)]

Input 2: Needs Merge "./update" [to /main/4 from /main/bugfix/1 base /main/2]

I have the following regexp:

Needs\s+Merge\s+\"(.*)\"\s+\[(\(automatic\)\s)?to\s+(.*)\s+from\s+(.*)\s+\(?base\s{1}(also\s{1})?(.*)\]

The last grouping (6) is:

Input 1: \main\2)

Input 2: /main/2

How can I avoid get the parenthesis to be included in group(6) for input 1?

br,

//mike

[toc] | [next] | [standalone]


#1963

FromDaniel Pitts <newsgroup.nospam@virtualinfinity.net>
Date2012-08-07 14:38 -0700
Message-ID<jVfUr.51896$ls7.32117@newsfe11.iad>
In reply to#1962
On 8/7/12 7:03 AM, mike wrote:
> Hi,
>
> I have the following:
>
> Input 1: Needs Merge "." [(automatic) to \main\2 from \main\rel1\2 (base also \main\2)]
>
> Input 2: Needs Merge "./update" [to /main/4 from /main/bugfix/1 base /main/2]
>
> I have the following regexp:
>
> Needs\s+Merge\s+\"(.*)\"\s+\[(\(automatic\)\s)?to\s+(.*)\s+from\s+(.*)\s+\(?base\s{1}(also\s{1})?(.*)\]
>
> The last grouping (6) is:
>
> Input 1: \main\2)
>
> Input 2: /main/2
>
> How can I avoid get the parenthesis to be included in group(6) for input 1?

How about adding a \) after your group 6?

or replacing your .* with [^)]*

Either of those should solve the problem. Each has different trade-offs 
depending on your typical input.

[toc] | [prev] | [next] | [standalone]


#1964

Frommike <mikaelpetterson@hotmail.com>
Date2012-08-07 22:14 -0700
Message-ID<e8fd1c92-6f54-48ee-8cc9-ec5a10d03ce5@googlegroups.com>
In reply to#1963
On Tuesday, August 7, 2012 11:38:21 PM UTC+2, Daniel Pitts wrote:
> On 8/7/12 7:03 AM, mike wrote:
> 
> > Hi,
> 
> >
> 
> > I have the following:
> 
> >
> 
> > Input 1: Needs Merge "." [(automatic) to \main\2 from \main\rel1\2 (base also \main\2)]
> 
> >
> 
> > Input 2: Needs Merge "./update" [to /main/4 from /main/bugfix/1 base /main/2]
> 
> >
> 
> > I have the following regexp:
> 
> >
> 
> > Needs\s+Merge\s+\"(.*)\"\s+\[(\(automatic\)\s)?to\s+(.*)\s+from\s+(.*)\s+\(?base\s{1}(also\s{1})?(.*)\]
> 
> >
> 
> > The last grouping (6) is:
> 
> >
> 
> > Input 1: \main\2)
> 
> >
> 
> > Input 2: /main/2
> 
> >
> 
> > How can I avoid get the parenthesis to be included in group(6) for input 1?
> 
> 
> 
> How about adding a \) after your group 6?

I tried it and it works for input #1 but not for #2.
> 
> 
> 
> or replacing your .* with [^)]*

I tried it and it works for input #2 but not for input #1.


What I need is a regexp that works for both input.
> 
> 
> 
> Either of those should solve the problem. Each has different trade-offs 
> 
> depending on your typical input.

[toc] | [prev] | [next] | [standalone]


#1966

FromDaniel Pitts <newsgroup.nospam@virtualinfinity.net>
Date2012-08-08 09:11 -0700
Message-ID<HcwUr.685$CE7.526@newsfe12.iad>
In reply to#1964
On 8/7/12 10:14 PM, mike wrote:
> On Tuesday, August 7, 2012 11:38:21 PM UTC+2, Daniel Pitts wrote:
>> On 8/7/12 7:03 AM, mike wrote:
>>
>>> Hi,
>>
>>>
>>
>>> I have the following:
>>
>>>
>>
>>> Input 1: Needs Merge "." [(automatic) to \main\2 from \main\rel1\2 (base also \main\2)]
>>
>>>
>>
>>> Input 2: Needs Merge "./update" [to /main/4 from /main/bugfix/1 base /main/2]
>>
>>>
>>
>>> I have the following regexp:
>>
>>>
>>
>>> Needs\s+Merge\s+\"(.*)\"\s+\[(\(automatic\)\s)?to\s+(.*)\s+from\s+(.*)\s+\(?base\s{1}(also\s{1})?(.*)\]
>>
>>>
>>
>>> The last grouping (6) is:
>>
>>>
>>
>>> Input 1: \main\2)
>>
>>>
>>
>>> Input 2: /main/2
>>
>>>
>>
>>> How can I avoid get the parenthesis to be included in group(6) for input 1?
>>
>>
>>
>> How about adding a \) after your group 6?
>
> I tried it and it works for input #1 but not for #2.
>>
>>
>>
>> or replacing your .* with [^)]*
>
> I tried it and it works for input #2 but not for input #1.
>
>
> What I need is a regexp that works for both input.
>>
>>
>>
>> Either of those should solve the problem. Each has different trade-offs
>>
>> depending on your typical input.
>

Well, you could also add a \)?\].  Or, you could have two regexs.  Or 
you could combine the regexs.

[toc] | [prev] | [next] | [standalone]


#1967

Frommarkspace <-@.>
Date2012-08-08 10:16 -0700
Message-ID<jvu6tp$gec$1@dont-email.me>
In reply to#1966
On 8/8/2012 9:11 AM, Daniel Pitts wrote:
> On 8/7/12 10:14 PM, mike wrote:
>> On Tuesday, August 7, 2012 11:38:21 PM UTC+2, Daniel Pitts wrote:
>>> On 8/7/12 7:03 AM, mike wrote:

>>>> How can I avoid get the parenthesis to be included in group(6) for
>>>> input 1?

>>> How about adding a \) after your group 6?

>>> or replacing your .* with [^)]*
>>
>> I tried it and it works for input #2 but not for input #1.

>> What I need is a regexp that works for both input.

> Well, you could also add a \)?\].  Or, you could have two regexs.  Or
> you could combine the regexs.


There's a quote that sometimes goes around the internet, something like 
this: "Sometimes a programmer who is confronted with a problem solves it 
with regex.  Now the programmer has two problems."

Regex is not the sharpest tool in the programmers toolbox.  It was 
intended to be used once, on a command line like with the grep tool, and 
then tossed away.

Take the time to write a parser to do what you want.  It'll probably be 
faster both ways:  you could write a parser faster than trying to figure 
out what regex you need to use, and regex is interpreted, whereas 
parsers can be optimized directly to machine code.  Win-win, and that's 
before taking into account the trouble maintenance programmers will have 
deciphering this regex.


[toc] | [prev] | [next] | [standalone]


#1968

FromGene Wirchenko <genew@ocis.net>
Date2012-08-08 12:13 -0700
Message-ID<99e528d5rjt3a0ahl6bpgih7ih1gv68k26@4ax.com>
In reply to#1967
On Wed, 08 Aug 2012 10:16:37 -0700, markspace <-@.> wrote:

[snip]

>There's a quote that sometimes goes around the internet, something like 
>this: "Sometimes a programmer who is confronted with a problem solves it 
>with regex.  Now the programmer has two problems."

Some people, when confronted with a problem, think "I know, I'll use
regular expressions."  Now they have two problems.

>Regex is not the sharpest tool in the programmers toolbox.  It was 
>intended to be used once, on a command line like with the grep tool, and 
>then tossed away.

     Yes.

     No.

>Take the time to write a parser to do what you want.  It'll probably be 
>faster both ways:  you could write a parser faster than trying to figure 
>out what regex you need to use, and regex is interpreted, whereas 
>parsers can be optimized directly to machine code.  Win-win, and that's 
>before taking into account the trouble maintenance programmers will have 
>deciphering this regex.

     A bit of devil's advocate: I like regexes, but simple regexes.  I
have some JavaScript code where I could have used one giant regex to
process, but I instead chose to use simple regexes and a bit of
processing.  It is a lot more readable than a custom parser.  The
giant regex would have been a ball of mud though.

     Regexes do not do so well on error handling.  Either the input
matches the regex or not, but you might want to know why it does not
match.  For those cases, I write a finite state machine.

Sincerely,

Gene Wirchenko

[toc] | [prev] | [next] | [standalone]


#1969

Frommarkspace <-@.>
Date2012-08-08 14:15 -0700
Message-ID<jvukt8$btq$1@dont-email.me>
In reply to#1968
On 8/8/2012 12:13 PM, Gene Wirchenko wrote:
>
>       A bit of devil's advocate: I like regexes, but simple regexes.  I
> have some JavaScript code where I could have used one giant regex to
> process, but I instead chose to use simple regexes and a bit of
> processing.  It is a lot more readable than a custom parser.  The
> giant regex would have been a ball of mud though.


This is fair.  (If I had a parser generator handy, I'd use that.)  I 
agree simple regex aren't terrible.  But there's a tendency to make the 
simple regex "just a little more complicated to get over this one 
problem."  It's easy to snowball.


A simple test: two hours to hand code a parser, sans parser generator. 
Vs. 24 hours or more for the OP to figure out his regex.

I wasn't actually sure of the OP's requirements, so I just captured the 
three branch labels into three separate strings.  Do with them as you will.

run:
To: \main\2, From: \main\rel1\2, Base: \main\2)]
To: /main/4, From: /main/bugfix/1, Base: /main/2]
To: , From: , Base:
BUILD SUCCESSFUL (total time: 0 seconds)


<code>

package quicktest;

import java.io.IOException;
import java.io.Reader;
import java.io.StringReader;

/**
  * A parser for SCC (source code control) output.
  *
  * @author Brenden
  */
public class SccParser {

    static String[] testVectors = {
    " Needs Merge \".\" [(automatic) to \\main\\2 from \\main\\rel1\\2 
(base also \\main\\2)]",
    " Needs Merge \"./update\" [to /main/4 from /main/bugfix/1 base 
/main/2]",
    " Every programmer Needs to believe in something; I believe I Needs 
another drink.",
};

    /**
     *
     * @param args
     */
    public static void main(String[] args) throws Exception {
       parse(new StringReader(testVectors[0]));
       parse(new StringReader(testVectors[1]));
       parse(new StringReader(testVectors[2]));
    }

    public static void parse(Reader reader) throws IOException {
       findMerge(reader);
    }

    // package-private
    static void findMerge(Reader reader) throws IOException {
       findString(reader, "Needs Merge");
       skipWhiteSpace(reader);
       findString(reader, "\""); // skip quoted string
       findString(reader, "\"");

       findEitherOrChar(reader, '/', '\\');
       reader.reset();                        // back one character
       String to = whiteSpaceToken( reader );
       findEitherOrChar(reader, '/', '\\');
       reader.reset();
       String from = whiteSpaceToken( reader );
       findEitherOrChar(reader, '/', '\\');
       reader.reset();
       String base = whiteSpaceToken( reader );
       System.out.println("To: "+to+", From: "+from+", Base: "+base );
    }

    // package-private
    static void findString(Reader reader, String str) throws IOException {
       if( str.length() == 0 ) return;
       outerLoop:
       for (int c; (c = reader.read()) != -1;) {
          if (c != str.charAt(0))
             continue;
          reader.mark(str.length());
          for (int i = 1, len = str.length(); i < len; i++) {
             if ((c = reader.read()) != str.charAt(i)) {
                reader.reset();
                continue outerLoop;
             }
          }
          return;
       }
    }

// etc. Remainder left as a exercise for readers trying to get us to
// do their homework/internship project.

}

[toc] | [prev] | [next] | [standalone]


#1970

FromGene Wirchenko <genew@ocis.net>
Date2012-08-08 18:43 -0700
Message-ID<b856281lbf51gqfq3eh1sdkqbmlq13tmhu@4ax.com>
In reply to#1969
On Wed, 08 Aug 2012 14:15:16 -0700, markspace <-@.> wrote:

>On 8/8/2012 12:13 PM, Gene Wirchenko wrote:
>>
>>       A bit of devil's advocate: I like regexes, but simple regexes.  I
>> have some JavaScript code where I could have used one giant regex to
>> process, but I instead chose to use simple regexes and a bit of
>> processing.  It is a lot more readable than a custom parser.  The
>> giant regex would have been a ball of mud though.

>This is fair.  (If I had a parser generator handy, I'd use that.)  I 
>agree simple regex aren't terrible.  But there's a tendency to make the 
>simple regex "just a little more complicated to get over this one 
>problem."  It's easy to snowball.

      Well, I do not have a problem with telling someone that his code
is getting overly complex.  This is so even when that someone is me. I
think of the maintenance programmer.  That is usually me for my code.

>A simple test: two hours to hand code a parser, sans parser generator. 
>Vs. 24 hours or more for the OP to figure out his regex.

     Maybe.  I do not use regexes enough that I have the bits all
memorised so I have to look up things.  It does not take that long
once one understands the concept though.

[snip]

Sincerely,

Gene Wirchenko

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.java.help


csiph-web