Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.java.help > #1962 > unrolled thread
| Started by | mike <mikaelpetterson@hotmail.com> |
|---|---|
| First post | 2012-08-07 07:03 -0700 |
| Last post | 2012-08-08 18:43 -0700 |
| Articles | 8 — 4 participants |
Back to article view | Back to comp.lang.java.help
Regexp help mike <mikaelpetterson@hotmail.com> - 2012-08-07 07:03 -0700
Re: Regexp help Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-08-07 14:38 -0700
Re: Regexp help mike <mikaelpetterson@hotmail.com> - 2012-08-07 22:14 -0700
Re: Regexp help Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-08-08 09:11 -0700
Re: Regexp help markspace <-@.> - 2012-08-08 10:16 -0700
Re: Regexp help Gene Wirchenko <genew@ocis.net> - 2012-08-08 12:13 -0700
Re: Regexp help markspace <-@.> - 2012-08-08 14:15 -0700
Re: Regexp help Gene Wirchenko <genew@ocis.net> - 2012-08-08 18:43 -0700
| From | mike <mikaelpetterson@hotmail.com> |
|---|---|
| Date | 2012-08-07 07:03 -0700 |
| Subject | Regexp help |
| Message-ID | <3e0aa0ea-bdb5-4cd5-bdae-4fcd31c70a7f@googlegroups.com> |
Hi,
I have the following:
Input 1: Needs Merge "." [(automatic) to \main\2 from \main\rel1\2 (base also \main\2)]
Input 2: Needs Merge "./update" [to /main/4 from /main/bugfix/1 base /main/2]
I have the following regexp:
Needs\s+Merge\s+\"(.*)\"\s+\[(\(automatic\)\s)?to\s+(.*)\s+from\s+(.*)\s+\(?base\s{1}(also\s{1})?(.*)\]
The last grouping (6) is:
Input 1: \main\2)
Input 2: /main/2
How can I avoid get the parenthesis to be included in group(6) for input 1?
br,
//mike
[toc] | [next] | [standalone]
| From | Daniel Pitts <newsgroup.nospam@virtualinfinity.net> |
|---|---|
| Date | 2012-08-07 14:38 -0700 |
| Message-ID | <jVfUr.51896$ls7.32117@newsfe11.iad> |
| In reply to | #1962 |
On 8/7/12 7:03 AM, mike wrote:
> Hi,
>
> I have the following:
>
> Input 1: Needs Merge "." [(automatic) to \main\2 from \main\rel1\2 (base also \main\2)]
>
> Input 2: Needs Merge "./update" [to /main/4 from /main/bugfix/1 base /main/2]
>
> I have the following regexp:
>
> Needs\s+Merge\s+\"(.*)\"\s+\[(\(automatic\)\s)?to\s+(.*)\s+from\s+(.*)\s+\(?base\s{1}(also\s{1})?(.*)\]
>
> The last grouping (6) is:
>
> Input 1: \main\2)
>
> Input 2: /main/2
>
> How can I avoid get the parenthesis to be included in group(6) for input 1?
How about adding a \) after your group 6?
or replacing your .* with [^)]*
Either of those should solve the problem. Each has different trade-offs
depending on your typical input.
[toc] | [prev] | [next] | [standalone]
| From | mike <mikaelpetterson@hotmail.com> |
|---|---|
| Date | 2012-08-07 22:14 -0700 |
| Message-ID | <e8fd1c92-6f54-48ee-8cc9-ec5a10d03ce5@googlegroups.com> |
| In reply to | #1963 |
On Tuesday, August 7, 2012 11:38:21 PM UTC+2, Daniel Pitts wrote:
> On 8/7/12 7:03 AM, mike wrote:
>
> > Hi,
>
> >
>
> > I have the following:
>
> >
>
> > Input 1: Needs Merge "." [(automatic) to \main\2 from \main\rel1\2 (base also \main\2)]
>
> >
>
> > Input 2: Needs Merge "./update" [to /main/4 from /main/bugfix/1 base /main/2]
>
> >
>
> > I have the following regexp:
>
> >
>
> > Needs\s+Merge\s+\"(.*)\"\s+\[(\(automatic\)\s)?to\s+(.*)\s+from\s+(.*)\s+\(?base\s{1}(also\s{1})?(.*)\]
>
> >
>
> > The last grouping (6) is:
>
> >
>
> > Input 1: \main\2)
>
> >
>
> > Input 2: /main/2
>
> >
>
> > How can I avoid get the parenthesis to be included in group(6) for input 1?
>
>
>
> How about adding a \) after your group 6?
I tried it and it works for input #1 but not for #2.
>
>
>
> or replacing your .* with [^)]*
I tried it and it works for input #2 but not for input #1.
What I need is a regexp that works for both input.
>
>
>
> Either of those should solve the problem. Each has different trade-offs
>
> depending on your typical input.
[toc] | [prev] | [next] | [standalone]
| From | Daniel Pitts <newsgroup.nospam@virtualinfinity.net> |
|---|---|
| Date | 2012-08-08 09:11 -0700 |
| Message-ID | <HcwUr.685$CE7.526@newsfe12.iad> |
| In reply to | #1964 |
On 8/7/12 10:14 PM, mike wrote:
> On Tuesday, August 7, 2012 11:38:21 PM UTC+2, Daniel Pitts wrote:
>> On 8/7/12 7:03 AM, mike wrote:
>>
>>> Hi,
>>
>>>
>>
>>> I have the following:
>>
>>>
>>
>>> Input 1: Needs Merge "." [(automatic) to \main\2 from \main\rel1\2 (base also \main\2)]
>>
>>>
>>
>>> Input 2: Needs Merge "./update" [to /main/4 from /main/bugfix/1 base /main/2]
>>
>>>
>>
>>> I have the following regexp:
>>
>>>
>>
>>> Needs\s+Merge\s+\"(.*)\"\s+\[(\(automatic\)\s)?to\s+(.*)\s+from\s+(.*)\s+\(?base\s{1}(also\s{1})?(.*)\]
>>
>>>
>>
>>> The last grouping (6) is:
>>
>>>
>>
>>> Input 1: \main\2)
>>
>>>
>>
>>> Input 2: /main/2
>>
>>>
>>
>>> How can I avoid get the parenthesis to be included in group(6) for input 1?
>>
>>
>>
>> How about adding a \) after your group 6?
>
> I tried it and it works for input #1 but not for #2.
>>
>>
>>
>> or replacing your .* with [^)]*
>
> I tried it and it works for input #2 but not for input #1.
>
>
> What I need is a regexp that works for both input.
>>
>>
>>
>> Either of those should solve the problem. Each has different trade-offs
>>
>> depending on your typical input.
>
Well, you could also add a \)?\]. Or, you could have two regexs. Or
you could combine the regexs.
[toc] | [prev] | [next] | [standalone]
| From | markspace <-@.> |
|---|---|
| Date | 2012-08-08 10:16 -0700 |
| Message-ID | <jvu6tp$gec$1@dont-email.me> |
| In reply to | #1966 |
On 8/8/2012 9:11 AM, Daniel Pitts wrote: > On 8/7/12 10:14 PM, mike wrote: >> On Tuesday, August 7, 2012 11:38:21 PM UTC+2, Daniel Pitts wrote: >>> On 8/7/12 7:03 AM, mike wrote: >>>> How can I avoid get the parenthesis to be included in group(6) for >>>> input 1? >>> How about adding a \) after your group 6? >>> or replacing your .* with [^)]* >> >> I tried it and it works for input #2 but not for input #1. >> What I need is a regexp that works for both input. > Well, you could also add a \)?\]. Or, you could have two regexs. Or > you could combine the regexs. There's a quote that sometimes goes around the internet, something like this: "Sometimes a programmer who is confronted with a problem solves it with regex. Now the programmer has two problems." Regex is not the sharpest tool in the programmers toolbox. It was intended to be used once, on a command line like with the grep tool, and then tossed away. Take the time to write a parser to do what you want. It'll probably be faster both ways: you could write a parser faster than trying to figure out what regex you need to use, and regex is interpreted, whereas parsers can be optimized directly to machine code. Win-win, and that's before taking into account the trouble maintenance programmers will have deciphering this regex.
[toc] | [prev] | [next] | [standalone]
| From | Gene Wirchenko <genew@ocis.net> |
|---|---|
| Date | 2012-08-08 12:13 -0700 |
| Message-ID | <99e528d5rjt3a0ahl6bpgih7ih1gv68k26@4ax.com> |
| In reply to | #1967 |
On Wed, 08 Aug 2012 10:16:37 -0700, markspace <-@.> wrote:
[snip]
>There's a quote that sometimes goes around the internet, something like
>this: "Sometimes a programmer who is confronted with a problem solves it
>with regex. Now the programmer has two problems."
Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems.
>Regex is not the sharpest tool in the programmers toolbox. It was
>intended to be used once, on a command line like with the grep tool, and
>then tossed away.
Yes.
No.
>Take the time to write a parser to do what you want. It'll probably be
>faster both ways: you could write a parser faster than trying to figure
>out what regex you need to use, and regex is interpreted, whereas
>parsers can be optimized directly to machine code. Win-win, and that's
>before taking into account the trouble maintenance programmers will have
>deciphering this regex.
A bit of devil's advocate: I like regexes, but simple regexes. I
have some JavaScript code where I could have used one giant regex to
process, but I instead chose to use simple regexes and a bit of
processing. It is a lot more readable than a custom parser. The
giant regex would have been a ball of mud though.
Regexes do not do so well on error handling. Either the input
matches the regex or not, but you might want to know why it does not
match. For those cases, I write a finite state machine.
Sincerely,
Gene Wirchenko
[toc] | [prev] | [next] | [standalone]
| From | markspace <-@.> |
|---|---|
| Date | 2012-08-08 14:15 -0700 |
| Message-ID | <jvukt8$btq$1@dont-email.me> |
| In reply to | #1968 |
On 8/8/2012 12:13 PM, Gene Wirchenko wrote:
>
> A bit of devil's advocate: I like regexes, but simple regexes. I
> have some JavaScript code where I could have used one giant regex to
> process, but I instead chose to use simple regexes and a bit of
> processing. It is a lot more readable than a custom parser. The
> giant regex would have been a ball of mud though.
This is fair. (If I had a parser generator handy, I'd use that.) I
agree simple regex aren't terrible. But there's a tendency to make the
simple regex "just a little more complicated to get over this one
problem." It's easy to snowball.
A simple test: two hours to hand code a parser, sans parser generator.
Vs. 24 hours or more for the OP to figure out his regex.
I wasn't actually sure of the OP's requirements, so I just captured the
three branch labels into three separate strings. Do with them as you will.
run:
To: \main\2, From: \main\rel1\2, Base: \main\2)]
To: /main/4, From: /main/bugfix/1, Base: /main/2]
To: , From: , Base:
BUILD SUCCESSFUL (total time: 0 seconds)
<code>
package quicktest;
import java.io.IOException;
import java.io.Reader;
import java.io.StringReader;
/**
* A parser for SCC (source code control) output.
*
* @author Brenden
*/
public class SccParser {
static String[] testVectors = {
" Needs Merge \".\" [(automatic) to \\main\\2 from \\main\\rel1\\2
(base also \\main\\2)]",
" Needs Merge \"./update\" [to /main/4 from /main/bugfix/1 base
/main/2]",
" Every programmer Needs to believe in something; I believe I Needs
another drink.",
};
/**
*
* @param args
*/
public static void main(String[] args) throws Exception {
parse(new StringReader(testVectors[0]));
parse(new StringReader(testVectors[1]));
parse(new StringReader(testVectors[2]));
}
public static void parse(Reader reader) throws IOException {
findMerge(reader);
}
// package-private
static void findMerge(Reader reader) throws IOException {
findString(reader, "Needs Merge");
skipWhiteSpace(reader);
findString(reader, "\""); // skip quoted string
findString(reader, "\"");
findEitherOrChar(reader, '/', '\\');
reader.reset(); // back one character
String to = whiteSpaceToken( reader );
findEitherOrChar(reader, '/', '\\');
reader.reset();
String from = whiteSpaceToken( reader );
findEitherOrChar(reader, '/', '\\');
reader.reset();
String base = whiteSpaceToken( reader );
System.out.println("To: "+to+", From: "+from+", Base: "+base );
}
// package-private
static void findString(Reader reader, String str) throws IOException {
if( str.length() == 0 ) return;
outerLoop:
for (int c; (c = reader.read()) != -1;) {
if (c != str.charAt(0))
continue;
reader.mark(str.length());
for (int i = 1, len = str.length(); i < len; i++) {
if ((c = reader.read()) != str.charAt(i)) {
reader.reset();
continue outerLoop;
}
}
return;
}
}
// etc. Remainder left as a exercise for readers trying to get us to
// do their homework/internship project.
}
[toc] | [prev] | [next] | [standalone]
| From | Gene Wirchenko <genew@ocis.net> |
|---|---|
| Date | 2012-08-08 18:43 -0700 |
| Message-ID | <b856281lbf51gqfq3eh1sdkqbmlq13tmhu@4ax.com> |
| In reply to | #1969 |
On Wed, 08 Aug 2012 14:15:16 -0700, markspace <-@.> wrote:
>On 8/8/2012 12:13 PM, Gene Wirchenko wrote:
>>
>> A bit of devil's advocate: I like regexes, but simple regexes. I
>> have some JavaScript code where I could have used one giant regex to
>> process, but I instead chose to use simple regexes and a bit of
>> processing. It is a lot more readable than a custom parser. The
>> giant regex would have been a ball of mud though.
>This is fair. (If I had a parser generator handy, I'd use that.) I
>agree simple regex aren't terrible. But there's a tendency to make the
>simple regex "just a little more complicated to get over this one
>problem." It's easy to snowball.
Well, I do not have a problem with telling someone that his code
is getting overly complex. This is so even when that someone is me. I
think of the maintenance programmer. That is usually me for my code.
>A simple test: two hours to hand code a parser, sans parser generator.
>Vs. 24 hours or more for the OP to figure out his regex.
Maybe. I do not use regexes enough that I have the bits all
memorised so I have to look up things. It does not take that long
once one understands the concept though.
[snip]
Sincerely,
Gene Wirchenko
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.java.help
csiph-web