Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.java.programmer > #13190 > unrolled thread
| Started by | laredotornado <laredotornado@zipmail.com> |
|---|---|
| First post | 2012-03-26 11:54 -0700 |
| Last post | 2012-03-28 07:51 +0200 |
| Articles | 20 on this page of 50 — 13 participants |
Back to article view | Back to comp.lang.java.programmer
Keeping the split token in a Java regular expression laredotornado <laredotornado@zipmail.com> - 2012-03-26 11:54 -0700
Re: Keeping the split token in a Java regular expression Lew <lewbloch@gmail.com> - 2012-03-26 12:22 -0700
Re: Keeping the split token in a Java regular expression Robert Klemme <shortcutter@googlemail.com> - 2012-03-26 22:01 +0200
Re: Keeping the split token in a Java regular expression Arne Vajhøj <arne@vajhoej.dk> - 2012-03-26 21:46 -0400
Re: Keeping the split token in a Java regular expression Robert Klemme <shortcutter@googlemail.com> - 2012-03-27 23:01 +0200
Re: Keeping the split token in a Java regular expression Arne Vajhøj <arne@vajhoej.dk> - 2012-03-27 17:18 -0400
Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-27 14:21 -0700
Re: Keeping the split token in a Java regular expression Robert Klemme <shortcutter@googlemail.com> - 2012-03-28 07:38 +0200
Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-28 10:24 -0700
Re: Keeping the split token in a Java regular expression markspace <-@.> - 2012-03-26 13:49 -0700
Re: Keeping the split token in a Java regular expression laredotornado@gmail.com - 2012-03-26 14:21 -0700
Re: Keeping the split token in a Java regular expression markspace <-@.> - 2012-03-26 15:02 -0700
Re: Keeping the split token in a Java regular expression Knute Johnson <nospam@knutejohnson.com> - 2012-03-26 15:56 -0700
Re: Keeping the split token in a Java regular expression markspace <-@.> - 2012-03-26 16:02 -0700
Re: Keeping the split token in a Java regular expression Knute Johnson <nospam@knutejohnson.com> - 2012-03-26 17:33 -0700
Re: Keeping the split token in a Java regular expression Martin Gregorie <martin@address-in-sig.invalid> - 2012-03-27 01:17 +0000
Re: Keeping the split token in a Java regular expression Martin Gregorie <martin@address-in-sig.invalid> - 2012-03-27 21:57 +0000
Re: Keeping the split token in a Java regular expression Gene Wirchenko <genew@ocis.net> - 2012-03-26 18:26 -0700
Re: Keeping the split token in a Java regular expression Lew <lewbloch@gmail.com> - 2012-03-26 19:07 -0700
Re: Keeping the split token in a Java regular expression Knute Johnson <nospam@knutejohnson.com> - 2012-03-26 20:40 -0700
Re: Keeping the split token in a Java regular expression Gene Wirchenko <genew@ocis.net> - 2012-03-27 09:10 -0700
Re: Keeping the split token in a Java regular expression Lew <lewbloch@gmail.com> - 2012-03-27 11:09 -0700
Re: Keeping the split token in a Java regular expression Gene Wirchenko <genew@ocis.net> - 2012-03-27 13:32 -0700
Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-27 14:29 -0700
Re: Keeping the split token in a Java regular expression Gene Wirchenko <genew@ocis.net> - 2012-03-27 16:22 -0700
Re: Keeping the split token in a Java regular expression Gene Wirchenko <genew@ocis.net> - 2012-03-27 18:20 -0700
Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-27 18:27 -0700
Re: Keeping the split token in a Java regular expression Gene Wirchenko <genew@ocis.net> - 2012-03-27 21:31 -0700
Re: Keeping the split token in a Java regular expression Robert Klemme <shortcutter@googlemail.com> - 2012-03-28 07:41 +0200
Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-28 10:28 -0700
Re: Keeping the split token in a Java regular expression Lew <lewbloch@gmail.com> - 2012-03-26 16:26 -0700
Re: Keeping the split token in a Java regular expression Knute Johnson <nospam@knutejohnson.com> - 2012-03-26 17:36 -0700
Re: Keeping the split token in a Java regular expression Robert Klemme <shortcutter@googlemail.com> - 2012-03-27 23:27 +0200
Re: Keeping the split token in a Java regular expression Robert Klemme <shortcutter@googlemail.com> - 2012-03-28 07:28 +0200
Re: Keeping the split token in a Java regular expression "John B. Matthews" <nospam@nospam.invalid> - 2012-03-26 20:49 -0400
Re: Keeping the split token in a Java regular expression Arne Vajhøj <arne@vajhoej.dk> - 2012-03-26 21:58 -0400
Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-26 21:14 -0700
Re: Keeping the split token in a Java regular expression Arne Vajhøj <arne@vajhoej.dk> - 2012-03-27 17:21 -0400
Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-27 15:20 -0700
Re: Keeping the split token in a Java regular expression Arne Vajhøj <arne@vajhoej.dk> - 2012-03-27 18:48 -0400
Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-27 17:07 -0700
Re: Keeping the split token in a Java regular expression Arved Sandstrom <asandstrom3minus1@eastlink.ca> - 2012-03-27 21:49 -0300
Re: Keeping the split token in a Java regular expression Arne Vajhøj <arne@vajhoej.dk> - 2012-03-27 20:56 -0400
Re: Keeping the split token in a Java regular expression Arved Sandstrom <asandstrom3minus1@eastlink.ca> - 2012-03-27 22:01 -0300
Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-27 18:27 -0700
Re: Keeping the split token in a Java regular expression Jim Janney <jjanney@shell.xmission.com> - 2012-03-27 08:15 -0600
Re: Keeping the split token in a Java regular expression laredotornado <laredotornado@zipmail.com> - 2012-03-27 07:58 -0700
Re: Keeping the split token in a Java regular expression Jim Janney <jjanney@shell.xmission.com> - 2012-03-27 09:21 -0600
Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-27 09:43 -0700
Re: Keeping the split token in a Java regular expression Robert Klemme <shortcutter@googlemail.com> - 2012-03-28 07:51 +0200
Page 1 of 3 [1] 2 3 Next page →
| From | laredotornado <laredotornado@zipmail.com> |
|---|---|
| Date | 2012-03-26 11:54 -0700 |
| Subject | Keeping the split token in a Java regular expression |
| Message-ID | <48d35bc3-a391-4ccf-a222-dac64775a2f2@oq7g2000pbb.googlegroups.com> |
Hi,
I'm using Java 6. I want to split a Java string on a regular
expression, but I would like to keep part of the string used to split
in the results. What I have are Strings like
Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM
What I would like to do is split the expression wherever I have an
expression matching /(am|pm),?/i . Hopefully I got that right. In
the above example, I would like the results to be
Fri 7:30 PM
Sat 2 PM
Sun 2:30 PM
But with String.split, the split token is not kept within the
results. How would I write a Java parsing expression to do what I
want?
Thanks, - Dave
[toc] | [next] | [standalone]
| From | Lew <lewbloch@gmail.com> |
|---|---|
| Date | 2012-03-26 12:22 -0700 |
| Message-ID | <33095746.178.1332789765559.JavaMail.geo-discussion-forums@pbcto7> |
| In reply to | #13190 |
laredotornado wrote: > I'm using Java 6. I want to split a Java string on a regular > expression, but I would like to keep part of the string used to split > in the results. What I have are Strings like > > Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM > > What I would like to do is split the expression wherever I have an > expression matching /(am|pm),?/i . Hopefully I got that right. In > the above example, I would like the results to be > > Fri 7:30 PM > Sat 2 PM > Sun 2:30 PM > > But with String.split, the split token is not kept within the > results. How would I write a Java parsing expression to do what I > want? Based on what you've shown it looks like you could split on the comma and trim the resulting strings. -- Lew
[toc] | [prev] | [next] | [standalone]
| From | Robert Klemme <shortcutter@googlemail.com> |
|---|---|
| Date | 2012-03-26 22:01 +0200 |
| Message-ID | <9tc099Fh7cU1@mid.individual.net> |
| In reply to | #13193 |
On 03/26/2012 09:22 PM, Lew wrote: > laredotornado wrote: >> I'm using Java 6. I want to split a Java string on a regular >> expression, but I would like to keep part of the string used to split >> in the results. What I have are Strings like >> >> Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM >> >> What I would like to do is split the expression wherever I have an >> expression matching /(am|pm),?/i . Hopefully I got that right. In >> the above example, I would like the results to be >> >> Fri 7:30 PM >> Sat 2 PM >> Sun 2:30 PM >> >> But with String.split, the split token is not kept within the >> results. How would I write a Java parsing expression to do what I >> want? > > Based on what you've shown it looks like you could split on the comma and trim the resulting strings. And one wouldn't even need a regular expression for that. http://docs.oracle.com/javase/6/docs/api/java/util/StringTokenizer.html Kind regards robert
[toc] | [prev] | [next] | [standalone]
| From | Arne Vajhøj <arne@vajhoej.dk> |
|---|---|
| Date | 2012-03-26 21:46 -0400 |
| Message-ID | <4f711c11$0$287$14726298@news.sunsite.dk> |
| In reply to | #13196 |
On 3/26/2012 4:01 PM, Robert Klemme wrote: > On 03/26/2012 09:22 PM, Lew wrote: >> laredotornado wrote: >>> I'm using Java 6. I want to split a Java string on a regular >>> expression, but I would like to keep part of the string used to split >>> in the results. What I have are Strings like >>> >>> Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM >>> >>> What I would like to do is split the expression wherever I have an >>> expression matching /(am|pm),?/i . Hopefully I got that right. In >>> the above example, I would like the results to be >>> >>> Fri 7:30 PM >>> Sat 2 PM >>> Sun 2:30 PM >>> >>> But with String.split, the split token is not kept within the >>> results. How would I write a Java parsing expression to do what I >>> want? >> >> Based on what you've shown it looks like you could split on the comma >> and trim the resulting strings. > > And one wouldn't even need a regular expression for that. > http://docs.oracle.com/javase/6/docs/api/java/util/StringTokenizer.html StringTokenizer is somewhat obsoleted by String split. So even for a pure literal expression then using split is common. Arne
[toc] | [prev] | [next] | [standalone]
| From | Robert Klemme <shortcutter@googlemail.com> |
|---|---|
| Date | 2012-03-27 23:01 +0200 |
| Message-ID | <9teo5cF63vU1@mid.individual.net> |
| In reply to | #13214 |
On 03/27/2012 03:46 AM, Arne Vajhøj wrote: > On 3/26/2012 4:01 PM, Robert Klemme wrote: >> On 03/26/2012 09:22 PM, Lew wrote: >>> Based on what you've shown it looks like you could split on the comma >>> and trim the resulting strings. >> >> And one wouldn't even need a regular expression for that. >> http://docs.oracle.com/javase/6/docs/api/java/util/StringTokenizer.html > > StringTokenizer is somewhat obsoleted by String split. I find regular expressions are quite a bit of overhead for splitting at commas only. (Now we know that the OP has more demanding requirements so regexp is probably the tool of choice.) Hmm... I don't like those methods in class String that much which use a String with a regular expression which is then parsed on every invocation of the method. That might be good for one off usage but for everything else I prefer solutions which at least use a Pattern constant to avoid parsing overhead per call. Even if it wasn't for runtime overhead of parsing I like to have the constant which can have it's own JavaDoc explaining what's going on plus I can reuse it and quickly find all places of usage etc. Kind regards robert
[toc] | [prev] | [next] | [standalone]
| From | Arne Vajhøj <arne@vajhoej.dk> |
|---|---|
| Date | 2012-03-27 17:18 -0400 |
| Message-ID | <4f722e96$0$290$14726298@news.sunsite.dk> |
| In reply to | #13233 |
On 3/27/2012 5:01 PM, Robert Klemme wrote: > On 03/27/2012 03:46 AM, Arne Vajhøj wrote: >> On 3/26/2012 4:01 PM, Robert Klemme wrote: >>> On 03/26/2012 09:22 PM, Lew wrote: > >>>> Based on what you've shown it looks like you could split on the comma >>>> and trim the resulting strings. >>> >>> And one wouldn't even need a regular expression for that. >>> http://docs.oracle.com/javase/6/docs/api/java/util/StringTokenizer.html >> >> StringTokenizer is somewhat obsoleted by String split. > > I find regular expressions are quite a bit of overhead for splitting at > commas only. (Now we know that the OP has more demanding requirements so > regexp is probably the tool of choice.) > > Hmm... I don't like those methods in class String that much which use a > String with a regular expression which is then parsed on every > invocation of the method. That might be good for one off usage but for > everything else I prefer solutions which at least use a Pattern constant > to avoid parsing overhead per call. Even if it wasn't for runtime > overhead of parsing I like to have the constant which can have it's own > JavaDoc explaining what's going on plus I can reuse it and quickly find > all places of usage etc. Split is the way you do it. To cut down on overhead a non-regex split should be added. Arne
[toc] | [prev] | [next] | [standalone]
| From | Daniel Pitts <newsgroup.nospam@virtualinfinity.net> |
|---|---|
| Date | 2012-03-27 14:21 -0700 |
| Message-ID | <Sbqcr.45778$IQ1.1030@newsfe18.iad> |
| In reply to | #13233 |
On 3/27/12 2:01 PM, Robert Klemme wrote:
> On 03/27/2012 03:46 AM, Arne Vajhøj wrote:
>> On 3/26/2012 4:01 PM, Robert Klemme wrote:
>>> On 03/26/2012 09:22 PM, Lew wrote:
>
>>>> Based on what you've shown it looks like you could split on the comma
>>>> and trim the resulting strings.
>>>
>>> And one wouldn't even need a regular expression for that.
>>> http://docs.oracle.com/javase/6/docs/api/java/util/StringTokenizer.html
>>
>> StringTokenizer is somewhat obsoleted by String split.
>
> I find regular expressions are quite a bit of overhead for splitting at
> commas only. (Now we know that the OP has more demanding requirements so
> regexp is probably the tool of choice.)
>
> Hmm... I don't like those methods in class String that much which use a
> String with a regular expression which is then parsed on every
> invocation of the method. That might be good for one off usage but for
> everything else I prefer solutions which at least use a Pattern constant
> to avoid parsing overhead per call.
Premature optimization. Regex parsing inside an inner loop *migh* add
unacceptable overhead, however that should be determined via profiling.
> Even if it wasn't for runtime
> overhead of parsing I like to have the constant which can have it's own
> JavaDoc explaining what's going on plus I can reuse it and quickly find
> all places of usage etc.
That's a better reason to factor it out.
My personal philosophy for this kind of thing:
Correct first, easy second, fast third.
If its not correct, it doesn't matter.
If its not easy, its likely not correct, at least not for long.
If its not fast, it should be "easy" to make it fast as long as it's
already correct and easy :-)
[toc] | [prev] | [next] | [standalone]
| From | Robert Klemme <shortcutter@googlemail.com> |
|---|---|
| Date | 2012-03-28 07:38 +0200 |
| Message-ID | <9tfme5F5ooU1@mid.individual.net> |
| In reply to | #13236 |
On 03/27/2012 11:21 PM, Daniel Pitts wrote: > On 3/27/12 2:01 PM, Robert Klemme wrote: >> On 03/27/2012 03:46 AM, Arne Vajhøj wrote: >>> On 3/26/2012 4:01 PM, Robert Klemme wrote: >>>> On 03/26/2012 09:22 PM, Lew wrote: >> >>>>> Based on what you've shown it looks like you could split on the comma >>>>> and trim the resulting strings. >>>> >>>> And one wouldn't even need a regular expression for that. >>>> http://docs.oracle.com/javase/6/docs/api/java/util/StringTokenizer.html >>> >>> StringTokenizer is somewhat obsoleted by String split. >> >> I find regular expressions are quite a bit of overhead for splitting at >> commas only. (Now we know that the OP has more demanding requirements so >> regexp is probably the tool of choice.) >> >> Hmm... I don't like those methods in class String that much which use a >> String with a regular expression which is then parsed on every >> invocation of the method. That might be good for one off usage but for >> everything else I prefer solutions which at least use a Pattern constant >> to avoid parsing overhead per call. > Premature optimization. Regex parsing inside an inner loop *migh* add > unacceptable overhead, however that should be determined via profiling. That's not the only reason, because: >> Even if it wasn't for runtime >> overhead of parsing I like to have the constant which can have it's own >> JavaDoc explaining what's going on plus I can reuse it and quickly find >> all places of usage etc. > That's a better reason to factor it out. I forgot to add another point: regular expressions tend to grow large which makes methods which contain such a regexp string constant harder to read. And then of course there is another difference: with the Pattern in a static variable you'll notice earlier (at class load time) if the pattern is ill formatted as opposed to using ad hoc compilation which comes to haunt you later on every method invocation. > My personal philosophy for this kind of thing: > Correct first, easy second, fast third. +1 Kind regards robert
[toc] | [prev] | [next] | [standalone]
| From | Daniel Pitts <newsgroup.nospam@virtualinfinity.net> |
|---|---|
| Date | 2012-03-28 10:24 -0700 |
| Message-ID | <_OHcr.10645$Ce4.1406@newsfe21.iad> |
| In reply to | #13253 |
On 3/27/12 10:38 PM, Robert Klemme wrote: > On 03/27/2012 11:21 PM, Daniel Pitts wrote: >> On 3/27/12 2:01 PM, Robert Klemme wrote: >>> On 03/27/2012 03:46 AM, Arne Vajhøj wrote: >>>> On 3/26/2012 4:01 PM, Robert Klemme wrote: >>>>> On 03/26/2012 09:22 PM, Lew wrote: >>> >>>>>> Based on what you've shown it looks like you could split on the comma >>>>>> and trim the resulting strings. >>>>> >>>>> And one wouldn't even need a regular expression for that. >>>>> http://docs.oracle.com/javase/6/docs/api/java/util/StringTokenizer.html >>>>> >>>> >>>> StringTokenizer is somewhat obsoleted by String split. >>> >>> I find regular expressions are quite a bit of overhead for splitting at >>> commas only. (Now we know that the OP has more demanding requirements so >>> regexp is probably the tool of choice.) >>> >>> Hmm... I don't like those methods in class String that much which use a >>> String with a regular expression which is then parsed on every >>> invocation of the method. That might be good for one off usage but for >>> everything else I prefer solutions which at least use a Pattern constant >>> to avoid parsing overhead per call. >> Premature optimization. Regex parsing inside an inner loop *migh* add >> unacceptable overhead, however that should be determined via profiling. > > That's not the only reason, because: > >>> Even if it wasn't for runtime >>> overhead of parsing I like to have the constant which can have it's own >>> JavaDoc explaining what's going on plus I can reuse it and quickly find >>> all places of usage etc. >> That's a better reason to factor it out. > > I forgot to add another point: regular expressions tend to grow large > which makes methods which contain such a regexp string constant harder > to read. Right, I did concede that there are other great reasons to factor it out. Performance isn't the first one I would pick ;-) > > And then of course there is another difference: with the Pattern in a > static variable you'll notice earlier (at class load time) if the > pattern is ill formatted as opposed to using ad hoc compilation which > comes to haunt you later on every method invocation. Actually, I know even earlier. I know at edit time, as my IDE will highlight bad regex inside methods which take regex ;-) Even so, it should be found at Unit Test time (which, granted, will be around the same time whether it's per method or per class-load). Just a thought.
[toc] | [prev] | [next] | [standalone]
| From | markspace <-@.> |
|---|---|
| Date | 2012-03-26 13:49 -0700 |
| Message-ID | <jkqkov$839$1@dont-email.me> |
| In reply to | #13190 |
On 3/26/2012 11:54 AM, laredotornado wrote:
> Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM
>
> But with String.split, the split token is not kept within the
> results. How would I write a Java parsing expression to do what I
> want?
What Lew said.
String[] dates = dateString.split( ", +" );
for( String date : dates ) {
String temp = date.trim().toUpper();
if( temp.endsWith( "PM" ) ) {
System.out.println( "Good afternoon." );
else if( temp.endsWith( "AM" ) {
System.out.println( "Good morning." );
} else {
System.out.println( "Good whatever." );
}
}
[toc] | [prev] | [next] | [standalone]
| From | laredotornado@gmail.com |
|---|---|
| Date | 2012-03-26 14:21 -0700 |
| Message-ID | <9569964.403.1332796867513.JavaMail.geo-discussion-forums@ynne2> |
| In reply to | #13190 |
On Monday, March 26, 2012 1:54:40 PM UTC-5, laredotornado wrote: > Hi, > > I'm using Java 6. I want to split a Java string on a regular > expression, but I would like to keep part of the string used to split > in the results. What I have are Strings like > > Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM > > What I would like to do is split the expression wherever I have an > expression matching /(am|pm),?/i . Hopefully I got that right. In > the above example, I would like the results to be > > Fri 7:30 PM > Sat 2 PM > Sun 2:30 PM > > But with String.split, the split token is not kept within the > results. How would I write a Java parsing expression to do what I > want? > > Thanks, - Dave Hi, I don't want to split on the comma because there could be a case where the given String is "Fri 8 PM, Sat 1, 3, and 5 PM" and in this case, I want the result to be a String array containing Fri 8 PM Sat 1, 3, and 5 PM Your continued help is appreciated, - Dave
[toc] | [prev] | [next] | [standalone]
| From | markspace <-@.> |
|---|---|
| Date | 2012-03-26 15:02 -0700 |
| Message-ID | <jkqp1b$2qu$1@dont-email.me> |
| In reply to | #13200 |
On 3/26/2012 2:21 PM, laredotornado@gmail.com wrote: > Hi, I don't want to split on the comma because there could be a case > where the given String is "Fri 8 PM, Sat 1, 3, and 5 PM" and in this > case, I want the result to be a String array containing > > Fri 8 PM Sat 1, 3, and 5 PM You might be able to do this with clever use of regex look-around: http://www.regular-expressions.info/lookaround.html Maybe something like "(?<=M),". Definitely take some time to test that carefully though. Otherwise, you'll have to write your own parser (which wouldn't be hard).
[toc] | [prev] | [next] | [standalone]
| From | Knute Johnson <nospam@knutejohnson.com> |
|---|---|
| Date | 2012-03-26 15:56 -0700 |
| Message-ID | <jkqs7e$jek$1@dont-email.me> |
| In reply to | #13200 |
On 3/26/2012 2:21 PM, laredotornado@gmail.com wrote:
> On Monday, March 26, 2012 1:54:40 PM UTC-5, laredotornado wrote:
>> Hi,
>>
>> I'm using Java 6. I want to split a Java string on a regular
>> expression, but I would like to keep part of the string used to split
>> in the results. What I have are Strings like
>>
>> Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM
>>
>> What I would like to do is split the expression wherever I have an
>> expression matching /(am|pm),?/i . Hopefully I got that right. In
>> the above example, I would like the results to be
>>
>> Fri 7:30 PM
>> Sat 2 PM
>> Sun 2:30 PM
>>
>> But with String.split, the split token is not kept within the
>> results. How would I write a Java parsing expression to do what I
>> want?
>>
>> Thanks, - Dave
>
> Hi, I don't want to split on the comma because there could be a case where the given String is "Fri 8 PM, Sat 1, 3, and 5 PM" and in this case, I want the result to be a String array containing
>
> Fri 8 PM
> Sat 1, 3, and 5 PM
>
> Your continued help is appreciated, - Dave
public class test {
public static void main(String[] args) {
String str = "Fri 7:30 PM, Fri 8 PM, Sat 1, 3, and 5 PM";
String token = "PM, |PM";
String[] strs = str.split(token);
for (String s : strs)
System.out.println(s+"PM");
}
}
C:\Documents and Settings\Knute Johnson>java test
Fri 7:30 PM
Fri 8 PM
Sat 1, 3, and 5 PM
If you wanted to get AMs too, you could do a first pass for the PMs and
then do it again for the AMs.
--
Knute Johnson
[toc] | [prev] | [next] | [standalone]
| From | markspace <-@.> |
|---|---|
| Date | 2012-03-26 16:02 -0700 |
| Message-ID | <jkqsi1$m3l$1@dont-email.me> |
| In reply to | #13203 |
On 3/26/2012 3:56 PM, Knute Johnson wrote:
> String str = "Fri 7:30 PM, Fri 8 PM, Sat 1, 3, and 5 PM";
...
> System.out.println(s+"PM");
^^
What does this print if the "str" string ends with AM instead of PM? I
don't think this actually works....
[toc] | [prev] | [next] | [standalone]
| From | Knute Johnson <nospam@knutejohnson.com> |
|---|---|
| Date | 2012-03-26 17:33 -0700 |
| Message-ID | <jkr1tf$iql$1@dont-email.me> |
| In reply to | #13204 |
On 3/26/2012 4:02 PM, markspace wrote: > On 3/26/2012 3:56 PM, Knute Johnson wrote: > >> String str = "Fri 7:30 PM, Fri 8 PM, Sat 1, 3, and 5 PM"; > ... >> System.out.println(s+"PM"); > ^^ > > What does this print if the "str" string ends with AM instead of PM? I > don't think this actually works.... > It won't. He'll have to make a two-pass system if he's going to split on two different tokens. I think I said that. -- Knute Johnson
[toc] | [prev] | [next] | [standalone]
| From | Martin Gregorie <martin@address-in-sig.invalid> |
|---|---|
| Date | 2012-03-27 01:17 +0000 |
| Message-ID | <jkr4f6$sf1$1@localhost.localdomain> |
| In reply to | #13208 |
On Mon, 26 Mar 2012 17:33:51 -0700, Knute Johnson wrote:
> On 3/26/2012 4:02 PM, markspace wrote:
>> On 3/26/2012 3:56 PM, Knute Johnson wrote:
>>
>>> String str = "Fri 7:30 PM, Fri 8 PM, Sat 1, 3, and 5 PM";
>> ...
>>> System.out.println(s+"PM");
>> ^^
>>
>> What does this print if the "str" string ends with AM instead of PM? I
>> don't think this actually works....
>>
>>
> It won't. He'll have to make a two-pass system if he's going to split
> on two different tokens. I think I said that
Then you'd something like the following, semi-pseudo-coded as:
slist = in.split("PM, +|PM")
for (int i=0; i<slist.length; i++)
slist[i] = slist[i].trim() + "PM";
ArrayList<String> alist = new ArrayList<String>;;
for (s : slist)
sp = s.split("AM, +|AM");
for (int j=0; j < s.length; j++)
alist.add(s.trim() + "AM");
...but its ugly. I think it can be done in one pass using a regex with
capture groups along the lines of
"(.*)([AP]M ,|[AP]M)"
If I got that right, each time expression that the OP needs to split
out is represented by a pair of adjacent capture groups, so just a
single pass along the array of capture groups concatenating adjacent
pairs and applying trim() to each concatenated pair should do the
trick.
Its rather late here, so I'll leave this as an exercise for anybody
who feels keen. If nobody has touched it by mid morning tomorrow I may
see if it works.
--
martin@ | Martin Gregorie
gregorie. | Essex, UK
org |
[toc] | [prev] | [next] | [standalone]
| From | Martin Gregorie <martin@address-in-sig.invalid> |
|---|---|
| Date | 2012-03-27 21:57 +0000 |
| Message-ID | <jktd4e$kef$1@localhost.localdomain> |
| In reply to | #13211 |
On Tue, 27 Mar 2012 01:17:26 +0000, Martin Gregorie wrote:
> Its rather late here, so I'll leave this as an exercise for anybody
> who feels keen. If nobody has touched it by mid morning tomorrow I
> may see if it works.
>
I put together the following this morning. Hopefully its enough of an SSCE
to pass muster.
As promised, I first implemented a two-pass splitter (the 'classico'
method): its ugly all right, even though it does the trick.
Then I swiped Stefan's code (the 'patternista' method), tewaked it
slightly and used it to drive both his and my regexes. The only other
changed it needs is to parameterise Matcher.group() because Stefan's regex
treats the whole pattern as a capture group while mine only uses the
first capture group in the pattern which lets it discard the comma
separators. This was one of my design aims: to output the exact same
strings as the classico() method does.
==========================================================================
import java.util.ArrayList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Splitter
{
public static ArrayList<String> classico(String in)
{
String[] sList = in.split("PM, +|PM");
for (int i=0; i<sList.length; i++)
sList[i] = sList[i].trim() + " PM";
ArrayList<String> aList = new ArrayList<String>();
for (String s : sList)
{
String sp[] = s.split("AM, +|AM");
for (int j=0; j < sp.length - 1; j++)
aList.add(sp[j].trim() + " AM");
aList.add(sp[sp.length - 1]); // The last element is
// always ended wth PM
}
return aList;
}
public static ArrayList<String> patternista(String p, int g, String in)
{
Pattern pattern = Pattern.compile(p, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(in);
ArrayList<String> aList = new ArrayList<String>();
while(matcher.find())
{
String s = matcher.group(g);
aList.add(s.trim());
}
return aList;
}
public static void showResult(String source,
String method,
ArrayList<String> s)
{
System.out.println(String.format("\n'%s' ==> '%s'",
source,
method));
for (int i = 0; i < s.size(); i++)
System.out.println(String.format("%2d: %s", i, s.get(i)));
}
public static void main(String[] args)
{
String SOURCE = "Fri 7:30 PM, Sat 1, 3 and 5 AM, Sun 2:30 PM";
String martin = "(.*?[AP]M),?";
String stefan = ".*?(?:am|pm),?";
ArrayList<String> s;
s = classico(SOURCE);
showResult(SOURCE, "classico", s);
s = patternista(martin, 1, SOURCE);
showResult(SOURCE, martin, s);
s = patternista(stefan, 0, SOURCE);
showResult(SOURCE, stefan, s);
}
}
==========================================================================
'Fri 7:30 PM, Sat 1, 3 and 5 AM, Sun 2:30 PM' ==> 'classico'
0: Fri 7:30 PM
1: Sat 1, 3 and 5 AM
2: Sun 2:30 PM
'Fri 7:30 PM, Sat 1, 3 and 5 AM, Sun 2:30 PM' ==> '(.*?[AP]M),?'
0: Fri 7:30 PM
1: Sat 1, 3 and 5 AM
2: Sun 2:30 PM
'Fri 7:30 PM, Sat 1, 3 and 5 AM, Sun 2:30 PM' ==> '.*?(?:am|pm),?'
0: Fri 7:30 PM,
1: Sat 1, 3 and 5 AM,
2: Sun 2:30 PM
==========================================================================
As you can see, once I'd swapped greedy matches for non-greedy in my regex
(the second test run), both regexes do job and to my mind use much more
elegant code than the two pass classico approach, but of course ymmv.
--
martin@ | Martin Gregorie
gregorie. | Essex, UK
org |
[toc] | [prev] | [next] | [standalone]
| From | Gene Wirchenko <genew@ocis.net> |
|---|---|
| Date | 2012-03-26 18:26 -0700 |
| Message-ID | <sf52n798nn2g3a5lg5te3vj7b4403iut65@4ax.com> |
| In reply to | #13200 |
On Mon, 26 Mar 2012 14:21:07 -0700 (PDT), laredotornado@gmail.com
wrote:
>On Monday, March 26, 2012 1:54:40 PM UTC-5, laredotornado wrote:
>> Hi,
>>
>> I'm using Java 6. I want to split a Java string on a regular
>> expression, but I would like to keep part of the string used to split
>> in the results. What I have are Strings like
>>
>> Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM
>>
>> What I would like to do is split the expression wherever I have an
>> expression matching /(am|pm),?/i . Hopefully I got that right. In
>> the above example, I would like the results to be
>>
>> Fri 7:30 PM
>> Sat 2 PM
>> Sun 2:30 PM
>>
>> But with String.split, the split token is not kept within the
>> results. How would I write a Java parsing expression to do what I
>> want?
>>
>> Thanks, - Dave
>
>Hi, I don't want to split on the comma because there could be a case where the given String is "Fri 8 PM, Sat 1, 3, and 5 PM" and in this case, I want the result to be a String array containing
>
>Fri 8 PM
>Sat 1, 3, and 5 PM
>
>Your continued help is appreciated, - Dave
What about "Sun 9, 11 AM, and 1 PM"? Or "Sun 9 and 11 AM, and 1
and 3 PM"?
I think you had better be quite sure of all of the variants. For
that matter, people often omit the comma before "and" which would give
"Sun 9, 11 AM and 1 PM" for my first example. Such people have
probably not seen
http://www.outsidethebeltway.com/oxford-comma-cartoon/
or other such references.
Sincerely,
Gene Wirchenko
[toc] | [prev] | [next] | [standalone]
| From | Lew <lewbloch@gmail.com> |
|---|---|
| Date | 2012-03-26 19:07 -0700 |
| Message-ID | <17975015.387.1332814029736.JavaMail.geo-discussion-forums@pbtd1> |
| In reply to | #13213 |
Gene Wirchenko wrote: > What about "Sun 9, 11 AM, and 1 PM"? > Or "Sun 9 and 11 AM, and 1 and 3 PM"? > > I think you had better be quite sure of all of the variants. For > that matter, people often omit the comma before "and" which would give > "Sun 9, 11 AM and 1 PM" for my first example. Such people have > probably not seen > http://www.outsidethebeltway.com/oxford-comma-cartoon/ > or other such references. The point is that you need a precise, perhaps formal statement of the exact rules to parse the input, and what to do when the input format fails quality checks. Parsing is a Dark Art in programming - not really the hardest of them, but worthy of close attention. It does require a careful, methodical approach. -- Lew
[toc] | [prev] | [next] | [standalone]
| From | Knute Johnson <nospam@knutejohnson.com> |
|---|---|
| Date | 2012-03-26 20:40 -0700 |
| Message-ID | <jkrcr8$4pg$1@dont-email.me> |
| In reply to | #13217 |
On 3/26/2012 7:07 PM, Lew wrote: > Gene Wirchenko wrote: >> What about "Sun 9, 11 AM, and 1 PM"? >> Or "Sun 9 and 11 AM, and 1 and 3 PM"? >> >> I think you had better be quite sure of all of the variants. For >> that matter, people often omit the comma before "and" which would give >> "Sun 9, 11 AM and 1 PM" for my first example. Such people have >> probably not seen >> http://www.outsidethebeltway.com/oxford-comma-cartoon/ >> or other such references. > > The point is that you need a precise, perhaps formal statement of the exact rules to parse the input, and what to do when the input format fails quality checks. > > Parsing is a Dark Art in programming - not really the hardest of them, but worthy of close attention. > > It does require a careful, methodical approach. > You've been awfully poetic lately Lew. -- Knute Johnson
[toc] | [prev] | [next] | [standalone]
Page 1 of 3 [1] 2 3 Next page →
Back to top | Article view | comp.lang.java.programmer
csiph-web