Groups > comp.lang.java.programmer > #13190 > unrolled thread

Keeping the split token in a Java regular expression

Started by	laredotornado <laredotornado@zipmail.com>
First post	2012-03-26 11:54 -0700
Last post	2012-03-28 07:51 +0200
Articles	20 on this page of 50 — 13 participants

Back to article view | Back to comp.lang.java.programmer

  Keeping the split token in a Java regular expression laredotornado <laredotornado@zipmail.com> - 2012-03-26 11:54 -0700
    Re: Keeping the split token in a Java regular expression Lew <lewbloch@gmail.com> - 2012-03-26 12:22 -0700
      Re: Keeping the split token in a Java regular expression Robert Klemme <shortcutter@googlemail.com> - 2012-03-26 22:01 +0200
        Re: Keeping the split token in a Java regular expression Arne Vajhøj <arne@vajhoej.dk> - 2012-03-26 21:46 -0400
          Re: Keeping the split token in a Java regular expression Robert Klemme <shortcutter@googlemail.com> - 2012-03-27 23:01 +0200
            Re: Keeping the split token in a Java regular expression Arne Vajhøj <arne@vajhoej.dk> - 2012-03-27 17:18 -0400
            Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-27 14:21 -0700
              Re: Keeping the split token in a Java regular expression Robert Klemme <shortcutter@googlemail.com> - 2012-03-28 07:38 +0200
                Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-28 10:24 -0700
    Re: Keeping the split token in a Java regular expression markspace <-@.> - 2012-03-26 13:49 -0700
    Re: Keeping the split token in a Java regular expression laredotornado@gmail.com - 2012-03-26 14:21 -0700
      Re: Keeping the split token in a Java regular expression markspace <-@.> - 2012-03-26 15:02 -0700
      Re: Keeping the split token in a Java regular expression Knute Johnson <nospam@knutejohnson.com> - 2012-03-26 15:56 -0700
        Re: Keeping the split token in a Java regular expression markspace <-@.> - 2012-03-26 16:02 -0700
          Re: Keeping the split token in a Java regular expression Knute Johnson <nospam@knutejohnson.com> - 2012-03-26 17:33 -0700
            Re: Keeping the split token in a Java regular expression Martin Gregorie <martin@address-in-sig.invalid> - 2012-03-27 01:17 +0000
              Re: Keeping the split token in a Java regular expression Martin Gregorie <martin@address-in-sig.invalid> - 2012-03-27 21:57 +0000
      Re: Keeping the split token in a Java regular expression Gene Wirchenko <genew@ocis.net> - 2012-03-26 18:26 -0700
        Re: Keeping the split token in a Java regular expression Lew <lewbloch@gmail.com> - 2012-03-26 19:07 -0700
          Re: Keeping the split token in a Java regular expression Knute Johnson <nospam@knutejohnson.com> - 2012-03-26 20:40 -0700
            Re: Keeping the split token in a Java regular expression Gene Wirchenko <genew@ocis.net> - 2012-03-27 09:10 -0700
              Re: Keeping the split token in a Java regular expression Lew <lewbloch@gmail.com> - 2012-03-27 11:09 -0700
                Re: Keeping the split token in a Java regular expression Gene Wirchenko <genew@ocis.net> - 2012-03-27 13:32 -0700
                  Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-27 14:29 -0700
                    Re: Keeping the split token in a Java regular expression Gene Wirchenko <genew@ocis.net> - 2012-03-27 16:22 -0700
                      Re: Keeping the split token in a Java regular expression Gene Wirchenko <genew@ocis.net> - 2012-03-27 18:20 -0700
                        Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-27 18:27 -0700
                          Re: Keeping the split token in a Java regular expression Gene Wirchenko <genew@ocis.net> - 2012-03-27 21:31 -0700
                            Re: Keeping the split token in a Java regular expression Robert Klemme <shortcutter@googlemail.com> - 2012-03-28 07:41 +0200
                              Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-28 10:28 -0700
    Re: Keeping the split token in a Java regular expression Lew <lewbloch@gmail.com> - 2012-03-26 16:26 -0700
      Re: Keeping the split token in a Java regular expression Knute Johnson <nospam@knutejohnson.com> - 2012-03-26 17:36 -0700
      Re: Keeping the split token in a Java regular expression Robert Klemme <shortcutter@googlemail.com> - 2012-03-27 23:27 +0200
        Re: Keeping the split token in a Java regular expression Robert Klemme <shortcutter@googlemail.com> - 2012-03-28 07:28 +0200
    Re: Keeping the split token in a Java regular expression "John B. Matthews" <nospam@nospam.invalid> - 2012-03-26 20:49 -0400
    Re: Keeping the split token in a Java regular expression Arne Vajhøj <arne@vajhoej.dk> - 2012-03-26 21:58 -0400
      Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-26 21:14 -0700
        Re: Keeping the split token in a Java regular expression Arne Vajhøj <arne@vajhoej.dk> - 2012-03-27 17:21 -0400
          Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-27 15:20 -0700
            Re: Keeping the split token in a Java regular expression Arne Vajhøj <arne@vajhoej.dk> - 2012-03-27 18:48 -0400
              Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-27 17:07 -0700
            Re: Keeping the split token in a Java regular expression Arved Sandstrom <asandstrom3minus1@eastlink.ca> - 2012-03-27 21:49 -0300
              Re: Keeping the split token in a Java regular expression Arne Vajhøj <arne@vajhoej.dk> - 2012-03-27 20:56 -0400
                Re: Keeping the split token in a Java regular expression Arved Sandstrom <asandstrom3minus1@eastlink.ca> - 2012-03-27 22:01 -0300
                  Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-27 18:27 -0700
    Re: Keeping the split token in a Java regular expression Jim Janney <jjanney@shell.xmission.com> - 2012-03-27 08:15 -0600
      Re: Keeping the split token in a Java regular expression laredotornado <laredotornado@zipmail.com> - 2012-03-27 07:58 -0700
        Re: Keeping the split token in a Java regular expression Jim Janney <jjanney@shell.xmission.com> - 2012-03-27 09:21 -0600
          Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-27 09:43 -0700
            Re: Keeping the split token in a Java regular expression Robert Klemme <shortcutter@googlemail.com> - 2012-03-28 07:51 +0200

Page 1 of 3 [1] 2 3 Next page →

#13190 — Keeping the split token in a Java regular expression

From	laredotornado <laredotornado@zipmail.com>
Date	2012-03-26 11:54 -0700
Subject	Keeping the split token in a Java regular expression
Message-ID	<48d35bc3-a391-4ccf-a222-dac64775a2f2@oq7g2000pbb.googlegroups.com>

Hi,

I'm using Java 6.  I want to split a Java string on a regular
expression, but I would like to keep part of the string used to split
in the results.  What I have are Strings like

    Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM

What I would like to do is split the expression wherever I have an
expression matching /(am|pm),?/i .  Hopefully I got that right.  In
the above example, I would like the results to be

    Fri 7:30 PM
    Sat 2 PM
    Sun 2:30 PM

But with String.split, the split token is not kept within the
results.  How would I write a Java parsing expression to do what I
want?

Thanks, - Dave

[toc] | [next] | [standalone]

#13193

From	Lew <lewbloch@gmail.com>
Date	2012-03-26 12:22 -0700
Message-ID	<33095746.178.1332789765559.JavaMail.geo-discussion-forums@pbcto7>
In reply to	#13190

laredotornado wrote:
> I'm using Java 6.  I want to split a Java string on a regular
> expression, but I would like to keep part of the string used to split
> in the results.  What I have are Strings like
> 
>     Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM
> 
> What I would like to do is split the expression wherever I have an
> expression matching /(am|pm),?/i .  Hopefully I got that right.  In
> the above example, I would like the results to be
> 
>     Fri 7:30 PM
>     Sat 2 PM
>     Sun 2:30 PM
> 
> But with String.split, the split token is not kept within the
> results.  How would I write a Java parsing expression to do what I
> want?

Based on what you've shown it looks like you could split on the comma and trim the resulting strings.

-- 
Lew

[toc] | [prev] | [next] | [standalone]

#13196

From	Robert Klemme <shortcutter@googlemail.com>
Date	2012-03-26 22:01 +0200
Message-ID	<9tc099Fh7cU1@mid.individual.net>
In reply to	#13193

On 03/26/2012 09:22 PM, Lew wrote:
> laredotornado wrote:
>> I'm using Java 6.  I want to split a Java string on a regular
>> expression, but I would like to keep part of the string used to split
>> in the results.  What I have are Strings like
>>
>>      Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM
>>
>> What I would like to do is split the expression wherever I have an
>> expression matching /(am|pm),?/i .  Hopefully I got that right.  In
>> the above example, I would like the results to be
>>
>>      Fri 7:30 PM
>>      Sat 2 PM
>>      Sun 2:30 PM
>>
>> But with String.split, the split token is not kept within the
>> results.  How would I write a Java parsing expression to do what I
>> want?
>
> Based on what you've shown it looks like you could split on the comma and trim the resulting strings.

And one wouldn't even need a regular expression for that.
http://docs.oracle.com/javase/6/docs/api/java/util/StringTokenizer.html

Kind regards

	robert

[toc] | [prev] | [next] | [standalone]

#13214

From	Arne Vajhøj <arne@vajhoej.dk>
Date	2012-03-26 21:46 -0400
Message-ID	<4f711c11$0$287$14726298@news.sunsite.dk>
In reply to	#13196

On 3/26/2012 4:01 PM, Robert Klemme wrote:
> On 03/26/2012 09:22 PM, Lew wrote:
>> laredotornado wrote:
>>> I'm using Java 6. I want to split a Java string on a regular
>>> expression, but I would like to keep part of the string used to split
>>> in the results. What I have are Strings like
>>>
>>> Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM
>>>
>>> What I would like to do is split the expression wherever I have an
>>> expression matching /(am|pm),?/i . Hopefully I got that right. In
>>> the above example, I would like the results to be
>>>
>>> Fri 7:30 PM
>>> Sat 2 PM
>>> Sun 2:30 PM
>>>
>>> But with String.split, the split token is not kept within the
>>> results. How would I write a Java parsing expression to do what I
>>> want?
>>
>> Based on what you've shown it looks like you could split on the comma
>> and trim the resulting strings.
>
> And one wouldn't even need a regular expression for that.
> http://docs.oracle.com/javase/6/docs/api/java/util/StringTokenizer.html

StringTokenizer is somewhat obsoleted by String split.

So even for a pure literal expression then using split is
common.

Arne

[toc] | [prev] | [next] | [standalone]

#13233

From	Robert Klemme <shortcutter@googlemail.com>
Date	2012-03-27 23:01 +0200
Message-ID	<9teo5cF63vU1@mid.individual.net>
In reply to	#13214

On 03/27/2012 03:46 AM, Arne Vajhøj wrote:
> On 3/26/2012 4:01 PM, Robert Klemme wrote:
>> On 03/26/2012 09:22 PM, Lew wrote:

>>> Based on what you've shown it looks like you could split on the comma
>>> and trim the resulting strings.
>>
>> And one wouldn't even need a regular expression for that.
>> http://docs.oracle.com/javase/6/docs/api/java/util/StringTokenizer.html
>
> StringTokenizer is somewhat obsoleted by String split.

I find regular expressions are quite a bit of overhead for splitting at 
commas only.  (Now we know that the OP has more demanding requirements 
so regexp is probably the tool of choice.)

Hmm...  I don't like those methods in class String that much which use a 
String with a regular expression which is then parsed on every 
invocation of the method.  That might be good for one off usage but for 
everything else I prefer solutions which at least use a Pattern constant 
to avoid parsing overhead per call.  Even if it wasn't for runtime 
overhead of parsing I like to have the constant which can have it's own 
JavaDoc explaining what's going on plus I can reuse it and quickly find 
all places of usage etc.

Kind regards

	robert

[toc] | [prev] | [next] | [standalone]

#13234

From	Arne Vajhøj <arne@vajhoej.dk>
Date	2012-03-27 17:18 -0400
Message-ID	<4f722e96$0$290$14726298@news.sunsite.dk>
In reply to	#13233

On 3/27/2012 5:01 PM, Robert Klemme wrote:
> On 03/27/2012 03:46 AM, Arne Vajhøj wrote:
>> On 3/26/2012 4:01 PM, Robert Klemme wrote:
>>> On 03/26/2012 09:22 PM, Lew wrote:
>
>>>> Based on what you've shown it looks like you could split on the comma
>>>> and trim the resulting strings.
>>>
>>> And one wouldn't even need a regular expression for that.
>>> http://docs.oracle.com/javase/6/docs/api/java/util/StringTokenizer.html
>>
>> StringTokenizer is somewhat obsoleted by String split.
>
> I find regular expressions are quite a bit of overhead for splitting at
> commas only. (Now we know that the OP has more demanding requirements so
> regexp is probably the tool of choice.)
>
> Hmm... I don't like those methods in class String that much which use a
> String with a regular expression which is then parsed on every
> invocation of the method. That might be good for one off usage but for
> everything else I prefer solutions which at least use a Pattern constant
> to avoid parsing overhead per call. Even if it wasn't for runtime
> overhead of parsing I like to have the constant which can have it's own
> JavaDoc explaining what's going on plus I can reuse it and quickly find
> all places of usage etc.

Split is the way you do it.

To cut down on overhead a non-regex split should be added.

Arne

[toc] | [prev] | [next] | [standalone]

#13236

From	Daniel Pitts <newsgroup.nospam@virtualinfinity.net>
Date	2012-03-27 14:21 -0700
Message-ID	<Sbqcr.45778$IQ1.1030@newsfe18.iad>
In reply to	#13233

On 3/27/12 2:01 PM, Robert Klemme wrote:
> On 03/27/2012 03:46 AM, Arne Vajhøj wrote:
>> On 3/26/2012 4:01 PM, Robert Klemme wrote:
>>> On 03/26/2012 09:22 PM, Lew wrote:
>
>>>> Based on what you've shown it looks like you could split on the comma
>>>> and trim the resulting strings.
>>>
>>> And one wouldn't even need a regular expression for that.
>>> http://docs.oracle.com/javase/6/docs/api/java/util/StringTokenizer.html
>>
>> StringTokenizer is somewhat obsoleted by String split.
>
> I find regular expressions are quite a bit of overhead for splitting at
> commas only. (Now we know that the OP has more demanding requirements so
> regexp is probably the tool of choice.)
>
> Hmm... I don't like those methods in class String that much which use a
> String with a regular expression which is then parsed on every
> invocation of the method. That might be good for one off usage but for
> everything else I prefer solutions which at least use a Pattern constant
> to avoid parsing overhead per call.
Premature optimization. Regex parsing inside an inner loop *migh* add 
unacceptable overhead, however that should be determined via profiling.
> Even if it wasn't for runtime
> overhead of parsing I like to have the constant which can have it's own
> JavaDoc explaining what's going on plus I can reuse it and quickly find
> all places of usage etc.
That's a better reason to factor it out.

My personal philosophy for this kind of thing:
   Correct first, easy second, fast third.

    If its not correct, it doesn't matter.
    If its not easy, its likely not correct, at least not for long.
    If its not fast, it should be "easy" to make it fast as long as it's 
already correct and easy :-)

[toc] | [prev] | [next] | [standalone]

#13253

From	Robert Klemme <shortcutter@googlemail.com>
Date	2012-03-28 07:38 +0200
Message-ID	<9tfme5F5ooU1@mid.individual.net>
In reply to	#13236

On 03/27/2012 11:21 PM, Daniel Pitts wrote:
> On 3/27/12 2:01 PM, Robert Klemme wrote:
>> On 03/27/2012 03:46 AM, Arne Vajhøj wrote:
>>> On 3/26/2012 4:01 PM, Robert Klemme wrote:
>>>> On 03/26/2012 09:22 PM, Lew wrote:
>>
>>>>> Based on what you've shown it looks like you could split on the comma
>>>>> and trim the resulting strings.
>>>>
>>>> And one wouldn't even need a regular expression for that.
>>>> http://docs.oracle.com/javase/6/docs/api/java/util/StringTokenizer.html
>>>
>>> StringTokenizer is somewhat obsoleted by String split.
>>
>> I find regular expressions are quite a bit of overhead for splitting at
>> commas only. (Now we know that the OP has more demanding requirements so
>> regexp is probably the tool of choice.)
>>
>> Hmm... I don't like those methods in class String that much which use a
>> String with a regular expression which is then parsed on every
>> invocation of the method. That might be good for one off usage but for
>> everything else I prefer solutions which at least use a Pattern constant
>> to avoid parsing overhead per call.
> Premature optimization. Regex parsing inside an inner loop *migh* add
> unacceptable overhead, however that should be determined via profiling.

That's not the only reason, because:

>> Even if it wasn't for runtime
>> overhead of parsing I like to have the constant which can have it's own
>> JavaDoc explaining what's going on plus I can reuse it and quickly find
>> all places of usage etc.
> That's a better reason to factor it out.

I forgot to add another point: regular expressions tend to grow large 
which makes methods which contain such a regexp string constant harder 
to read.

And then of course there is another difference: with the Pattern in a 
static variable you'll notice earlier (at class load time) if the 
pattern is ill formatted as opposed to using ad hoc compilation which 
comes to haunt you later on every method invocation.

> My personal philosophy for this kind of thing:
> Correct first, easy second, fast third.

+1

Kind regards

	robert

[toc] | [prev] | [next] | [standalone]

#13256

From	Daniel Pitts <newsgroup.nospam@virtualinfinity.net>
Date	2012-03-28 10:24 -0700
Message-ID	<_OHcr.10645$Ce4.1406@newsfe21.iad>
In reply to	#13253

On 3/27/12 10:38 PM, Robert Klemme wrote:
> On 03/27/2012 11:21 PM, Daniel Pitts wrote:
>> On 3/27/12 2:01 PM, Robert Klemme wrote:
>>> On 03/27/2012 03:46 AM, Arne Vajhøj wrote:
>>>> On 3/26/2012 4:01 PM, Robert Klemme wrote:
>>>>> On 03/26/2012 09:22 PM, Lew wrote:
>>>
>>>>>> Based on what you've shown it looks like you could split on the comma
>>>>>> and trim the resulting strings.
>>>>>
>>>>> And one wouldn't even need a regular expression for that.
>>>>> http://docs.oracle.com/javase/6/docs/api/java/util/StringTokenizer.html
>>>>>
>>>>
>>>> StringTokenizer is somewhat obsoleted by String split.
>>>
>>> I find regular expressions are quite a bit of overhead for splitting at
>>> commas only. (Now we know that the OP has more demanding requirements so
>>> regexp is probably the tool of choice.)
>>>
>>> Hmm... I don't like those methods in class String that much which use a
>>> String with a regular expression which is then parsed on every
>>> invocation of the method. That might be good for one off usage but for
>>> everything else I prefer solutions which at least use a Pattern constant
>>> to avoid parsing overhead per call.
>> Premature optimization. Regex parsing inside an inner loop *migh* add
>> unacceptable overhead, however that should be determined via profiling.
>
> That's not the only reason, because:
>
>>> Even if it wasn't for runtime
>>> overhead of parsing I like to have the constant which can have it's own
>>> JavaDoc explaining what's going on plus I can reuse it and quickly find
>>> all places of usage etc.
>> That's a better reason to factor it out.
>
> I forgot to add another point: regular expressions tend to grow large
> which makes methods which contain such a regexp string constant harder
> to read.
Right, I did concede that there are other great reasons to factor it 
out. Performance isn't the first one I would pick ;-)

>
> And then of course there is another difference: with the Pattern in a
> static variable you'll notice earlier (at class load time) if the
> pattern is ill formatted as opposed to using ad hoc compilation which
> comes to haunt you later on every method invocation.
Actually, I know even earlier. I know at edit time, as my IDE will 
highlight bad regex inside methods which take regex ;-)

Even so, it should be found at Unit Test time (which, granted, will be 
around the same time whether it's per method or per class-load).

Just a thought.

[toc] | [prev] | [next] | [standalone]

#13199

From	markspace <-@.>
Date	2012-03-26 13:49 -0700
Message-ID	<jkqkov$839$1@dont-email.me>
In reply to	#13190

On 3/26/2012 11:54 AM, laredotornado wrote:

>      Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM
>
> But with String.split, the split token is not kept within the
> results.  How would I write a Java parsing expression to do what I
> want?


What Lew said.

   String[] dates = dateString.split( ", +" );

   for( String date : dates ) {

     String temp = date.trim().toUpper();

     if( temp.endsWith( "PM" ) ) {
       System.out.println( "Good afternoon." );
     else if( temp.endsWith( "AM" ) {
       System.out.println( "Good morning." );
     } else {
       System.out.println( "Good whatever." );
     }

   }

[toc] | [prev] | [next] | [standalone]

#13200

From	laredotornado@gmail.com
Date	2012-03-26 14:21 -0700
Message-ID	<9569964.403.1332796867513.JavaMail.geo-discussion-forums@ynne2>
In reply to	#13190

On Monday, March 26, 2012 1:54:40 PM UTC-5, laredotornado wrote:
> Hi,
> 
> I'm using Java 6.  I want to split a Java string on a regular
> expression, but I would like to keep part of the string used to split
> in the results.  What I have are Strings like
> 
>     Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM
> 
> What I would like to do is split the expression wherever I have an
> expression matching /(am|pm),?/i .  Hopefully I got that right.  In
> the above example, I would like the results to be
> 
>     Fri 7:30 PM
>     Sat 2 PM
>     Sun 2:30 PM
> 
> But with String.split, the split token is not kept within the
> results.  How would I write a Java parsing expression to do what I
> want?
> 
> Thanks, - Dave

Hi, I don't want to split on the comma because there could be a case where the given String is "Fri 8 PM, Sat 1, 3, and 5 PM" and in this case, I want the result to be a String array containing

Fri 8 PM
Sat 1, 3, and 5 PM

Your continued help is appreciated, - Dave

[toc] | [prev] | [next] | [standalone]

#13202

From	markspace <-@.>
Date	2012-03-26 15:02 -0700
Message-ID	<jkqp1b$2qu$1@dont-email.me>
In reply to	#13200

On 3/26/2012 2:21 PM, laredotornado@gmail.com wrote:

> Hi, I don't want to split on the comma because there could be a case
> where the given String is "Fri 8 PM, Sat 1, 3, and 5 PM" and in this
> case, I want the result to be a String array containing
>
> Fri 8 PM Sat 1, 3, and 5 PM

You might be able to do this with clever use of regex look-around:

http://www.regular-expressions.info/lookaround.html

Maybe something like "(?<=M),".  Definitely take some time to test that 
carefully though.

Otherwise, you'll have to write your own parser (which wouldn't be hard).

[toc] | [prev] | [next] | [standalone]

#13203

From	Knute Johnson <nospam@knutejohnson.com>
Date	2012-03-26 15:56 -0700
Message-ID	<jkqs7e$jek$1@dont-email.me>
In reply to	#13200

On 3/26/2012 2:21 PM, laredotornado@gmail.com wrote:
> On Monday, March 26, 2012 1:54:40 PM UTC-5, laredotornado wrote:
>> Hi,
>>
>> I'm using Java 6.  I want to split a Java string on a regular
>> expression, but I would like to keep part of the string used to split
>> in the results.  What I have are Strings like
>>
>>      Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM
>>
>> What I would like to do is split the expression wherever I have an
>> expression matching /(am|pm),?/i .  Hopefully I got that right.  In
>> the above example, I would like the results to be
>>
>>      Fri 7:30 PM
>>      Sat 2 PM
>>      Sun 2:30 PM
>>
>> But with String.split, the split token is not kept within the
>> results.  How would I write a Java parsing expression to do what I
>> want?
>>
>> Thanks, - Dave
>
> Hi, I don't want to split on the comma because there could be a case where the given String is "Fri 8 PM, Sat 1, 3, and 5 PM" and in this case, I want the result to be a String array containing
>
> Fri 8 PM
> Sat 1, 3, and 5 PM
>
> Your continued help is appreciated, - Dave

public class test {
     public static void main(String[] args) {
         String str = "Fri 7:30 PM, Fri 8 PM, Sat 1, 3, and 5 PM";
         String token = "PM, |PM";

         String[] strs = str.split(token);
         for (String s : strs)
             System.out.println(s+"PM");

     }
}

C:\Documents and Settings\Knute Johnson>java test
Fri 7:30 PM
Fri 8 PM
Sat 1, 3, and 5 PM

If you wanted to get AMs too, you could do a first pass for the PMs and 
then do it again for the AMs.

-- 

Knute Johnson

[toc] | [prev] | [next] | [standalone]

#13204

From	markspace <-@.>
Date	2012-03-26 16:02 -0700
Message-ID	<jkqsi1$m3l$1@dont-email.me>
In reply to	#13203

On 3/26/2012 3:56 PM, Knute Johnson wrote:

> String str = "Fri 7:30 PM, Fri 8 PM, Sat 1, 3, and 5 PM";
...
> System.out.println(s+"PM");
                         ^^

What does this print if the "str" string ends with AM instead of PM?  I 
don't think this actually works....

[toc] | [prev] | [next] | [standalone]

#13208

From	Knute Johnson <nospam@knutejohnson.com>
Date	2012-03-26 17:33 -0700
Message-ID	<jkr1tf$iql$1@dont-email.me>
In reply to	#13204

On 3/26/2012 4:02 PM, markspace wrote:
> On 3/26/2012 3:56 PM, Knute Johnson wrote:
>
>> String str = "Fri 7:30 PM, Fri 8 PM, Sat 1, 3, and 5 PM";
> ...
>> System.out.println(s+"PM");
> ^^
>
> What does this print if the "str" string ends with AM instead of PM? I
> don't think this actually works....
>

It won't.  He'll have to make a two-pass system if he's going to split 
on two different tokens.  I think I said that.

-- 

Knute Johnson

[toc] | [prev] | [next] | [standalone]

#13211

From	Martin Gregorie <martin@address-in-sig.invalid>
Date	2012-03-27 01:17 +0000
Message-ID	<jkr4f6$sf1$1@localhost.localdomain>
In reply to	#13208

On Mon, 26 Mar 2012 17:33:51 -0700, Knute Johnson wrote:

> On 3/26/2012 4:02 PM, markspace wrote:
>> On 3/26/2012 3:56 PM, Knute Johnson wrote:
>>
>>> String str = "Fri 7:30 PM, Fri 8 PM, Sat 1, 3, and 5 PM";
>> ...
>>> System.out.println(s+"PM");
>> ^^
>>
>> What does this print if the "str" string ends with AM instead of PM? I
>> don't think this actually works....
>>
>>
> It won't.  He'll have to make a two-pass system if he's going to split
> on two different tokens.  I think I said that

Then you'd something like the following, semi-pseudo-coded as:

   slist = in.split("PM, +|PM")
   for (int i=0; i<slist.length; i++) 
      slist[i] = slist[i].trim() + "PM";

   ArrayList<String> alist = new ArrayList<String>;;
   for (s : slist)
      sp = s.split("AM, +|AM");
      for (int j=0; j < s.length; j++) 
         alist.add(s.trim() + "AM");

   ...but its ugly. I think it can be done in one pass using a regex with
   capture groups along the lines of

     "(.*)([AP]M ,|[AP]M)"

   If I got that right, each time expression that the OP needs to split
   out is represented by a pair of adjacent capture groups, so just a
   single pass along the array of capture groups concatenating adjacent
   pairs and applying trim() to each concatenated pair should do the
   trick.

   Its rather late here, so I'll leave this as an exercise for anybody
   who feels keen. If nobody has touched it by mid morning tomorrow I may
   see if it works.

-- 
martin@   | Martin Gregorie
gregorie. | Essex, UK
org       |

[toc] | [prev] | [next] | [standalone]

#13239

From	Martin Gregorie <martin@address-in-sig.invalid>
Date	2012-03-27 21:57 +0000
Message-ID	<jktd4e$kef$1@localhost.localdomain>
In reply to	#13211

On Tue, 27 Mar 2012 01:17:26 +0000, Martin Gregorie wrote:

>    Its rather late here, so I'll leave this as an exercise for anybody
>    who feels keen. If nobody has touched it by mid morning tomorrow I
>    may see if it works.
>
I put together the following this morning. Hopefully its enough of an SSCE 
to pass muster. 

As promised, I first implemented a two-pass splitter (the 'classico' 
method): its ugly all right, even though it does the trick.

Then I swiped Stefan's code (the 'patternista' method), tewaked it 
slightly and used it to drive both his and my regexes. The only other 
changed it needs is to parameterise Matcher.group() because Stefan's regex 
treats the whole pattern as a capture group while mine only uses the 
first capture group in the pattern which lets it discard the comma 
separators. This was one of my design aims: to output the exact same 
strings as the classico() method does.

==========================================================================
import java.util.ArrayList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Splitter
{
   public static ArrayList<String> classico(String in)
   {
      String[] sList = in.split("PM, +|PM");
      for (int i=0; i<sList.length; i++)
         sList[i] = sList[i].trim() + " PM";

      ArrayList<String> aList = new ArrayList<String>();
      for (String s : sList)
      {
         String sp[] = s.split("AM, +|AM");
         for (int j=0; j < sp.length - 1; j++)
            aList.add(sp[j].trim() + " AM");

         aList.add(sp[sp.length - 1]);  // The last element is 
                                        // always ended wth PM
      }

      return aList;
   }

   public static ArrayList<String> patternista(String p, int g, String in)
   {
      Pattern pattern = Pattern.compile(p, Pattern.CASE_INSENSITIVE);
      Matcher matcher = pattern.matcher(in);
      ArrayList<String> aList = new ArrayList<String>();
      while(matcher.find())
      {
         String s = matcher.group(g);
         aList.add(s.trim());
      }

      return aList;
   }

   public static void showResult(String source,
                                 String method,
                                 ArrayList<String> s)
   {
      System.out.println(String.format("\n'%s' ==> '%s'", 
                                       source, 
                                       method));
      for (int i = 0; i < s.size(); i++)
         System.out.println(String.format("%2d: %s", i, s.get(i)));
   }

   public static void main(String[] args)
   {
      String SOURCE = "Fri 7:30 PM, Sat 1, 3 and 5 AM, Sun 2:30 PM";
      String martin = "(.*?[AP]M),?";
      String stefan = ".*?(?:am|pm),?";
      
      ArrayList<String> s;
      s = classico(SOURCE);
      showResult(SOURCE, "classico", s);
      s = patternista(martin, 1, SOURCE);
      showResult(SOURCE, martin, s);
      s = patternista(stefan, 0, SOURCE);
      showResult(SOURCE, stefan, s);
   }
}
==========================================================================
'Fri 7:30 PM, Sat 1, 3 and 5 AM, Sun 2:30 PM' ==> 'classico'
 0: Fri 7:30 PM
 1: Sat 1, 3 and 5 AM
 2: Sun 2:30 PM

'Fri 7:30 PM, Sat 1, 3 and 5 AM, Sun 2:30 PM' ==> '(.*?[AP]M),?'
 0: Fri 7:30 PM
 1: Sat 1, 3 and 5 AM
 2: Sun 2:30 PM

'Fri 7:30 PM, Sat 1, 3 and 5 AM, Sun 2:30 PM' ==> '.*?(?:am|pm),?'
 0: Fri 7:30 PM,
 1: Sat 1, 3 and 5 AM,
 2: Sun 2:30 PM
==========================================================================

As you can see, once I'd swapped greedy matches for non-greedy in my regex 
(the second test run), both regexes do job and to my mind use much more 
elegant code than the two pass classico approach, but of course ymmv.  


-- 
martin@   | Martin Gregorie
gregorie. | Essex, UK
org       |

[toc] | [prev] | [next] | [standalone]

#13213

From	Gene Wirchenko <genew@ocis.net>
Date	2012-03-26 18:26 -0700
Message-ID	<sf52n798nn2g3a5lg5te3vj7b4403iut65@4ax.com>
In reply to	#13200

On Mon, 26 Mar 2012 14:21:07 -0700 (PDT), laredotornado@gmail.com
wrote:

>On Monday, March 26, 2012 1:54:40 PM UTC-5, laredotornado wrote:
>> Hi,
>> 
>> I'm using Java 6.  I want to split a Java string on a regular
>> expression, but I would like to keep part of the string used to split
>> in the results.  What I have are Strings like
>> 
>>     Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM
>> 
>> What I would like to do is split the expression wherever I have an
>> expression matching /(am|pm),?/i .  Hopefully I got that right.  In
>> the above example, I would like the results to be
>> 
>>     Fri 7:30 PM
>>     Sat 2 PM
>>     Sun 2:30 PM
>> 
>> But with String.split, the split token is not kept within the
>> results.  How would I write a Java parsing expression to do what I
>> want?
>> 
>> Thanks, - Dave
>
>Hi, I don't want to split on the comma because there could be a case where the given String is "Fri 8 PM, Sat 1, 3, and 5 PM" and in this case, I want the result to be a String array containing
>
>Fri 8 PM
>Sat 1, 3, and 5 PM
>
>Your continued help is appreciated, - Dave

     What about "Sun 9, 11 AM, and 1 PM"?  Or "Sun 9 and 11 AM, and 1
and 3 PM"?

     I think you had better be quite sure of all of the variants.  For
that matter, people often omit the comma before "and" which would give
"Sun 9, 11 AM and 1 PM" for my first example.  Such people have
probably not seen
          http://www.outsidethebeltway.com/oxford-comma-cartoon/
or other such references.

Sincerely,

Gene Wirchenko

[toc] | [prev] | [next] | [standalone]

#13217

From	Lew <lewbloch@gmail.com>
Date	2012-03-26 19:07 -0700
Message-ID	<17975015.387.1332814029736.JavaMail.geo-discussion-forums@pbtd1>
In reply to	#13213

Gene Wirchenko wrote:
>     What about "Sun 9, 11 AM, and 1 PM"?  
> Or "Sun 9 and 11 AM, and 1 and 3 PM"?
> 
>      I think you had better be quite sure of all of the variants.  For
> that matter, people often omit the comma before "and" which would give
> "Sun 9, 11 AM and 1 PM" for my first example.  Such people have
> probably not seen
>           http://www.outsidethebeltway.com/oxford-comma-cartoon/
> or other such references.

The point is that you need a precise, perhaps formal statement of the exact rules to parse the input, and what to do when the input format fails quality checks.

Parsing is a Dark Art in programming - not really the hardest of them, but worthy of close attention.

It does require a careful, methodical approach.

-- 
Lew

[toc] | [prev] | [next] | [standalone]

#13219

From	Knute Johnson <nospam@knutejohnson.com>
Date	2012-03-26 20:40 -0700
Message-ID	<jkrcr8$4pg$1@dont-email.me>
In reply to	#13217

On 3/26/2012 7:07 PM, Lew wrote:
> Gene Wirchenko wrote:
>>      What about "Sun 9, 11 AM, and 1 PM"?
>> Or "Sun 9 and 11 AM, and 1 and 3 PM"?
>>
>>       I think you had better be quite sure of all of the variants.  For
>> that matter, people often omit the comma before "and" which would give
>> "Sun 9, 11 AM and 1 PM" for my first example.  Such people have
>> probably not seen
>>            http://www.outsidethebeltway.com/oxford-comma-cartoon/
>> or other such references.
>
> The point is that you need a precise, perhaps formal statement of the exact rules to parse the input, and what to do when the input format fails quality checks.
>
> Parsing is a Dark Art in programming - not really the hardest of them, but worthy of close attention.
>
> It does require a careful, methodical approach.
>

You've been awfully poetic lately Lew.

-- 

Knute Johnson

[toc] | [prev] | [next] | [standalone]

Page 1 of 3 [1] 2 3 Next page →

csiph-web

Keeping the split token in a Java regular expression

Contents

#13190 — Keeping the split token in a Java regular expression

#13193

#13196

#13214

#13233

#13234

#13236

#13253

#13256

#13199

#13200

#13202

#13203

#13204

#13208

#13211

#13239

#13213

#13217

#13219