Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #13190 > unrolled thread

Keeping the split token in a Java regular expression

Started bylaredotornado <laredotornado@zipmail.com>
First post2012-03-26 11:54 -0700
Last post2012-03-28 07:51 +0200
Articles 20 on this page of 50 — 13 participants

Back to article view | Back to comp.lang.java.programmer


Contents

  Keeping the split token in a Java regular expression laredotornado <laredotornado@zipmail.com> - 2012-03-26 11:54 -0700
    Re: Keeping the split token in a Java regular expression Lew <lewbloch@gmail.com> - 2012-03-26 12:22 -0700
      Re: Keeping the split token in a Java regular expression Robert Klemme <shortcutter@googlemail.com> - 2012-03-26 22:01 +0200
        Re: Keeping the split token in a Java regular expression Arne Vajhøj <arne@vajhoej.dk> - 2012-03-26 21:46 -0400
          Re: Keeping the split token in a Java regular expression Robert Klemme <shortcutter@googlemail.com> - 2012-03-27 23:01 +0200
            Re: Keeping the split token in a Java regular expression Arne Vajhøj <arne@vajhoej.dk> - 2012-03-27 17:18 -0400
            Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-27 14:21 -0700
              Re: Keeping the split token in a Java regular expression Robert Klemme <shortcutter@googlemail.com> - 2012-03-28 07:38 +0200
                Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-28 10:24 -0700
    Re: Keeping the split token in a Java regular expression markspace <-@.> - 2012-03-26 13:49 -0700
    Re: Keeping the split token in a Java regular expression laredotornado@gmail.com - 2012-03-26 14:21 -0700
      Re: Keeping the split token in a Java regular expression markspace <-@.> - 2012-03-26 15:02 -0700
      Re: Keeping the split token in a Java regular expression Knute Johnson <nospam@knutejohnson.com> - 2012-03-26 15:56 -0700
        Re: Keeping the split token in a Java regular expression markspace <-@.> - 2012-03-26 16:02 -0700
          Re: Keeping the split token in a Java regular expression Knute Johnson <nospam@knutejohnson.com> - 2012-03-26 17:33 -0700
            Re: Keeping the split token in a Java regular expression Martin Gregorie <martin@address-in-sig.invalid> - 2012-03-27 01:17 +0000
              Re: Keeping the split token in a Java regular expression Martin Gregorie <martin@address-in-sig.invalid> - 2012-03-27 21:57 +0000
      Re: Keeping the split token in a Java regular expression Gene Wirchenko <genew@ocis.net> - 2012-03-26 18:26 -0700
        Re: Keeping the split token in a Java regular expression Lew <lewbloch@gmail.com> - 2012-03-26 19:07 -0700
          Re: Keeping the split token in a Java regular expression Knute Johnson <nospam@knutejohnson.com> - 2012-03-26 20:40 -0700
            Re: Keeping the split token in a Java regular expression Gene Wirchenko <genew@ocis.net> - 2012-03-27 09:10 -0700
              Re: Keeping the split token in a Java regular expression Lew <lewbloch@gmail.com> - 2012-03-27 11:09 -0700
                Re: Keeping the split token in a Java regular expression Gene Wirchenko <genew@ocis.net> - 2012-03-27 13:32 -0700
                  Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-27 14:29 -0700
                    Re: Keeping the split token in a Java regular expression Gene Wirchenko <genew@ocis.net> - 2012-03-27 16:22 -0700
                      Re: Keeping the split token in a Java regular expression Gene Wirchenko <genew@ocis.net> - 2012-03-27 18:20 -0700
                        Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-27 18:27 -0700
                          Re: Keeping the split token in a Java regular expression Gene Wirchenko <genew@ocis.net> - 2012-03-27 21:31 -0700
                            Re: Keeping the split token in a Java regular expression Robert Klemme <shortcutter@googlemail.com> - 2012-03-28 07:41 +0200
                              Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-28 10:28 -0700
    Re: Keeping the split token in a Java regular expression Lew <lewbloch@gmail.com> - 2012-03-26 16:26 -0700
      Re: Keeping the split token in a Java regular expression Knute Johnson <nospam@knutejohnson.com> - 2012-03-26 17:36 -0700
      Re: Keeping the split token in a Java regular expression Robert Klemme <shortcutter@googlemail.com> - 2012-03-27 23:27 +0200
        Re: Keeping the split token in a Java regular expression Robert Klemme <shortcutter@googlemail.com> - 2012-03-28 07:28 +0200
    Re: Keeping the split token in a Java regular expression "John B. Matthews" <nospam@nospam.invalid> - 2012-03-26 20:49 -0400
    Re: Keeping the split token in a Java regular expression Arne Vajhøj <arne@vajhoej.dk> - 2012-03-26 21:58 -0400
      Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-26 21:14 -0700
        Re: Keeping the split token in a Java regular expression Arne Vajhøj <arne@vajhoej.dk> - 2012-03-27 17:21 -0400
          Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-27 15:20 -0700
            Re: Keeping the split token in a Java regular expression Arne Vajhøj <arne@vajhoej.dk> - 2012-03-27 18:48 -0400
              Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-27 17:07 -0700
            Re: Keeping the split token in a Java regular expression Arved Sandstrom <asandstrom3minus1@eastlink.ca> - 2012-03-27 21:49 -0300
              Re: Keeping the split token in a Java regular expression Arne Vajhøj <arne@vajhoej.dk> - 2012-03-27 20:56 -0400
                Re: Keeping the split token in a Java regular expression Arved Sandstrom <asandstrom3minus1@eastlink.ca> - 2012-03-27 22:01 -0300
                  Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-27 18:27 -0700
    Re: Keeping the split token in a Java regular expression Jim Janney <jjanney@shell.xmission.com> - 2012-03-27 08:15 -0600
      Re: Keeping the split token in a Java regular expression laredotornado <laredotornado@zipmail.com> - 2012-03-27 07:58 -0700
        Re: Keeping the split token in a Java regular expression Jim Janney <jjanney@shell.xmission.com> - 2012-03-27 09:21 -0600
          Re: Keeping the split token in a Java regular expression Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-03-27 09:43 -0700
            Re: Keeping the split token in a Java regular expression Robert Klemme <shortcutter@googlemail.com> - 2012-03-28 07:51 +0200

Page 2 of 3 — ← Prev page 1 [2] 3  Next page →


#13228

FromGene Wirchenko <genew@ocis.net>
Date2012-03-27 09:10 -0700
Message-ID<pfp3n79hbjpqt1d7h3cbtbf8gggms38ud8@4ax.com>
In reply to#13219
On Mon, 26 Mar 2012 20:40:24 -0700, Knute Johnson
<nospam@knutejohnson.com> wrote:

>On 3/26/2012 7:07 PM, Lew wrote:
>> Gene Wirchenko wrote:
>>>      What about "Sun 9, 11 AM, and 1 PM"?
>>> Or "Sun 9 and 11 AM, and 1 and 3 PM"?
>>>
>>>       I think you had better be quite sure of all of the variants.  For
>>> that matter, people often omit the comma before "and" which would give
>>> "Sun 9, 11 AM and 1 PM" for my first example.  Such people have
>>> probably not seen
>>>            http://www.outsidethebeltway.com/oxford-comma-cartoon/
>>> or other such references.
>>
>> The point is that you need a precise, perhaps formal statement of the
 exact rules to parse the input, and what to do when the input format
fails quality checks.
>>
>> Parsing is a Dark Art in programming - not really the hardest of them,
 but worthy of close attention.
>>
>> It does require a careful, methodical approach.

>You've been awfully poetic lately Lew.

     I prefer the "new" Lew.  He has dropped the antagonism that I
often saw, and it has made his posts much more readable and useful.

Sincerely,

Gene Wirchenko

[toc] | [prev] | [next] | [standalone]


#13230

FromLew <lewbloch@gmail.com>
Date2012-03-27 11:09 -0700
Message-ID<16745393.406.1332871796181.JavaMail.geo-discussion-forums@pbij6>
In reply to#13228
Gene Wirchenko wrote:
>      I prefer the "new" Lew.  He has dropped the antagonism that I
> often saw, and it has made his posts much more readable and useful.

I give your preference all the consideration that it is due.

-- 
Lew

[toc] | [prev] | [next] | [standalone]


#13232

FromGene Wirchenko <genew@ocis.net>
Date2012-03-27 13:32 -0700
Message-ID<9j84n7htui6ahhd7fd6e1rudl1cnuatfjr@4ax.com>
In reply to#13230
On Tue, 27 Mar 2012 11:09:56 -0700 (PDT), Lew <lewbloch@gmail.com>
wrote:

>Gene Wirchenko wrote:
>>      I prefer the "new" Lew.  He has dropped the antagonism that I
>> often saw, and it has made his posts much more readable and useful.
>
>I give your preference all the consideration that it is due.

     As manners are a social lubricant and a fairly inexpensive one,
that would be quite a lot.  Thank you.  If you did not mean that,
consider meaning that.  You are quite knowledgeable, and without an
antagonistic curve, your posts are very good indeed.  This same
statement applies to many people posting on USENET.

     Call my preference the USENET Manners Project if you want.
Disagreeing is one thing; being disagreeable is quite another.
          http://xkcd.com/386/
is a good joke but a poor reality.

     I look forward to your next politely informative post, Lew.  Your
recent one clarifying a sentence of yours was very nice indeed.

Sincerely,

Gene Wirchenko

[toc] | [prev] | [next] | [standalone]


#13238

FromDaniel Pitts <newsgroup.nospam@virtualinfinity.net>
Date2012-03-27 14:29 -0700
Message-ID<2jqcr.27255$QC3.7246@newsfe16.iad>
In reply to#13232
On 3/27/12 1:32 PM, Gene Wirchenko wrote:
> On Tue, 27 Mar 2012 11:09:56 -0700 (PDT), Lew<lewbloch@gmail.com>
> wrote:
>
>> Gene Wirchenko wrote:
>>>       I prefer the "new" Lew.  He has dropped the antagonism that I
>>> often saw, and it has made his posts much more readable and useful.
>>
>> I give your preference all the consideration that it is due.
>
>       As manners are a social lubricant and a fairly inexpensive one,
> that would be quite a lot.  Thank you.  If you did not mean that,
> consider meaning that.  You are quite knowledgeable, and without an
> antagonistic curve, your posts are very good indeed.  This same
> statement applies to many people posting on USENET.
At the same time, it is ones personal loss to ignore something because 
of who said it or how it was said.  Part of the problem is the jadedness 
that some of the old-timers on this group have, due to certain 
trolls-who-shall-not-be-named. Lew is a very analytical and structured 
person, arguing facts logically, with references is more likely to 
persuade him than talking about feelings.  I'm very much the same way, 
though I have tried to include my understanding of psychology in my 
responses.

>       Call my preference the USENET Manners Project if you want.
> Disagreeing is one thing; being disagreeable is quite another.
>            http://xkcd.com/386/
> is a good joke but a poor reality.
>
>       I look forward to your next politely informative post, Lew.  Your
> recent one clarifying a sentence of yours was very nice indeed.

I just want to point out that while your intentions *may* be good, the 
tone of your message comes off just as smug as what you're attempting to 
decry. I'm not trying to stir up a flame war, but I'm hoping that you 
can see the other side of this as well.  Lew has been a long time 
contributor to the Java newsgroups, and I have never found any of this 
posts personally distasteful in any way. This is the internet, and some 
slight thickness of skin is expected.

So, please, stop baiting each other, and keep these messages on topic.

[toc] | [prev] | [next] | [standalone]


#13242

FromGene Wirchenko <genew@ocis.net>
Date2012-03-27 16:22 -0700
Message-ID<dpi4n7tdmt9pe7l9me3heubt1g8ii3v0nm@4ax.com>
In reply to#13238
On Tue, 27 Mar 2012 14:29:33 -0700, Daniel Pitts
<newsgroup.nospam@virtualinfinity.net> wrote:

[snip]

>At the same time, it is ones personal loss to ignore something because 
>of who said it or how it was said.  Part of the problem is the jadedness 

     One must balance the loss of missing something with the loss of
spending time trying to uncurve a response.

[snip]

>I just want to point out that while your intentions *may* be good, the 
>tone of your message comes off just as smug as what you're attempting to 
>decry. I'm not trying to stir up a flame war, but I'm hoping that you 
>can see the other side of this as well.  Lew has been a long time 
>contributor to the Java newsgroups, and I have never found any of this 
>posts personally distasteful in any way. This is the internet, and some 
>slight thickness of skin is expected.

     "slight".  And that does mean that being rude is good.

>So, please, stop baiting each other, and keep these messages on topic.

     I am not baiting him.  I like the polite Lew.  There is no reason
why people can not be polite on USENET.  They just have to decide to
do so.

Sincerely,

Gene Wirchenko

[toc] | [prev] | [next] | [standalone]


#13247

FromGene Wirchenko <genew@ocis.net>
Date2012-03-27 18:20 -0700
Message-ID<0qp4n7pohramm6lrbvhvc08k0v5cj5lg7e@4ax.com>
In reply to#13242
On Tue, 27 Mar 2012 16:22:29 -0700, Gene Wirchenko <genew@ocis.net>
wrote:

>On Tue, 27 Mar 2012 14:29:33 -0700, Daniel Pitts
><newsgroup.nospam@virtualinfinity.net> wrote:
>
>[snip]
>
>>At the same time, it is ones personal loss to ignore something because 
>>of who said it or how it was said.  Part of the problem is the jadedness 
>
>     One must balance the loss of missing something with the loss of
>spending time trying to uncurve a response.
>
>[snip]
>
>>I just want to point out that while your intentions *may* be good, the 
>>tone of your message comes off just as smug as what you're attempting to 
>>decry. I'm not trying to stir up a flame war, but I'm hoping that you 
>>can see the other side of this as well.  Lew has been a long time 
>>contributor to the Java newsgroups, and I have never found any of this 
>>posts personally distasteful in any way. This is the internet, and some 
>>slight thickness of skin is expected.
>
>     "slight".  And that does mean that being rude is good.
                              ^
     I missed a "not" here.

>>So, please, stop baiting each other, and keep these messages on topic.
>
>     I am not baiting him.  I like the polite Lew.  There is no reason
>why people can not be polite on USENET.  They just have to decide to
>do so.

Sincerely,

Gene Wirchenko

[toc] | [prev] | [next] | [standalone]


#13249

FromDaniel Pitts <newsgroup.nospam@virtualinfinity.net>
Date2012-03-27 18:27 -0700
Message-ID<yOtcr.6782$V94.4319@newsfe19.iad>
In reply to#13247
On 3/27/12 6:20 PM, Gene Wirchenko wrote:
> On Tue, 27 Mar 2012 16:22:29 -0700, Gene Wirchenko<genew@ocis.net>
> wrote:
>>      "slight".  And that does mean that being rude is good.
>                                ^
>       I missed a "not" here.
I had wondered ;-)

[toc] | [prev] | [next] | [standalone]


#13251

FromGene Wirchenko <genew@ocis.net>
Date2012-03-27 21:31 -0700
Message-ID<vv45n7dtnoq0k3u696eeed3sbrobig31nv@4ax.com>
In reply to#13249
On Tue, 27 Mar 2012 18:27:58 -0700, Daniel Pitts
<newsgroup.nospam@virtualinfinity.net> wrote:

>On 3/27/12 6:20 PM, Gene Wirchenko wrote:
>> On Tue, 27 Mar 2012 16:22:29 -0700, Gene Wirchenko<genew@ocis.net>
>> wrote:
>>>      "slight".  And that does mean that being rude is good.
>>                                ^
>>       I missed a "not" here.
>I had wondered ;-)

     I have noted over the years, that if there is one word that
people will miss in posts, it is "not".

Sincerely,

Gene Wirchenko

[toc] | [prev] | [next] | [standalone]


#13254

FromRobert Klemme <shortcutter@googlemail.com>
Date2012-03-28 07:41 +0200
Message-ID<9tfmk0F5ooU2@mid.individual.net>
In reply to#13251
On 03/28/2012 06:31 AM, Gene Wirchenko wrote:
> On Tue, 27 Mar 2012 18:27:58 -0700, Daniel Pitts
> <newsgroup.nospam@virtualinfinity.net>  wrote:
>
>> On 3/27/12 6:20 PM, Gene Wirchenko wrote:
>>> On Tue, 27 Mar 2012 16:22:29 -0700, Gene Wirchenko<genew@ocis.net>
>>> wrote:
>>>>       "slight".  And that does mean that being rude is good.
>>>                                 ^
>>>        I missed a "not" here.
>> I had wondered ;-)
>
>       I have noted over the years, that if there is one word that
> people will miss in posts, it is "not".

I don't remember the details but I once heard that people cannot 
remember "not" - seems to be a psychological thing or a "feature" of the 
mind.  You kind of focus on the main message and then you forget to 
store the negation as well.

Kind regards

	robert

[toc] | [prev] | [next] | [standalone]


#13257

FromDaniel Pitts <newsgroup.nospam@virtualinfinity.net>
Date2012-03-28 10:28 -0700
Message-ID<DSHcr.14713$532.10656@newsfe14.iad>
In reply to#13254
On 3/27/12 10:41 PM, Robert Klemme wrote:
> On 03/28/2012 06:31 AM, Gene Wirchenko wrote:
>> On Tue, 27 Mar 2012 18:27:58 -0700, Daniel Pitts
>> <newsgroup.nospam@virtualinfinity.net> wrote:
>>
>>> On 3/27/12 6:20 PM, Gene Wirchenko wrote:
>>>> On Tue, 27 Mar 2012 16:22:29 -0700, Gene Wirchenko<genew@ocis.net>
>>>> wrote:
>>>>> "slight". And that does mean that being rude is good.
>>>> ^
>>>> I missed a "not" here.
>>> I had wondered ;-)
>>
>> I have noted over the years, that if there is one word that
>> people will miss in posts, it is "not".
>
> I don't remember the details but I once heard that people cannot
> remember "not" - seems to be a psychological thing or a "feature" of the
> mind. You kind of focus on the main message and then you forget to store
> the negation as well.
I wonder if this is really a true phenomena, or even if it is frequent 
enough to contort your point to avoid negating the text of it.

If there is any chance that your point will be pulled out of context, 
(such as with dubious reporters), then you may want to choose your words 
in such a way that the "not" isn't elided.

However, on the day-to-day conversation, I think some concepts are so 
much easier to convey as what they are not, instead of what they are.

[toc] | [prev] | [next] | [standalone]


#13206

FromLew <lewbloch@gmail.com>
Date2012-03-26 16:26 -0700
Message-ID<21500379.296.1332804401740.JavaMail.geo-discussion-forums@pbbpk10>
In reply to#13190
Stefan Ram wrote:
> laredotornado writes:
>>What I would like to do is split the expression wherever I have an
> 
> public class Main
> {
>   public static void split
>   ( final java.lang.String text )
>   { java.util.regex.Pattern pattern = 
>     java.util.regex.Pattern.compile
>     ( ".*?(?:am|pm),?", java.util.regex.Pattern.CASE_INSENSITIVE );
>     java.util.regex.Matcher matcher = pattern.matcher( text );
>     while( matcher.find() )
>     java.lang.System.out.println( matcher.group( 0 )); }
> 
>   public static void main( final java.lang.String[] args )
>   { split( "Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM" ); }}

This excellent (except for layout) example deserves to be archived.

-- 
Lew

[toc] | [prev] | [next] | [standalone]


#13209

FromKnute Johnson <nospam@knutejohnson.com>
Date2012-03-26 17:36 -0700
Message-ID<jkr21q$iql$2@dont-email.me>
In reply to#13206
On 3/26/2012 4:26 PM, Lew wrote:
> Stefan Ram wrote:
>> laredotornado writes:
>>> What I would like to do is split the expression wherever I have an
>>
>> public class Main
>> {
>>    public static void split
>>    ( final java.lang.String text )
>>    { java.util.regex.Pattern pattern =
>>      java.util.regex.Pattern.compile
>>      ( ".*?(?:am|pm),?", java.util.regex.Pattern.CASE_INSENSITIVE );
>>      java.util.regex.Matcher matcher = pattern.matcher( text );
>>      while( matcher.find() )
>>      java.lang.System.out.println( matcher.group( 0 )); }
>>
>>    public static void main( final java.lang.String[] args )
>>    { split( "Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM" ); }}
>
> This excellent (except for layout) example deserves to be archived.
>

I like that too.  I tried it but I didn't get this.

-- 

Knute Johnson

[toc] | [prev] | [next] | [standalone]


#13237

FromRobert Klemme <shortcutter@googlemail.com>
Date2012-03-27 23:27 +0200
Message-ID<9tepmvFhjvU1@mid.individual.net>
In reply to#13206
On 03/27/2012 01:26 AM, Lew wrote:
> Stefan Ram wrote:
>> laredotornado writes:
>>> What I would like to do is split the expression wherever I have an
>>
>> public class Main
>> {
>>    public static void split
>>    ( final java.lang.String text )
>>    { java.util.regex.Pattern pattern =
>>      java.util.regex.Pattern.compile
>>      ( ".*?(?:am|pm),?", java.util.regex.Pattern.CASE_INSENSITIVE );
>>      java.util.regex.Matcher matcher = pattern.matcher( text );
>>      while( matcher.find() )
>>      java.lang.System.out.println( matcher.group( 0 )); }
>>
>>    public static void main( final java.lang.String[] args )
>>    { split( "Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM" ); }}
>
> This excellent (except for layout) example deserves to be archived.

What do you find excellent about this?  I find it has some deficiencies

  - the separator is included in the match (which goes against the 
requirements despite the thread subject)
  - spaces after a separator comma are included in the next token as 
leading text
  - the method really does more than splitting (namely printing), so the 
name does not reflect what's going on
  - the Pattern is compiled on _every_ invocation of the method
  - the method is unnecessary restricted, argument type CharSequence is 
sufficient

Test output for
"Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM"
"Fri 8 PM, Sat 1, 3, and 5 PM"

Fri 7:30 PM,
  Sat 2 PM,
  Sun 2:30 PM
---
Fri 8 PM,
  Sat 1, 3, and 5 PM
---

I would change that to

import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {
	private static final Pattern SPLIT_PATTERN = Pattern.compile(
			"(\\S.*?(?:am|pm))(?:,\\s*)?", Pattern.CASE_INSENSITIVE);

	public static void splitPrint(final CharSequence text) {
		for (final Matcher m = SPLIT_PATTERN.matcher(text); m.find();) {
			System.out.println(m.group(1));
		}
	}

	public static List<String> split(final CharSequence text) {
		final List<String> result = new ArrayList<String>();

		for (final Matcher m = SPLIT_PATTERN.matcher(text); m.find();) {
			result.add(m.group(1));
		}

		return result;
	}

	public static void main(final java.lang.String[] args) {
		splitPrint("Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM");
		System.out.println("---");
		splitPrint("Fri 8 PM, Sat 1, 3, and 5 PM");
		System.out.println("---");
	}
}

I might even sneak a "\\s*" in between "pm)" and "(?:," to even catch 
cases where there are spaces before the separator.

Kind regards

	robert

[toc] | [prev] | [next] | [standalone]


#13252

FromRobert Klemme <shortcutter@googlemail.com>
Date2012-03-28 07:28 +0200
Message-ID<9tflrdF259U1@mid.individual.net>
In reply to#13237
On 03/27/2012 11:27 PM, Robert Klemme wrote:
> On 03/27/2012 01:26 AM, Lew wrote:
>> Stefan Ram wrote:
>>> laredotornado writes:
>>>> What I would like to do is split the expression wherever I have an
>>>
>>> public class Main
...
>>
>> This excellent (except for layout) example deserves to be archived.
>
> What do you find excellent about this? I find it has some deficiencies
>
> - the separator is included in the match (which goes against the
> requirements despite the thread subject)
> - spaces after a separator comma are included in the next token as
> leading text
> - the method really does more than splitting (namely printing), so the
> name does not reflect what's going on
> - the Pattern is compiled on _every_ invocation of the method
> - the method is unnecessary restricted, argument type CharSequence is
> sufficient
>
> Test output for
> "Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM"
> "Fri 8 PM, Sat 1, 3, and 5 PM"
>
> Fri 7:30 PM,
> Sat 2 PM,
> Sun 2:30 PM
> ---
> Fri 8 PM,
> Sat 1, 3, and 5 PM
> ---
>
> I would change that to

import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {
     private static final Pattern SPLIT_PATTERN = Pattern.compile(
             "(\\S.*?[ap]m)(?:,\\s*)?", Pattern.CASE_INSENSITIVE);

     public static void splitPrint(final CharSequence text) {
         for (final Matcher m = SPLIT_PATTERN.matcher(text); m.find();) {
             System.out.println(m.group(1));
         }
     }

     public static List<String> split(final CharSequence text) {
         final List<String> result = new ArrayList<String>();

         for (final Matcher m = SPLIT_PATTERN.matcher(text); m.find();) {
             result.add(m.group(1));
         }

         return result;
     }

     public static void main(final java.lang.String[] args) {
         splitPrint("Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM");
         System.out.println("---");
         splitPrint("Fri 8 PM, Sat 1, 3, and 5 PM");
         System.out.println("---");
     }
}

I had overlooked a fairly obvious improvement with regards to am/pm parsing.

> I might even sneak a "\\s*" in between "pm)" and "(?:," to even catch
> cases where there are spaces before the separator.

Kind regards

	robert

[toc] | [prev] | [next] | [standalone]


#13210

From"John B. Matthews" <nospam@nospam.invalid>
Date2012-03-26 20:49 -0400
Message-ID<nospam-884B21.20492426032012@news.aioe.org>
In reply to#13190
In article 
<48d35bc3-a391-4ccf-a222-dac64775a2f2@oq7g2000pbb.googlegroups.com>,
 laredotornado <laredotornado@zipmail.com> wrote:

> I'm using Java 6.  I want to split a Java string on a regular 
> expression, but I would like to keep part of the string used to split 
> in the results.  What I have are Strings like
> 
>     Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM
> 
> What I would like to do is split the expression wherever I have an 
> expression matching /(am|pm),?/i .  Hopefully I got that right.  In 
> the above example, I would like the results to be
> 
>     Fri 7:30 PM
>     Sat 2 PM
>     Sun 2:30 PM
> 
> But with String.split, the split token is not kept within the 
> results.  How would I write a Java parsing expression to do what I 
> want?

Instead of split, why not parse and format?

-- 
John B. Matthews
trashgod at gmail dot com
<http://sites.google.com/site/drjohnbmatthews>

[toc] | [prev] | [next] | [standalone]


#13215

FromArne Vajhøj <arne@vajhoej.dk>
Date2012-03-26 21:58 -0400
Message-ID<4f711ee1$0$294$14726298@news.sunsite.dk>
In reply to#13190
On 3/26/2012 2:54 PM, laredotornado wrote:
> I'm using Java 6.  I want to split a Java string on a regular
> expression, but I would like to keep part of the string used to split
> in the results.  What I have are Strings like
>
>      Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM
>
> What I would like to do is split the expression wherever I have an
> expression matching /(am|pm),?/i .  Hopefully I got that right.  In
> the above example, I would like the results to be
>
>      Fri 7:30 PM
>      Sat 2 PM
>      Sun 2:30 PM
>
> But with String.split, the split token is not kept within the
> results.  How would I write a Java parsing expression to do what I
> want?

A hackish solution:

String[] p = s.replaceAll("[AP]M", "$0X$0").split("X[AP]M");

Arne

[toc] | [prev] | [next] | [standalone]


#13220

FromDaniel Pitts <newsgroup.nospam@virtualinfinity.net>
Date2012-03-26 21:14 -0700
Message-ID<K8bcr.41913$%P4.6823@newsfe05.iad>
In reply to#13215
On 3/26/12 6:58 PM, Arne Vajhøj wrote:
> On 3/26/2012 2:54 PM, laredotornado wrote:
>> I'm using Java 6. I want to split a Java string on a regular
>> expression, but I would like to keep part of the string used to split
>> in the results. What I have are Strings like
>>
>> Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM
>>
>> What I would like to do is split the expression wherever I have an
>> expression matching /(am|pm),?/i . Hopefully I got that right. In
>> the above example, I would like the results to be
>>
>> Fri 7:30 PM
>> Sat 2 PM
>> Sun 2:30 PM
>>
>> But with String.split, the split token is not kept within the
>> results. How would I write a Java parsing expression to do what I
>> want?
>
> A hackish solution:
>
> String[] p = s.replaceAll("[AP]M", "$0X$0").split("X[AP]M");
>
> Arne
>
Nice. As far as hackish, using "split" for this purpose at all is 
hackish. Stefan Ram had the right algorithm (though strange formatting)

Stefan Ram wrote:
> public class Main
> {
>   public static void split
>   ( final java.lang.String text )
>   { java.util.regex.Pattern pattern =
>     java.util.regex.Pattern.compile
>     ( ".*?(?:am|pm),?", java.util.regex.Pattern.CASE_INSENSITIVE );
>     java.util.regex.Matcher matcher = pattern.matcher( text );
>     while( matcher.find() )
>     java.lang.System.out.println( matcher.group( 0 )); }
>
>   public static void main( final java.lang.String[] args )
>   { split( "Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM" ); }}
>

[toc] | [prev] | [next] | [standalone]


#13235

FromArne Vajhøj <arne@vajhoej.dk>
Date2012-03-27 17:21 -0400
Message-ID<4f722f41$0$290$14726298@news.sunsite.dk>
In reply to#13220
On 3/27/2012 12:14 AM, Daniel Pitts wrote:
> On 3/26/12 6:58 PM, Arne Vajhøj wrote:
>> On 3/26/2012 2:54 PM, laredotornado wrote:
>>> I'm using Java 6. I want to split a Java string on a regular
>>> expression, but I would like to keep part of the string used to split
>>> in the results. What I have are Strings like
>>>
>>> Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM
>>>
>>> What I would like to do is split the expression wherever I have an
>>> expression matching /(am|pm),?/i . Hopefully I got that right. In
>>> the above example, I would like the results to be
>>>
>>> Fri 7:30 PM
>>> Sat 2 PM
>>> Sun 2:30 PM
>>>
>>> But with String.split, the split token is not kept within the
>>> results. How would I write a Java parsing expression to do what I
>>> want?
>>
>> A hackish solution:
>>
>> String[] p = s.replaceAll("[AP]M", "$0X$0").split("X[AP]M");
 >
> Nice. As far as hackish, using "split" for this purpose at all is
> hackish.

That type of split is the typical way in most modern languages
(though usually in a non regex flavor).

Arne

[toc] | [prev] | [next] | [standalone]


#13240

FromDaniel Pitts <newsgroup.nospam@virtualinfinity.net>
Date2012-03-27 15:20 -0700
Message-ID<y2rcr.42984$%P4.35732@newsfe05.iad>
In reply to#13235
On 3/27/12 2:21 PM, Arne Vajhøj wrote:
> On 3/27/2012 12:14 AM, Daniel Pitts wrote:
>> On 3/26/12 6:58 PM, Arne Vajhøj wrote:
>>> On 3/26/2012 2:54 PM, laredotornado wrote:
>>>> I'm using Java 6. I want to split a Java string on a regular
>>>> expression, but I would like to keep part of the string used to split
>>>> in the results. What I have are Strings like
>>>>
>>>> Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM
>>>>
>>>> What I would like to do is split the expression wherever I have an
>>>> expression matching /(am|pm),?/i . Hopefully I got that right. In
>>>> the above example, I would like the results to be
>>>>
>>>> Fri 7:30 PM
>>>> Sat 2 PM
>>>> Sun 2:30 PM
>>>>
>>>> But with String.split, the split token is not kept within the
>>>> results. How would I write a Java parsing expression to do what I
>>>> want?
>>>
>>> A hackish solution:
>>>
>>> String[] p = s.replaceAll("[AP]M", "$0X$0").split("X[AP]M");
>  >
>> Nice. As far as hackish, using "split" for this purpose at all is
>> hackish.
>
> That type of split is the typical way in most modern languages
> (though usually in a non regex flavor).
For functional languages, yes, but those modern languages don't 
necessarily return an array.  Ideally they would return "iterable" of 
some sort.

And in any case, this particular problem is not a "split" kind of 
problem, but a "parse" kind of problem. So, split for this is hackish,

[toc] | [prev] | [next] | [standalone]


#13241

FromArne Vajhøj <arne@vajhoej.dk>
Date2012-03-27 18:48 -0400
Message-ID<4f7243cf$0$289$14726298@news.sunsite.dk>
In reply to#13240
On 3/27/2012 6:20 PM, Daniel Pitts wrote:
> On 3/27/12 2:21 PM, Arne Vajhøj wrote:
>> On 3/27/2012 12:14 AM, Daniel Pitts wrote:
>>> On 3/26/12 6:58 PM, Arne Vajhøj wrote:
>>>> On 3/26/2012 2:54 PM, laredotornado wrote:
>>>>> I'm using Java 6. I want to split a Java string on a regular
>>>>> expression, but I would like to keep part of the string used to split
>>>>> in the results. What I have are Strings like
>>>>>
>>>>> Fri 7:30 PM, Sat 2 PM, Sun 2:30 PM
>>>>>
>>>>> What I would like to do is split the expression wherever I have an
>>>>> expression matching /(am|pm),?/i . Hopefully I got that right. In
>>>>> the above example, I would like the results to be
>>>>>
>>>>> Fri 7:30 PM
>>>>> Sat 2 PM
>>>>> Sun 2:30 PM
>>>>>
>>>>> But with String.split, the split token is not kept within the
>>>>> results. How would I write a Java parsing expression to do what I
>>>>> want?
>>>>
>>>> A hackish solution:
>>>>
>>>> String[] p = s.replaceAll("[AP]M", "$0X$0").split("X[AP]M");
>> >
>>> Nice. As far as hackish, using "split" for this purpose at all is
>>> hackish.
>>
>> That type of split is the typical way in most modern languages
>> (though usually in a non regex flavor).
> For functional languages, yes, but those modern languages don't
> necessarily return an array. Ideally they would return "iterable" of
> some sort.

.NET String Split return string[] (non regex)
.NET Regex Split return string[] (regex)
PHP split return array (regex)
PHP explode return array (non regex)
PHP preg_split return array (regex)

> And in any case, this particular problem is not a "split" kind of
> problem, but a "parse" kind of problem. So, split for this is hackish,

I think it would be rather common in practice.

Arne

[toc] | [prev] | [next] | [standalone]


Page 2 of 3 — ← Prev page 1 [2] 3  Next page →

Back to top | Article view | comp.lang.java.programmer


csiph-web