Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #94692 > unrolled thread

Split on multiple delimiters, and also treat consecutive delimiters as a single delimiter?

Started byVictor Hooi <victorhooi@gmail.com>
First post2015-07-28 06:55 -0700
Last post2015-07-28 21:28 -0400
Articles 7 — 6 participants

Back to article view | Back to comp.lang.python


Contents

  Split on multiple delimiters, and also treat consecutive delimiters as a single delimiter? Victor Hooi <victorhooi@gmail.com> - 2015-07-28 06:55 -0700
    Re: Split on multiple delimiters, and also treat consecutive delimiters as a single delimiter? m <mvoicem@gmail.com> - 2015-07-28 15:59 +0200
      Re: Split on multiple delimiters, and also treat consecutive delimiters as a single delimiter? Victor Hooi <victorhooi@gmail.com> - 2015-07-28 07:09 -0700
        Re: Split on multiple delimiters, and also treat consecutive delimiters as a single delimiter? MRAB <python@mrabarnett.plus.com> - 2015-07-28 15:30 +0100
    Re: Split on multiple delimiters, and also treat consecutive delimiters as a single delimiter? Chris Angelico <rosuav@gmail.com> - 2015-07-29 10:08 +1000
      Re: Split on multiple delimiters, and also treat consecutive delimiters as a single delimiter? Rustom Mody <rustompmody@gmail.com> - 2015-07-28 19:41 -0700
    Re: Split on multiple delimiters, and also treat consecutive delimiters as a single delimiter? Joel Goldstick <joel.goldstick@gmail.com> - 2015-07-28 21:28 -0400

#94692 — Split on multiple delimiters, and also treat consecutive delimiters as a single delimiter?

FromVictor Hooi <victorhooi@gmail.com>
Date2015-07-28 06:55 -0700
SubjectSplit on multiple delimiters, and also treat consecutive delimiters as a single delimiter?
Message-ID<fed7bab5-db18-45c3-9ba2-4b7fbfa80602@googlegroups.com>
I have a line that looks like this:

    14     *0    330     *0     760   411|0       0   770g  1544g   117g   1414 computedshopcartdb:103.5%          0      30|0     0|1    19m    97m  1538 ComputedCartRS  PRI   09:40:26

I'd like to split this line on multiple separators - in this case, consecutive whitespace, as well as the pipe symbol (|).

If I run .split() on the line, it will split on consecutive whitespace:

In [17]: f.split()
Out[17]:
['14',
 '*0',
 '330',
 '*0',
 '760',
 '411|0',
 '0',
 '770g',
 '1544g',
 '117g',
 '1414',
 'computedshopcartdb:103.5%',
 '0',
 '30|0',
 '0|1',
 '19m',
 '97m',
 '1538',
 'ComputedCartRS',
 'PRI',
 '09:40:26']

If I try to run .split(' |'), however, I get:

f.split(' |')
Out[18]: ['    14     *0    330     *0     760   411|0       0   770g  1544g   117g   1414 computedshopcartdb:103.5%          0      30|0     0|1    19m    97m  1538 ComputedCartRS  PRI   09:40:26']

I know the regex library also has a split, unfortunately, that does not collapse consecutive whitespace:

In [19]: re.split(' |', f)
Out[19]:
['',
 '',
 '',
 '',
 '14',
 '',
 '',
 '',
 '',
 '*0',
 '',
 '',
 '',
 '330',
 '',
 '',
 '',
 '',
 '*0',
 '',
 '',
 '',
 '',
 '760',
 '',
 '',
 '411|0',
 '',
 '',
 '',
 '',
 '',
 '',
 '0',
 '',
 '',
 '770g',
 '',
 '1544g',
 '',
 '',
 '117g',
 '',
 '',
 '1414',
 'computedshopcartdb:103.5%',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '',
 '0',
 '',
 '',
 '',
 '',
 '',
 '30|0',
 '',
 '',
 '',
 '',
 '0|1',
 '',
 '',
 '',
 '19m',
 '',
 '',
 '',
 '97m',
 '',
 '1538',
 'ComputedCartRS',
 '',
 'PRI',
 '',
 '',
 '09:40:26']

Is there an easy way to split on multiple characters, and also treat consecutive delimiters as a single delimiter?

[toc] | [next] | [standalone]


#94693

Fromm <mvoicem@gmail.com>
Date2015-07-28 15:59 +0200
Message-ID<55b78aa3$0$2206$65785112@news.neostrada.pl>
In reply to#94692
W dniu 28.07.2015 o 15:55, Victor Hooi pisze:
> I know the regex library also has a split, unfortunately, that does not collapse consecutive whitespace:
> 
> In [19]: re.split(' |', f)

Try ' *\|'

p. m.

[toc] | [prev] | [next] | [standalone]


#94696

FromVictor Hooi <victorhooi@gmail.com>
Date2015-07-28 07:09 -0700
Message-ID<d0e1a44a-6619-4cb6-937a-ce962cc10094@googlegroups.com>
In reply to#94693
On Tuesday, 28 July 2015 23:59:11 UTC+10, m  wrote:
> W dniu 28.07.2015 o 15:55, Victor Hooi pisze:
> > I know the regex library also has a split, unfortunately, that does not collapse consecutive whitespace:
> > 
> > In [19]: re.split(' |', f)
> 
> Try ' *\|'
> 
> p. m.

Hmm, that seems to be getting closer (it returns a four-element list):

In [23]: re.split(' *\|', f)
Out[23]:
['    14     *0    330     *0     760   411',
 '0       0   770g  1544g   117g   1414 computedshopcartdb:103.5%          0      30',
 '0     0',
 '1    19m    97m  1538 ComputedCartRS  PRI   09:40:26']

[toc] | [prev] | [next] | [standalone]


#94697

FromMRAB <python@mrabarnett.plus.com>
Date2015-07-28 15:30 +0100
Message-ID<mailman.1049.1438093834.3674.python-list@python.org>
In reply to#94696
On 2015-07-28 15:09, Victor Hooi wrote:
> On Tuesday, 28 July 2015 23:59:11 UTC+10, m  wrote:
>> W dniu 28.07.2015 o 15:55, Victor Hooi pisze:
>> > I know the regex library also has a split, unfortunately, that does not collapse consecutive whitespace:
>> >
>> > In [19]: re.split(' |', f)
>>
>> Try ' *\|'
>>
>> p. m.
>
> Hmm, that seems to be getting closer (it returns a four-element list):
>
> In [23]: re.split(' *\|', f)
> Out[23]:
> ['    14     *0    330     *0     760   411',
>   '0       0   770g  1544g   117g   1414 computedshopcartdb:103.5%          0      30',
>   '0     0',
>   '1    19m    97m  1538 ComputedCartRS  PRI   09:40:26']
>
Try '[ |]+'.

[toc] | [prev] | [next] | [standalone]


#94713

FromChris Angelico <rosuav@gmail.com>
Date2015-07-29 10:08 +1000
Message-ID<mailman.1055.1438132534.3674.python-list@python.org>
In reply to#94692
On Tue, Jul 28, 2015 at 11:55 PM, Victor Hooi <victorhooi@gmail.com> wrote:
> I have a line that looks like this:
>
>     14     *0    330     *0     760   411|0       0   770g  1544g   117g   1414 computedshopcartdb:103.5%          0      30|0     0|1    19m    97m  1538 ComputedCartRS  PRI   09:40:26
>
> I'd like to split this line on multiple separators - in this case, consecutive whitespace, as well as the pipe symbol (|).

Correct me if I'm misanalyzing this, but it sounds to me like a simple
transform-then-split would do the job:

f.replace("|"," ").split()

Turn those pipe characters into spaces, then split on whitespace. Or,
reading it differently: Declare that pipe is another form of
whitespace, then split on whitespace. Python lets you declare anything
you like, same as mathematics does :)

ChrisA

[toc] | [prev] | [next] | [standalone]


#94716

FromRustom Mody <rustompmody@gmail.com>
Date2015-07-28 19:41 -0700
Message-ID<153c2ad1-a5c7-4ac0-86a4-0e62cd2a92cd@googlegroups.com>
In reply to#94713
On Wednesday, July 29, 2015 at 6:45:45 AM UTC+5:30, Chris Angelico wrote:
> On Tue, Jul 28, 2015 at 11:55 PM, Victor Hooi  wrote:
> > I have a line that looks like this:
> >
> >     14     *0    330     *0     760   411|0       0   770g  1544g   117g   1414 computedshopcartdb:103.5%          0      30|0     0|1    19m    97m  1538 ComputedCartRS  PRI   09:40:26
> >
> > I'd like to split this line on multiple separators - in this case, consecutive whitespace, as well as the pipe symbol (|).
> 
> Correct me if I'm misanalyzing this, but it sounds to me like a simple
> transform-then-split would do the job:
> 
> f.replace("|"," ").split()
> 
> Turn those pipe characters into spaces, then split on whitespace. Or,
> reading it differently: Declare that pipe is another form of
> whitespace, then split on whitespace. Python lets you declare anything
> you like, same as mathematics does :)

I dont see how anything can beat MRABs in declarativeness, neatness succinctness


s= "14     *0    330     *0     760   411|0       0   770g  1544g   117g   1414 computedshopcartdb:103.5%          0      30|0     0|1    19m    97m  1538 ComputedCartRS  PRI   09:40:26"

>>> split('[ |]+', s)
['14', '*0', '330', '*0', '760', '411', '0', '0', '770g', '1544g', '117g', '1414', 'computedshopcartdb:103.5%', '0', '30', '0', '0', '1', '19m', '97m', '1538', 'ComputedCartRS', 'PRI', '09:40:26']

And if you dont mind two steps here is another longer-looking but more straightforward (IMHO of course):
>>> [z for y in s.split() for z in y.split('|')]
['14', '*0', '330', '*0', '760', '411', '0', '0', '770g', '1544g', '117g', '1414', 'computedshopcartdb:103.5%', '0', '30', '0', '0', '1', '19m', '97m', '1538', 'ComputedCartRS', 'PRI', '09:40:26']

[toc] | [prev] | [next] | [standalone]


#94715

FromJoel Goldstick <joel.goldstick@gmail.com>
Date2015-07-28 21:28 -0400
Message-ID<mailman.1056.1438137388.3674.python-list@python.org>
In reply to#94692
+1 Chris

On Tue, Jul 28, 2015 at 8:08 PM, Chris Angelico <rosuav@gmail.com> wrote:
> On Tue, Jul 28, 2015 at 11:55 PM, Victor Hooi <victorhooi@gmail.com> wrote:
>> I have a line that looks like this:
>>
>>     14     *0    330     *0     760   411|0       0   770g  1544g   117g   1414 computedshopcartdb:103.5%          0      30|0     0|1    19m    97m  1538 ComputedCartRS  PRI   09:40:26
>>
>> I'd like to split this line on multiple separators - in this case, consecutive whitespace, as well as the pipe symbol (|).
>
> Correct me if I'm misanalyzing this, but it sounds to me like a simple
> transform-then-split would do the job:
>
> f.replace("|"," ").split()
>
> Turn those pipe characters into spaces, then split on whitespace. Or,
> reading it differently: Declare that pipe is another form of
> whitespace, then split on whitespace. Python lets you declare anything
> you like, same as mathematics does :)
>
> ChrisA
> --
> https://mail.python.org/mailman/listinfo/python-list



-- 
Joel Goldstick
http://joelgoldstick.com

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web