Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #34226 > unrolled thread
| Started by | Nick Mellor <thebalancepro@gmail.com> |
|---|---|
| First post | 2012-12-04 05:57 -0800 |
| Last post | 2012-12-06 13:29 -0800 |
| Articles | 20 on this page of 38 — 12 participants |
Back to article view | Back to comp.lang.python
Good use for itertools.dropwhile and itertools.takewhile Nick Mellor <thebalancepro@gmail.com> - 2012-12-04 05:57 -0800
Re: Good use for itertools.dropwhile and itertools.takewhile Neil Cerutti <neilc@norwich.edu> - 2012-12-04 14:23 +0000
Re: Good use for itertools.dropwhile and itertools.takewhile Nick Mellor <thebalancepro@gmail.com> - 2012-12-04 06:47 -0800
Re: Good use for itertools.dropwhile and itertools.takewhile Neil Cerutti <neilc@norwich.edu> - 2012-12-04 15:17 +0000
Re: Good use for itertools.dropwhile and itertools.takewhile Vlastimil Brom <vlastimil.brom@gmail.com> - 2012-12-04 15:31 +0100
Re: Good use for itertools.dropwhile and itertools.takewhile Nick Mellor <thebalancepro@gmail.com> - 2012-12-04 07:24 -0800
Re: Good use for itertools.dropwhile and itertools.takewhile Vlastimil Brom <vlastimil.brom@gmail.com> - 2012-12-04 22:08 +0100
Re: Good use for itertools.dropwhile and itertools.takewhile Nick Mellor <thebalancepro@gmail.com> - 2012-12-04 07:24 -0800
Re: Good use for itertools.dropwhile and itertools.takewhile Neil Cerutti <neilc@norwich.edu> - 2012-12-04 18:26 +0000
Re: Good use for itertools.dropwhile and itertools.takewhile Alexander Blinne <news@blinne.net> - 2012-12-04 18:18 +0100
Re: Good use for itertools.dropwhile and itertools.takewhile DJC <djc@news.invalid> - 2012-12-04 18:28 +0000
Re: Good use for itertools.dropwhile and itertools.takewhile Alexander Blinne <news@blinne.net> - 2012-12-04 19:48 +0100
Re: Good use for itertools.dropwhile and itertools.takewhile Ian Kelly <ian.g.kelly@gmail.com> - 2012-12-04 12:37 -0700
Re: Good use for itertools.dropwhile and itertools.takewhile Alexander Blinne <news@blinne.net> - 2012-12-04 21:33 +0100
Re: Good use for itertools.dropwhile and itertools.takewhile Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-12-04 21:13 +0000
Re: Good use for itertools.dropwhile and itertools.takewhile MRAB <python@mrabarnett.plus.com> - 2012-12-04 20:17 +0000
Re: Good use for itertools.dropwhile and itertools.takewhile Terry Reedy <tjreedy@udel.edu> - 2012-12-04 15:44 -0500
Re: Good use for itertools.dropwhile and itertools.takewhile Nick Mellor <thebalancepro@gmail.com> - 2012-12-04 17:17 -0800
Re: Good use for itertools.dropwhile and itertools.takewhile Chris Angelico <rosuav@gmail.com> - 2012-12-06 00:45 +1100
Re: Good use for itertools.dropwhile and itertools.takewhile Neil Cerutti <neilc@norwich.edu> - 2012-12-05 14:34 +0000
Re: Good use for itertools.dropwhile and itertools.takewhile Ian Kelly <ian.g.kelly@gmail.com> - 2012-12-05 08:33 -0700
Re: Good use for itertools.dropwhile and itertools.takewhile Neil Cerutti <neilc@norwich.edu> - 2012-12-05 16:11 +0000
Re: Good use for itertools.dropwhile and itertools.takewhile Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-12-05 15:32 +0000
Re: Good use for itertools.dropwhile and itertools.takewhile Ian Kelly <ian.g.kelly@gmail.com> - 2012-12-05 09:16 -0700
Re: Good use for itertools.dropwhile and itertools.takewhile MRAB <python@mrabarnett.plus.com> - 2012-12-05 17:57 +0000
Re: Good use for itertools.dropwhile and itertools.takewhile Nick Mellor <thebalancepro@gmail.com> - 2012-12-04 17:17 -0800
Re: Good use for itertools.dropwhile and itertools.takewhile Neil Cerutti <neilc@norwich.edu> - 2012-12-05 13:29 +0000
Re: Good use for itertools.dropwhile and itertools.takewhile Nick Mellor <thebalancepro@gmail.com> - 2012-12-05 09:04 -0800
Re: Good use for itertools.dropwhile and itertools.takewhile MRAB <python@mrabarnett.plus.com> - 2012-12-05 17:57 +0000
Re: Good use for itertools.dropwhile and itertools.takewhile Neil Cerutti <neilc@norwich.edu> - 2012-12-05 18:16 +0000
Re: Good use for itertools.dropwhile and itertools.takewhile Nick Mellor <thebalancepro@gmail.com> - 2012-12-05 11:01 -0800
Re: Good use for itertools.dropwhile and itertools.takewhile Neil Cerutti <neilc@norwich.edu> - 2012-12-05 20:13 +0000
Re: Good use for itertools.dropwhile and itertools.takewhile Vlastimil Brom <vlastimil.brom@gmail.com> - 2012-12-05 22:36 +0100
Re: Good use for itertools.dropwhile and itertools.takewhile Neil Cerutti <neilc@norwich.edu> - 2012-12-06 13:06 +0000
Re: Good use for itertools.dropwhile and itertools.takewhile Vlastimil Brom <vlastimil.brom@gmail.com> - 2012-12-06 15:12 +0100
Re: Good use for itertools.dropwhile and itertools.takewhile Alexander Blinne <news@blinne.net> - 2012-12-06 14:40 +0100
Re: Good use for itertools.dropwhile and itertools.takewhile Terry Reedy <tjreedy@udel.edu> - 2012-12-04 17:21 -0500
Re: Good use for itertools.dropwhile and itertools.takewhile Paul Rubin <no.email@nospam.invalid> - 2012-12-06 13:29 -0800
Page 1 of 2 [1] 2 Next page →
| From | Nick Mellor <thebalancepro@gmail.com> |
|---|---|
| Date | 2012-12-04 05:57 -0800 |
| Subject | Good use for itertools.dropwhile and itertools.takewhile |
| Message-ID | <b80f3ab3-ef81-4806-86db-efd5800d4bb3@googlegroups.com> |
Hi,
I came across itertools.dropwhile only today, then shortly afterwards found Raymond Hettinger wondering, in 2007, whether to drop [sic] dropwhile and takewhile from the itertools module.
Fate of itertools.dropwhile() and itertools.takewhile() - Python
bytes.com
http://bit.ly/Vi2PqP
Almost nobody else of the 18 respondents seemed to be using them.
And then 2 hours later, a use case came along. I think. Anyone have any better solutions?
I have a file full of things like this:
"CAPSICUM RED fresh from Queensland"
Product names (all caps, at start of string) and descriptions (mixed case, to end of string) all muddled up in the same field. And I need to split them into two fields. Note that if the text had said:
"CAPSICUM RED fresh from QLD"
I would want QLD in the description, not shunted forwards and put in the product name. So (uncontrived) list comprehensions and regex's are out.
I want to split the above into:
("CAPSICUM RED", "fresh from QLD")
Enter dropwhile and takewhile. 6 lines later:
from itertools import takewhile, dropwhile
def split_product_itertools(s):
words = s.split()
allcaps = lambda word: word == word.upper()
product, description = takewhile(allcaps, words), dropwhile(allcaps, words)
return " ".join(product), " ".join(description)
When I tried to refactor this code to use while or for loops, I couldn't find any way that felt shorter or more pythonic:
(9 lines: using for)
def split_product_1(s):
words = s.split()
product = []
for word in words:
if word == word.upper():
product.append(word)
else:
break
return " ".join(product), " ".join(words[len(product):])
(12 lines: using while)
def split_product_2(s):
words = s.split()
i = 0
product = []
while 1:
word = words[i]
if word == word.upper():
product.append(word)
i += 1
else:
break
return " ".join(product), " ".join(words[i:])
Any thoughts?
Nick
[toc] | [next] | [standalone]
| From | Neil Cerutti <neilc@norwich.edu> |
|---|---|
| Date | 2012-12-04 14:23 +0000 |
| Message-ID | <ai6fb6Fk3vkU7@mid.individual.net> |
| In reply to | #34226 |
On 2012-12-04, Nick Mellor <thebalancepro@gmail.com> wrote:
> I have a file full of things like this:
>
> "CAPSICUM RED fresh from Queensland"
>
> Product names (all caps, at start of string) and descriptions
> (mixed case, to end of string) all muddled up in the same
> field. And I need to split them into two fields. Note that if
> the text had said:
>
> "CAPSICUM RED fresh from QLD"
>
> I would want QLD in the description, not shunted forwards and
> put in the product name. So (uncontrived) list comprehensions
> and regex's are out.
>
> I want to split the above into:
>
> ("CAPSICUM RED", "fresh from QLD")
>
> Enter dropwhile and takewhile. 6 lines later:
>
> from itertools import takewhile, dropwhile
> def split_product_itertools(s):
> words = s.split()
> allcaps = lambda word: word == word.upper()
> product, description = takewhile(allcaps, words), dropwhile(allcaps, words)
> return " ".join(product), " ".join(description)
>
> When I tried to refactor this code to use while or for loops, I
> couldn't find any way that felt shorter or more pythonic:
I'm really tempted to import re, and that means takewhile and
dropwhile need to stay. ;)
But seriously, this is a quick implementation of my first thought.
description = s.lstrip(string.ascii_uppercase + ' ')
product = s[:-len(description)-1]
--
Neil Cerutti
[toc] | [prev] | [next] | [standalone]
| From | Nick Mellor <thebalancepro@gmail.com> |
|---|---|
| Date | 2012-12-04 06:47 -0800 |
| Message-ID | <2152911e-50a0-42aa-8956-5eb96803c7c1@googlegroups.com> |
| In reply to | #34228 |
Hi Neil,
Nice! But fails if the first word of the description starts with a capital letter.
Nick
On Wednesday, 5 December 2012 01:23:34 UTC+11, Neil Cerutti wrote:
> On 2012-12-04, Nick Mellor <thebalancepro@gmail.com> wrote:
>
> > I have a file full of things like this:
>
> >
>
> > "CAPSICUM RED fresh from Queensland"
>
> >
>
> > Product names (all caps, at start of string) and descriptions
>
> > (mixed case, to end of string) all muddled up in the same
>
> > field. And I need to split them into two fields. Note that if
>
> > the text had said:
>
> >
>
> > "CAPSICUM RED fresh from QLD"
>
> >
>
> > I would want QLD in the description, not shunted forwards and
>
> > put in the product name. So (uncontrived) list comprehensions
>
> > and regex's are out.
>
> >
>
> > I want to split the above into:
>
> >
>
> > ("CAPSICUM RED", "fresh from QLD")
>
> >
>
> > Enter dropwhile and takewhile. 6 lines later:
>
> >
>
> > from itertools import takewhile, dropwhile
>
> > def split_product_itertools(s):
>
> > words = s.split()
>
> > allcaps = lambda word: word == word.upper()
>
> > product, description = takewhile(allcaps, words), dropwhile(allcaps, words)
>
> > return " ".join(product), " ".join(description)
>
> >
>
> > When I tried to refactor this code to use while or for loops, I
>
> > couldn't find any way that felt shorter or more pythonic:
>
>
>
> I'm really tempted to import re, and that means takewhile and
>
> dropwhile need to stay. ;)
>
>
>
> But seriously, this is a quick implementation of my first thought.
>
>
>
> description = s.lstrip(string.ascii_uppercase + ' ')
>
> product = s[:-len(description)-1]
>
>
>
> --
>
> Neil Cerutti
[toc] | [prev] | [next] | [standalone]
| From | Neil Cerutti <neilc@norwich.edu> |
|---|---|
| Date | 2012-12-04 15:17 +0000 |
| Message-ID | <ai6ifcFlm5qU1@mid.individual.net> |
| In reply to | #34232 |
On 2012-12-04, Nick Mellor <thebalancepro@gmail.com> wrote: > Hi Neil, > > Nice! But fails if the first word of the description starts > with a capital letter. Darn edge cases. -- Neil Cerutti
[toc] | [prev] | [next] | [standalone]
| From | Vlastimil Brom <vlastimil.brom@gmail.com> |
|---|---|
| Date | 2012-12-04 15:31 +0100 |
| Message-ID | <mailman.461.1354631511.29569.python-list@python.org> |
| In reply to | #34226 |
2012/12/4 Nick Mellor <thebalancepro@gmail.com>:
> Hi,
>
> I came across itertools.dropwhile only today, then shortly afterwards found Raymond Hettinger wondering, in 2007, whether to drop [sic] dropwhile and takewhile from the itertools module.
>
> Fate of itertools.dropwhile() and itertools.takewhile() - Python
> bytes.com
> http://bit.ly/Vi2PqP
>
> Almost nobody else of the 18 respondents seemed to be using them.
>
> And then 2 hours later, a use case came along. I think. Anyone have any better solutions?
>
> I have a file full of things like this:
>
> "CAPSICUM RED fresh from Queensland"
>
> Product names (all caps, at start of string) and descriptions (mixed case, to end of string) all muddled up in the same field. And I need to split them into two fields. Note that if the text had said:
>
> "CAPSICUM RED fresh from QLD"
>
> I would want QLD in the description, not shunted forwards and put in the product name. So (uncontrived) list comprehensions and regex's are out.
>
> I want to split the above into:
>
> ("CAPSICUM RED", "fresh from QLD")
>
> Enter dropwhile and takewhile. 6 lines later:
>
> from itertools import takewhile, dropwhile
> def split_product_itertools(s):
> words = s.split()
> allcaps = lambda word: word == word.upper()
> product, description = takewhile(allcaps, words), dropwhile(allcaps, words)
> return " ".join(product), " ".join(description)
>
>
> When I tried to refactor this code to use while or for loops, I couldn't find any way that felt shorter or more pythonic:
>
> (9 lines: using for)
>
> def split_product_1(s):
> words = s.split()
> product = []
> for word in words:
> if word == word.upper():
> product.append(word)
> else:
> break
> return " ".join(product), " ".join(words[len(product):])
>
>
> (12 lines: using while)
>
> def split_product_2(s):
> words = s.split()
> i = 0
> product = []
> while 1:
> word = words[i]
> if word == word.upper():
> product.append(word)
> i += 1
> else:
> break
> return " ".join(product), " ".join(words[i:])
>
>
> Any thoughts?
>
> Nick
> --
> http://mail.python.org/mailman/listinfo/python-list
Hi,
the regex approach doesn't actually seem to be very complex, given the
mentioned specification, e.g.
>>> import re
>>> re.findall(r"(?m)^([A-Z\s]+) (.+)$", "CAPSICUM RED fresh from QLD\nCAPSICUM RED fresh from Queensland")
[('CAPSICUM RED', 'fresh from QLD'), ('CAPSICUM RED', 'fresh from Queensland')]
>>>
(It might be necessary to account for some punctuation, whitespace etc. too.)
hth,
vbr
[toc] | [prev] | [next] | [standalone]
| From | Nick Mellor <thebalancepro@gmail.com> |
|---|---|
| Date | 2012-12-04 07:24 -0800 |
| Message-ID | <d06616b9-20f8-4390-ac28-1ad0e49ee018@googlegroups.com> |
| In reply to | #34229 |
I love the way you guys can write a line of code that does the same as 20 of mine :)
I can turn up the heat on your regex by feeding it a null description or multiple white space (both in the original file.) I'm sure you'd adjust, but at the cost of a more complex regex.
Meanwhile takewith and dropwith are behaving themselves impeccably but my while loop has fallen over.
Best,
Nick
On Wednesday, 5 December 2012 01:31:48 UTC+11, Vlastimil Brom wrote:
> 2012/12/4 Nick Mellor <thebalancepro@gmail.com>:
>
> > Hi,
>
> >
>
> > I came across itertools.dropwhile only today, then shortly afterwards found Raymond Hettinger wondering, in 2007, whether to drop [sic] dropwhile and takewhile from the itertools module.
>
> >
>
> > Fate of itertools.dropwhile() and itertools.takewhile() - Python
>
> > bytes.com
>
> > http://bit.ly/Vi2PqP
>
> >
>
> > Almost nobody else of the 18 respondents seemed to be using them.
>
> >
>
> > And then 2 hours later, a use case came along. I think. Anyone have any better solutions?
>
> >
>
> > I have a file full of things like this:
>
> >
>
> > "CAPSICUM RED fresh from Queensland"
>
> >
>
> > Product names (all caps, at start of string) and descriptions (mixed case, to end of string) all muddled up in the same field. And I need to split them into two fields. Note that if the text had said:
>
> >
>
> > "CAPSICUM RED fresh from QLD"
>
> >
>
> > I would want QLD in the description, not shunted forwards and put in the product name. So (uncontrived) list comprehensions and regex's are out.
>
> >
>
> > I want to split the above into:
>
> >
>
> > ("CAPSICUM RED", "fresh from QLD")
>
> >
>
> > Enter dropwhile and takewhile. 6 lines later:
>
> >
>
> > from itertools import takewhile, dropwhile
>
> > def split_product_itertools(s):
>
> > words = s.split()
>
> > allcaps = lambda word: word == word.upper()
>
> > product, description = takewhile(allcaps, words), dropwhile(allcaps, words)
>
> > return " ".join(product), " ".join(description)
>
> >
>
> >
>
> > When I tried to refactor this code to use while or for loops, I couldn't find any way that felt shorter or more pythonic:
>
> >
>
> > (9 lines: using for)
>
> >
>
> > def split_product_1(s):
>
> > words = s.split()
>
> > product = []
>
> > for word in words:
>
> > if word == word.upper():
>
> > product.append(word)
>
> > else:
>
> > break
>
> > return " ".join(product), " ".join(words[len(product):])
>
> >
>
> >
>
> > (12 lines: using while)
>
> >
>
> > def split_product_2(s):
>
> > words = s.split()
>
> > i = 0
>
> > product = []
>
> > while 1:
>
> > word = words[i]
>
> > if word == word.upper():
>
> > product.append(word)
>
> > i += 1
>
> > else:
>
> > break
>
> > return " ".join(product), " ".join(words[i:])
>
> >
>
> >
>
> > Any thoughts?
>
> >
>
> > Nick
>
> > --
>
> > http://mail.python.org/mailman/listinfo/python-list
>
>
>
> Hi,
>
> the regex approach doesn't actually seem to be very complex, given the
>
> mentioned specification, e.g.
>
>
>
> >>> import re
>
> >>> re.findall(r"(?m)^([A-Z\s]+) (.+)$", "CAPSICUM RED fresh from QLD\nCAPSICUM RED fresh from Queensland")
>
> [('CAPSICUM RED', 'fresh from QLD'), ('CAPSICUM RED', 'fresh from Queensland')]
>
> >>>
>
>
>
> (It might be necessary to account for some punctuation, whitespace etc. too.)
>
>
>
> hth,
>
> vbr
[toc] | [prev] | [next] | [standalone]
| From | Vlastimil Brom <vlastimil.brom@gmail.com> |
|---|---|
| Date | 2012-12-04 22:08 +0100 |
| Message-ID | <mailman.477.1354655318.29569.python-list@python.org> |
| In reply to | #34235 |
2012/12/4 Nick Mellor <thebalancepro@gmail.com>:
> I love the way you guys can write a line of code that does the same as 20 of mine :)
> I can turn up the heat on your regex by feeding it a null description or multiple white space (both in the original file.) I'm sure you'd adjust, but at the cost of a more complex regex.
> Meanwhile takewith and dropwith are behaving themselves impeccably but my while loop has fallen over.
>
> Best,
> Nick
>> [...]
> --
Hi,
well, for what is it worth, both cases could be addressed quite
easily, with little added complexity - e.g.: make the description part
optional, allow multiple whitespace and enforce word boundary after
the product name in order to get rid of the trailing whitespace in it:
>>> re.findall(r"(?m)^([A-Z\s]+\b)(?:\s+(.*))?$", "CAPSICUM RED fresh from QLD\nCAPSICUM RED fresh from Queensland\nCAPSICUM RED")
[('CAPSICUM RED', 'fresh from QLD'), ('CAPSICUM RED', 'fresh from
Queensland'), ('CAPSICUM RED', '')]
>>>
However, it's certainly preferable to use a solution you are more
comfortable with, e.g. the itertools one...
regards,
vbr
[toc] | [prev] | [next] | [standalone]
| From | Nick Mellor <thebalancepro@gmail.com> |
|---|---|
| Date | 2012-12-04 07:24 -0800 |
| Message-ID | <mailman.465.1354635214.29569.python-list@python.org> |
| In reply to | #34229 |
I love the way you guys can write a line of code that does the same as 20 of mine :)
I can turn up the heat on your regex by feeding it a null description or multiple white space (both in the original file.) I'm sure you'd adjust, but at the cost of a more complex regex.
Meanwhile takewith and dropwith are behaving themselves impeccably but my while loop has fallen over.
Best,
Nick
On Wednesday, 5 December 2012 01:31:48 UTC+11, Vlastimil Brom wrote:
> 2012/12/4 Nick Mellor <thebalancepro@gmail.com>:
>
> > Hi,
>
> >
>
> > I came across itertools.dropwhile only today, then shortly afterwards found Raymond Hettinger wondering, in 2007, whether to drop [sic] dropwhile and takewhile from the itertools module.
>
> >
>
> > Fate of itertools.dropwhile() and itertools.takewhile() - Python
>
> > bytes.com
>
> > http://bit.ly/Vi2PqP
>
> >
>
> > Almost nobody else of the 18 respondents seemed to be using them.
>
> >
>
> > And then 2 hours later, a use case came along. I think. Anyone have any better solutions?
>
> >
>
> > I have a file full of things like this:
>
> >
>
> > "CAPSICUM RED fresh from Queensland"
>
> >
>
> > Product names (all caps, at start of string) and descriptions (mixed case, to end of string) all muddled up in the same field. And I need to split them into two fields. Note that if the text had said:
>
> >
>
> > "CAPSICUM RED fresh from QLD"
>
> >
>
> > I would want QLD in the description, not shunted forwards and put in the product name. So (uncontrived) list comprehensions and regex's are out.
>
> >
>
> > I want to split the above into:
>
> >
>
> > ("CAPSICUM RED", "fresh from QLD")
>
> >
>
> > Enter dropwhile and takewhile. 6 lines later:
>
> >
>
> > from itertools import takewhile, dropwhile
>
> > def split_product_itertools(s):
>
> > words = s.split()
>
> > allcaps = lambda word: word == word.upper()
>
> > product, description = takewhile(allcaps, words), dropwhile(allcaps, words)
>
> > return " ".join(product), " ".join(description)
>
> >
>
> >
>
> > When I tried to refactor this code to use while or for loops, I couldn't find any way that felt shorter or more pythonic:
>
> >
>
> > (9 lines: using for)
>
> >
>
> > def split_product_1(s):
>
> > words = s.split()
>
> > product = []
>
> > for word in words:
>
> > if word == word.upper():
>
> > product.append(word)
>
> > else:
>
> > break
>
> > return " ".join(product), " ".join(words[len(product):])
>
> >
>
> >
>
> > (12 lines: using while)
>
> >
>
> > def split_product_2(s):
>
> > words = s.split()
>
> > i = 0
>
> > product = []
>
> > while 1:
>
> > word = words[i]
>
> > if word == word.upper():
>
> > product.append(word)
>
> > i += 1
>
> > else:
>
> > break
>
> > return " ".join(product), " ".join(words[i:])
>
> >
>
> >
>
> > Any thoughts?
>
> >
>
> > Nick
>
> > --
>
> > http://mail.python.org/mailman/listinfo/python-list
>
>
>
> Hi,
>
> the regex approach doesn't actually seem to be very complex, given the
>
> mentioned specification, e.g.
>
>
>
> >>> import re
>
> >>> re.findall(r"(?m)^([A-Z\s]+) (.+)$", "CAPSICUM RED fresh from QLD\nCAPSICUM RED fresh from Queensland")
>
> [('CAPSICUM RED', 'fresh from QLD'), ('CAPSICUM RED', 'fresh from Queensland')]
>
> >>>
>
>
>
> (It might be necessary to account for some punctuation, whitespace etc. too.)
>
>
>
> hth,
>
> vbr
[toc] | [prev] | [next] | [standalone]
| From | Neil Cerutti <neilc@norwich.edu> |
|---|---|
| Date | 2012-12-04 18:26 +0000 |
| Message-ID | <ai6thvFo9u6U1@mid.individual.net> |
| In reply to | #34237 |
On 2012-12-04, Nick Mellor <thebalancepro@gmail.com> wrote:
> I love the way you guys can write a line of code that does the
> same as 20 of mine :)
>
> I can turn up the heat on your regex by feeding it a null
> description or multiple white space (both in the original
> file.) I'm sure you'd adjust, but at the cost of a more complex
> regex.
A re.split should be able to handle this without too much hassle.
The simplicity of my two-line version will evaporate pretty
quickly to compensate for edge cases.
Here's one that can handle one of the edge cases you mention, but
it's hardly any shorter than what you had, and it doesn't
preserve non-standard whites space, like double spaces.
def prod_desc(s):
"""split s into product name and product description. Product
name is a series of one or more capitalized words followed
by white space. Everything after the trailing white space is
the product description.
>>> prod_desc("CAR FIFTY TWO Chrysler LeBaron.")
['CAR FIFTY TWO', 'Chrysler LeBaron.']
"""
prod = []
desc = []
target = prod
for word in s.split():
if target is prod and not word.isupper():
target = desc
target.append(word)
return [' '.join(prod), ' '.join(desc)]
When str methods fail I'll usually write my own parser before
turning to re. The following is no longer nice looking at all.
def prod_desc(s):
"""split s into product name and product description. Product
name is a series of one or more capitalized words followed
by white space. Everything after the trailing white space is
the product description.
>>> prod_desc("CAR FIFTY TWO Chrysler LeBaron.")
['CAR FIFTY TWO', 'Chrysler LeBaron.']
>>> prod_desc("MR. JONESEY Saskatchewan's finest")
['MR. JONESEY', "Saskatchewan's finest"]
"""
i = 0
while not s[i].islower():
i += 1
i -= 1
while not s[i].isspace():
i -= 1
start_desc = i+1
while s[i].isspace():
i -= 1
end_prod = i+1
return [s[:end_prod], s[start_desc:]]
--
Neil Cerutti
[toc] | [prev] | [next] | [standalone]
| From | Alexander Blinne <news@blinne.net> |
|---|---|
| Date | 2012-12-04 18:18 +0100 |
| Message-ID | <50be3049$0$9517$9b4e6d93@newsspool1.arcor-online.net> |
| In reply to | #34226 |
Another neat solution with a little help from
http://stackoverflow.com/questions/1701211/python-return-the-index-of-the-first-element-of-a-list-which-makes-a-passed-fun
>>> def split_product(p):
... w = p.split(" ")
... j = (i for i,v in enumerate(w) if v.upper() != v).next()
... return " ".join(w[:j]), " ".join(w[j:])
Greetings
[toc] | [prev] | [next] | [standalone]
| From | DJC <djc@news.invalid> |
|---|---|
| Date | 2012-12-04 18:28 +0000 |
| Message-ID | <k9lfd0$evp$1@dont-email.me> |
| In reply to | #34240 |
On 04/12/12 17:18, Alexander Blinne wrote:
> Another neat solution with a little help from
>
> http://stackoverflow.com/questions/1701211/python-return-the-index-of-the-first-element-of-a-list-which-makes-a-passed-fun
>
>>>> def split_product(p):
> .... w = p.split(" ")
> .... j = (i for i,v in enumerate(w) if v.upper() != v).next()
> .... return " ".join(w[:j]), " ".join(w[j:])
>
Python 2.7.3 (default, Sep 26 2012, 21:51:14)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> w1 = "CAPSICUM RED Fresh from Queensland"
>>> w1.split()
['CAPSICUM', 'RED', 'Fresh', 'from', 'Queensland']
>>> w = w1.split()
>>> (i for i,v in enumerate(w) if v.upper() != v)
<generator object <genexpr> at 0x18b1910>
>>> (i for i,v in enumerate(w) if v.upper() != v).next()
2
Python 3.2.3 (default, Oct 19 2012, 19:53:16)
>>> (i for i,v in enumerate(w) if v.upper() != v).next()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'generator' object has no attribute 'next'
[toc] | [prev] | [next] | [standalone]
| From | Alexander Blinne <news@blinne.net> |
|---|---|
| Date | 2012-12-04 19:48 +0100 |
| Message-ID | <50be4566$0$9507$9b4e6d93@newsspool1.arcor-online.net> |
| In reply to | #34244 |
Am 04.12.2012 19:28, schrieb DJC:
>>>> (i for i,v in enumerate(w) if v.upper() != v).next()
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> AttributeError: 'generator' object has no attribute 'next'
Yeah, i saw this problem right after i sent the posting. It now is
supposed to read like this
>>> def split_product(p):
... w = p.split(" ")
... j = next(i for i,v in enumerate(w) if v.upper() != v)
... return " ".join(w[:j]), " ".join(w[j:])
Greetings
[toc] | [prev] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2012-12-04 12:37 -0700 |
| Message-ID | <mailman.470.1354649891.29569.python-list@python.org> |
| In reply to | #34245 |
[Multipart message — attachments visible in raw view] — view raw
On Tue, Dec 4, 2012 at 11:48 AM, Alexander Blinne <news@blinne.net> wrote:
> Am 04.12.2012 19:28, schrieb DJC:
> >>>> (i for i,v in enumerate(w) if v.upper() != v).next()
> > Traceback (most recent call last):
> > File "<stdin>", line 1, in <module>
> > AttributeError: 'generator' object has no attribute 'next'
>
> Yeah, i saw this problem right after i sent the posting. It now is
> supposed to read like this
>
> >>> def split_product(p):
> ... w = p.split(" ")
> ... j = next(i for i,v in enumerate(w) if v.upper() != v)
> ... return " ".join(w[:j]), " ".join(w[j:])
>
It still fails if the product description is empty.
>>> split_product("CAPSICUM RED")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in split_product
StopIteration
I'm not meaning to pick on you; some of the other solutions in this thread
also fail in that case.
>>> re.findall(r"(?m)^([A-Z\s]+) (.+)$", "CAPSICUM RED")
[('CAPSICUM', 'RED')]
>>> prod_desc("CAPSICUM RED") # the second version from Neil's post
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 14, in prod_desc
IndexError: string index out of range
[toc] | [prev] | [next] | [standalone]
| From | Alexander Blinne <news@blinne.net> |
|---|---|
| Date | 2012-12-04 21:33 +0100 |
| Message-ID | <50be5e30$0$9512$9b4e6d93@newsspool1.arcor-online.net> |
| In reply to | #34248 |
Am 04.12.2012 20:37, schrieb Ian Kelly:
> >>> def split_product(p):
> ... w = p.split(" ")
> ... j = next(i for i,v in enumerate(w) if v.upper() != v)
> ... return " ".join(w[:j]), " ".join(w[j:])
>
>
> It still fails if the product description is empty.
That's true... let's see, next() takes a default value in case the
iterator is empty and then we could use some special value and test for
it. But i think it would be more elegant to just handle the excepten
ourselves, so:
>>> def split_product(p):
... w = p.split(" ")
... try:
... j = next(i for i,v in enumerate(w) if v.upper() != v)
... except StopIteration:
... return p, ''
... return " ".join(w[:j]), " ".join(w[j:])
> I'm not meaning to pick on you; some of the other solutions in this
> thread also fail in that case.
It's ok, opening the eye for edge cases is always a good idea :)
Greetings
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2012-12-04 21:13 +0000 |
| Message-ID | <50be675c$0$29994$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #34248 |
Ian, For the sanity of those of us reading this via Usenet using the Pan newsreader, could you please turn off HTML emailing? It's very distracting. Thanks, Steven On Tue, 04 Dec 2012 12:37:38 -0700, Ian Kelly wrote: [...] > <div class="gmail_quote">On Tue, > Dec 4, 2012 at 11:48 AM, Alexander Blinne <span dir="ltr"><<a > href="mailto:news@blinne.net" > target="_blank">news@blinne.net</a>></span> wrote:<br><blockquote > class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc > solid;padding-left:1ex"> > > Am 04.12.2012 19:28, schrieb DJC:<br> <div class="im">>>>> > (i for i,v in enumerate(w) if v.upper() != v).next()<br> > Traceback > (most recent call last):<br> > File "<stdin>", line > 1, in <module><br> > AttributeError: 'generator' object > has no attribute 'next'<br> <br> > </div>Yeah, i saw this problem right after i sent the posting. It now > is<br> supposed to read like this<br> > <div class="im"><br> > >>> def split_product(p):<br> ... w = p.split(" > ")<br> </div>... j = next(i for i,v in enumerate(w) if > v.upper() != v)<br> <div class="im">... return " > ".join(w[:j]), " > ".join(w[j:])<br></div></blockquote></div><br>It still fails if the > product description is empty.<br><br>>>> > split_product("CAPSICUM RED")<br> > > Traceback (most recent call last):<br> File "<stdin>", > line 1, in <module><br> File "<stdin>", line 3, > in split_product<br>StopIteration<br><br>I'm not meaning to pick on > you; some of the other solutions in this thread also fail in that > case.<br> > > <br>>>> re.findall(r"(?m)^([A-Z\s]+) (.+)$", > "CAPSICUM RED")<br>[('CAPSICUM', > 'RED')]<br><br>>>> prod_desc("CAPSICUM RED") > # the second version from Neil's post<br> > > Traceback (most recent call last):<br> File "<stdin>", > line 1, in <module><br> File "<stdin>", line 14, > in prod_desc<br>IndexError: string index out of range<br><br> -- Steven
[toc] | [prev] | [next] | [standalone]
| From | MRAB <python@mrabarnett.plus.com> |
|---|---|
| Date | 2012-12-04 20:17 +0000 |
| Message-ID | <mailman.473.1354652248.29569.python-list@python.org> |
| In reply to | #34245 |
On 2012-12-04 19:37, Ian Kelly wrote:
> On Tue, Dec 4, 2012 at 11:48 AM, Alexander Blinne <news@blinne.net
> <mailto:news@blinne.net>> wrote:
>
> Am 04.12.2012 19:28, schrieb DJC:
> >>>> (i for i,v in enumerate(w) if v.upper() != v).next()
> > Traceback (most recent call last):
> > File "<stdin>", line 1, in <module>
> > AttributeError: 'generator' object has no attribute 'next'
>
> Yeah, i saw this problem right after i sent the posting. It now is
> supposed to read like this
>
> >>> def split_product(p):
> ... w = p.split(" ")
> ... j = next(i for i,v in enumerate(w) if v.upper() != v)
> ... return " ".join(w[:j]), " ".join(w[j:])
>
>
> It still fails if the product description is empty.
>
> >>> split_product("CAPSICUM RED")
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "<stdin>", line 3, in split_product
> StopIteration
>
> I'm not meaning to pick on you; some of the other solutions in this
> thread also fail in that case.
>
> >>> re.findall(r"(?m)^([A-Z\s]+) (.+)$", "CAPSICUM RED")
> [('CAPSICUM', 'RED')]
>
That's easily fixed:
>>> re.findall(r"(?m)^([A-Z\s]+)(?: (.*))?$", "CAPSICUM RED")
[('CAPSICUM RED', '')]
> >>> prod_desc("CAPSICUM RED") # the second version from Neil's post
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "<stdin>", line 14, in prod_desc
> IndexError: string index out of range
>
[toc] | [prev] | [next] | [standalone]
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2012-12-04 15:44 -0500 |
| Message-ID | <mailman.474.1354653865.29569.python-list@python.org> |
| In reply to | #34226 |
On 12/4/2012 8:57 AM, Nick Mellor wrote:
> I have a file full of things like this:
>
> "CAPSICUM RED fresh from Queensland"
>
> Product names (all caps, at start of string) and descriptions (mixed
> case, to end of string) all muddled up in the same field. And I need
> to split them into two fields. Note that if the text had said:
>
> "CAPSICUM RED fresh from QLD"
>
> I would want QLD in the description, not shunted forwards and put in
> the product name. So (uncontrived) list comprehensions and regex's
> are out.
>
> I want to split the above into:
>
> ("CAPSICUM RED", "fresh from QLD")
>
> Enter dropwhile and takewhile. 6 lines later:
>
> from itertools import takewhile, dropwhile
> def split_product_itertools(s):
> words = s.split()
> allcaps = lambda word: word == word.upper()
> product, description =\
> takewhile(allcaps, words), dropwhile(allcaps, words)
> return " ".join(product), " ".join(description)
If the original string has no excess whitespace, description is what
remains of s after product prefix is omitted. (Py 3 code)
from itertools import takewhile
def allcaps(word): return word == word.upper()
def split_product_itertools(s):
product = ' '.join(takewhile(allcaps, s.split()))
return product, s[len(product)+1:]
print(split_product_itertools("CAPSICUM RED fresh from QLD"))
>>>
('CAPSICUM RED', 'fresh from QLD')
Without that assumption, the same idea applies to the split list.
def split_product_itertools(s):
words = s.split()
product = list(takewhile(allcaps, words))
return ' '.join(product), ' '.join(words[len(product):])
--
Terry Jan Reedy
[toc] | [prev] | [next] | [standalone]
| From | Nick Mellor <thebalancepro@gmail.com> |
|---|---|
| Date | 2012-12-04 17:17 -0800 |
| Message-ID | <05bca175-2077-4fb8-917e-baee1a43a47d@googlegroups.com> |
| In reply to | #34252 |
Hi Terry,
For my money, and especially in your versions, despite several expert solutions using other features, itertools has it. It seems to me to need less nutting out than the other approaches. It's short, robust, has a minimum of symbols, uses simple expressions and is not overly clever. If we could just get used to using takewhile.
takewhile mines for gold at the start of a sequence, dropwhile drops the dross at the start of a sequence.
Thanks all for your interest and your help,
Best,
Nick
Terry's implementations:
> from itertools import takewhile
>
> def allcaps(word): return word == word.upper()
>
>
>
> def split_product_itertools(s):
>
> product = ' '.join(takewhile(allcaps, s.split()))
>
> return product, s[len(product)+1:]
>
>
>
> print(split_product_itertools("CAPSICUM RED fresh from QLD"))
>
> >>>
>
> ('CAPSICUM RED', 'fresh from QLD')
>
>
>
> [if there could be surplus whitespace], the same idea applies to the split list.
>
>
>
> def split_product_itertools(s):
>
> words = s.split()
>
> product = list(takewhile(allcaps, words))
>
> return ' '.join(product), ' '.join(words[len(product):])
>
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2012-12-06 00:45 +1100 |
| Message-ID | <mailman.490.1354715109.29569.python-list@python.org> |
| In reply to | #34266 |
On Wed, Dec 5, 2012 at 12:17 PM, Nick Mellor <thebalancepro@gmail.com> wrote:
>
> takewhile mines for gold at the start of a sequence, dropwhile drops the dross at the start of a sequence.
When you're using both over the same sequence and with the same
condition, it seems odd that you need to iterate over it twice.
Perhaps a partitioning iterator would be cleaner - something like
this:
def partitionwhile(predicate, iterable):
iterable = iter(iterable)
while True:
val = next(iterable)
if not predicate(val): break
yield val
raise StopIteration # Signal the end of Phase 1
for val in iterable: yield val # or just "yield from iterable", I think
Only the cold hard boot of reality just stomped out the spark of an
idea. Once StopIteration has been raised, that's it, there's no
"resuming" the iterator. Is there a way around that? Is there a clean
way to say "Done for now, but next time you ask, there'll be more"?
I tested it on Python 3.2 (yeah, time I upgraded, I know).
ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Neil Cerutti <neilc@norwich.edu> |
|---|---|
| Date | 2012-12-05 14:34 +0000 |
| Message-ID | <ai94btF9hoaU1@mid.individual.net> |
| In reply to | #34280 |
On 2012-12-05, Chris Angelico <rosuav@gmail.com> wrote:
> On Wed, Dec 5, 2012 at 12:17 PM, Nick Mellor <thebalancepro@gmail.com> wrote:
>>
>> takewhile mines for gold at the start of a sequence, dropwhile
>> drops the dross at the start of a sequence.
>
> When you're using both over the same sequence and with the same
> condition, it seems odd that you need to iterate over it twice.
> Perhaps a partitioning iterator would be cleaner - something
> like this:
>
> def partitionwhile(predicate, iterable):
> iterable = iter(iterable)
> while True:
> val = next(iterable)
> if not predicate(val): break
> yield val
> raise StopIteration # Signal the end of Phase 1
> for val in iterable: yield val # or just "yield from iterable", I think
>
> Only the cold hard boot of reality just stomped out the spark
> of an idea. Once StopIteration has been raised, that's it,
> there's no "resuming" the iterator. Is there a way around that?
> Is there a clean way to say "Done for now, but next time you
> ask, there'll be more"?
>
> I tested it on Python 3.2 (yeah, time I upgraded, I know).
Well, shoot! Then this is a job for groupby, not takewhile.
def prod_desc(s):
"""split s into product name and product description.
>>> prod_desc("CAR FIFTY TWO Chrysler LeBaron.")
['CAR FIFTY TWO', 'Chrysler LeBaron.']
>>> prod_desc("MR. JONESEY Saskatchewan's finest")
['MR. JONESEY', "Saskatchewan's finest"]
>>> prod_desc("no product name?")
['', 'no product name?']
>>> prod_desc("NO DESCRIPTION")
['NO DESCRIPTION', '']
"""
prod = ''
desc = ''
for k, g in itertools.groupby(s.split(),
key=lambda w: any(c.islower() for c in w)):
a = ' '.join(g)
if k:
desc = a
else:
prod = a
return [prod, desc]
This has no way to preserve odd white space which could break
evil product name differences.
--
Neil Cerutti
[toc] | [prev] | [next] | [standalone]
Page 1 of 2 [1] 2 Next page →
Back to top | Article view | comp.lang.python
csiph-web