Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #34368
| Path | csiph.com!usenet.pasdenom.info!news.albasani.net!newsfeed.freenet.ag!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail |
|---|---|
| Return-Path | <vlastimil.brom@gmail.com> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | UNSURE 0.281 |
| X-Spam-Level | ** |
| X-Spam-Evidence | '*H*': 0.46; '*S*': 0.02; "'')": 0.07; 'parsing': 0.07; "'gold": 0.16; "('',": 0.16; '(given': 0.16; 'fruit': 0.16; 'kiwi': 0.16; 'nick': 0.16; 'range,': 0.16; 'seconds,': 0.16; 'spec,': 0.16; '>>>': 0.18; 'import': 0.21; 'spring': 0.22; 'subject:skip:i 10': 0.22; 'header:In-Reply-To:1': 0.25; 'message- id:@mail.gmail.com': 0.27; 'url:mailman': 0.29; 'starts': 0.29; '(from': 0.30; 'seconds': 0.30; 'url:python': 0.32; 'print': 0.32; 'url:listinfo': 0.32; 'organic': 0.33; 'to:addr:python-list': 0.33; 'hi,': 0.33; 'another': 0.33; 'received:google.com': 0.34; 'done': 0.34; 'data,': 0.35; 'fresh': 0.35; 'skip:l 30': 0.35; 'received:209.85.220': 0.35; 'received:209.85': 0.35; 'but': 0.36; 'url:org': 0.36; "i'll": 0.36; 'should': 0.36; 'received:209': 0.37; 'data': 0.37; 'subject:: ': 0.38; 'green': 0.38; 'description': 0.39; 'to:addr:python.org': 0.39; 'header:Received:5': 0.40; 'url:mail': 0.40; 'red': 0.60; 'further': 0.61; 'first': 0.61; 'dedicated': 0.61; 'latest': 0.61; 'free': 0.61; 'more': 0.63; 'brown': 0.65; 'smith': 0.71; 'sweet': 0.71; 'swiss': 0.71; 'hand': 0.82; 'certified': 0.83; 'chemical': 0.84; 'cream': 0.84; 'loose': 0.84; 'peninsula': 0.84; 'season': 0.84; 'bags': 0.91; 'bags,': 0.91; 'dutch': 0.91; 'lady': 0.91; 'snow': 0.91; 'subject:Good': 0.91 |
| DKIM-Signature | v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=zleDDaEK4DcV2BXSkM9p75TpMNG+bNHcoTTUu6Ovwj8=; b=vAnPfnhdYGdJXgw142xTi3ZvP7j0sHs2joB48yCeUb7EQ2e2KismNuEgbaa1OtP333 RYjgPQvJ+4FNcaSqnn2pTzNiG25V8OJnnZ/BX4MykcRpzRlWBqhi+VrcuZTJ62troUiM kXkzWGI8MDjFpRsKI2TU8JFmwCaMaJw1pdn7pfChs0CLXpmfjM9O40l4ZbLoaOPthTYE HaJQSZsEblexNfad4bXoPb+81vxk1gQYL31eWRrR1ZWHv5baO9Nl8O0g5xSKwsjOHlLR rkLWw4SJczFO9SviWIWWq51wKbJPz0cKgMkxnd2QMsMmNKimNBG+YinpZqJSlS6hdB3X OxpQ== |
| MIME-Version | 1.0 |
| In-Reply-To | <945048d8-961e-4894-89fc-3b7fd9b7965b@googlegroups.com> |
| References | <b80f3ab3-ef81-4806-86db-efd5800d4bb3@googlegroups.com> <mailman.474.1354653865.29569.python-list@python.org> <mailman.484.1354670286.29569.python-list@python.org> <ai90h9F8mm8U4@mid.individual.net> <26781aa9-b4a2-4308-8db2-5a150da2128f@googlegroups.com> <ai9hb4FclvgU1@mid.individual.net> <945048d8-961e-4894-89fc-3b7fd9b7965b@googlegroups.com> |
| Date | Wed, 5 Dec 2012 22:36:36 +0100 |
| Subject | Re: Good use for itertools.dropwhile and itertools.takewhile |
| From | Vlastimil Brom <vlastimil.brom@gmail.com> |
| To | python-list@python.org |
| Content-Type | text/plain; charset=ISO-8859-1 |
| X-Mailman-Approved-At | Thu, 06 Dec 2012 09:22:12 +0100 |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.15 |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list/> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.548.1354782133.29569.python-list@python.org> (permalink) |
| Lines | 177 |
| NNTP-Posting-Host | 2001:888:2000:d::a6 |
| X-Trace | 1354782133 news.xs4all.nl 6918 [2001:888:2000:d::a6]:50181 |
| X-Complaints-To | abuse@xs4all.nl |
| Xref | csiph.com comp.lang.python:34368 |
Show key headers only | View raw
2012/12/5 Nick Mellor <thebalancepro@gmail.com>:
> Neil,
>
> Further down the data, found another edge case:
>
> "Spring ONION from QLD"
>
> Following the spec, the whole line should be description (description starts at first word that is not all caps.) This case breaks the latest groupby.
>
> N
> --
> http://mail.python.org/mailman/listinfo/python-list
Hi,
Just for completeness..., it (likely) can be done using regex (given
the current specificatioin), but if the data are even more complex and
varying, the tools like pyparsing or dedicated parsing functions might
be more appropriate;
hth,
vbr:
>>> import re
>>> test_product_data = """BEANS hand picked
... BEETROOT certified organic
... BOK CHOY (bunch)
... BROCCOLI Mornington Peninsula
... BRUSSEL SPROUTS
... CABBAGE green
... CABBAGE Red
... CAPSICUM RED
... CARROTS
... CARROTS loose
... CARROTS juicing, certified organic
... CARROTS Trentham, large seconds, certified organic
... CARROTS Trentham, firsts, certified organic
... CAULIFLOWER
... CELERY Mornington Peninsula IPM grower
... CELERY Mornington Peninsula IPM grower
... CUCUMBER
... EGGPLANT
... FENNEL
... GARLIC (from Argentina)
... GINGER fresh uncured
... KALE (bunch)
... KOHL RABI certified organic
... LEEKS
... LETTUCE iceberg
... MUSHROOM cup or flat
... MUSHROOM Swiss brown
... ONION brown
... ONION red
... ONION spring (bunch)
... PARSNIP, certified organic
... POTATOES certified organic
... POTATOES Sebago
... POTATOES Desiree
... POTATOES Bullarto chemical free
... POTATOES Dutch Cream
... POTATOES Nicola
... POTATOES Pontiac
... POTATOES Otway Red
... POTATOES teardrop
... PUMPKIN certified organic
... SCHALLOTS brown
... SNOW PEAS
... SPINACH I'll try to get certified organic (bunch)
... SWEET POTATO gold certified organic
... SWEET POTATO red small
... SWEDE certified organic
... TOMATOES Qld
... TURMERIC fresh certified organic
... ZUCCHINI
... APPLES Harcourt Pink Lady, Fuji, Granny Smith
... APPLES Harcourt 2 kg bags, Pink Lady or Fuji (bag)
... AVOCADOS
... AVOCADOS certified organic, seconds
... BANANAS Qld, organic
... GRAPEFRUIT
... GRAPES crimson seedless
... KIWI FRUIT Qld certified organic
... LEMONS
... LIMES
... MANDARINS
... ORANGES Navel
... PEARS Beurre Bosc Harcourt new season
... PEARS Packham, Harcourt new season
... SULTANAS 350g pre-packed bags
... EGGS Melita free range, Barker's Creek
... BASIL (bunch)
... CORIANDER (bunch)
... DILL (bunch)
... MINT (bunch)
... PARSLEY (bunch)
... Spring ONION from QLD"""
>>>
>>> len(test_product_data.splitlines())
72
>>>
>>> for prod_item in re.findall(r"(?m)(?=^.+$)^ *(?:([A-Z ]+\b(?<! )(?=[\s,]|$)))?(?: *(.*))?$", test_product_data): print prod_item
...
('BEANS', 'hand picked')
('BEETROOT', 'certified organic')
('BOK CHOY', '(bunch)')
('BROCCOLI', 'Mornington Peninsula')
('BRUSSEL SPROUTS', '')
('CABBAGE', 'green')
('CABBAGE', 'Red')
('CAPSICUM RED', '')
('CARROTS', '')
('CARROTS', 'loose')
('CARROTS', 'juicing, certified organic')
('CARROTS', 'Trentham, large seconds, certified organic')
('CARROTS', 'Trentham, firsts, certified organic')
('CAULIFLOWER', '')
('CELERY', 'Mornington Peninsula IPM grower')
('CELERY', 'Mornington Peninsula IPM grower')
('CUCUMBER', '')
('EGGPLANT', '')
('FENNEL', '')
('GARLIC', '(from Argentina)')
('GINGER', 'fresh uncured')
('KALE', '(bunch)')
('KOHL RABI', 'certified organic')
('LEEKS', '')
('LETTUCE', 'iceberg')
('MUSHROOM', 'cup or flat')
('MUSHROOM', 'Swiss brown')
('ONION', 'brown')
('ONION', 'red')
('ONION', 'spring (bunch)')
('PARSNIP', ', certified organic')
('POTATOES', 'certified organic')
('POTATOES', 'Sebago')
('POTATOES', 'Desiree')
('POTATOES', 'Bullarto chemical free')
('POTATOES', 'Dutch Cream')
('POTATOES', 'Nicola')
('POTATOES', 'Pontiac')
('POTATOES', 'Otway Red')
('POTATOES', 'teardrop')
('PUMPKIN', 'certified organic')
('SCHALLOTS', 'brown')
('SNOW PEAS', '')
('SPINACH', "I'll try to get certified organic (bunch)")
('SWEET POTATO', 'gold certified organic')
('SWEET POTATO', 'red small')
('SWEDE', 'certified organic')
('TOMATOES', 'Qld')
('TURMERIC', 'fresh certified organic')
('ZUCCHINI', '')
('APPLES', 'Harcourt Pink Lady, Fuji, Granny Smith')
('APPLES', 'Harcourt 2 kg bags, Pink Lady or Fuji (bag)')
('AVOCADOS', '')
('AVOCADOS', 'certified organic, seconds')
('BANANAS', 'Qld, organic')
('GRAPEFRUIT', '')
('GRAPES', 'crimson seedless')
('KIWI FRUIT', 'Qld certified organic')
('LEMONS', '')
('LIMES', '')
('MANDARINS', '')
('ORANGES', 'Navel')
('PEARS', 'Beurre Bosc Harcourt new season')
('PEARS', 'Packham, Harcourt new season')
('SULTANAS', '350g pre-packed bags')
('EGGS', "Melita free range, Barker's Creek")
('BASIL', '(bunch)')
('CORIANDER', '(bunch)')
('DILL', '(bunch)')
('MINT', '(bunch)')
('PARSLEY', '(bunch)')
('', 'Spring ONION from QLD')
>>> len(re.findall(r"(?m)(?=^.+$)^ *(?:([A-Z ]+\b(?<! )(?=[\s,]|$)))?(?: *(.*))?$", test_product_data))
72
>>>
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Good use for itertools.dropwhile and itertools.takewhile Nick Mellor <thebalancepro@gmail.com> - 2012-12-04 05:57 -0800
Re: Good use for itertools.dropwhile and itertools.takewhile Neil Cerutti <neilc@norwich.edu> - 2012-12-04 14:23 +0000
Re: Good use for itertools.dropwhile and itertools.takewhile Nick Mellor <thebalancepro@gmail.com> - 2012-12-04 06:47 -0800
Re: Good use for itertools.dropwhile and itertools.takewhile Neil Cerutti <neilc@norwich.edu> - 2012-12-04 15:17 +0000
Re: Good use for itertools.dropwhile and itertools.takewhile Vlastimil Brom <vlastimil.brom@gmail.com> - 2012-12-04 15:31 +0100
Re: Good use for itertools.dropwhile and itertools.takewhile Nick Mellor <thebalancepro@gmail.com> - 2012-12-04 07:24 -0800
Re: Good use for itertools.dropwhile and itertools.takewhile Vlastimil Brom <vlastimil.brom@gmail.com> - 2012-12-04 22:08 +0100
Re: Good use for itertools.dropwhile and itertools.takewhile Nick Mellor <thebalancepro@gmail.com> - 2012-12-04 07:24 -0800
Re: Good use for itertools.dropwhile and itertools.takewhile Neil Cerutti <neilc@norwich.edu> - 2012-12-04 18:26 +0000
Re: Good use for itertools.dropwhile and itertools.takewhile Alexander Blinne <news@blinne.net> - 2012-12-04 18:18 +0100
Re: Good use for itertools.dropwhile and itertools.takewhile DJC <djc@news.invalid> - 2012-12-04 18:28 +0000
Re: Good use for itertools.dropwhile and itertools.takewhile Alexander Blinne <news@blinne.net> - 2012-12-04 19:48 +0100
Re: Good use for itertools.dropwhile and itertools.takewhile Ian Kelly <ian.g.kelly@gmail.com> - 2012-12-04 12:37 -0700
Re: Good use for itertools.dropwhile and itertools.takewhile Alexander Blinne <news@blinne.net> - 2012-12-04 21:33 +0100
Re: Good use for itertools.dropwhile and itertools.takewhile Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-12-04 21:13 +0000
Re: Good use for itertools.dropwhile and itertools.takewhile MRAB <python@mrabarnett.plus.com> - 2012-12-04 20:17 +0000
Re: Good use for itertools.dropwhile and itertools.takewhile Terry Reedy <tjreedy@udel.edu> - 2012-12-04 15:44 -0500
Re: Good use for itertools.dropwhile and itertools.takewhile Nick Mellor <thebalancepro@gmail.com> - 2012-12-04 17:17 -0800
Re: Good use for itertools.dropwhile and itertools.takewhile Chris Angelico <rosuav@gmail.com> - 2012-12-06 00:45 +1100
Re: Good use for itertools.dropwhile and itertools.takewhile Neil Cerutti <neilc@norwich.edu> - 2012-12-05 14:34 +0000
Re: Good use for itertools.dropwhile and itertools.takewhile Ian Kelly <ian.g.kelly@gmail.com> - 2012-12-05 08:33 -0700
Re: Good use for itertools.dropwhile and itertools.takewhile Neil Cerutti <neilc@norwich.edu> - 2012-12-05 16:11 +0000
Re: Good use for itertools.dropwhile and itertools.takewhile Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-12-05 15:32 +0000
Re: Good use for itertools.dropwhile and itertools.takewhile Ian Kelly <ian.g.kelly@gmail.com> - 2012-12-05 09:16 -0700
Re: Good use for itertools.dropwhile and itertools.takewhile MRAB <python@mrabarnett.plus.com> - 2012-12-05 17:57 +0000
Re: Good use for itertools.dropwhile and itertools.takewhile Nick Mellor <thebalancepro@gmail.com> - 2012-12-04 17:17 -0800
Re: Good use for itertools.dropwhile and itertools.takewhile Neil Cerutti <neilc@norwich.edu> - 2012-12-05 13:29 +0000
Re: Good use for itertools.dropwhile and itertools.takewhile Nick Mellor <thebalancepro@gmail.com> - 2012-12-05 09:04 -0800
Re: Good use for itertools.dropwhile and itertools.takewhile MRAB <python@mrabarnett.plus.com> - 2012-12-05 17:57 +0000
Re: Good use for itertools.dropwhile and itertools.takewhile Neil Cerutti <neilc@norwich.edu> - 2012-12-05 18:16 +0000
Re: Good use for itertools.dropwhile and itertools.takewhile Nick Mellor <thebalancepro@gmail.com> - 2012-12-05 11:01 -0800
Re: Good use for itertools.dropwhile and itertools.takewhile Neil Cerutti <neilc@norwich.edu> - 2012-12-05 20:13 +0000
Re: Good use for itertools.dropwhile and itertools.takewhile Vlastimil Brom <vlastimil.brom@gmail.com> - 2012-12-05 22:36 +0100
Re: Good use for itertools.dropwhile and itertools.takewhile Neil Cerutti <neilc@norwich.edu> - 2012-12-06 13:06 +0000
Re: Good use for itertools.dropwhile and itertools.takewhile Vlastimil Brom <vlastimil.brom@gmail.com> - 2012-12-06 15:12 +0100
Re: Good use for itertools.dropwhile and itertools.takewhile Alexander Blinne <news@blinne.net> - 2012-12-06 14:40 +0100
Re: Good use for itertools.dropwhile and itertools.takewhile Terry Reedy <tjreedy@udel.edu> - 2012-12-04 17:21 -0500
Re: Good use for itertools.dropwhile and itertools.takewhile Paul Rubin <no.email@nospam.invalid> - 2012-12-06 13:29 -0800
csiph-web