Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #85769

Re: Noob Parsing question

Path csiph.com!usenet.pasdenom.info!gegeweb.org!usenet-fr.net!nerim.net!novso.com!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <rosuav@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.007
X-Spam-Evidence '*H*': 0.99; '*S*': 0.00; "'',": 0.07; 'lines,': 0.07; 'parser': 0.07; 'string': 0.09; 'happen,': 0.09; 'happen?': 0.09; 'integers': 0.09; 'subject:question': 0.10; 'cc:addr:python- list': 0.11; 'blocks': 0.16; 'd[key]': 0.16; 'dictionaries': 0.16; 'elem': 0.16; 'elements,': 0.16; 'expression.': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'know;': 0.16; 'parser.': 0.16; 'shorthand': 0.16; 'splitting': 0.16; 'wrote:': 0.18; 'wed,': 0.18; 'split': 0.19; 'feb': 0.22; '>>>': 0.22; 'otherwise,': 0.22; 'cc:addr:python.org': 0.22; 'instance,': 0.24; 'fairly': 0.24; 'question': 0.24; 'cc:2**0': 0.24; 'least': 0.26; 'header:In-Reply-To:1': 0.27; 'characters': 0.30; 'message-id:@mail.gmail.com': 0.30; 'that.': 0.31; '(maybe': 0.31; 'piece': 0.31; 'regular': 0.32; 'open': 0.33; 'there,': 0.34; 'basic': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'done': 0.36; 'useful': 0.36; 'thanks': 0.36; 'should': 0.36; 'two': 0.37; 'level': 0.37; 'clear': 0.37; 'depends': 0.38; 'pm,': 0.38; 'that,': 0.38; 'skip:p 20': 0.39; 'how': 0.40; 'full': 0.61; 'first': 0.61; "you'll": 0.62; "you've": 0.63; 'name': 0.63; 'such': 0.63; 'more': 0.64; 'different': 0.65; 'here': 0.66; 'date,': 0.68; 'person.': 0.69; 'special': 0.74; 'guaranteed': 0.75; '2015': 0.84; 'safe.': 0.84; 'absolutely': 0.87; 'to:none': 0.92
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type; bh=ih29lYPQEZ5Y8ZMcPektvKw/HHLBa6Jt0JXM4YTU6eo=; b=tZNFVJyy15ilVjY5lrb48VVDrqjy3SG0w1Bq7Tap82bUnU6BNWIFmWBF9+LmbT54Ko NICT29Lsk7pvzEr1lcX/RZ0u57aFSgi7NNfxGRraG5e41bFG95OjTVVOEPRy9WgR//Qe CbzznM1VDSUHQ3Zt5ZEGLduV4r6/3yyqbvvA2FV3fYii1iGpzZkW5awXn9B07lSneVJR DuUJXe5Pl+hyPvPsKZhyBmuEty8Eqghtw1OftgeEALLS3+QjhM58DgiDsm92CiZi0uTa GpEVjHkAQbzK11vR/KunOQvwG3ZAOFua9rXddNrtJ6znHYNEWYrU/gDRJCWa+ndDQ9Wb ytvw==
MIME-Version 1.0
X-Received by 10.50.61.238 with SMTP id t14mr780117igr.34.1424235248643; Tue, 17 Feb 2015 20:54:08 -0800 (PST)
In-Reply-To <af5861ab-1ba2-435d-a494-6e7ff759064e@googlegroups.com>
References <c41fcec3-ea9f-4cce-8f6b-0f51d8cf3912@googlegroups.com> <mailman.18802.1424232968.18130.python-list@python.org> <af5861ab-1ba2-435d-a494-6e7ff759064e@googlegroups.com>
Date Wed, 18 Feb 2015 15:54:08 +1100
Subject Re: Noob Parsing question
From Chris Angelico <rosuav@gmail.com>
Cc "python-list@python.org" <python-list@python.org>
Content-Type text/plain; charset=UTF-8
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.18803.1424235256.18130.python-list@python.org> (permalink)
Lines 55
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1424235256 news.xs4all.nl 2925 [2001:888:2000:d::a6]:47706
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:85769

Show key headers only | View raw


On Wed, Feb 18, 2015 at 3:35 PM,  <kai.peters@gmail.com> wrote:
>> > Given
>> >
>> > data = '{[<a=14^b=Fred^c=45.22^><a=22^b=Joe^><a=17^c=3.20^>][<a=72^b=Soup^>]}'
>> >
>> > How can I efficiently get dictionaries for each of the data blocks framed by <> ?
>> >
>> > Thanks for any help
>>
>> The question here is: What _can't_ happen? For instance, what happens
>> if Fred's name contains a greater-than symbol, or a caret?
>>
>> If those absolutely cannot happen, your parser can be fairly
>> straight-forward. Just put together some basic splitting (maybe a
>> regex), and then split on the caret inside that. Otherwise, you may
>> need a more stateful parser.
>>
>> ChrisA
>
> The data string is guaranteed to be clean - no such irregularities occur.

Okay!

(Side point: You've stripped off all citations, here, so it's not
clear who said what. My shorthand signature isn't as useful as the
full line identifying date, time, and person. It's polite to keep
those lines, at least for the first level of quoting.)

What you want can be done with a regular expression. (Yes, yes, I
know; now you have two problems.)

>>> data = '{[<a=14^b=Fred^c=45.22^><a=22^b=Joe^><a=17^c=3.20^>][<a=72^b=Soup^>]}'
>>> re.findall("<.*?>",data)
['<a=14^b=Fred^c=45.22^>', '<a=22^b=Joe^>', '<a=17^c=3.20^>', '<a=72^b=Soup^>']

>From there, you can crack open the different pieces:

>>> for piece in re.findall("<.*?>",data):
...     d = {}
...     for elem in piece[1:-2].split("^"):
...         key, value = elem.split("=",1)
...         d[key] = value
...     print(d)
...
{'c': '45.22', 'b': 'Fred', 'a': '14'}
{'b': 'Joe', 'a': '22'}
{'c': '3.20', 'a': '17'}
{'b': 'Soup', 'a': '72'}

If you need some of those to be integers or floats, you'll need to do
some post-processing on it, but this guarantees that you get the data
out reliably. It depends on not having any of the special characters
"=^<>" inside the elements, but other than that, it should be safe.

ChrisA

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Noob Parsing question kai.peters@gmail.com - 2015-02-17 20:07 -0800
  Re: Noob Parsing question Chris Angelico <rosuav@gmail.com> - 2015-02-18 15:16 +1100
    Re: Noob Parsing question kai.peters@gmail.com - 2015-02-17 20:35 -0800
      Re: Noob Parsing question Chris Angelico <rosuav@gmail.com> - 2015-02-18 15:54 +1100
        Re: Noob Parsing question kai.peters@gmail.com - 2015-02-18 08:57 -0800

csiph-web