Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #7701 > unrolled thread

How to form a dict out of a string by doing regex ?

Started bySatyajit Sarangi <writetosatyajit@gmail.com>
First post2011-06-15 07:42 -0700
Last post2011-06-15 14:58 -0400
Articles 3 — 3 participants

Back to article view | Back to comp.lang.python


Contents

  How to form a dict out of a string by doing regex ? Satyajit Sarangi <writetosatyajit@gmail.com> - 2011-06-15 07:42 -0700
    Re: How to form a dict out of a string by doing regex ? Mel <mwilson@the-wire.com> - 2011-06-15 11:36 -0400
    Re: How to form a dict out of a string by doing regex ? Terry Reedy <tjreedy@udel.edu> - 2011-06-15 14:58 -0400

#7701 — How to form a dict out of a string by doing regex ?

FromSatyajit Sarangi <writetosatyajit@gmail.com>
Date2011-06-15 07:42 -0700
SubjectHow to form a dict out of a string by doing regex ?
Message-ID<0cfd4592-75fd-48ec-884b-122b0e094078@j13g2000pro.googlegroups.com>

data = "GEOMETRYCOLLECTION (POINT (-8.9648437500000000
-4.1308593750000000), POINT (2.0214843750000000 -2.6367187500000000),
POINT (-1.4062500000000000 -11.1621093750000000), POINT
(-11.9531250000000000,-10.8984375000000000), POLYGON
((-21.6210937500000000 1.8457031250000000,2.4609375000000000
2.1972656250000000, -18.9843750000000000 -3.6914062500000000,
-22.6757812500000000 -3.3398437500000000, -22.1484375000000000
-2.6367187500000000, -21.6210937500000000
1.8457031250000000)),LINESTRING (-11.9531250000000000
11.3378906250000000, 7.7343750000000000 11.5136718750000000,
12.3046875000000000 2.5488281250000000, 12.2167968750000000
1.6699218750000000, 14.5019531250000000 3.9550781250000000))"

This is my string .
How do I traverse through it and form 3 dicts of Point , Polygon and
Linestring containing the co-ordinates ?

[toc] | [next] | [standalone]


#7705

FromMel <mwilson@the-wire.com>
Date2011-06-15 11:36 -0400
Message-ID<itajhj$7rk$1@speranza.aioe.org>
In reply to#7701
Satyajit Sarangi wrote:

> 
> 
> data = "GEOMETRYCOLLECTION (POINT (-8.9648437500000000
> -4.1308593750000000), POINT (2.0214843750000000 -2.6367187500000000),
> POINT (-1.4062500000000000 -11.1621093750000000), POINT
> (-11.9531250000000000,-10.8984375000000000), POLYGON
> ((-21.6210937500000000 1.8457031250000000,2.4609375000000000
> 2.1972656250000000, -18.9843750000000000 -3.6914062500000000,
> -22.6757812500000000 -3.3398437500000000, -22.1484375000000000
> -2.6367187500000000, -21.6210937500000000
> 1.8457031250000000)),LINESTRING (-11.9531250000000000
> 11.3378906250000000, 7.7343750000000000 11.5136718750000000,
> 12.3046875000000000 2.5488281250000000, 12.2167968750000000
> 1.6699218750000000, 14.5019531250000000 3.9550781250000000))"
> 
> This is my string .
> How do I traverse through it and form 3 dicts of Point , Polygon and
> Linestring containing the co-ordinates ?

Except for those space-separated number pairs, it could be a job for some 
well-crafted classes (e.g. `class GEOMETRYCOLLECTION ...`, `class POINT 
...`) and eval.

My approach would be to use a loop with regexes to recognize the leading 
element and pick out its arguments, then use the string split and strip 
methods beyond that point.  Like (untested):

recognizer = re.compile (r'(?(POINT|POLYGON|LINESTRING)\s*\(+(.*?)\)+,(.*)')
# regex is not good with nested brackets, 
# so kill off outer nested brackets..
s1 = 'GEOMETRYCOLLECTION ('
if data.startswith (s1):
    data = data (len (s1):-1)

while data:
    match = recognizer.match (data)
    if not match:
        break	# nothing usable in data
    ## now the matched groups will be:
    ## 1: the keyword
    ## 2: the arguments inside the smallest bracketed sequence
    ## 3: the rest of data
    ##  so use str.split and str.match to pull out the individual arguments,
    ## and lastly
    data = match.group (3)

This is all from memory.  I might have got some details wrong in recognizer.

	Mel.

[toc] | [prev] | [next] | [standalone]


#7712

FromTerry Reedy <tjreedy@udel.edu>
Date2011-06-15 14:58 -0400
Message-ID<mailman.5.1308164304.1164.python-list@python.org>
In reply to#7701
On 6/15/2011 10:42 AM, Satyajit Sarangi wrote:
>
>
> data = "GEOMETRYCOLLECTION (POINT (-8.9648437500000000
> -4.1308593750000000), POINT (2.0214843750000000 -2.6367187500000000),
> POINT (-1.4062500000000000 -11.1621093750000000), POINT
> (-11.9531250000000000,-10.8984375000000000), POLYGON
> ((-21.6210937500000000 1.8457031250000000,2.4609375000000000
> 2.1972656250000000, -18.9843750000000000 -3.6914062500000000,
> -22.6757812500000000 -3.3398437500000000, -22.1484375000000000
> -2.6367187500000000, -21.6210937500000000
> 1.8457031250000000)),LINESTRING (-11.9531250000000000
> 11.3378906250000000, 7.7343750000000000 11.5136718750000000,
> 12.3046875000000000 2.5488281250000000, 12.2167968750000000
> 1.6699218750000000, 14.5019531250000000 3.9550781250000000))"
>
> This is my string .

If this what you are given by an unchangable external source or can you 
get something a bit better? One object per line would make the problem 
pretty simple, with no regex required.

> How do I traverse through it and form 3 dicts of Point , Polygon and
> Linestring containing the co-ordinates ?

Dicts map keys to values. I do not see any key values above. It looks 
like you really want three sets.


-- 
Terry Jan Reedy

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web