Groups > comp.lang.python > #67163 > unrolled thread

Extracting parts of string between anchor points

Started by	Jignesh Sutar <jsutar@gmail.com>
First post	2014-02-27 20:07 +0000
Last post	2014-02-28 01:25 +0000
Articles	3 — 2 participants

Back to article view | Back to comp.lang.python

  Extracting parts of string between anchor points Jignesh Sutar <jsutar@gmail.com> - 2014-02-27 20:07 +0000
    Re: Extracting parts of string between anchor points Denis McMahon <denismfmcmahon@gmail.com> - 2014-02-28 00:55 +0000
      Re: Extracting parts of string between anchor points Denis McMahon <denismfmcmahon@gmail.com> - 2014-02-28 01:25 +0000

#67163 — Extracting parts of string between anchor points

From	Jignesh Sutar <jsutar@gmail.com>
Date	2014-02-27 20:07 +0000
Subject	Extracting parts of string between anchor points
Message-ID	<mailman.7437.1393531705.18130.python-list@python.org>

[Multipart message — attachments visible in raw view] — view raw

I've kind of got this working but my code is very ugly. I'm sure it's
regular expression I need to achieve this more but not very familiar with
use regex, particularly retaining part of the string that is being
searched/matched for.

Notes and code below to demonstrate what I am trying to achieve. Any help,
much appreciated.

Examples=["Test1A",
                  "Test2A: Test2B",
                   "Test3A: Test3B -:- Test3C", ""]

# Out1 is just itself unless if it is empty
# Out2 is everything left of ":" (including ":" i.e. part A) and right of
"-:-" (excluding "-:-" i.e. part C)
    # If text doesn't contain "-:-" then return text itself as it is
# Out3 is everything right of "-:-" (excluding "-:-" i.e. part C)
   # If text doesn't contain "-:-" but does contains ":" then return part B
only
   # If it doesn't contain ":" then return itself (unless if it empty then
"None")

for i,s in enumerate(Examples,start=1):
    Out1=s if len(s)>0 else "Empty"
    Out2=s[:s.find(":")+3] + s[s.find("-:-")+5:] if s.find("-:-")>0 else
s.strip() if len(s) else "Empty"
    Out3=s[s.find("-:-")+4:] if s.find("-:-")>0 else
s[s.find(":")+1:].strip() if s.find(":")>0 and len(s)!=s.find(":")+1 else s
if len(s) else "Empty"
    print "Item%(i)s <%(s)s>  Out1 = %(Out1)s" % locals()
    print "Item%(i)s <%(s)s>  Out2 = %(Out2)s" % locals()
    print "Item%(i)s <%(s)s>  Out3 = %(Out3)s" % locals()


Output:

Item1 <Test1A>  Out1 = Test1A
Item1 <Test1A>  Out2 = Test1A
Item1 <Test1A>  Out3 = Test1A
Item2 <Test2A: Test2B>  Out1 = Test2A: Test2B
Item2 <Test2A: Test2B>  Out2 = Test2A: Test2B
Item2 <Test2A: Test2B>  Out3 = Test2B #INCORRECT - Should be "Test2A:
Test2B"
Item3 <Test3A: Test3B -:- Test3C>  Out1 = Test3A: Test3B -:- Test3C
Item3 <Test3A: Test3B -:- Test3C>  Out2 = Test3A: Test3C
Item3 <Test3A: Test3B -:- Test3C>  Out3 = Test3C
Item4 <>  Out1 = Empty
Item4 <>  Out2 = Empty
Item4 <>  Out3 = Empty

[toc] | [next] | [standalone]

#67180

From	Denis McMahon <denismfmcmahon@gmail.com>
Date	2014-02-28 00:55 +0000
Message-ID	<leomp5$7ev$1@dont-email.me>
In reply to	#67163

On Thu, 27 Feb 2014 20:07:56 +0000, Jignesh Sutar wrote:

> I've kind of got this working but my code is very ugly. I'm sure it's
> regular expression I need to achieve this more but not very familiar
> with use regex, particularly retaining part of the string that is being
> searched/matched for.
> 
> Notes and code below to demonstrate what I am trying to achieve. Any
> help,
> much appreciated.

It seems you have a string which may be split into between 1 and 3 
substrings by the presence of up to 2 delimeters, and that if both 
delimeters are present, they are in a specified order.

You have several possible cases which, broadly speaking, break down into 
4 groups:

(a) no delimiters present
(b) delimiter 1 present
(c) delimiter 2 present
(d) both delimiters present

It is important when coding for such scenarios to consider the possible 
cases that are not specified, as well as the ones that are.

For example, consider the string:

"<delim1><delim2>"

where you have both delims, in sequence, but no other data elements.

I believe there are at least 17 possible combinations, and maybe another 
8 if you allow for the delims being out of sequence.

The code in the file at the url below processes 17 different cases. It 
may help, or it may confuse.

http://www.sined.co.uk/tmp/strparse.py.txt

-- 
Denis McMahon, denismfmcmahon@gmail.com

[toc] | [prev] | [next] | [standalone]

#67181

From	Denis McMahon <denismfmcmahon@gmail.com>
Date	2014-02-28 01:25 +0000
Message-ID	<leoohs$7ev$2@dont-email.me>
In reply to	#67180

On Fri, 28 Feb 2014 00:55:01 +0000, Denis McMahon wrote:

> The code in the file at the url below processes 17 different cases. It
> may help, or it may confuse.

> http://www.sined.co.uk/tmp/strparse.py.txt

I added some more cases to it, and then realised that the code could 
actually be simplified quite a lot. So now it has been.

-- 
Denis McMahon, denismfmcmahon@gmail.com

[toc] | [prev] | [standalone]

csiph-web

Extracting parts of string between anchor points

Contents

#67163 — Extracting parts of string between anchor points

#67180

#67181