Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.033 X-Spam-Evidence: '*H*': 0.93; '*S*': 0.00; 'string': 0.09; 'subject:string': 0.09; 'cc:addr:python-list': 0.11; '":"': 0.16; '(excluding': 0.16; 'regex,': 0.16; 'subject:between': 0.16; 'demonstrate': 0.16; 'sender:addr:gmail.com': 0.17; 'trying': 0.19; 'cc:addr:python.org': 0.22; 'print': 0.22; 'skip:% 10': 0.24; 'skip:l 30': 0.24; 'cc:2**0': 0.24; 'cc:no real name:2**0': 0.24; "i've": 0.25; 'appreciated.': 0.29; "doesn't": 0.30; 'message-id:@mail.gmail.com': 0.30; "i'm": 0.30; 'code': 0.31; '(unless': 0.31; 'regular': 0.32; 'text': 0.33; '(including': 0.33; 'skip:s 30': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'i.e.': 0.36; 'should': 0.36; 'being': 0.38; 'skip:o 20': 0.38; 'skip:& 10': 0.38; 'does': 0.39; 'help,': 0.39; 'itself': 0.39; '\xa0\xa0\xa0': 0.39; 'sure': 0.39; 'expression': 0.60; 'skip:o 30': 0.61; 'kind': 0.63; 'more': 0.64; '8bit%:100': 0.72; 'to:none': 0.92 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:from:date:message-id:subject:cc:content-type; bh=14mJuNtuggH1FJeR7dNL86MU5aZvCMnTvichEHUK0uI=; b=U/TbWKpdidzLpZShdoTJviRRe6lW/jaHJHKdqD0cWRCgSJ9Pr+GhkMGGE2uFeNSBnC OP/0cTwnAD8+lp2Vjffp4PXV03WkNvqyNX3rySV3O9BZKmb2v/NYnKwncp1Mx8HRp2nn ZSh7XiLH/NaHpAIPo6r1QkSthB0ZC1Obg1E/GZ6LMwAyDA5SHQ/Yb0B3osuIUdmyA0ok PjI00aic9eRGWmlDM1F+ZDgnSG5iwieHxP5zjfZnboIbv4YJFEsUO7We/MMgJnrcXKu8 rKFNBDQcx+aJEbVm8QoysvzD+BspoTF9TEG1PE0z33SdG5r3JrtBg80wIKM7jPu5Zlvi D9lw== X-Received: by 10.182.2.42 with SMTP id 10mr3321850obr.73.1393531696890; Thu, 27 Feb 2014 12:08:16 -0800 (PST) MIME-Version: 1.0 Sender: jignesh.sutar@gmail.com From: Jignesh Sutar Date: Thu, 27 Feb 2014 20:07:56 +0000 X-Google-Sender-Auth: JtY-LgSKMqeFltYzFpG7eB6kDho Subject: Extracting parts of string between anchor points Cc: python-list@python.org Content-Type: multipart/alternative; boundary=001a1134ad06c6c72704f368e1f9 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 119 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1393531705 news.xs4all.nl 2891 [2001:888:2000:d::a6]:49187 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:67163 --001a1134ad06c6c72704f368e1f9 Content-Type: text/plain; charset=ISO-8859-1 I've kind of got this working but my code is very ugly. I'm sure it's regular expression I need to achieve this more but not very familiar with use regex, particularly retaining part of the string that is being searched/matched for. Notes and code below to demonstrate what I am trying to achieve. Any help, much appreciated. Examples=["Test1A", "Test2A: Test2B", "Test3A: Test3B -:- Test3C", ""] # Out1 is just itself unless if it is empty # Out2 is everything left of ":" (including ":" i.e. part A) and right of "-:-" (excluding "-:-" i.e. part C) # If text doesn't contain "-:-" then return text itself as it is # Out3 is everything right of "-:-" (excluding "-:-" i.e. part C) # If text doesn't contain "-:-" but does contains ":" then return part B only # If it doesn't contain ":" then return itself (unless if it empty then "None") for i,s in enumerate(Examples,start=1): Out1=s if len(s)>0 else "Empty" Out2=s[:s.find(":")+3] + s[s.find("-:-")+5:] if s.find("-:-")>0 else s.strip() if len(s) else "Empty" Out3=s[s.find("-:-")+4:] if s.find("-:-")>0 else s[s.find(":")+1:].strip() if s.find(":")>0 and len(s)!=s.find(":")+1 else s if len(s) else "Empty" print "Item%(i)s <%(s)s> Out1 = %(Out1)s" % locals() print "Item%(i)s <%(s)s> Out2 = %(Out2)s" % locals() print "Item%(i)s <%(s)s> Out3 = %(Out3)s" % locals() Output: Item1 Out1 = Test1A Item1 Out2 = Test1A Item1 Out3 = Test1A Item2 Out1 = Test2A: Test2B Item2 Out2 = Test2A: Test2B Item2 Out3 = Test2B #INCORRECT - Should be "Test2A: Test2B" Item3 Out1 = Test3A: Test3B -:- Test3C Item3 Out2 = Test3A: Test3C Item3 Out3 = Test3C Item4 <> Out1 = Empty Item4 <> Out2 = Empty Item4 <> Out3 = Empty --001a1134ad06c6c72704f368e1f9 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
I've kind of got this worki= ng but my code is very ugly. I'm sure it's regular expression I nee= d to achieve this more but not very familiar with use regex, particularly r= etaining part of the string that is being searched/matched for.

Notes and c= ode below to demonstrate what I am trying to achieve. Any help, much apprec= iated.

Examples=3D["Test1A",
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 "Test2A: Test2B&qu= ot;,
=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 "Test3A= : Test3B -:- Test3C", ""]
# Out1 is just itself unless if it is em= pty
# Out2 is everything left of ":" (including ":" i.e. pa= rt A) and right of "-:-" (excluding "-:-" i.e. part C)<= br>=A0=A0=A0 # If text doesn't contain "-:-" then return text= itself as it is
# Out3 is everything right of "-:-" (excluding "-:-" i.= e. part C)
=A0=A0 # If text doesn't contain "-:-" but does= contains ":" then return part B only
=A0=A0 # If it doesn'= ;t contain ":" then return itself (unless if it empty then "= None")

for i,s in = enumerate(Examples,start=3D1):
=A0=A0=A0 Out1=3Ds if len(s)>0 else &q= uot;Empty"
=A0=A0=A0 Out2=3Ds[:s.find(":")+3] + s[s.find= ("-:-")+5:] if s.find("-:-")>0 else s.strip() if len= (s) else "Empty"
=A0=A0=A0 Out3=3Ds[s.find("-:-")+4:] if s.find("-:-")&g= t;0 else s[s.find(":")+1:].strip() if s.find(":")>0 = and len(s)!=3Ds.find(":")+1 else s if len(s) else "Empty&quo= t;
=A0=A0=A0 print "Item%(i)s <%(s)s>=A0= Out1 =3D %(Out1)s" % locals()
=A0=A0=A0 print "Item%(i)s <= ;%(s)s>=A0 Out2 =3D %(Out2)s" % locals()
=A0=A0=A0 print "= Item%(i)s <%(s)s>=A0 Out3 =3D %(Out3)s" % locals()


<= div class=3D"gmail_extra">Output:

Item1 <Test1A>=A0 Out1 =3D Test1A=A0
I= tem1 <Test1A>=A0 Out2 =3D Test1A
Item1 <Test1A>=A0 Out3 =3D Test1A
Item2 <Test2A: Test2B>=A0= Out1 =3D Test2A: Test2B
Item2 <Test2A: Test2B>=A0 Out2 =3D Test2= A: Test2B
Item2 <Test2A: Test2B>=A0 Out3 =3D Test2B #INCORRECT - = Should be "Test2A: Test2B"
Item3 <Test3A: Test3B -:- Test3C>=A0 Out1 = =3D Test3A: Test3B -:- Test3C
Item3 <Test3A: Test3B -:- Test3C>= =A0 Out2 =3D Test3A: Test3C
Item3 <Test3A: Test3B -:- Test3C>=A0 = Out3 =3D Test3C
Item4 <>=A0 Out1 =3D Empty
Item4 <>=A0 Out2 =3D Empty
I= tem4 <>=A0 Out3 =3D Empty
--001a1134ad06c6c72704f368e1f9--