Re:Re: how to right the regular expression ?

Path	csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!nntp.club.cc.cmu.edu!195.208.113.1.MISMATCH!goblin3!goblin2!goblin.stu.neva.ru!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
Return-Path	<mailtomanage@163.com>
X-Original-To	python-list@python.org
Delivered-To	python-list@mail.python.org
X-Spam-Status	OK 0.002
X-Spam-Evidence	'H': 1.00; 'S': 0.00; 'output': 0.04; 'patterns': 0.04; 'attribute': 0.05; 'pat': 0.05; '-*-': 0.07; 'utf-8': 0.07; 'python': 0.09; '8bit%:30': 0.09; 'coding:': 0.09; '>>': 0.16; '>on': 0.16; '>the': 0.16; 'literals': 0.16; 'output?': 0.16; 'subject:expression': 0.16; 'subject:regular': 0.16; 'string': 0.17; 'wrote:': 0.17; '>>>': 0.18; 'skip:p 30': 0.20; 'import': 0.21; '>>>': 0.22; 'message-id:@163.com': 0.22; 'example': 0.23; '>': 0.23; 'this:': 0.23; 'header:In-Reply- To:1': 0.25; 'skip:" 20': 0.26; '(most': 0.27; 'raw': 0.27; 'skip:( 20': 0.28; 'all.': 0.28; '8bit%:89': 0.29; 'optional': 0.29; 'url:mailman': 0.29; 'skip:& 10': 0.29; 'subject: ?': 0.30; 'code': 0.31; 'url:python': 0.32; 'file': 0.32; 'print': 0.32; 'url:listinfo': 0.32; 'received:220.181.13': 0.33; 'traceback': 0.33; 'problem': 0.33; 'to:addr:python-list': 0.33; 'code:': 0.33; 'recommended': 0.33; 'but': 0.36; 'url:org': 0.36; 'subject:: ': 0.38; 'skip:( 30': 0.38; 'object': 0.38; 'to:addr:python.org': 0.39; 'skip:" 10': 0.40; 'url:mail': 0.40; 'url:ip addr': 0.62; 'skip:n 10': 0.63; 'here': 0.65; '8bit%:100': 0.70; 'subject::': 0.83; 'url:177': 0.84; 'url:202': 0.84; '8bit%:54': 0.91; '8bit%:56': 0.91; 'url:is': 0.91; 'url:nbsp': 0.93
DKIM-Signature	v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com; s=s110527; h=Received:Date:From:To:Subject:In-Reply-To: References:Content-Type:MIME-Version:Message-ID; bh=rWrszHojyhmc ze51sn3BWVU0aJZW09Fo/ve4aT6y2NQ=; b=RcWIpyuU/CgPnOmqkL7PIQub5A0M H1tbyisjTQwXwUiMmJYD8eRTvgr/yhvZ/9XHd5eSpHkNWHCfRScM3OyRdvz4/GWw J6IVwut/mmceD7xqY2/BYcrUgjqVGziAp/hRWGrUmbKdNh6CD0Xiq8YDu6oyCQcJ lvFi2J5erKgG1bI=
X-Originating-IP	[1.89.180.69]
Date	Fri, 15 Feb 2013 08:32:14 +0800 (CST)
From	python <mailtomanage@163.com>
To	python-list@python.org
Subject	Re:Re: how to right the regular expression ?
X-Priority	3
X-Mailer	Coremail Webmail Server Version SP_ntes V3.5 build 20130124(21453.5226.5222) Copyright (c) 2002-2013 www.mailtech.cn 163com
In-Reply-To	<511D062E.4080101@mrabarnett.plus.com>
References	<227f6014.405c.13cd90da3d4.Coremail.mailtomanage@163.com> <511D062E.4080101@mrabarnett.plus.com>
X-CM-CTRLDATA	ZzFHxGZvb3Rlcl9odG09ODEyMTo4MQ==
Content-Type	multipart/alternative; boundary="----=_Part_3811_1277455604.1360888334389"
MIME-Version	1.0
X-CM-TRANSID	k8GowABnbgoOgh1R1GhYAA--.38908W
X-CM-SenderInfo	hpdlz3xrpd0tljh6il2tof0z/1tbisQ7EPFD+JflF9gADsF
X-Coremail-Antispam	1U5529EdanIXcx71UUUUU7vcSsGvfC2KfnxnUU==
X-BeenThere	python-list@python.org
X-Mailman-Version	2.1.15
Precedence	list
List-Id	General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe	<http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive	<http://mail.python.org/pipermail/python-list/>
List-Post	<mailto:python-list@python.org>
List-Help	<mailto:python-list-request@python.org?subject=help>
List-Subscribe	<http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups	comp.lang.python
Message-ID	<mailman.1791.1360889265.2939.python-list@python.org> (permalink)
Lines	254
NNTP-Posting-Host	2001:888:2000:d::a6
X-Trace	1360889265 news.xs4all.nl 6925 [2001:888:2000:d::a6]:42892
X-Complaints-To	abuse@xs4all.nl
Xref	csiph.com comp.lang.python:38898

Show key headers only | View raw

[Multipart message — attachments visible in raw view] - view raw

the regex--- pat = r'([a-z].+?\s)(.+)(?:(\(.+\)))?' ,do not work at all.


>>> rfile.close()
>>> import re
>>> rfile=open("tv.txt","r")
>>> pat1 = r'([a-z].+?\s)(.+)((\(.+\)))?'
>>> for  line in  rfile.readlines():
...     Match=re.match(pat1,line)
...     print "1group is ",Match.group(1),"2group is ",Match.group(2),"3group is ",Match.group(3)
... 
1group is  http://202.177.192.119/radio5  2group is  香港电台第五台(可于Totem/VLC/MPlayer播放) 3group is  None
1group is  http://202.177.192.119/radio35  2group is  香港电台第五台(DAB版，可于Totem/VLC/MPlayer播放) 3group is  None
1group is  http://202.177.192.119/radiopth  2group is  香港电台普通话台(可于Totem/VLC/MPlayer播放) 3group is  None
1group is  http://202.177.192.119/radio31  2group is  香港电台普通话台(DAB版，可于Totem/VLC/MPlayer播放) 3group is  None
1group is  octoshape:rthk.ch1  2group is  香港电台第一台(粤) 3group is  None
1group is  octoshape:rthk.ch2  2group is  香港电台第二台(粤) 3group is  None
1group is  octoshape:rthk.ch6  2group is  香港电台普通话台 3group is  None
1group is  octoshape:rthk.ch3  2group is  香港电台第三台(英) 3group is  None




>>> rfile.close()
>>> import re
>>> rfile=open("tv.txt","r")
>>> pat2 = r'([a-z].+?\s)(.+?)((\(.+\)))?'
>>> for  line in  rfile.readlines():
...     Match=re.match(pat1,line)
...     print "1group is ",Match.group(1),"2group is ",Match.group(2),"3group is ",Match.group(3)
... 
1group is  http://202.177.192.119/radio5  2group is  香港电台第五台(可于Totem/VLC/MPlayer播放) 3group is  None
1group is  http://202.177.192.119/radio35  2group is  香港电台第五台(DAB版，可于Totem/VLC/MPlayer播放) 3group is  None
1group is  http://202.177.192.119/radiopth  2group is  香港电台普通话台(可于Totem/VLC/MPlayer播放) 3group is  None
1group is  http://202.177.192.119/radio31  2group is  香港电台普通话台(DAB版，可于Totem/VLC/MPlayer播放) 3group is  None
1group is  octoshape:rthk.ch1  2group is  香港电台第一台(粤) 3group is  None
1group is  octoshape:rthk.ch2  2group is  香港电台第二台(粤) 3group is  None
1group is  octoshape:rthk.ch6  2group is  香港电台普通话台 3group is  None
1group is  octoshape:rthk.ch3  2group is  香港电台第三台(英) 3group is  None








在 2013-02-14 23:43:42，MRAB <python@mrabarnett.plus.com> 写道：
>On 2013-02-14 14:13, python wrote:
>> my tv.txt is :
>> http://202.177.192.119/radio5 香港电台第五台(可于Totem/VLC/MPlayer播放)
>> http://202.177.192.119/radio35 香港电台第五台(DAB版，可于Totem/VLC/MPlayer播放)
>> http://202.177.192.119/radiopth 香港电台普通话台(可于Totem/VLC/MPlayer播放)
>> http://202.177.192.119/radio31 香港电台普通话台(DAB版，可于Totem/VLC/MPlayer播放)
>> octoshape:rthk.ch1 香港电台第一台(粤)
>> octoshape:rthk.ch2 香港电台第二台(粤)
>> octoshape:rthk.ch6 香港电台普通话台
>> octoshape:rthk.ch3 香港电台第三台(英)
>>
>> what i want to get the result is
>> 1group is  http://202.177.192.119/radio5  2group is  香港电台第五台  3group is  (可于Totem/VLC/MPlayer播放)
>> 1group is  http://202.177.192.119/radio35  2group is  香港电台第五台  3group is  (DAB版，可于Totem/VLC/MPlayer播放)
>> 1group is  http://202.177.192.119/radiopth  2group is  香港电台普通话台  3group is  (可于Totem/VLC/MPlayer播放)
>> 1group is  http://202.177.192.119/radio31  2group is  香港电台普通话台  3group is  (DAB版，可于Totem/VLC/MPlayer播放)
>> 1group is  octoshape:rthk.ch1  2group is  香港电台第一台 3group is  (粤)
>> 1group is  octoshape:rthk.ch2  2group is  香港电台第二台 3group is  (粤)
>> 1group is  octoshape:rthk.ch6  2group is  香港电台普通话台 3group is  none
>> 1group is  octoshape:rthk.ch3  2group is  香港电台第三台 3group is  (英)
> >
>> here is my code:
>> # -*- coding: utf-8 -*-
>> import re
>> rfile=open("tv.txt","r")
>> pat=r'([a-z].+?\s)(.+)(\(.+\))'
>> for  line in  rfile.readlines():
>>      Match=re.match(pat,line)
>>      print "1group is ",Match.group(1),"2group is
>> ",Match.group(2),"3group is ",Match.group(3)
>> rfile.close()
>>
>> the output is :
>> 1group is  http://202.177.192.119/radio5  2group is  香港电台第五台
>> 3group is  (可于Totem/VLC/MPlayer播放)
>> 1group is  http://202.177.192.119/radio35  2group is  香港电台第五台
>> 3group is  (DAB版，可于Totem/VLC/MPlayer播放)
>> 1group is  http://202.177.192.119/radiopth  2group is  香港电台普通话台
>> 3group is  (可于Totem/VLC/MPlayer播放)
>> 1group is  http://202.177.192.119/radio31  2group is  香港电台普通话台
>> 3group is  (DAB版，可于Totem/VLC/MPlayer播放)
>> 1group is  octoshape:rthk.ch1  2group is  香港电台第一台 3group is  (粤)
>> 1group is  octoshape:rthk.ch2  2group is  香港电台第二台 3group is  (粤)
>> 1group is
>> Traceback (most recent call last):
>>    File "tv.py", line 7, in <module>
>>      print "1group is ",Match.group(1),"2group is ",Match.group(2),"3group is ",Match.group(3)
>> AttributeError: 'NoneType' object has no attribute 'group'
>>
>> how to revise my code to get the output?
>>
>The problem is that the regex makes '(\(.+\))' mandatory, but example 7
>doesn't match it.
>
>You can make it optional by wrapping it in a non-capturing group
>(?:...), like this:
>
>pat = r'([a-z].+?\s)(.+)(?:(\(.+\)))?'
>
>Also, it's highly recommended that you use raw string literals
>(r'...') when writing regex patterns and replacements.
>
>-- 
>http://mail.python.org/mailman/listinfo/python-list

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread

Thread

Re:Re: how to right the regular expression ? python  <mailtomanage@163.com> - 2013-02-15 08:32 +0800

csiph-web