Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!news.albasani.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed1a.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'url:sourceforge': 0.03; '---------': 0.07; 'urllib2': 0.07; 'string': 0.09; 'f.close()': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'subject:string': 0.09; 'wrong,': 0.09; 'python': 0.11; 'def': 0.12; 'data)': 0.16; 'none.': 0.16; 'received:80.91.229.3': 0.16; 'received:dip0.t-ipconnect.de': 0.16; 'received:plane.gmane.org': 0.16; 'received:t-ipconnect.de': 0.16; 'fix': 0.17; 'wrote:': 0.18; 'module': 0.19; '>>>': 0.22; 'import': 0.22; 'print': 0.22; 'header:User-Agent:1': 0.23; 'string,': 0.24; 'subject:problem': 0.24; 'skip:" 20': 0.27; 'header:X-Complaints-To:1': 0.27; 'function': 0.29; "doesn't": 0.30; 'code': 0.31; "skip:' 10": 0.31; 'skip:# 10': 0.33; 'skip:u 20': 0.35; 'but': 0.35; 'found.': 0.36; 'object,': 0.36; 'returning': 0.36; 'should': 0.36; 'to:addr :python-list': 0.38; 'short': 0.38; 'to:addr:python.org': 0.39; 'received:org': 0.40; 'skip:u 10': 0.60; 're:': 0.63; 'within': 0.65 X-Injected-Via-Gmane: http://gmane.org/ To: python-list@python.org From: Peter Otten <__peter__@web.de> Subject: Re: string encoding regex problem Date: Sat, 23 Aug 2014 23:13:07 +0200 Organization: None References: Mime-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7Bit X-Gmane-NNTP-Posting-Host: p57bda680.dip0.t-ipconnect.de User-Agent: KNode/4.13.3 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 58 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1408828401 news.xs4all.nl 2954 [2001:888:2000:d::a6]:52937 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:76901 Philipp Kraus wrote: > I have create a short script: > > --------- > #!/usr/bin/env python > > import re, urllib2 > > > def URLReader(url) : > f = urllib2.urlopen(url) > data = f.read() > f.close() > return data > > > print re.match( "\.*\<\/small\>", > URLReader("http://sourceforge.net/projects/boost/") ) > --------- > > Within the data the string "boost_1_56_0.tar.gz" should > be machted, but I get always a None result on the re.match, re.search > returns also a None. >>> help(re.match) Help on function match in module re: match(pattern, string, flags=0) Try to apply the pattern at the start of the string, returning a match object, or None if no match was found. As the string doesn't start with your regex re.match() is clearly wrong, but re.search() works for me: >>> import re, urllib2 >>> >>> >>> def URLReader(url) : ... f = urllib2.urlopen(url) ... data = f.read() ... f.close() ... return data ... >>> data = URLReader("http://sourceforge.net/projects/boost/") >>> re.search("\.*\<\/small\>", data) <_sre.SRE_Match object at 0x7f282dd58718> >>> _.group() 'boost_1_56_pdf.7z' > I have tested the regex under http://regex101.com/ with the HTML code > and on the page the regex is matched. > > Can you help me please to fix the problem, I don't understand that the > match returns None