Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!aioe.org!feeder.news-service.com!xlned.com!feeder7.xlned.com!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:x-enigmail-version:content-type :content-transfer-encoding; b=DaAREmymINPtziguQvWRY1rY7UcjSdP5RTFe461aqHJ6o0qEKtojlTz3HjmimIjgqC cyPo310/YGckxU2evUwhWOVecxP+IGJKF/miWIHe1hsuKIhK/5Gf/TP5i0rXeMf4IC3X swvYo6KouhBE89SKhNc0uPYi43vorsBcx+RBU=
Date: Sun, 29 May 2011 11:16:35 -0500
From: Andrew Berg <bahamutzero8825@gmail.com>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.15) Gecko/20110303 Thunderbird/3.1.9 ThunderBrowse/3.3.5
MIME-Version: 1.0
To: python-list@python.org
Subject: Re: Weird problem matching with REs
References: <mailman.2220.1306669538.9059.python-list@python.org>	<4de2459b$0$29996$c3e8da3$5496439d@news.astraweb.com>	<mailman.2222.1306676482.9059.python-list@python.org>	<4de255a8$0$29996$c3e8da3$5496439d@news.astraweb.com>	<mailman.2223.1306679725.9059.python-list@python.org> <1b8d81c1-ab87-4059-ad55-9f4a39331e7d@u26g2000vby.googlegroups.com>
In-Reply-To: <1b8d81c1-ab87-4059-ad55-9f4a39331e7d@u26g2000vby.googlegroups.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.2226.1306685804.9059.python-list@python.org>
Lines: 24
NNTP-Posting-Host: 82.94.164.166
Xref: x330-a1.tempe.blueboxinc.net comp.lang.python:6549

On 2011.05.29 10:48 AM, John S wrote:
> Dots don't match end-of-line-for-your-current-OS is how I think of
> it.
IMO, the docs should say the dot matches any character except a line
feed ('\n'), since that is more accurate.
> True, malformed
> HTML can throw you off, but they can also throw a parser off.
That was part of my point. html.parser.HTMLParser from the standard
library will definitely not work on x264.nl's broken HTML, and fixing it
requires lxml (I'm working with Python 3; I've looked into
BeautifulSoup, and does not work with Python 3 at all). Admittedly,
fixing x264.nl's HTML only requires one or two lines of code, but really
nasty HTML might require quite a bit of work.
> In your case, and because x264 might change their HTML, I suggest the
> following code, which works great on my system.YMMV. I changed your
> newline matches to use \s and put some capturing parentheses around
> the date, so you could grab it.
I've been meaning to learn how to use parenthesis groups.
> Also, be sure to
> use a raw string when composing REs, so you don't run into backslash
> issues.
How would I do that when grabbing strings from a config file (via the
configparser module)? Or rather, if I have a predefined variable
containing a string, how do change it into a raw string?