Path: csiph.com!usenet.pasdenom.info!aioe.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Sender: Kushal Kumaran <kushal.kumaran@gmail.com>
From: Kushal Kumaran <kushal.kumaran+python@gmail.com>
To: python-list@python.org
Subject: Re: In defence of 80-char lines
In-Reply-To: <roy-D6F29A.08394604042013@news.panix.com>
References: <515cd919$0$29966$c3e8da3$5496439d@news.astraweb.com> <mailman.96.1365077619.3114.python-list@python.org> <roy-D6F29A.08394604042013@news.panix.com>
User-Agent: Notmuch/0.13.2 (http://notmuchmail.org) Emacs/24.1.1 (x86_64-pc-linux-gnu)
Date: Thu, 04 Apr 2013 23:04:07 +0530
MIME-Version: 1.0
Content-Type: text/plain
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.111.1365096868.3114.python-list@python.org>
Lines: 63
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:42770

Roy Smith <roy@panix.com> writes:

> In article <mailman.96.1365077619.3114.python-list@python.org>,
>  Jason Swails <jason.swails@gmail.com> wrote:
>
>> The only time I regularly break my rule is for regular expressions (at some
>> point I may embrace re.X to allow me to break those up, too).
>
> re.X is a pretty cool tool for making huge regexes readable.  But, it 
> turns out that python's auto-continuation and string literal 
> concatenation rules are enough to let you get much the same effect.  
> Here's a regex we use to parse haproxy log files. This would be utter 
> line noise all run together. This way, it's almost readable :-)
>
> pattern = re.compile(r'haproxy\[(?P<pid>\d+)]: '
>                      r'(?P<client_ip>(\d{1,3}\.){3}\d{1,3}):'
>                      r'(?P<client_port>\d{1,5}) '
>                      
> r'\[(?P<accept_date>\d{2}/\w{3}/\d{4}(:\d{2}){3}\.\d{3})] '
>                      r'(?P<frontend_name>\S+) '
>                      r'(?P<backend_name>\S+)/'
>                      r'(?P<server_name>\S+) '
>                      r'(?P<Tq>(-1|\d+))/'
>                      r'(?P<Tw>(-1|\d+))/'
>                      r'(?P<Tc>(-1|\d+))/'
>                      r'(?P<Tr>(-1|\d+))/'
>                      r'(?P<Tt>\+?\d+) '
>                      r'(?P<status_code>\d{3}) '
>                      r'(?P<bytes_read>\d+) '
>                      r'(?P<captured_request_cookie>\S+) '
>                      r'(?P<captured_response_cookie>\S+) '
>                      r'(?P<termination_state>[\w-]{4}) '
>                      r'(?P<actconn>\d+)/'
>                      r'(?P<feconn>\d+)/'
>                      r'(?P<beconn>\d+)/'
>                      r'(?P<srv_conn>\d+)/'
>                      r'(?P<retries>\d+) '
>                      r'(?P<srv_queue>\d+)/'
>                      r'(?P<backend_queue>\d+) '
>                      r'(\{(?P<request_id>.*?)\} )?'
>                      r'(\{(?P<captured_request_headers>.*?)\} )?'
>                      r'(\{(?P<captured_response_headers>.*?)\} )?'
>                      r'"(?P<http_request>.+)"'
>                      )
>
> And, for those of you who go running in the other direction every time 
> regex is suggested as a solution, I challenge you to come up with easier 
> to read (or write) code for parsing a line like this (probably 
> hopelessly mangled by the time you read it):
>
> 2013-04-03T00:00:00+00:00 localhost haproxy[5199]: 10.159.19.244:57291 
> [02/Apr/2013:23:59:59.811] app-nodes next-song-nodes/web8.songza.com 
> 0/0/3/214/219 200 593 sessionid=NWiX5KGOdvg6dSaA 
> sessionid=NWiX5KGOdvg6dSaA ---- 249/249/149/14/0 0/0 
> {4C0ABFA9-515B6DEF-933229} "POST 
> /api/1/station/892337/song/16024201/notify-play HTTP/1.0"

Is using csv.DictReader with delimiter=' ' not sufficient for this?  I
did not actually read the regular expression in its entirety.

-- 
regards,
kushal