Re: Reading \n unescaped from a file

Path	csiph.com!goblin1!goblin.stu.neva.ru!uio.no!news.tele.dk!news.tele.dk!small.news.tele.dk!newsgate.cistron.nl!newsgate.news.xs4all.nl!nzpost1.xs4all.net!not-for-mail
Return-Path	<rosuav@gmail.com>
X-Original-To	python-list@python.org
Delivered-To	python-list@mail.python.org
X-Spam-Status	OK 0.000
X-Spam-Evidence	'H': 1.00; 'S': 0.00; 'python,': 0.02; 'sufficient': 0.05; 'tries': 0.05; '"as': 0.07; 'subject:file': 0.07; 'trailing': 0.07; 'cc:addr:python-list': 0.09; 'backslash': 0.09; 'ignoring': 0.09; 'oh,': 0.09; 'python': 0.10; 'do,': 0.15; 'wed,': 0.15; 'carriage': 0.16; 'escapes': 0.16; 'fancy': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'naive': 0.16; 'subject:Reading': 0.16; 'wrote:': 0.16; '<': 0.18; 'debugging': 0.18; 'skip:l 30': 0.18; '>': 0.18; '2015': 0.20; 'cc:2**0': 0.20; 'cc:addr:python.org': 0.20; 'sep': 0.22; 'file.': 0.22; 'code,': 0.23; 'seems': 0.23; 'this:': 0.23; 'header:In-Reply-To:1': 0.24; 'followed': 0.27; 'handling': 0.27; 'message-id:@mail.gmail.com': 0.27; '"': 0.29; 'escaped': 0.29; 'pile': 0.29; 'character': 0.29; 'comments': 0.30; 'code': 0.30; 'probably': 0.31; 'source': 0.33; 'file': 0.34; 'except': 0.34; 'handle': 0.34; 'received:google.com': 0.35; 'easiest': 0.35; 'mapping': 0.35; 'something': 0.35; 'but': 0.36; 'pm,': 0.36; 'subject:: ': 0.37; 'someone': 0.38; 'why': 0.39; 'subject:from': 0.39; 'some': 0.40; 'easy': 0.60; 'your': 0.60; 'you.': 0.64; 'capable': 0.65; 'else.': 0.66; 'potentially': 0.67; 'repeat': 0.67; 'chrisa': 0.84; 'confusing': 0.84; 'to:none': 0.91; 'hills': 0.93
DKIM-Signature	v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type:content-transfer-encoding; bh=BA3T7gonhx3f0Wyg+UcbJJOUmWk2uwDC2iG2UgZMiaY=; b=zmP1MAB72vvgY10BHtrE/w/vbytipJKJuSxES/Gd5xvLlUJa9UAvhXUni/YOn8xwjc 8D1XlX3CoJQpugXpwBa3310aPmF09yIyXvWJfwr/XyxETa4C/35dqzl2b9SoKpgsLc/0 K7CAW4NkSleV3vn4PY9x0wwGivSaaLXu5DMJUA2OUiV8TIK0DUx/4bGd35R8+o3iSTFP AEvsUrHKPMC70kKJanO7HYRoRCyBITtJQs4T+3Fyz3J5s0XGwGjzWx5U47f/E2QMpKD4 Kk65Ak82Qo4X7KoWkXD56+0v4XgihROUOi8FbM5ZhHfsofgj/eNtqLjpu5IW4Kyj8T0F /GSw==
MIME-Version	1.0
X-Received	by 10.107.36.8 with SMTP id k8mr17629807iok.157.1441231828160; Wed, 02 Sep 2015 15:10:28 -0700 (PDT)
In-Reply-To	<55E65909.2080507@medimorphosis.com.au>
References	<55E65909.2080507@medimorphosis.com.au>
Date	Thu, 3 Sep 2015 08:10:28 +1000
Subject	Re: Reading \n unescaped from a file
From	Chris Angelico <rosuav@gmail.com>
Cc	"python-list@python.org" <python-list@python.org>
Content-Type	text/plain; charset=UTF-8
Content-Transfer-Encoding	quoted-printable
X-BeenThere	python-list@python.org
X-Mailman-Version	2.1.20+
Precedence	list
List-Id	General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe	<https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive	<http://mail.python.org/pipermail/python-list/>
List-Post	<mailto:python-list@python.org>
List-Help	<mailto:python-list-request@python.org?subject=help>
List-Subscribe	<https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups	comp.lang.python
Message-ID	<mailman.40.1441231836.8327.python-list@python.org> (permalink)
Lines	49
NNTP-Posting-Host	2001:888:2000:d::a6
X-Trace	1441231836 news.xs4all.nl 23861 [2001:888:2000:d::a6]:50213
X-Complaints-To	abuse@xs4all.nl
Xref	csiph.com comp.lang.python:95899

Show key headers only | View raw

On Wed, Sep 2, 2015 at 12:03 PM, Rob Hills <rhills@medimorphosis.com.au> wrote:
> My mapping file contents look like this:
>
> \r = \\n
> â€œ = &quot;

Oh, lovely. Code page 1252 when you're expecting UTF-8. Sadly, you're
likely to have to cope with a whole pile of other mojibake if that
happens :(

You have my sympathy.

> &lt; = <
> &gt; = >
> &#039; = &apos;
> &#070; = F
> &#111; = o
> &#102; = f
> &#101; = e
> &#079; = O
>
> This all works "as advertised" except for the '\r' => '\\n' replacement.
> Debugging the code, I see that my '\r' character is "escaped" to '\\r' and
> the '\\n' to '\\\\n' when they are read in from the file.

Technically, what's happening is that your "\r" is literally a
backslash followed by the letter r; the transformation of backslash
sequences into single characters is part of Python source code
parsing. (Incidentally, why do you want to change a carriage return
into backslash-n? Seems odd.)

Probably the easiest solution would be a simple and naive replace(),
looking for some very specific strings and ignoring everything else.
Easy to do, but potentially confusing down the track if someone tries
something fancy :)

line = line.split('#')[:1][0].strip() # trim any trailing comments
line = line.replace(r"\r", "\r") # repeat this for as many backslash
escapes as you want to handle

Be aware that this, while simple, is NOT capable of handling escaped
backslashes. In Python, "\\r" comes out the same as r"\r", but with
this parser, it would come out the same as "\\\r". But it might be
sufficient for you.

ChrisA

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread

Thread

Re: Reading \n unescaped from a file Chris Angelico <rosuav@gmail.com> - 2015-09-03 08:10 +1000

csiph-web