Path: csiph.com!goblin1!goblin.stu.neva.ru!uio.no!news.tele.dk!news.tele.dk!small.news.tele.dk!newsgate.cistron.nl!newsgate.news.xs4all.nl!nzpost1.xs4all.net!not-for-mail
MIME-Version: 1.0
In-Reply-To: <55E65909.2080507@medimorphosis.com.au>
References: <55E65909.2080507@medimorphosis.com.au>
Date: Thu, 3 Sep 2015 08:10:28 +1000
Subject: Re: Reading \n unescaped from a file
From: Chris Angelico <rosuav@gmail.com>
Cc: "python-list@python.org" <python-list@python.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.40.1441231836.8327.python-list@python.org>
Lines: 49
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:95899

On Wed, Sep 2, 2015 at 12:03 PM, Rob Hills <rhills@medimorphosis.com.au> wr=
ote:
> My mapping file contents look like this:
>
> \r =3D \\n
> =C3=A2=E2=82=AC=C5=93 =3D &quot;

Oh, lovely. Code page 1252 when you're expecting UTF-8. Sadly, you're
likely to have to cope with a whole pile of other mojibake if that
happens :(

You have my sympathy.

> &lt; =3D <
> &gt; =3D >
> &#039; =3D &apos;
> &#070; =3D F
> &#111; =3D o
> &#102; =3D f
> &#101; =3D e
> &#079; =3D O
>
> This all works "as advertised" except for the '\r' =3D> '\\n' replacement=
.
> Debugging the code, I see that my '\r' character is "escaped" to '\\r' an=
d
> the '\\n' to '\\\\n' when they are read in from the file.

Technically, what's happening is that your "\r" is literally a
backslash followed by the letter r; the transformation of backslash
sequences into single characters is part of Python source code
parsing. (Incidentally, why do you want to change a carriage return
into backslash-n? Seems odd.)

Probably the easiest solution would be a simple and naive replace(),
looking for some very specific strings and ignoring everything else.
Easy to do, but potentially confusing down the track if someone tries
something fancy :)

line =3D line.split('#')[:1][0].strip() # trim any trailing comments
line =3D line.replace(r"\r", "\r") # repeat this for as many backslash
escapes as you want to handle

Be aware that this, while simple, is NOT capable of handling escaped
backslashes. In Python, "\\r" comes out the same as r"\r", but with
this parser, it would come out the same as "\\\r". But it might be
sufficient for you.

ChrisA