Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #95899 > unrolled thread

Re: Reading \n unescaped from a file

Started byChris Angelico <rosuav@gmail.com>
First post2015-09-03 08:10 +1000
Last post2015-09-03 08:10 +1000
Articles 1 — 1 participant

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Re: Reading \n unescaped from a file Chris Angelico <rosuav@gmail.com> - 2015-09-03 08:10 +1000

#95899 — Re: Reading \n unescaped from a file

FromChris Angelico <rosuav@gmail.com>
Date2015-09-03 08:10 +1000
SubjectRe: Reading \n unescaped from a file
Message-ID<mailman.40.1441231836.8327.python-list@python.org>
On Wed, Sep 2, 2015 at 12:03 PM, Rob Hills <rhills@medimorphosis.com.au> wrote:
> My mapping file contents look like this:
>
> \r = \\n
> “ = &quot;

Oh, lovely. Code page 1252 when you're expecting UTF-8. Sadly, you're
likely to have to cope with a whole pile of other mojibake if that
happens :(

You have my sympathy.

> &lt; = <
> &gt; = >
> &#039; = &apos;
> &#070; = F
> &#111; = o
> &#102; = f
> &#101; = e
> &#079; = O
>
> This all works "as advertised" except for the '\r' => '\\n' replacement.
> Debugging the code, I see that my '\r' character is "escaped" to '\\r' and
> the '\\n' to '\\\\n' when they are read in from the file.

Technically, what's happening is that your "\r" is literally a
backslash followed by the letter r; the transformation of backslash
sequences into single characters is part of Python source code
parsing. (Incidentally, why do you want to change a carriage return
into backslash-n? Seems odd.)

Probably the easiest solution would be a simple and naive replace(),
looking for some very specific strings and ignoring everything else.
Easy to do, but potentially confusing down the track if someone tries
something fancy :)

line = line.split('#')[:1][0].strip() # trim any trailing comments
line = line.replace(r"\r", "\r") # repeat this for as many backslash
escapes as you want to handle

Be aware that this, while simple, is NOT capable of handling escaped
backslashes. In Python, "\\r" comes out the same as r"\r", but with
this parser, it would come out the same as "\\\r". But it might be
sufficient for you.

ChrisA

[toc] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web