Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #95899

Re: Reading \n unescaped from a file

Path csiph.com!goblin1!goblin.stu.neva.ru!uio.no!news.tele.dk!news.tele.dk!small.news.tele.dk!newsgate.cistron.nl!newsgate.news.xs4all.nl!nzpost1.xs4all.net!not-for-mail
Return-Path <rosuav@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.000
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'python,': 0.02; 'sufficient': 0.05; 'tries': 0.05; '"as': 0.07; 'subject:file': 0.07; 'trailing': 0.07; 'cc:addr:python-list': 0.09; 'backslash': 0.09; 'ignoring': 0.09; 'oh,': 0.09; 'python': 0.10; 'do,': 0.15; 'wed,': 0.15; 'carriage': 0.16; 'escapes': 0.16; 'fancy': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'naive': 0.16; 'subject:Reading': 0.16; 'wrote:': 0.16; '&lt;': 0.18; 'debugging': 0.18; 'skip:l 30': 0.18; '&gt;': 0.18; '2015': 0.20; 'cc:2**0': 0.20; 'cc:addr:python.org': 0.20; 'sep': 0.22; 'file.': 0.22; 'code,': 0.23; 'seems': 0.23; 'this:': 0.23; 'header:In-Reply-To:1': 0.24; 'followed': 0.27; 'handling': 0.27; 'message-id:@mail.gmail.com': 0.27; '&quot;': 0.29; 'escaped': 0.29; 'pile': 0.29; 'character': 0.29; 'comments': 0.30; 'code': 0.30; 'probably': 0.31; 'source': 0.33; 'file': 0.34; 'except': 0.34; 'handle': 0.34; 'received:google.com': 0.35; 'easiest': 0.35; 'mapping': 0.35; 'something': 0.35; 'but': 0.36; 'pm,': 0.36; 'subject:: ': 0.37; 'someone': 0.38; 'why': 0.39; 'subject:from': 0.39; 'some': 0.40; 'easy': 0.60; 'your': 0.60; 'you.': 0.64; 'capable': 0.65; 'else.': 0.66; 'potentially': 0.67; 'repeat': 0.67; 'chrisa': 0.84; 'confusing': 0.84; 'to:none': 0.91; 'hills': 0.93
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type:content-transfer-encoding; bh=BA3T7gonhx3f0Wyg+UcbJJOUmWk2uwDC2iG2UgZMiaY=; b=zmP1MAB72vvgY10BHtrE/w/vbytipJKJuSxES/Gd5xvLlUJa9UAvhXUni/YOn8xwjc 8D1XlX3CoJQpugXpwBa3310aPmF09yIyXvWJfwr/XyxETa4C/35dqzl2b9SoKpgsLc/0 K7CAW4NkSleV3vn4PY9x0wwGivSaaLXu5DMJUA2OUiV8TIK0DUx/4bGd35R8+o3iSTFP AEvsUrHKPMC70kKJanO7HYRoRCyBITtJQs4T+3Fyz3J5s0XGwGjzWx5U47f/E2QMpKD4 Kk65Ak82Qo4X7KoWkXD56+0v4XgihROUOi8FbM5ZhHfsofgj/eNtqLjpu5IW4Kyj8T0F /GSw==
MIME-Version 1.0
X-Received by 10.107.36.8 with SMTP id k8mr17629807iok.157.1441231828160; Wed, 02 Sep 2015 15:10:28 -0700 (PDT)
In-Reply-To <55E65909.2080507@medimorphosis.com.au>
References <55E65909.2080507@medimorphosis.com.au>
Date Thu, 3 Sep 2015 08:10:28 +1000
Subject Re: Reading \n unescaped from a file
From Chris Angelico <rosuav@gmail.com>
Cc "python-list@python.org" <python-list@python.org>
Content-Type text/plain; charset=UTF-8
Content-Transfer-Encoding quoted-printable
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.20+
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.40.1441231836.8327.python-list@python.org> (permalink)
Lines 49
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1441231836 news.xs4all.nl 23861 [2001:888:2000:d::a6]:50213
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:95899

Show key headers only | View raw


On Wed, Sep 2, 2015 at 12:03 PM, Rob Hills <rhills@medimorphosis.com.au> wrote:
> My mapping file contents look like this:
>
> \r = \\n
> “ = &quot;

Oh, lovely. Code page 1252 when you're expecting UTF-8. Sadly, you're
likely to have to cope with a whole pile of other mojibake if that
happens :(

You have my sympathy.

> &lt; = <
> &gt; = >
> &#039; = &apos;
> &#070; = F
> &#111; = o
> &#102; = f
> &#101; = e
> &#079; = O
>
> This all works "as advertised" except for the '\r' => '\\n' replacement.
> Debugging the code, I see that my '\r' character is "escaped" to '\\r' and
> the '\\n' to '\\\\n' when they are read in from the file.

Technically, what's happening is that your "\r" is literally a
backslash followed by the letter r; the transformation of backslash
sequences into single characters is part of Python source code
parsing. (Incidentally, why do you want to change a carriage return
into backslash-n? Seems odd.)

Probably the easiest solution would be a simple and naive replace(),
looking for some very specific strings and ignoring everything else.
Easy to do, but potentially confusing down the track if someone tries
something fancy :)

line = line.split('#')[:1][0].strip() # trim any trailing comments
line = line.replace(r"\r", "\r") # repeat this for as many backslash
escapes as you want to handle

Be aware that this, while simple, is NOT capable of handling escaped
backslashes. In Python, "\\r" comes out the same as r"\r", but with
this parser, it would come out the same as "\\\r". But it might be
sufficient for you.

ChrisA

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Re: Reading \n unescaped from a file Chris Angelico <rosuav@gmail.com> - 2015-09-03 08:10 +1000

csiph-web