Path: csiph.com!goblin1!goblin.stu.neva.ru!uio.no!news.tele.dk!news.tele.dk!small.news.tele.dk!newsgate.cistron.nl!newsgate.news.xs4all.nl!nzpost1.xs4all.net!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'python,': 0.02; 'sufficient': 0.05; 'tries': 0.05; '"as': 0.07; 'subject:file': 0.07; 'trailing': 0.07; 'cc:addr:python-list': 0.09; 'backslash': 0.09; 'ignoring': 0.09; 'oh,': 0.09; 'python': 0.10; 'do,': 0.15; 'wed,': 0.15; 'carriage': 0.16; 'escapes': 0.16; 'fancy': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'naive': 0.16; 'subject:Reading': 0.16; 'wrote:': 0.16; '<': 0.18; 'debugging': 0.18; 'skip:l 30': 0.18; '>': 0.18; '2015': 0.20; 'cc:2**0': 0.20; 'cc:addr:python.org': 0.20; 'sep': 0.22; 'file.': 0.22; 'code,': 0.23; 'seems': 0.23; 'this:': 0.23; 'header:In-Reply-To:1': 0.24; 'followed': 0.27; 'handling': 0.27; 'message-id:@mail.gmail.com': 0.27; '"': 0.29; 'escaped': 0.29; 'pile': 0.29; 'character': 0.29; 'comments': 0.30; 'code': 0.30; 'probably': 0.31; 'source': 0.33; 'file': 0.34; 'except': 0.34; 'handle': 0.34; 'received:google.com': 0.35; 'easiest': 0.35; 'mapping': 0.35; 'something': 0.35; 'but': 0.36; 'pm,': 0.36; 'subject:: ': 0.37; 'someone': 0.38; 'why': 0.39; 'subject:from': 0.39; 'some': 0.40; 'easy': 0.60; 'your': 0.60; 'you.': 0.64; 'capable': 0.65; 'else.': 0.66; 'potentially': 0.67; 'repeat': 0.67; 'chrisa': 0.84; 'confusing': 0.84; 'to:none': 0.91; 'hills': 0.93 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type:content-transfer-encoding; bh=BA3T7gonhx3f0Wyg+UcbJJOUmWk2uwDC2iG2UgZMiaY=; b=zmP1MAB72vvgY10BHtrE/w/vbytipJKJuSxES/Gd5xvLlUJa9UAvhXUni/YOn8xwjc 8D1XlX3CoJQpugXpwBa3310aPmF09yIyXvWJfwr/XyxETa4C/35dqzl2b9SoKpgsLc/0 K7CAW4NkSleV3vn4PY9x0wwGivSaaLXu5DMJUA2OUiV8TIK0DUx/4bGd35R8+o3iSTFP AEvsUrHKPMC70kKJanO7HYRoRCyBITtJQs4T+3Fyz3J5s0XGwGjzWx5U47f/E2QMpKD4 Kk65Ak82Qo4X7KoWkXD56+0v4XgihROUOi8FbM5ZhHfsofgj/eNtqLjpu5IW4Kyj8T0F /GSw== MIME-Version: 1.0 X-Received: by 10.107.36.8 with SMTP id k8mr17629807iok.157.1441231828160; Wed, 02 Sep 2015 15:10:28 -0700 (PDT) In-Reply-To: <55E65909.2080507@medimorphosis.com.au> References: <55E65909.2080507@medimorphosis.com.au> Date: Thu, 3 Sep 2015 08:10:28 +1000 Subject: Re: Reading \n unescaped from a file From: Chris Angelico Cc: "python-list@python.org" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.20+ Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 49 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1441231836 news.xs4all.nl 23861 [2001:888:2000:d::a6]:50213 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:95899 On Wed, Sep 2, 2015 at 12:03 PM, Rob Hills wr= ote: > My mapping file contents look like this: > > \r =3D \\n > =C3=A2=E2=82=AC=C5=93 =3D " Oh, lovely. Code page 1252 when you're expecting UTF-8. Sadly, you're likely to have to cope with a whole pile of other mojibake if that happens :( You have my sympathy. > < =3D < > > =3D > > ' =3D ' > F =3D F > o =3D o > f =3D f > e =3D e > O =3D O > > This all works "as advertised" except for the '\r' =3D> '\\n' replacement= . > Debugging the code, I see that my '\r' character is "escaped" to '\\r' an= d > the '\\n' to '\\\\n' when they are read in from the file. Technically, what's happening is that your "\r" is literally a backslash followed by the letter r; the transformation of backslash sequences into single characters is part of Python source code parsing. (Incidentally, why do you want to change a carriage return into backslash-n? Seems odd.) Probably the easiest solution would be a simple and naive replace(), looking for some very specific strings and ignoring everything else. Easy to do, but potentially confusing down the track if someone tries something fancy :) line =3D line.split('#')[:1][0].strip() # trim any trailing comments line =3D line.replace(r"\r", "\r") # repeat this for as many backslash escapes as you want to handle Be aware that this, while simple, is NOT capable of handling escaped backslashes. In Python, "\\r" comes out the same as r"\r", but with this parser, it would come out the same as "\\\r". But it might be sufficient for you. ChrisA