Path: csiph.com!usenet.pasdenom.info!news.albasani.net!feeder.erje.net!eu.feeder.erje.net!bcyclone01.am1.xlned.com!bcyclone01.am1.xlned.com!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.002 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'compiler': 0.07; 'correct.': 0.07; 'incompatible': 0.07; 'paths': 0.07; 'string': 0.09; 'agree,': 0.09; 'escape': 0.09; 'inherited': 0.09; 'literal': 0.09; 'strings.': 0.09; 'subject:string': 0.09; 'way:': 0.09; 'cc:addr:python-list': 0.11; 'python': 0.11; 'language,': 0.12; 'windows': 0.15; '23,': 0.16; 'backslash': 0.16; 'backslashes': 0.16; 'character.': 0.16; 'escapes': 0.16; 'finney': 0.16; 'formatted': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'implies': 0.16; 'literals': 0.16; 'literals.': 0.16; 'unambiguous': 0.16; 'unicode,': 0.16; 'uppercase': 0.16; 'exception': 0.16; 'wrote:': 0.18; 'drawing': 0.19; 'work,': 0.20; 'written': 0.21; 'feb': 0.22; 'programming': 0.22; '(in': 0.22; 'cc:addr:python.org': 0.22; 'error': 0.23; 'byte': 0.24; 'instance,': 0.24; 'unicode': 0.24; 'fine': 0.24; 'mon,': 0.24; 'versions': 0.24; 'cc:2**0': 0.24; "i've": 0.25; 'skip:" 20': 0.27; 'header:In-Reply-To:1': 0.27; 'chris': 0.29; 'character': 0.29; "doesn't": 0.30; 'primarily': 0.30; 'especially': 0.30; 'message-id:@mail.gmail.com': 0.30; 'code': 0.31; 'fault': 0.31; 'writes:': 0.31; 'file': 0.32; 'languages': 0.32; 'text': 0.33; 'trouble': 0.34; 'common': 0.35; 'something': 0.35; 'usual': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'there': 0.35; 'edge': 0.36; 'surely': 0.36; 'thanks': 0.36; 'similar': 0.36; 'being': 0.38; 'ben': 0.38; 'pm,': 0.38; 'rather': 0.38; 'that,': 0.38; 'anything': 0.39; 'system.': 0.39; 'even': 0.60; 'skip:u 10': 0.60; 'easy': 0.60; 'future': 0.60; 'up,': 0.60; 'most': 0.60; 'introduced': 0.61; 'new': 0.61; 'entire': 0.61; "you're": 0.61; 'first': 0.61; 'such': 0.63; 'different': 0.65; 'needing': 0.65; 'frequently': 0.68; 'fact,': 0.69; 'special': 0.74; '2015': 0.84; 'ambiguous': 0.84; 'blow': 0.84; 'escapes,': 0.84; 'fails,': 0.84; "it'd": 0.84; 'to:none': 0.92; 'have.': 0.93; 'contrary': 0.95; 'treatment': 0.95 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type:content-transfer-encoding; bh=vZ7m/OQDCQ/tA1E5wAG1pBjWpwh8tGlDteszQkjk0Us=; b=vBu7P6Wxu/naMMvmEnAdgQdW7JWIRE5YrBxAkwFS1qMCz5xvFY64VuBB5LdQDGb2Bl NKHiGlLNAHPGN2Xoww/PkG5as9RlNwfjVDLP1dAc9tKD/XAquKesHojJ/Qm8wpqUQC0q waTp4vDvwRgidPS8/QXH0dWH4KZ9bIDfEDa4dF/x4og7ZP8PQsBD30wekFD6OdXOHLX9 tnw+kkPSKtgmDABC742yZW2YEx/VSD/TtXRziuFtkMrFdzTx7K0CO+S/NtySWrh8z9Cx 0sOKntdxgcndDMUiScQa6qO9E7lWePbiXg4OgH2D8hyOkQBn21Lxwh3UISgCKhJ9oMUK ne/g== MIME-Version: 1.0 X-Received: by 10.107.33.11 with SMTP id h11mr10968726ioh.53.1424660148243; Sun, 22 Feb 2015 18:55:48 -0800 (PST) In-Reply-To: <85vbit73jd.fsf@benfinney.id.au> References: <85vbit73jd.fsf@benfinney.id.au> Date: Mon, 23 Feb 2015 13:55:48 +1100 Subject: Re: Unrecognized backslash escapes in string literals From: Chris Angelico Cc: "python-list@python.org" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 66 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1424660151 news.xs4all.nl 2837 [2001:888:2000:d::a6]:44796 X-Complaints-To: abuse@xs4all.nl X-Received-Bytes: 7187 X-Received-Body-CRC: 2887124529 Xref: csiph.com comp.lang.python:86183 On Mon, Feb 23, 2015 at 1:41 PM, Ben Finney wr= ote: > Chris Angelico writes: > >> Why is it that Python interprets them this way, and doesn't even give >> a warning? > > Because the interpretation of those literals is unambiguous and correct. And it also implies that never, in the entire infinite future of Python development, will any additional escapes be invented - because then it'd be ambiguous (in versions up to X, "\s" means "\\s", and after that, "\s" means something else). > It's unfortunate that MS Windows inherited the incompatible =E2=80=9Cback= slash > is a path separator=E2=80=9D, long after backslash was already establishe= d in > many programming languages as the escape character. I agree, the fault is primarily with Windows. But I've seen similar issues when people use /-\| for box drawing and framing and such; Windows paths are by far the most common case of this, but not the sole. >> Is there a way to enable such warnings/errors? > > A warning or error for a correctly formatted literal with an unambiguous > meaning would be an up-Pythonic thing to have. > ... > This has the advantage that it's the same escape character used for text > string literals in virtually every other programming language, so you're > not needing to learn anything unusual. And yet the treatment of the edge case differs. In C, for instance, you get a compiler warning, and then the backslash is removed and you're left with just the other character. The trouble isn't that people need to learn that backslashes are special in Python string literals. The trouble is that, especially when file names are frequently being written with uppercase first letters, it's very easy to have code that just so happens to work, without being reliable. Having spent some time working with paths like these: fn =3D "C:\Foo\Bar\Asdf.ext" and then to find that each of these fails, but in a different way: path =3D "C:\Foo\Bar\"; fn =3D path + "Asdf.ext" fn =3D "c:\foo\bar\asdf.ext" fn =3D "c:\users\myname\blah" would surely count as surprising. Particularly since the last one will work fine in Python 2 sans unicode_literals, and will then blow up in Python 3 - because, contrary to the "no additional escapes" assumption, Unicode strings introduced new escapes, which means that "\u0123" has different meaning in byte strings and Unicode strings. In fact, that's an exception to the usual rule of "upper case is safe", and it's one that *will* trip people up, thanks to the "C:\Users" directory on a modern Windows system. What's the betting people will blame the failure on Python 3 and/or Unicode, rather than on the sloppy use of escapes and the poor choice of path separator on a popular platform? ChrisA