Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!newsfeed.xs4all.nl!newsfeed2a.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'subject:Windows': 0.02; 'scripts': 0.03; 'patterns': 0.04; 'argument': 0.05; 'purpose.': 0.07; 'users,': 0.07; 'string': 0.09; 'arguments': 0.09; 'bash': 0.09; 'filename': 0.09; 'filenames': 0.09; 'latter': 0.09; 'parsed': 0.09; 'properly.': 0.09; 'received:internal': 0.09; 'used.': 0.09; 'python': 0.11; 'windows': 0.15; 'api,': 0.16; 'confuse': 0.16; 'expect,': 0.16; 'expects': 0.16; 'filenames,': 0.16; 'merely': 0.16; 'message-id:@webmail.messagingengine.com': 0.16; 'newlines': 0.16; 'pathnames': 0.16; 'received:10.202': 0.16; 'received:10.202.2': 0.16; 'received:66.111': 0.16; 'received:66.111.4': 0.16; 'received:messagingengine.com': 0.16; 'scripts.': 0.16; 'subject:Unicode': 0.16; 'weird': 0.16; 'wrote:': 0.18; 'thu,': 0.19; 'command': 0.22; 'rules': 0.22; 'shell': 0.22; 'settings.': 0.24; 'script': 0.25; 'gets': 0.27; 'header:In-Reply-To:1': 0.27; 'this.': 0.32; 'text': 0.33; 'received:66': 0.35; 'common': 0.35; 'knows': 0.35; 'but': 0.35; 'there': 0.35; 'module.': 0.36; 'responsible': 0.36; 'possible': 0.36; 'example,': 0.37; 'two': 0.37; 'list': 0.37; 'received:10': 0.37; 'expected': 0.38; 'handle': 0.38; 'to:addr:python-list': 0.38; 'files': 0.38; 'rather': 0.38; 'to:addr:python.org': 0.39; 'enough': 0.39; 'users': 0.40; 'how': 0.40; 'even': 0.60; 'most': 0.60; 'from:no real name:2**0': 0.61; 'first': 0.61; 'header :Message-Id:1': 0.63; 'different': 0.65; 'due': 0.66; 'default': 0.69; 'fact,': 0.69; 'guaranteed': 0.75; 'ending': 0.78; '2014,': 0.84; '99.9%': 0.84; 'them)': 0.84 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=fastmail.us; h= message-id:from:to:mime-version:content-transfer-encoding :content-type:subject:date:in-reply-to:references; s=mesmtp; bh= ci5zPrGlyX91E62zuKYavtz58mg=; b=WU48/rjSHJHS2HR5I43rmicCRn7HNluB fnKVKNqf5nNGkJkR8MyXqK2w3HjcNVYC6R8FTm69bB+aNepWLVBQJ+tWUMrs7P9D 9tBEGzvTT1Hx+gOYLor2VckYwVADO3RAhowtXFb/O+30JXaBPX9wyHXve/LWo9TR CKK8TdFdLZ8= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=message-id:from:to:mime-version :content-transfer-encoding:content-type:subject:date:in-reply-to :references; s=smtpout; bh=ci5zPrGlyX91E62zuKYavtz58mg=; b=u8FWg n8WTIw+8CPqLNESrVAAUDQFkfnIdAnmeslQ0DAeW7wzhArIfJe/D1CBehQ+3YbCD CaM42iRikLpVhIj8pYaxcLOr8O/SOqYx5EGJN9mgicSeaQTryACXC4f4H8FJ7M17 4rdVbuvcG12ZQUN4WtvkP/ZhtPQAPUAfOL3iUE= X-Sasl-Enc: 6WjSbktlimlJcNhlcVvcojLcSUvNcj/XzWAqdkCtVwtc 1396533471 From: random832@fastmail.us To: python-list@python.org MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain X-Mailer: MessagingEngine.com Webmail Interface - ajax-21f21982 Subject: Re: Unicode Chars in Windows Path Date: Thu, 03 Apr 2014 09:57:51 -0400 In-Reply-To: <87fvluss86.fsf@elektro.pacujo.net> References: <533cc967$0$2909$c3e8da3$76491128@news.astraweb.com> <87fvluss86.fsf@elektro.pacujo.net> X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 28 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1396533473 news.xs4all.nl 2873 [2001:888:2000:d::a6]:59060 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:69585 On Thu, Apr 3, 2014, at 5:00, Marko Rauhamaa wrote: > In fact, proper dealing with punctuation in pathnames is one of the main > reasons to migrate to Python from bash. Even if it is often possible to > write bash scripts that handle arbitrary pathnames correctly, few script > writers are pedantic enough to do it properly. For example, newlines in > filenames are bound to confuse 99.9% of bash scripts. Incidentally, these rules mean there are different norms about how command line arguments are parsed on windows. Since * and ? are not allowed in filenames, you don't have to care whether they were quoted. An argument [in a position where a list of filenames is expected] with * or ? in it _always_ gets globbed, so "C:\dir with spaces\*.txt" can be used. This is part of the reason the program is responsible for globbing rather than the shell - because only the program knows if it expects a list of filenames in that position vs a text string for some other purpose. This is unfortunate, because it means that most python programs do not handle filename patterns at all (expecting the shell to do it for them) - it would be nice if there was a cross-platform way to do this. Native windows wildcards are also weird in a number of ways not emulated by the glob module. Most of these are not expected by users, but some users may expect, for example, *.htm to match files ending in .html; *.* to match files with no dot in them, and *. to match _only_ files with no dot in them. The latter two are guaranteed by the windows API, the first is merely common due to default shortname settings. Also, native windows wildcards do not support [character classes].