Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!multikabel.net!newsfeed20.multikabel.net!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.001 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'cpython': 0.05; 'subject:Python': 0.05; 'bytes.': 0.07; '(possibly': 0.09; 'default.': 0.09; 'encoding.': 0.09; 'okay': 0.09; 'skip:\\ 20': 0.09; 'api': 0.09; 'files.': 0.09; 'am,': 0.12; 'ntfs': 0.16; 'stated,': 0.16; 'subject:usage': 0.16; 'wrote:': 0.18; 'bytes': 0.18; 'exists': 0.18; 'written': 0.19; 'received:209.85.210.174': 0.21; 'received:mail-iy0-f174.google.com': 0.21; 'header:In-Reply- To:1': 0.22; 'appear': 0.23; 'interpreted': 0.23; 'subject:numbers': 0.23; 'windows': 0.26; 'separate': 0.28; 'interpret': 0.29; 'characters,': 0.30; 'message-id:@gmail.com': 0.31; 'like.': 0.32; 'actual': 0.32; 'header:User-Agent:1': 0.33; 'file': 0.34; 'steven': 0.34; 'then.': 0.34; 'to:addr:python- list': 0.35; "we're": 0.36; 'two': 0.36; 'consistently': 0.37; 'disk': 0.37; 'but': 0.37; 'received:google.com': 0.37; 'received:209.85': 0.38; 'uses': 0.38; 'received:192': 0.38; 'received:192.168.1': 0.39; 'received:209': 0.39; 'called': 0.40; 'to:addr:python.org': 0.40; 'platforms': 0.40; 'talking': 0.62; 'leading': 0.62; 'wide': 0.63; 'free': 0.64; 'encoding,': 0.84 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:x-enigmail-version:content-type :content-transfer-encoding; bh=wde7Pi1ZvLOLeMfpc9oqMGk+g++FXPObusCG71+ZFD4=; b=NrbsIPmZm9ycabYnPNGgsAqK4E1dzhdj/El4eNI8FY4tDKHL09bQG57/TOl18vbYLc dzsN8idVRtmhiulfRA3BJ4YNO0+Ve6atdtQZf4+HvCt6PtSbV1Ca2V31SeBrzFDxth6I q64Bop9lRddwk+YFqeu7cGavN+qwCxcdU+JPw= Date: Sun, 12 Feb 2012 05:11:30 -0600 From: Andrew Berg User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:9.0) Gecko/20111220 Thunderbird/9.0 MIME-Version: 1.0 To: "comp.lang.python" Subject: Re: Python usage numbers References: <4F36E2F5.9000505@gmail.com> <4f37229b$0$29986$c3e8da3$5496439d@news.astraweb.com> <4f3757cc$0$29986$c3e8da3$5496439d@news.astraweb.com> <4f378298$0$29986$c3e8da3$5496439d@news.astraweb.com> In-Reply-To: <4f378298$0$29986$c3e8da3$5496439d@news.astraweb.com> X-Enigmail-Version: 1.3.5 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 21 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1329045101 news.xs4all.nl 6959 [2001:888:2000:d::a6]:42754 X-Complaints-To: abuse@xs4all.nl Xref: x330-a1.tempe.blueboxinc.net comp.lang.python:20263 On 2/12/2012 3:12 AM, Steven D'Aprano wrote: > NTFS by default uses the UTF-16 encoding, which means the actual bytes > written to disk are \x1d\x040\x04\xe5\x042\x04 (possibly with a leading > byte-order mark \xff\xfe). That's what I meant. Those bytes will be interpreted consistently across all locales. > Windows has two separate APIs, one for "wide" characters, the other for > single bytes. Depending on which one you use, the directory will appear > to be called Наӥв or 0å2. Yes, and AFAIK, the wide API is the default. The other one only exists to support programs that don't support the wide API (generally, such programs were intended to be used on older platforms that lack that API). > But in any case, we're not talking about the file name encoding. We're > talking about the contents of files. Okay then. As I stated, this has nothing to do with the OS since programs are free to interpret bytes any way they like. -- CPython 3.2.2 | Windows NT 6.1.7601.17640