Path: csiph.com!fu-berlin.de!uni-berlin.de!not-for-mail From: eryk sun Newsgroups: comp.lang.python Subject: Re: cannot open file with non-ASCII filename Date: Mon, 14 Dec 2015 12:45:31 -0600 Lines: 25 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Trace: news.uni-berlin.de Jpwr8ZmnBr3YphoE4osN/gVRVqYrNQ7QA0A45/+QE6gA== Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'received:209.85.223': 0.03; 'subject:file': 0.07; '"+"': 0.09; 'closest': 0.09; 'encodes': 0.09; 'events.': 0.09; 'non-ascii': 0.09; 'python': 0.10; '"-"': 0.16; 'decoding': 0.16; 'filenames.': 0.16; "microsoft's": 0.16; 'received:io': 0.16; 'received:psf.io': 0.16; 'sequence,': 0.16; 'simple.': 0.16; 'subject:non': 0.16; 'wrote:': 0.16; 'exists': 0.18; 'input': 0.18; 'windows': 0.20; '2015': 0.20; 'cc:2**0': 0.20; 'keyboard': 0.22; 'am,': 0.23; 'code.': 0.23; 'dec': 0.23; 'header:In-Reply-To:1': 0.24; 'mon,': 0.24; "doesn't": 0.26; 'message-id:@mail.gmail.com': 0.27; '14,': 0.27; 'module.': 0.27; 'host': 0.28; 'function': 0.28; 'actual': 0.28; 'looks': 0.29; 'ideal': 0.32; 'problem': 0.33; 'handle': 0.34; 'received:google.com': 0.35; 'could': 0.35; 'unicode': 0.35; "isn't": 0.35; 'supports': 0.35; 'received:209.85': 0.36; 'to:addr :python-list': 0.36; 'subject:: ': 0.37; 'received:209': 0.38; 'files': 0.38; 'does': 0.39; 'subject:-': 0.39; 'to:addr:python.org': 0.40; 'subject:with': 0.40; 'per': 0.62; 'matter': 0.63; 'world': 0.64 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=PmRNDHZh+oLNkjamYYIFMbEKVoeREUuhfXjoZmMgJwk=; b=mmb5Kh5QL7CB5Dsjnakpi0jkQLaxMkh/Z6EwlzpIkWJNacA2TL+1FnhaPHSRXrNZAU VIkAyMQ5XyFBAkohi8LNKZQvBmE+5dWzVVZ15baxCbU2xAxFJ1Y5YACWOKyXCdtwwgGg DoCGksvyLDGOTxtInRH5U9i67gRKTT7RFrsN2mGeQk0CwGhLUzB3CqZkLMX2ZMBcfv5/ nyxUgTPBYW5GGl2/kj24YigE90KQ3IQYBie9PuqLBFA9fg3UD4GY6URzuZ6IyQb4R2WF dy8PIC0Y8O7y7Y/Zer2jLJR27UBXauq3VkzuEfWAyGWsD74wiO48G+3lgM7Fa4HqPpa2 ua8Q== X-Received: by 10.107.165.69 with SMTP id o66mr10041606ioe.103.1450118770757; Mon, 14 Dec 2015 10:46:10 -0800 (PST) In-Reply-To: X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.20+ Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com comp.lang.python:100419 On Mon, Dec 14, 2015 at 10:24 AM, Ulli Horlacher wrote: > With Python 2.7.11 on Windows 7 my users cannot open/read files with > non-ASCII filenames. [...] > c = msvcrt.getch() This isn't an issue with Python per se, and the same problem exists in Python 3, using either getch or getwch. Microsoft's getwch function isn't designed to handle the variety of ways the console host (conhost.exe) encodes Unicode keyboard events. Their implementation calls ReadConsoleInput and looks for a KEY_EVENT. If bKeyDown is set it grabs the UnicodeChar field. In an ideal world it would be that simple. However, the console literally supports the alt+numpad sequences that allow entering characters by code. So the input event sequence, for example, could be +VK_MENU, +VK_NUMPAD7, -VK_NUMPAD7, +VK_NUMPAD6, -VK_NUMPAD6, -VK_MENU, which is an "L". (Denoting "+" as key down and "-" as key up.) This may just be the closest approximation in the system locale's codepage (ANSI). That doesn't matter because the actual Unicode codepoint is set in the last event's UnicodeChar field. Try using the pyreadline module. IIRC, it does a better job decoding the events from ReadConsoleInput.