Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #100414 > unrolled thread

cannot open file with non-ASCII filename

Started byUlli Horlacher <framstag@rus.uni-stuttgart.de>
First post2015-12-14 16:24 +0000
Last post2015-12-18 01:37 -0500
Articles 19 — 8 participants

Back to article view | Back to comp.lang.python


Contents

  cannot open file with non-ASCII filename Ulli Horlacher <framstag@rus.uni-stuttgart.de> - 2015-12-14 16:24 +0000
    Re: cannot open file with non-ASCII filename Terry Reedy <tjreedy@udel.edu> - 2015-12-14 13:34 -0500
      Re: cannot open file with non-ASCII filename wxjmfauth@gmail.com - 2015-12-14 11:07 -0800
    Re: cannot open file with non-ASCII filename eryk sun <eryksun@gmail.com> - 2015-12-14 12:45 -0600
    Re: cannot open file with non-ASCII filename Laura Creighton <lac@openend.se> - 2015-12-14 19:51 +0100
      Re: cannot open file with non-ASCII filename Ulli Horlacher <framstag@rus.uni-stuttgart.de> - 2015-12-14 22:11 +0000
        Re: cannot open file with non-ASCII filename Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2015-12-14 23:41 +0100
          Re: cannot open file with non-ASCII filename Laura Creighton <lac@openend.se> - 2015-12-15 01:07 +0100
          Re: cannot open file with non-ASCII filename eryk sun <eryksun@gmail.com> - 2015-12-14 21:20 -0600
    Re: cannot open file with non-ASCII filename eryk sun <eryksun@gmail.com> - 2015-12-14 17:55 -0600
    Re: cannot open file with non-ASCII filename Laura Creighton <lac@openend.se> - 2015-12-15 01:13 +0100
      Re: cannot open file with non-ASCII filename Ulli Horlacher <framstag@rus.uni-stuttgart.de> - 2015-12-15 08:26 +0000
        Re: cannot open file with non-ASCII filename Laura Creighton <lac@openend.se> - 2015-12-15 15:09 +0100
        Re: cannot open file with non-ASCII filename eryk sun <eryksun@gmail.com> - 2015-12-15 09:34 -0600
          Re: cannot open file with non-ASCII filename Ulli Horlacher <framstag@rus.uni-stuttgart.de> - 2015-12-15 17:04 +0000
            Re: cannot open file with non-ASCII filename eryk sun <eryksun@gmail.com> - 2015-12-16 21:39 -0600
            Re: cannot open file with non-ASCII filename smap <askme.first@thankyouverymuch.invalid> - 2015-12-18 21:15 +0000
    cannot open file with non-ASCII filename bearmingo <bearmingo@gmail.com> - 2015-12-17 21:12 -0800
      Re: cannot open file with non-ASCII filename Terry Reedy <tjreedy@udel.edu> - 2015-12-18 01:37 -0500

#100414 — cannot open file with non-ASCII filename

FromUlli Horlacher <framstag@rus.uni-stuttgart.de>
Date2015-12-14 16:24 +0000
Subjectcannot open file with non-ASCII filename
Message-ID<n4mqgm$v1d$1@news2.informatik.uni-stuttgart.de>
With Python 2.7.11 on Windows 7 my users cannot open/read files with
non-ASCII filenames. They use the Windows explorer to drag&drop files into
a console window running the Python program.
os.path.exists() does not detect such a file and an open() fails, too.

My code:


  print("\nDrag&drop files or directories into this window.")
  system('explorer "%s"' % HOME)
  file = get_paste()
  if not(os.path.exists(file)): die('"%s" does not exist' % file)


def get_paste():
  import msvcrt
  while True:
    c = msvcrt.getch()
    if c == '\t': return ''
    if c == '\003' or c == '\004': return None
    if not (c == '\n' or c == '\r'): break
  paste = c
  while msvcrt.kbhit():
    c = msvcrt.getch()
    if c == '\n' or c == '\r': break
    paste += c
  if match(r'\s',paste): paste = subst('^"(.+)"$',r'\1',paste)
  return paste


-- 
Ullrich Horlacher              Server und Virtualisierung
Rechenzentrum IZUS/TIK         E-Mail: horlacher@tik.uni-stuttgart.de
Universitaet Stuttgart         Tel:    ++49-711-68565868
Allmandring 30a                Fax:    ++49-711-682357
70550 Stuttgart (Germany)      WWW:    http://www.tik.uni-stuttgart.de/

[toc] | [next] | [standalone]


#100417

FromTerry Reedy <tjreedy@udel.edu>
Date2015-12-14 13:34 -0500
Message-ID<mailman.12.1450118127.14916.python-list@python.org>
In reply to#100414
On 12/14/2015 11:24 AM, Ulli Horlacher wrote:
> With Python 2.7.11 on Windows 7 my users cannot open/read files with
> non-ASCII filenames.

Right.  They should either restrict themselves to ascii (or possibly 
latin-1) filenames or use current 3.x.  This is one of the (known) 
unicode problems fixed in 3.x by making unicode the core text class, 
replacing the implementation of unicode, and performing further work 
with the new implementation.

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]


#100422

Fromwxjmfauth@gmail.com
Date2015-12-14 11:07 -0800
Message-ID<f5a442d3-a17e-4f67-b866-5d77605c4462@googlegroups.com>
In reply to#100417
Le lundi 14 décembre 2015 19:35:49 UTC+1, Terry Reedy a écrit :
> On 12/14/2015 11:24 AM, Ulli Horlacher wrote:
> > With Python 2.7.11 on Windows 7 my users cannot open/read files with
> > non-ASCII filenames.
> 
> Right.  They should either restrict themselves to ascii (or possibly 
> latin-1) filenames or use current 3.x.  This is one of the (known) 
> unicode problems fixed in 3.x by making unicode the core text class, 
> replacing the implementation of unicode, and performing further work 
> with the new implementation.
> 
> -- 
> Terry Jan Reedy

Sorry, but no.

>>> sys.version
'2.7.6 (default, Nov 10 2013, 19:24:18) [MSC v.1500 32 bit (Intel)]'
>>> with open(r'd:\éüÄñoe.txt', 'r') as f:
...     r = f.read()
...
>>> print r, len(r)
éabcéoe EURO z
9
>>>


[toc] | [prev] | [next] | [standalone]


#100419

Fromeryk sun <eryksun@gmail.com>
Date2015-12-14 12:45 -0600
Message-ID<mailman.14.1450118773.14916.python-list@python.org>
In reply to#100414
On Mon, Dec 14, 2015 at 10:24 AM, Ulli Horlacher
<framstag@rus.uni-stuttgart.de> wrote:
> With Python 2.7.11 on Windows 7 my users cannot open/read files with
> non-ASCII filenames.
[...]
>     c = msvcrt.getch()

This isn't an issue with Python per se, and the same problem exists in
Python 3, using either getch or getwch. Microsoft's getwch function
isn't designed to handle the variety of ways the console host
(conhost.exe) encodes Unicode keyboard events. Their implementation
calls ReadConsoleInput and looks for a KEY_EVENT. If bKeyDown is set
it grabs the UnicodeChar field.

In an ideal world it would be that simple. However, the console
literally supports the alt+numpad sequences that allow entering
characters by code. So the input event sequence, for example, could be
+VK_MENU, +VK_NUMPAD7, -VK_NUMPAD7, +VK_NUMPAD6, -VK_NUMPAD6,
-VK_MENU, which is an "L". (Denoting "+" as key down and "-" as key
up.) This may just be the closest approximation in the system locale's
codepage (ANSI). That doesn't matter because the actual Unicode
codepoint is set in the last event's UnicodeChar field.

Try using the pyreadline module. IIRC, it does a better job decoding
the events from ReadConsoleInput.

[toc] | [prev] | [next] | [standalone]


#100420

FromLaura Creighton <lac@openend.se>
Date2015-12-14 19:51 +0100
Message-ID<mailman.15.1450119109.14916.python-list@python.org>
In reply to#100414
In a message of Mon, 14 Dec 2015 13:34:56 -0500, Terry Reedy writes:
>On 12/14/2015 11:24 AM, Ulli Horlacher wrote:
>> With Python 2.7.11 on Windows 7 my users cannot open/read files with
>> non-ASCII filenames.
>
>Right.  They should either restrict themselves to ascii (or possibly 
>latin-1) filenames or use current 3.x.  This is one of the (known) 
>unicode problems fixed in 3.x by making unicode the core text class, 
>replacing the implementation of unicode, and performing further work 
>with the new implementation.
>
>-- 
>Terry Jan Reedy
>
>-- 
>https://mail.python.org/mailman/listinfo/python-list

Given that Ulli is in Germany, latin-1 is likely to work fine for him.  And
you do it like this:

# -*- coding: latin-1 -*-
from Tkinter import *
root = Tk()
s = 'Välkommen till Göteborg'  # Welcome to Gothenburg (where I live)
u = unicode(s, 'iso8859-1')
Label(root, text=u).pack()

root.mainloop()

Laura


[toc] | [prev] | [next] | [standalone]


#100431

FromUlli Horlacher <framstag@rus.uni-stuttgart.de>
Date2015-12-14 22:11 +0000
Message-ID<n4nepp$4nb$1@news2.informatik.uni-stuttgart.de>
In reply to#100420
Laura Creighton <lac@openend.se> wrote:

> Given that Ulli is in Germany, latin-1 is likely to work fine for him. 

For me, but not for my users. We have people from about 100 nations at our
university. 


> And you do it like this:
> 
> # -*- coding: latin-1 -*-
> from Tkinter import *
> root = Tk()
> s = 'Välkommen till Göteborg'  # Welcome to Gothenburg (where I live)
> u = unicode(s, 'iso8859-1')
> Label(root, text=u).pack()

The problem is the input of these filenames.


-- 
Ullrich Horlacher              Server und Virtualisierung
Rechenzentrum IZUS/TIK         E-Mail: horlacher@tik.uni-stuttgart.de
Universitaet Stuttgart         Tel:    ++49-711-68565868
Allmandring 30a                Fax:    ++49-711-682357
70550 Stuttgart (Germany)      WWW:    http://www.tik.uni-stuttgart.de/

[toc] | [prev] | [next] | [standalone]


#100432

FromThomas 'PointedEars' Lahn <PointedEars@web.de>
Date2015-12-14 23:41 +0100
Message-ID<4412672.gUIyRlH8Kf@PointedEars.de>
In reply to#100431
Ulli Horlacher wrote:

> Laura Creighton <lac@openend.se> wrote:
>> Given that Ulli is in Germany, latin-1 is likely to work fine for him.
> 
> For me, but not for my users. We have people from about 100 nations at our
> university.
> […]
> The problem is the input of these filenames.

Why do you have to use msvcrt?

I would use curses for user input, but:

,-<https://docs.python.org/2/howto/curses.html?highlight=user%20input>
,-<https://docs.python.org/3.2/howto/curses.html?highlight=user%20input>
| 
| No one has made a Windows port of the curses module. On a Windows 
| platform, try the Console module written by Fredrik Lundh. The Console 
| module provides cursor-addressable text output, plus full support for 
| mouse and keyboard input, and is available from 
| http://effbot.org/zone/console-index.htm.

So you should try that instead.

-- 
PointedEars

Twitter: @PointedEars2
Please do not cc me. / Bitte keine Kopien per E-Mail.

[toc] | [prev] | [next] | [standalone]


#100441

FromLaura Creighton <lac@openend.se>
Date2015-12-15 01:07 +0100
Message-ID<mailman.7.1450138083.22044.python-list@python.org>
In reply to#100432
In a message of Mon, 14 Dec 2015 23:41:21 +0100, "Thomas 'PointedEars' Lahn" wr
ites:

>Why do you have to use msvcrt?
>
>I would use curses for user input, but:
>
>,-<https://docs.python.org/2/howto/curses.html?highlight=user%20input>
>,-<https://docs.python.org/3.2/howto/curses.html?highlight=user%20input>
>| 
>| No one has made a Windows port of the curses module. On a Windows 
>| platform, try the Console module written by Fredrik Lundh. The Console 
>| module provides cursor-addressable text output, plus full support for 
>| mouse and keyboard input, and is available from 
>| http://effbot.org/zone/console-index.htm.
>
>So you should try that instead.

If going for curses, I'd try this instead:
http://pdcurses.sourceforge.net/

Laura

[toc] | [prev] | [next] | [standalone]


#100446

Fromeryk sun <eryksun@gmail.com>
Date2015-12-14 21:20 -0600
Message-ID<mailman.10.1450149661.22044.python-list@python.org>
In reply to#100432
On Mon, Dec 14, 2015 at 6:07 PM, Laura Creighton <lac@openend.se> wrote:
> In a message of Mon, 14 Dec 2015 23:41:21 +0100, "Thomas 'PointedEars' Lahn" wr
> ites:
>
>>Why do you have to use msvcrt?
>>
>>I would use curses for user input, but:
>>
>>,-<https://docs.python.org/2/howto/curses.html?highlight=user%20input>
>>,-<https://docs.python.org/3.2/howto/curses.html?highlight=user%20input>
>>|
>>| No one has made a Windows port of the curses module. On a Windows
>>| platform, try the Console module written by Fredrik Lundh. The Console
>>| module provides cursor-addressable text output, plus full support for
>>| mouse and keyboard input, and is available from
>>| http://effbot.org/zone/console-index.htm.
>>
>>So you should try that instead.
>
> If going for curses, I'd try this instead:
> http://pdcurses.sourceforge.net/

Christoph Gohlke has an extension module based on PDCurses [1]. The
good news for Python 3 users is that it uses the [W]ide-character
console API, such as ReadConsoleInputW. Also, its _get_key_count [2]
function is designed to support the alt numpad event sequences that
the system creates for the input filepath when dragging a file into
the console. In my limited testing, dragging filepaths from Explorer
worked without a hitch using a random Latin-1 name "¨°¸ÀÈÐØàèðø" and a
Latin Extended-B name "ƠƨưƸǀLjǐǘǠǨǰǸ".

Unfortunately the Python 2.7 version is linked against the [A]NSI API,
which maps each Unicode character to either the closest matching
character in the console's codepage or "?". Moreover the PDCurses code
has a bug in narrow builds in that it returns the UnicodeChar from the
KEY_EVENT_RECORD [3] instead of the AsciiChar (the name is a
misnomer). In this case the high byte is junk. You can mask it out
using a bitwise & with 0xFF.

That said, IIRC, the OP wants to avoid using any frameworks such as
curses or a GUI toolkit.

[1]: http://www.lfd.uci.edu/~gohlke/pythonlibs/#curses
[2]: https://github.com/wmcbrine/PDCurses/blob/PDCurses_3_4/win32/pdckbd.c#L259
[3]: https://msdn.microsoft.com/en-us/library/ms684166

[toc] | [prev] | [next] | [standalone]


#100440

Fromeryk sun <eryksun@gmail.com>
Date2015-12-14 17:55 -0600
Message-ID<mailman.6.1450137352.22044.python-list@python.org>
In reply to#100414
On Mon, Dec 14, 2015 at 4:17 PM, Ulli Horlacher
<framstag@rus.uni-stuttgart.de> wrote:
>
> ImportError: No module named pyreadline
>
> Is it a python 3.x module?
>
> I am limited to Python 2.7

pyreadline is available for 2.7-3.5 on PyPI. Anyway, I tried it to no
avail. When dropping a file path into the console it ignores the
alt-numpad sequences that get queued for non-ASCII characters, just
like mvcrt.getwch. If you decide to roll your own getwch via ctypes or
PyWin32, I suggest starting a new topic on the ctypes list or Windows
list.

[toc] | [prev] | [next] | [standalone]


#100442

FromLaura Creighton <lac@openend.se>
Date2015-12-15 01:13 +0100
Message-ID<mailman.8.1450138408.22044.python-list@python.org>
In reply to#100414
In a message of Mon, 14 Dec 2015 17:55:04 -0600, eryk sun writes:
>On Mon, Dec 14, 2015 at 4:17 PM, Ulli Horlacher
><framstag@rus.uni-stuttgart.de> wrote:
>>
>> ImportError: No module named pyreadline
>>
>> Is it a python 3.x module?
>>
>> I am limited to Python 2.7
>
>pyreadline is available for 2.7-3.5 on PyPI. Anyway, I tried it to no
>avail. When dropping a file path into the console it ignores the
>alt-numpad sequences that get queued for non-ASCII characters, just
>like mvcrt.getwch. If you decide to roll your own getwch via ctypes or
>PyWin32, I suggest starting a new topic on the ctypes list or Windows
>list.
>-- 
>https://mail.python.org/mailman/listinfo/python-list

PyPy wrote its own pyreadline.
You can get it here. https://bitbucket.org/pypy/pyrepl
And see if it works any better.

Laura

[toc] | [prev] | [next] | [standalone]


#100449

FromUlli Horlacher <framstag@rus.uni-stuttgart.de>
Date2015-12-15 08:26 +0000
Message-ID<n4oirt$eir$1@news2.informatik.uni-stuttgart.de>
In reply to#100442
Laura Creighton <lac@openend.se> wrote:

> PyPy wrote its own pyreadline.
> You can get it here. https://bitbucket.org/pypy/pyrepl

As far as I can see, it has no getkey function.
My users do not hit ENTER after drag&drop or copy&paste files.
I need an input function with a timeout.


-- 
Ullrich Horlacher              Server und Virtualisierung
Rechenzentrum IZUS/TIK         E-Mail: horlacher@tik.uni-stuttgart.de
Universitaet Stuttgart         Tel:    ++49-711-68565868
Allmandring 30a                Fax:    ++49-711-682357
70550 Stuttgart (Germany)      WWW:    http://www.tik.uni-stuttgart.de/

[toc] | [prev] | [next] | [standalone]


#100458

FromLaura Creighton <lac@openend.se>
Date2015-12-15 15:09 +0100
Message-ID<mailman.21.1450188573.22044.python-list@python.org>
In reply to#100449
In a message of Tue, 15 Dec 2015 08:26:37 +0000, Ulli Horlacher writes:
>Laura Creighton <lac@openend.se> wrote:
>
>> PyPy wrote its own pyreadline.
>> You can get it here. https://bitbucket.org/pypy/pyrepl
>
>As far as I can see, it has no getkey function.
>My users do not hit ENTER after drag&drop or copy&paste files.
>I need an input function with a timeout.

Right, then this isn't going to work.  Sorry about that.

Laura

[toc] | [prev] | [next] | [standalone]


#100464

Fromeryk sun <eryksun@gmail.com>
Date2015-12-15 09:34 -0600
Message-ID<mailman.25.1450193701.22044.python-list@python.org>
In reply to#100449
On Tue, Dec 15, 2015 at 2:26 AM, Ulli Horlacher
<framstag@rus.uni-stuttgart.de> wrote:
> Laura Creighton <lac@openend.se> wrote:
>
>> PyPy wrote its own pyreadline.
>> You can get it here. https://bitbucket.org/pypy/pyrepl
>
> As far as I can see, it has no getkey function.
> My users do not hit ENTER after drag&drop or copy&paste files.
> I need an input function with a timeout.

pyreadline looked promising for its extensive ctypes implementation of
the Windows console API [1], wrapped by high-level methods such as
peek, getchar, and getkeypress. It turns out it ignores the event
sequences you need for alt+numpad input (used when a file is dragged
into the console). You'd have to modify its console and keysyms
modules to make it work. It would be a useful enhancement, so probably
your patches would be accepted upstream.

AFAICT, pyrepl has no Windows support. Check the TODO [2]:

> + port to windows

[1]: https://github.com/pyreadline/pyreadline/blob/master/pyreadline/console/console.py
[2]: https://bitbucket.org/pypy/pyrepl/src/62f2256014af7b74b97c00827f1a7789e00dd814/TODO?at=v0.8.4

[toc] | [prev] | [next] | [standalone]


#100474

FromUlli Horlacher <framstag@rus.uni-stuttgart.de>
Date2015-12-15 17:04 +0000
Message-ID<n4ph7n$mc7$1@news2.informatik.uni-stuttgart.de>
In reply to#100464
eryk sun <eryksun@gmail.com> wrote:

> pyreadline looked promising for its extensive ctypes implementation of
> the Windows console API [1], wrapped by high-level methods such as
> peek, getchar, and getkeypress. It turns out it ignores the event
> sequences you need for alt+numpad input (used when a file is dragged
> into the console). You'd have to modify its console and keysyms
> modules to make it work. It would be a useful enhancement, so probably
> your patches would be accepted upstream.

Ehhh... I started Python programming some weeks ago and I know nearly
nothing about Windows. I am a UNIX and VMS guy :-)

I am far away from delivering patches for Windows system programming.

-- 
Ullrich Horlacher              Server und Virtualisierung
Rechenzentrum IZUS/TIK         E-Mail: horlacher@tik.uni-stuttgart.de
Universitaet Stuttgart         Tel:    ++49-711-68565868
Allmandring 30a                Fax:    ++49-711-682357
70550 Stuttgart (Germany)      WWW:    http://www.tik.uni-stuttgart.de/

[toc] | [prev] | [next] | [standalone]


#100556

Fromeryk sun <eryksun@gmail.com>
Date2015-12-16 21:39 -0600
Message-ID<mailman.31.1450323631.30845.python-list@python.org>
In reply to#100474
On Tue, Dec 15, 2015 at 11:04 AM, Ulli Horlacher
<framstag@rus.uni-stuttgart.de> wrote:
>
> Ehhh... I started Python programming some weeks ago and I know nearly
> nothing about Windows. I am a UNIX and VMS guy :-)

You should feel right at home, then. The Windows NT kernel was
designed and implemented by a team of former DEC engineers led by
David Cutler, who was one of the principle architects of VMS. There's
an old joke that W[indows] NT is VMS + 1. Actually, you'd probably
only notice a slight resemblance if you were coding a driver [1].
Microsoft discourages using the native NT API in user mode.

Windows client DLLs such as kernel32.dll usually implement an API
function in one of three ways, or in combination:

    using the native runtime library and loader functions
    (Rtl* & Ldr* in ntdll.dll)

    calling system services such as

        Nt* public APIs (ntdll.dll => ntoskrnl.exe)
        NtUser* & NtGdi* private APIs
        (user32.dll, gdi32.dll => win32k.sys)

    using a local procedure call (via ALPC or a driver) to a
    subsystem process such as

        csrss.exe    - Windows client/server runtime
        conhost.exe  - console host
        services.exe - service control manager
        lsass.exe    - local security authority
        smss.exe     - session manager

But this is all an implementation detail. The API could be implemented
in a totally different way in a totally different environment, such as
running WINE on Linux.

[1]: http://windowsitpro.com/windows-client/windows-nt-and-vms-rest-story

[toc] | [prev] | [next] | [standalone]


#100604

Fromsmap <askme.first@thankyouverymuch.invalid>
Date2015-12-18 21:15 +0000
Message-ID<qH_cy.70540$qj6.50232@fx44.am4>
In reply to#100474
On Tue, 15 Dec 2015 17:04:55 +0000, Ulli Horlacher wrote:

> I am a UNIX

If I were you, and I had a choice, I would stay with it.

Windoze is a bloody joke. A troll designed it and is probably laughing 
all the way to the bank. I wish there was a way to go back in time and 
sneakily roll a condom on Mr. Gates senior's Johnson while he was 
servicing the Mrs. That's how much I really HATE that so-called "OS" :(

[toc] | [prev] | [next] | [standalone]


#100578

Frombearmingo <bearmingo@gmail.com>
Date2015-12-17 21:12 -0800
Message-ID<4797ae88-3341-49a9-b046-de2a31d6ad40@googlegroups.com>
In reply to#100414
Usually I put 
#!-*-coding=utf-8-*-
at each py file.
It's ok to open file in local system.

[toc] | [prev] | [next] | [standalone]


#100580

FromTerry Reedy <tjreedy@udel.edu>
Date2015-12-18 01:37 -0500
Message-ID<mailman.44.1450420807.30845.python-list@python.org>
In reply to#100578
On 12/18/2015 12:12 AM, bearmingo wrote:
> Usually I put
> #!-*-coding=utf-8-*-
> at each py file.
> It's ok to open file in local system.

That declaration only applies to the content of the file, not its name 
on the filesystem.


-- 
Terry Jan Reedy

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web