Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #44925

Re: Making safe file names

Path csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!rt.uk.eu.org!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
Return-Path <davea@davea.name>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.002
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'encoding': 0.05; 'allowed.': 0.07; 'encoded': 0.07; 'subject:file': 0.07; 'utf-8': 0.07; 'bytes.': 0.09; 'escape': 0.09; 'filenames': 0.09; 'locale': 0.09; 'window?': 0.09; 'mostly': 0.14; 'windows': 0.15; '"in': 0.16; 'encodings': 0.16; 'finite': 0.16; 'happily': 0.16; 'received:74.208.4.195': 0.16; 'those,': 0.16; 'unicode,': 0.16; 'unix,': 0.16; 'wrote:': 0.18; 'small,': 0.19; 'typing': 0.19; 'settings': 0.22; 'comfortable': 0.22; 'header:User-Agent:1': 0.23; "aren't": 0.24; 'artist': 0.24; 'unicode': 0.24; 'decide': 0.24; 'fine': 0.24; "haven't": 0.24; 'certain': 0.27; 'header:In- Reply-To:1': 0.27; 'character': 0.29; 'generally': 0.29; 'andrew': 0.30; 'characters': 0.30; 'ignored.': 0.30; '(which': 0.31; 'code': 0.31; 'adequate': 0.31; "user's": 0.31; 'file': 0.32; 'problem': 0.35; 'but': 0.35; 'there': 0.35; 'changing': 0.37; 'reports': 0.37; 'list': 0.37; 'easily': 0.37; 'displays': 0.38; 'handle': 0.38; 'whatever': 0.38; 'to:addr:python-list': 0.38; 'issue': 0.38; 'pm,': 0.38; 'moving': 0.39; 'to:addr:python.org': 0.39; 'system.': 0.39; 'how': 0.40; 'dave': 0.60; "you're": 0.61; 'first': 0.61; 'soon': 0.63; 'decided': 0.64; 'different': 0.65; 'between': 0.67; 'invalid': 0.68; 'received:74.208': 0.68; 'characters,': 0.84; 'subject:Making': 0.84; 'angel': 0.91
Date Tue, 07 May 2013 21:13:11 -0400
From Dave Angel <davea@davea.name>
User-Agent Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130329 Thunderbird/17.0.5
MIME-Version 1.0
To python-list@python.org
Subject Re: Making safe file names
References <51895D03.4000300@gmail.com> <518998FE.6030805@davea.name> <5189A18C.5060109@gmail.com>
In-Reply-To <5189A18C.5060109@gmail.com>
Content-Type text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding 7bit
X-Provags-ID V02:K0:Kg9EzfvPEcpZVa2znA6ye7VcNgpRqvNxKoQUbfoog2D BgHflxo13/hXXfdzAkrnDZAeohExbKpx6Ur4ZnyodN9Ry1hqAN LgKjqFW2WfV5xrNvtItQRPIj0yyFuxOcyPr58df4qLpeNzhvXB CapsfATVr22G4yWB4eogSdUVNYOEiWE8yIkmlUQFLJkfAPnda2 hODzqeLzoZzjgbj9G9k1Vx8ARR1P1UMUlkeuuDoKnX9neH3sRU KzkysVKjNe204wfAsNNBJmD00CNQpLwu/AK6bLbmi3VKAwlUIP BPns6LUH14c4v9M/v34AmyeyoYiPoDszydM7/wYIn6qFiNxOA= =
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.1431.1367975609.3114.python-list@python.org> (permalink)
Lines 42
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1367975609 news.xs4all.nl 15921 [2001:888:2000:d::a6]:60223
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:44925

Show key headers only | View raw


On 05/07/2013 08:51 PM, Andrew Berg wrote:
> On 2013.05.07 19:14, Dave Angel wrote:
>> You also need to decide how to handle Unicode characters, since they're
>> different for different OS.  In Windows on NTFS, filenames are in
>> Unicode, while on Unix, filenames are bytes.  So on one of those, you
>> will be encoding/decoding if your code is to be mostly portable.
> Characters outside whatever sys.getfilesystemencoding() returns won't be allowed. If the user's locale settings don't support Unicode, my
> program will be far from the only one to have issues with it. Any problem reports that arise from a user moving between legacy encodings
> will generally be ignored. I haven't yet decided how I will handle artist names with characters outside UTF-8,

There aren't any characters "outside UTF-8".  But a character is not "in 
utf-8", it can be encoded by utf-8.

  but inside UTF-16/32 (UTF-16

Nor outside UTF-16 or 32.

> is just fine on Windows/NTFS, but on Unix(-ish) systems, many use UTF-8 in their locale settings).
>> Don't forget that ls and rm may not use the same encoding you're using.
>> So you may not consider it adequate to make the names legal, but you
>> may also want they easily typeable in the shell.
> I don't understand. I have no intention of changing Unicode characters.

So you're comfortable typing arbitrary characters?  what about all the 
characters that have identical displays in your font? What about viewing 
0x07 in the terminal window?  Or 0x04?

>
>
> This is not a Unicode issue since (modern) file systems will happily accept it. The issue is that certain characters (which are ASCII) are
> not allowed on some file systems:
>   \ / : * ? " < > | @ and the NUL character
> The first 9 are not allowed on NTFS, the @ is not allowed on ext3cow, and NUL and / are not allowed on pretty much any file system. Locale
> settings and encodings aside, these 11 characters will need to be escaped.
>

As soon as you have a small, finite list of invalid characters, writing 
an escape system is pretty easy.


-- 
DaveA

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Re: Making safe file names Dave Angel <davea@davea.name> - 2013-05-07 21:13 -0400

csiph-web