Path: csiph.com!x330-a1.tempe.blueboxinc.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!weretis.net!feeder1.news.weretis.net!feeder4.news.weretis.net!feeder.news-service.com!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.003 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'bits': 0.07; 'underlying': 0.09; 'def': 0.12; '"don\'t': 0.16; "'/'": 0.16; '-tkc': 0.16; 'filenames': 0.16; 'fname': 0.16; 'from:addr:python.list': 0.16; 'from:addr:tim.thechases.com': 0.16; 'from:name:tim chase': 0.16; 'message-id:@tim.thechases.com': 0.16; 'os.': 0.16; 'sane': 0.16; 'string",': 0.16; 'thread.': 0.16; 'thanks,': 0.19; 'code': 0.24; 'somebody': 0.25; 'checking': 0.29; 'odd': 0.29; 'host': 0.29; 'folder.': 0.30; "skip:' 10": 0.32; 'to:addr:python-list': 0.33; 'generally': 0.33; "i've": 0.33; '[1]': 0.34; 'characters': 0.34; 'there': 0.35; 'header:User-Agent:1': 0.35; 'something': 0.37; 'but': 0.38; 'skip:s 20': 0.39; "i'd": 0.39; 'received:209': 0.39; 'to:addr:python.org': 0.39; 'current': 0.40; 'results': 0.60; 'your': 0.60 Date: Tue, 31 May 2011 21:17:50 -0500 From: Tim Chase User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.17) Gecko/20110414 Thunderbird/3.1.10 MIME-Version: 1.0 To: Python Subject: Sanitizing filename strings across platforms Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - boston.accountservergroup.com X-AntiAbuse: Original Domain - python.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - tim.thechases.com X-Source: X-Source-Args: X-Source-Dir: X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 38 NNTP-Posting-Host: 82.94.164.166 X-Trace: 1306896666 news.xs4all.nl 49174 [::ffff:82.94.164.166]:33573 X-Complaints-To: abuse@xs4all.nl Xref: x330-a1.tempe.blueboxinc.net comp.lang.python:6775 Scenario: a file-name from potentially untrusted sources may have odd filenames that need to be sanitized for the underlying OS. On *nix, this generally just means "don't use '/' or \x00 in your string", while on Win32, there are a host of verboten characters and file-names. Then there's also checking the abspath/normpath of the resulting name to make sure it's still in the intended folder. I've read through [1] and have started to glom together various bits from that thread. My current course of action is something like SACRED_WIN32_FNAMES = set( ['CON', 'PRN', 'CLOCK$', 'AUX', 'NUL'] + ['LPT%i' % i for i in range(32)] + ['CON%i' % i for i in range(32)] + def sanitize_filename(fname): sane = set(string.letters + string.digits + '-_.[]{}()$') results = ''.join(c for c in fname if c in sane) # might have to check sans-extension if results.upper() in SACRED_WIN32_FNAMES: results = "_" + results return results but if somebody already has war-hardened code they'd be willing to share, I'd appreciate any thoughts. Thanks, -tkc [1] http://stackoverflow.com/questions/295135/turn-a-string-into-a-valid-filename-in-python