Path: csiph.com!usenet.pasdenom.info!gegeweb.org!usenet-fr.net!nerim.net!novso.com!newsfeed.xs4all.nl!newsfeed2.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.168 X-Spam-Level: * X-Spam-Evidence: '*H*': 0.67; '*S*': 0.01; 'way:': 0.09; 'columbus': 0.16; 'identifiers.': 0.16; 'unambiguous': 0.16; 'sat,': 0.16; 'wrote:': 0.18; 'do.': 0.18; 'creating': 0.23; 'header:User- Agent:1': 0.23; 'subject:Code': 0.24; 'sort': 0.25; 'header:In- Reply-To:1': 0.27; "i'm": 0.30; 'that.': 0.31; 'url:wiki': 0.31; "d'aprano": 0.31; 'steven': 0.31; 'url:wikipedia': 0.31; 'username': 0.31; 'another': 0.32; 'text': 0.33; 'received:209.85': 0.35; 'case,': 0.35; 'received:google.com': 0.35; 'there': 0.35; 'url:org': 0.36; 'example,': 0.37; 'received:209.85.216': 0.37; 'received:209': 0.37; 'to:addr :python-list': 0.38; 'extremely': 0.39; 'legitimate': 0.39; 'to:addr:python.org': 0.39; 'called': 0.40; 'users': 0.40; 'back': 0.62; 'content-disposition:inline': 0.62; 'received:190': 0.69; 'risk': 0.72; 'confusing': 0.84; 'received:190.163': 0.84; 'justice': 0.93; 'reducing': 0.93; '2013': 0.98 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:date:from:to:subject:message-id:references :mime-version:content-type:content-disposition :content-transfer-encoding:in-reply-to:user-agent; bh=fH8Ud3s9IBnlqFTu91oXbo8lTuqQa2Z+uALiDf9uLww=; b=j6yrM7oNyhVatCHZKypJ2vwk+/4uAV0O2s5EqZNU+CoJfpT+xCR6tqcAAktgRdUi3V LA2CAsWNcp2aIebtlFFbHFI1eBAxLUkOs0qqs7nVJeLBxi99IDLl4Y4MwynXfa9TXpwB qQykxZ+jHIWx6RCE1VT/spHF1YgslvngT9oH5v1Vmy1ip9Agqw5Xi5q+4IbCO3jpPZdM bdONImn6yrOp7/gTZuuRO8fzWR6WrEyHB8F8h7X4gEsXafoNbO/j5bUwFWbrFz2QcV7p gnlnDVa9ffrzXT9nkGPkioyYqQPKAVhrlE5yhgY/SnKRUQDVZGEX8TH85A9fsNovtrPu lRcw== X-Gm-Message-State: ALoCoQk8VymfboaLHCg4s9xwHR8H9FwU+wKvIvqokJmcc2HJWSrQfMxfg3q0KCzu8d4b2gWoZjIH X-Received: by 10.224.40.138 with SMTP id k10mr11557911qae.67.1382192084691; Sat, 19 Oct 2013 07:14:44 -0700 (PDT) Date: Sat, 19 Oct 2013 11:14:30 -0300 From: Zero Piraeus To: python-list@python.org Subject: Re: Looking for UNICODE to ASCII Conversioni Example Code References: <52624e8f$0$29981$c3e8da3$5496439d@news.astraweb.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <52624e8f$0$29981$c3e8da3$5496439d@news.astraweb.com> X-PGP-Key: http://etiol.net/pubkey.asc User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 30 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1382192092 news.xs4all.nl 16006 [2001:888:2000:d::a6]:58759 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:57104 : On Sat, Oct 19, 2013 at 09:19:12AM +0000, Steven D'Aprano wrote: > Make no mistake, this sort of simple-minded stripping of accents and > diacritics is an extremely ham-fisted thing to do. I used to live on a street called Calle Colón, so I'm aware of the dangers of stripping diacritics: https://es.wikipedia.org/wiki/Colón https://es.wikipedia.org/wiki/Colon ... although in that particular case, there's a degree of poetic justice in confusing Cristóbal Colón / Cristopher Columbus with the back end of a digestive tract: http://theoatmeal.com/comics/columbus_day Joking aside, there is a legitimate use for asciifying text in this way: creating unambiguous identifiers. For example, a miscreant may create the username 'míguel' in order to pose as another user 'miguel', relying on other users inattentiveness. Asciifying is one way of reducing the risk of that. -[]z. -- Zero Piraeus: in ictu oculi http://etiol.net/pubkey.asc