Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.005 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'url:pypi': 0.03; 'ascii': 0.07; 'python': 0.09; 'subject:string': 0.09; 'encoding': 0.15; 'properly': 0.15; 'err': 0.16; 'from:addr:torriem': 0.16; 'from:name:michael torrie': 0.16; 'roy': 0.16; 'somehow.': 0.16; 'subject:3.3': 0.16; 'wrote:': 0.17; 'tries': 0.17; 'unicode': 0.17; 'thus': 0.24; 'header:In-Reply-To:1': 0.25; 'header:User- Agent:1': 0.26; '(which': 0.26; 'thanks!': 0.26; 'am,': 0.27; 'plain': 0.27; 'rules': 0.27; 'letter.': 0.29; 'piece': 0.29; 'represented': 0.29; 'smart': 0.29; 'character': 0.29; 'convert': 0.29; 'url:python': 0.32; 'generally': 0.32; 'to:addr:python- list': 0.33; 'languages': 0.33; 'likely': 0.33; 'text': 0.34; 'received:org': 0.36; 'michael': 0.36; 'message-id:@gmail.com': 0.36; 'url:org': 0.36; 'subject:New': 0.37; 'subject:: ': 0.38; 'to:addr:python.org': 0.39; 'received:192': 0.39; 'received:192.168': 0.40; 'header:Received:5': 0.40; 'behavior': 0.64; 'smith': 0.71; 'sweet': 0.71; 'article': 0.78; 'etc,': 0.84 X-Virus-Scanned: amavisd-new at torriefamily.org Date: Mon, 20 Aug 2012 22:18:38 -0600 From: Michael Torrie User-Agent: Mozilla/5.0 (X11; Linux i686; rv:10.0.6esrpre) Gecko/20120713 Thunderbird/10.0.6 MIME-Version: 1.0 To: python-list@python.org Subject: Re: New internal string format in 3.3 References: <5030891f$0$29978$c3e8da3$5496439d@news.astraweb.com> <5030aa44$0$29978$c3e8da3$5496439d@news.astraweb.com> <11931ec9-1858-4ae8-8a61-1d154d105229@googlegroups.com> <73c85f3b-a4a9-4812-bc41-132b5126874c@googlegroups.com> <1f22cebc-71aa-4881-bac5-d97d72fe2633@googlegroups.com> <5570714c-59e7-4149-b2bd-89d7628774e3@googlegroups.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 17 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1345522727 news.xs4all.nl 6981 [2001:888:2000:d::a6]:60355 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:27546 On 08/20/2012 07:17 AM, Roy Smith wrote: > In article , > Michael Torrie wrote: > >> Python generally tries to follow unicode >> encoding rules to the letter. Thus if a piece of text cannot be >> represented in the character set of the terminal, then Python will >> properly err out. Other languages you have tried, likely fudge it >> somehow. > > And if you want the "fudge it somehow" behavior (which is often very > useful!), there's always http://pypi.python.org/pypi/Unidecode/ Sweet tip, thanks! I often want to process text that has smart quotes, emdashes, etc, and convert them to plain old ascii quotes, dashes, ticks, etc. This will do that for me without resorting to a bunch of regexes. Bravo.