Path: csiph.com!usenet.pasdenom.info!dedibox.gegeweb.org!gegeweb.eu!nntpfeed.proxad.net!proxad.net!feeder1-2.proxad.net!usenet-fr.net!nerim.net!novso.com!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.047 X-Spam-Evidence: '*H*': 0.91; '*S*': 0.00; 'python,': 0.02; 'url:pypi': 0.03; '"as': 0.07; 'subject:trying': 0.09; 'python': 0.11; '"a"': 0.16; '"o"': 0.16; 'fine.': 0.16; 'from:addr:torriem': 0.16; 'from:name:michael torrie': 0.16; 'fuzzy': 0.16; 'subject:non': 0.16; 'wrote:': 0.18; 'module': 0.19; 'seems': 0.21; 'header:User-Agent:1': 0.23; '(or': 0.24; 'purposes': 0.26; 'header:In-Reply-To:1': 0.27; 'am,': 0.29; "doesn't": 0.30; 'letter.': 0.31; 'quite': 0.32; 'url:python': 0.33; 'comment': 0.34; "can't": 0.35; 'something': 0.35; 'but': 0.35; 'google': 0.35; "didn't": 0.36; 'url:org': 0.36; 'should': 0.36; 'example,': 0.37; 'searching': 0.37; 'message- id:@gmail.com': 0.38; 'same.': 0.38; 'to:addr:python-list': 0.38; 'does': 0.39; 'sure': 0.39; 'to:addr:python.org': 0.39; 'received:org': 0.40; 'days': 0.60; 'even': 0.60; 'read': 0.60; 'simple': 0.61; 'email addr:gmail.com': 0.63; 'our': 0.64; 'more': 0.64; 'different': 0.65; 'url:search': 0.81; 'comment.': 0.84; 'nice,': 0.84; 'subject:.. ': 0.84; 'think...': 0.84; 'treats': 0.84; 'this!': 0.93 X-Virus-Scanned: amavisd-new at torriefamily.org Date: Wed, 30 Oct 2013 11:54:05 -0600 From: Michael Torrie User-Agent: Mozilla/5.0 (X11; Linux i686; rv:10.0.12) Gecko/20130105 Thunderbird/10.0.12 MIME-Version: 1.0 To: python-list@python.org Subject: Re: trying to strip out non ascii.. or rather convert non ascii References: <526c412a$0$29972$c3e8da3$5496439d@news.astraweb.com> <526f46a2$0$6512$c3e8da3$5496439d@news.astraweb.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 23 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1383156703 news.xs4all.nl 15977 [2001:888:2000:d::a6]:39582 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:58076 On 10/30/2013 10:08 AM, wxjmfauth@gmail.com wrote: > My comment had nothing to do with Python, it was a > general comment. A diacritical mark just makes a letter > a different letter; a "ï " and a "i" are "as > diferent" as a "a" from a "z". A diacritical mark > is more than a simple ornementation. That's nice, but you didn't actually read what Ned said (or the OP). The OP doesn't care that "ï " and a "i" are as different as "a" and "z". For the purposes of his search he wants them treated as the same letter. A fuzzy searching treats them all the same. For example, a search for "Godel, Escher, Bach" should find "Gödel, Escher, Bach" just fine. Even though "o" and "ö" are different characters. And lo and behold Google actually does this! Try it. It's nice for those of use who want to find something and our US keyboards don't have the right marks. https://www.google.ca/search?q=godel+escher+bach After all this nonsense, that's what the original poster is looking for (I think... can't be sure since it's been so many days now). Seems to me a python module does this quite nicely: https://pypi.python.org/pypi/Unidecode