Path: csiph.com!usenet.pasdenom.info!dedibox.gegeweb.org!gegeweb.eu!nntpfeed.proxad.net!proxad.net!feeder1-2.proxad.net!usenet-fr.net!nerim.net!novso.com!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Date: Wed, 30 Oct 2013 11:54:05 -0600
From: Michael Torrie <torriem@gmail.com>
User-Agent: Mozilla/5.0 (X11; Linux i686; rv:10.0.12) Gecko/20130105 Thunderbird/10.0.12
MIME-Version: 1.0
To: python-list@python.org
Subject: Re: trying to strip out non ascii.. or rather convert non ascii
References: <mailman.1604.1382818293.18130.python-list@python.org> <526c412a$0$29972$c3e8da3$5496439d@news.astraweb.com> <mailman.1628.1382838024.18130.python-list@python.org> <pan.2013.10.27.03.21.57.202000@nowhere.com> <d205042e-29cd-49df-9f6e-600e123f8483@googlegroups.com> <mailman.1702.1382970129.18130.python-list@python.org> <526f46a2$0$6512$c3e8da3$5496439d@news.astraweb.com> <e018a4c6-e7a5-4356-8929-e26a3fdcb75d@googlegroups.com> <mailman.1806.1383137592.18130.python-list@python.org> <d4e620ab-e42c-4939-92b4-0c5c62c0bc8b@googlegroups.com>
In-Reply-To: <d4e620ab-e42c-4939-92b4-0c5c62c0bc8b@googlegroups.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.1821.1383156703.18130.python-list@python.org>
Lines: 23
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:58076

On 10/30/2013 10:08 AM, wxjmfauth@gmail.com wrote:
> My comment had nothing to do with Python, it was a
> general comment. A diacritical mark just makes a letter
> a different letter; a "ï " and a "i" are "as
> diferent" as a "a" from a "z". A diacritical mark
> is more than a simple ornementation.

That's nice, but you didn't actually read what Ned said (or the OP).
The OP doesn't care that "ï " and a "i" are as different as "a" and "z".
 For the purposes of his search he wants them treated as the same
letter.  A fuzzy searching treats them all the same. For example, a
search for "Godel, Escher, Bach" should find "Gödel, Escher, Bach" just
fine.  Even though "o" and "ö" are different characters.  And lo and
behold Google actually does this!  Try it.  It's nice for those of use
who want to find something and our US keyboards don't have the right marks.

https://www.google.ca/search?q=godel+escher+bach

After all this nonsense, that's what the original poster is looking for
(I think... can't be sure since it's been so many days now).  Seems to
me a python module does this quite nicely:

https://pypi.python.org/pypi/Unidecode