Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!newsfeed.xs4all.nl!newsfeed2.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
Date: Mon, 28 Oct 2013 09:23:41 -0500
From: Tim Chase <python.list@tim.thechases.com>
To: wxjmfauth@gmail.com
Subject: Re: trying to strip out non ascii.. or rather convert non ascii
In-Reply-To: <d205042e-29cd-49df-9f6e-600e123f8483@googlegroups.com>
References: <mailman.1604.1382818293.18130.python-list@python.org> <526c412a$0$29972$c3e8da3$5496439d@news.astraweb.com> <mailman.1628.1382838024.18130.python-list@python.org> <pan.2013.10.27.03.21.57.202000@nowhere.com> <d205042e-29cd-49df-9f6e-600e123f8483@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Cc: python-list@python.org
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.1702.1382970129.18130.python-list@python.org>
Lines: 22
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:57826

On 2013-10-28 07:01, wxjmfauth@gmail.com wrote:
>> Simply ignoring diactrics won't get you very far.
>=20
> Right. As an example, these four French words :
> cote, c=C3=B4te, cot=C3=A9, c=C3=B4t=C3=A9 .

Distinct words with distinct meanings, sure.

But when a na=C3=AFve (naive? =E2=98=BA) person or one without the easy abi=
lity
to enter characters with diacritics searches for "cote", I want to
return possible matches containing any of your 4 examples.  It's
slightly fuzzier if they search for "cot=C3=A9", in which case they may
mean "cot=C3=A9" or they might mean be unable to figure out how to
add a hat and want to type "c=C3=B4t=C3=A9". Though I'd rather get more
results, even if it has some that only match fuzzily.

Circumflexually-circumspectly-yers,

-tkc