Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #58141
| From | Roy Smith <roy@panix.com> |
|---|---|
| Newsgroups | comp.lang.python |
| Subject | Re: trying to strip out non ascii.. or rather convert non ascii |
| Date | 2013-10-30 19:28 -0400 |
| Organization | PANIX Public Access Internet and UNIX, NYC |
| Message-ID | <roy-445BFB.19284330102013@news.panix.com> (permalink) |
| References | (6 earlier) <526f46a2$0$6512$c3e8da3$5496439d@news.astraweb.com> <e018a4c6-e7a5-4356-8929-e26a3fdcb75d@googlegroups.com> <mailman.1806.1383137592.18130.python-list@python.org> <d4e620ab-e42c-4939-92b4-0c5c62c0bc8b@googlegroups.com> <mailman.1821.1383156703.18130.python-list@python.org> |
In article <mailman.1821.1383156703.18130.python-list@python.org>,
Michael Torrie <torriem@gmail.com> wrote:
> On 10/30/2013 10:08 AM, wxjmfauth@gmail.com wrote:
> > My comment had nothing to do with Python, it was a
> > general comment. A diacritical mark just makes a letter
> > a different letter; a "ï " and a "i" are "as
> > diferent" as a "a" from a "z". A diacritical mark
> > is more than a simple ornementation.
>
> That's nice, but you didn't actually read what Ned said (or the OP).
> The OP doesn't care that "ï " and a "i" are as different as "a" and "z".
> For the purposes of his search he wants them treated as the same
> letter. A fuzzy searching treats them all the same.
That's one definition of fuzzy. But, there's nothing that says you
can't build a fuzzy matching algorithm which considers some mismatches
to be worse than others.
For example, it's reasonable to consider any vowel (or string of vowels,
for that matter) to be closer to another vowel than to a consonant. A
great example is the word, "bureaucrat". As far as I'm concerned, it's
spelled {b, vowels, r, vowels, c, r, a, t}. It usually takes me three
or four tries to get auto-correct to even recognize what I'm trying to
type and fix it for me.
Likewise for pairs like {c, s}, {j, g}, {v, w}, and so on.
In that spirit, I would think that a, á, and â would all be considered
more conservative replacements for each other than they would be for k,
x, or z.
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
trying to strip out non ascii.. or rather convert non ascii bruce <badouglas@gmail.com> - 2013-10-26 16:11 -0400
Re: trying to strip out non ascii.. or rather convert non ascii Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-10-26 22:24 +0000
Re: trying to strip out non ascii.. or rather convert non ascii Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-10-26 20:51 -0400
Re: trying to strip out non ascii.. or rather convert non ascii Roy Smith <roy@panix.com> - 2013-10-26 21:11 -0400
Re: trying to strip out non ascii.. or rather convert non ascii Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-10-27 02:05 +0000
Re: trying to strip out non ascii.. or rather convert non ascii Chris Angelico <rosuav@gmail.com> - 2013-10-27 13:15 +1100
Re: trying to strip out non ascii.. or rather convert non ascii Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-10-27 09:21 +0000
Re: trying to strip out non ascii.. or rather convert non ascii Tim Chase <python.list@tim.thechases.com> - 2013-10-26 20:41 -0500
Re: trying to strip out non ascii.. or rather convert non ascii Roy Smith <roy@panix.com> - 2013-10-26 21:54 -0400
Re: trying to strip out non ascii.. or rather convert non ascii Tim Chase <python.list@tim.thechases.com> - 2013-10-26 21:17 -0500
Re: trying to strip out non ascii.. or rather convert non ascii Nobody <nobody@nowhere.com> - 2013-10-27 03:21 +0000
Re: trying to strip out non ascii.. or rather convert non ascii wxjmfauth@gmail.com - 2013-10-28 07:01 -0700
Re: trying to strip out non ascii.. or rather convert non ascii Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-10-28 14:13 +0000
Re: trying to strip out non ascii.. or rather convert non ascii Tim Chase <python.list@tim.thechases.com> - 2013-10-28 09:23 -0500
Re: trying to strip out non ascii.. or rather convert non ascii Steven D'Aprano <steve@pearwood.info> - 2013-10-29 05:24 +0000
Re: trying to strip out non ascii.. or rather convert non ascii wxjmfauth@gmail.com - 2013-10-30 01:49 -0700
Re: trying to strip out non ascii.. or rather convert non ascii Ned Batchelder <ned@nedbatchelder.com> - 2013-10-30 08:44 -0400
Re: trying to strip out non ascii.. or rather convert non ascii wxjmfauth@gmail.com - 2013-10-30 09:08 -0700
Re: trying to strip out non ascii.. or rather convert non ascii Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-10-30 16:24 +0000
Re: trying to strip out non ascii.. or rather convert non ascii Ned Batchelder <ned@nedbatchelder.com> - 2013-10-30 13:10 -0400
Re: trying to strip out non ascii.. or rather convert non ascii Michael Torrie <torriem@gmail.com> - 2013-10-30 11:54 -0600
Re: trying to strip out non ascii.. or rather convert non ascii wxjmfauth@gmail.com - 2013-10-30 11:38 -0700
Re: trying to strip out non ascii.. or rather convert non ascii Roy Smith <roy@panix.com> - 2013-10-30 19:28 -0400
Re: trying to strip out non ascii.. or rather convert non ascii Tim Chase <python.list@tim.thechases.com> - 2013-10-31 06:46 -0500
Re: trying to strip out non ascii.. or rather convert non ascii Terry Reedy <tjreedy@udel.edu> - 2013-10-30 17:56 -0400
Re: trying to strip out non ascii.. or rather convert non ascii Steven D'Aprano <steve@pearwood.info> - 2013-10-31 07:10 +0000
Re: trying to strip out non ascii.. or rather convert non ascii Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-10-31 07:23 +0000
Re: trying to strip out non ascii.. or rather convert non ascii wxjmfauth@gmail.com - 2013-10-31 03:33 -0700
Re: trying to strip out non ascii.. or rather convert non ascii Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-11-01 07:16 +0000
Re: trying to strip out non ascii.. or rather convert non ascii wxjmfauth@gmail.com - 2013-11-01 02:00 -0700
Re: trying to strip out non ascii.. or rather convert non ascii Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-11-01 09:18 +0000
Re: trying to strip out non ascii.. or rather convert non ascii Steven D'Aprano <steve@pearwood.info> - 2013-10-29 05:22 +0000
Re: trying to strip out non ascii.. or rather convert non ascii wxjmfauth@gmail.com - 2013-10-29 08:38 -0700
Re: trying to strip out non ascii.. or rather convert non ascii Tim Chase <python.list@tim.thechases.com> - 2013-10-29 10:52 -0500
Re: trying to strip out non ascii.. or rather convert non ascii wxjmfauth@gmail.com - 2013-10-29 12:16 -0700
Re: trying to strip out non ascii.. or rather convert non ascii Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-10-29 19:54 +0000
Re: trying to strip out non ascii.. or rather convert non ascii Piet van Oostrum <piet@vanoostrum.org> - 2013-10-29 21:33 -0400
Re: trying to strip out non ascii.. or rather convert non ascii Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-10-30 09:19 +0000
Re: trying to strip out non ascii.. or rather convert non ascii Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-10-29 15:56 +0000
Re: trying to strip out non ascii.. or rather convert non ascii Chris Angelico <rosuav@gmail.com> - 2013-10-30 13:17 +1100
Re: trying to strip out non ascii.. or rather convert non ascii wxjmfauth@gmail.com - 2013-10-30 01:13 -0700
Re: trying to strip out non ascii.. or rather convert non ascii Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-10-30 15:25 +0000
csiph-web