Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #58058
| Newsgroups | comp.lang.python |
|---|---|
| Date | 2013-10-30 09:08 -0700 |
| References | (4 earlier) <d205042e-29cd-49df-9f6e-600e123f8483@googlegroups.com> <mailman.1702.1382970129.18130.python-list@python.org> <526f46a2$0$6512$c3e8da3$5496439d@news.astraweb.com> <e018a4c6-e7a5-4356-8929-e26a3fdcb75d@googlegroups.com> <mailman.1806.1383137592.18130.python-list@python.org> |
| Message-ID | <d4e620ab-e42c-4939-92b4-0c5c62c0bc8b@googlegroups.com> (permalink) |
| Subject | Re: trying to strip out non ascii.. or rather convert non ascii |
| From | wxjmfauth@gmail.com |
Le mercredi 30 octobre 2013 13:44:47 UTC+1, Ned Batchelder a écrit : > On 10/30/13 4:49 AM, wxjmfauth@gmail.com wrote: > > > Le mardi 29 octobre 2013 06:24:50 UTC+1, Steven D'Aprano a écrit : > > >> On Mon, 28 Oct 2013 09:23:41 -0500, Tim Chase wrote: > > >> > > >> > > >> > > >>> On 2013-10-28 07:01, wxjmfauth@gmail.com wrote: > > >>>>> Simply ignoring diactrics won't get you very far. > > >>>> Right. As an example, these four French words : cote, côte, coté, côté > > >>>> . > > >>> Distinct words with distinct meanings, sure. > > >>> But when a naïve (naive? ☺) person or one without the easy ability to > > >>> enter characters with diacritics searches for "cote", I want to return > > >>> possible matches containing any of your 4 examples. It's slightly > > >>> fuzzier if they search for "coté", in which case they may mean "coté" or > > >>> they might mean be unable to figure out how to add a hat and want to > > >>> type "côté". Though I'd rather get more results, even if it has some > > >>> that only match fuzzily. > > >> > > >> > > >> The right solution to that is to treat it no differently from other fuzzy > > >> > > >> searches. A good search engine should be tolerant of spelling errors and > > >> > > >> alternative spellings for any letter, not just those with diacritics. > > >> > > >> Ideally, a good search engine would successfully match all three of > > >> > > >> "naïve", "naive" and "niave", and it shouldn't rely on special handling > > >> > > >> of diacritics. > > >> > > >> > > >> > > > ------ > > > > > > This is a non sense. The purpose of a diacritical mark is to > > > make a letter a different letter. If a tool is supposed to > > > match an ô, there is absolutely no reason to match something > > > else. > > > > > > jmf > > > > > > > jmf, Tim Chase described his use case, and it seems reasonable to me. > > I'm not sure why you would describe it as nonsense. > > > > --Ned. -------- My comment had nothing to do with Python, it was a general comment. A diacritical mark just makes a letter a different letter; a "ï " and a "i" are "as diferent" as a "a" from a "z". A diacritical mark is more than a simple ornementation. From a unicode perspective. Unicode.org "knows", these chars a very important, that's the reason why they exist in two forms, precomposed and composed forms. From a software perspective. Luckily for the end users, all the serious software are considering all these chars in an equal way. They are all belonging to the BMP plane. An "Ą" is treated as an "ê", same memory consumption, same performance, ==> very smooth software. jmf
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
trying to strip out non ascii.. or rather convert non ascii bruce <badouglas@gmail.com> - 2013-10-26 16:11 -0400
Re: trying to strip out non ascii.. or rather convert non ascii Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-10-26 22:24 +0000
Re: trying to strip out non ascii.. or rather convert non ascii Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-10-26 20:51 -0400
Re: trying to strip out non ascii.. or rather convert non ascii Roy Smith <roy@panix.com> - 2013-10-26 21:11 -0400
Re: trying to strip out non ascii.. or rather convert non ascii Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-10-27 02:05 +0000
Re: trying to strip out non ascii.. or rather convert non ascii Chris Angelico <rosuav@gmail.com> - 2013-10-27 13:15 +1100
Re: trying to strip out non ascii.. or rather convert non ascii Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-10-27 09:21 +0000
Re: trying to strip out non ascii.. or rather convert non ascii Tim Chase <python.list@tim.thechases.com> - 2013-10-26 20:41 -0500
Re: trying to strip out non ascii.. or rather convert non ascii Roy Smith <roy@panix.com> - 2013-10-26 21:54 -0400
Re: trying to strip out non ascii.. or rather convert non ascii Tim Chase <python.list@tim.thechases.com> - 2013-10-26 21:17 -0500
Re: trying to strip out non ascii.. or rather convert non ascii Nobody <nobody@nowhere.com> - 2013-10-27 03:21 +0000
Re: trying to strip out non ascii.. or rather convert non ascii wxjmfauth@gmail.com - 2013-10-28 07:01 -0700
Re: trying to strip out non ascii.. or rather convert non ascii Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-10-28 14:13 +0000
Re: trying to strip out non ascii.. or rather convert non ascii Tim Chase <python.list@tim.thechases.com> - 2013-10-28 09:23 -0500
Re: trying to strip out non ascii.. or rather convert non ascii Steven D'Aprano <steve@pearwood.info> - 2013-10-29 05:24 +0000
Re: trying to strip out non ascii.. or rather convert non ascii wxjmfauth@gmail.com - 2013-10-30 01:49 -0700
Re: trying to strip out non ascii.. or rather convert non ascii Ned Batchelder <ned@nedbatchelder.com> - 2013-10-30 08:44 -0400
Re: trying to strip out non ascii.. or rather convert non ascii wxjmfauth@gmail.com - 2013-10-30 09:08 -0700
Re: trying to strip out non ascii.. or rather convert non ascii Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-10-30 16:24 +0000
Re: trying to strip out non ascii.. or rather convert non ascii Ned Batchelder <ned@nedbatchelder.com> - 2013-10-30 13:10 -0400
Re: trying to strip out non ascii.. or rather convert non ascii Michael Torrie <torriem@gmail.com> - 2013-10-30 11:54 -0600
Re: trying to strip out non ascii.. or rather convert non ascii wxjmfauth@gmail.com - 2013-10-30 11:38 -0700
Re: trying to strip out non ascii.. or rather convert non ascii Roy Smith <roy@panix.com> - 2013-10-30 19:28 -0400
Re: trying to strip out non ascii.. or rather convert non ascii Tim Chase <python.list@tim.thechases.com> - 2013-10-31 06:46 -0500
Re: trying to strip out non ascii.. or rather convert non ascii Terry Reedy <tjreedy@udel.edu> - 2013-10-30 17:56 -0400
Re: trying to strip out non ascii.. or rather convert non ascii Steven D'Aprano <steve@pearwood.info> - 2013-10-31 07:10 +0000
Re: trying to strip out non ascii.. or rather convert non ascii Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-10-31 07:23 +0000
Re: trying to strip out non ascii.. or rather convert non ascii wxjmfauth@gmail.com - 2013-10-31 03:33 -0700
Re: trying to strip out non ascii.. or rather convert non ascii Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-11-01 07:16 +0000
Re: trying to strip out non ascii.. or rather convert non ascii wxjmfauth@gmail.com - 2013-11-01 02:00 -0700
Re: trying to strip out non ascii.. or rather convert non ascii Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-11-01 09:18 +0000
Re: trying to strip out non ascii.. or rather convert non ascii Steven D'Aprano <steve@pearwood.info> - 2013-10-29 05:22 +0000
Re: trying to strip out non ascii.. or rather convert non ascii wxjmfauth@gmail.com - 2013-10-29 08:38 -0700
Re: trying to strip out non ascii.. or rather convert non ascii Tim Chase <python.list@tim.thechases.com> - 2013-10-29 10:52 -0500
Re: trying to strip out non ascii.. or rather convert non ascii wxjmfauth@gmail.com - 2013-10-29 12:16 -0700
Re: trying to strip out non ascii.. or rather convert non ascii Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-10-29 19:54 +0000
Re: trying to strip out non ascii.. or rather convert non ascii Piet van Oostrum <piet@vanoostrum.org> - 2013-10-29 21:33 -0400
Re: trying to strip out non ascii.. or rather convert non ascii Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-10-30 09:19 +0000
Re: trying to strip out non ascii.. or rather convert non ascii Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-10-29 15:56 +0000
Re: trying to strip out non ascii.. or rather convert non ascii Chris Angelico <rosuav@gmail.com> - 2013-10-30 13:17 +1100
Re: trying to strip out non ascii.. or rather convert non ascii wxjmfauth@gmail.com - 2013-10-30 01:13 -0700
Re: trying to strip out non ascii.. or rather convert non ascii Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-10-30 15:25 +0000
csiph-web