Path: csiph.com!usenet.pasdenom.info!news.etla.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed3a.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.002 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'from:addr:yahoo.co.uk': 0.04; 'say,': 0.05; 'string': 0.09; 'janvier': 0.09; 'lawrence': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'def': 0.12; 'language.': 0.14; '"is': 0.16; 'cp1252': 0.16; 'handled.': 0.16; 'received:80.91.229.3': 0.16; 'received:plane.gmane.org': 0.16; 'stuff.': 0.16; 'subject:unicode': 0.16; 'language': 0.16; 'wrote:': 0.18; '>>>': 0.22; 'example': 0.22; 'header:User- Agent:1': 0.23; 'simpler': 0.24; 'unicode': 0.24; "i've": 0.25; 'shown': 0.26; 'task': 0.26; 'skip:" 20': 0.27; 'header:X -Complaints-To:1': 0.27; 'header:In-Reply-To:1': 0.27; 'characters': 0.30; 'see,': 0.30; "i'm": 0.30; 'asked': 0.31; '>>>>': 0.31; 'assert': 0.31; 'lies': 0.31; "skip:' 40": 0.31; 'txt': 0.31; 'with,': 0.31; "i'd": 0.34; 'could': 0.34; 'something': 0.35; 'really': 0.36; 'done': 0.36; 'example,': 0.37; 'being': 0.38; 'represent': 0.38; 'question,': 0.38; 'handle': 0.38; 'to:addr:python-list': 0.38; 'to:addr:python.org': 0.39; 'received:org': 0.40; 'how': 0.40; 'even': 0.60; 'simply': 0.61; 'simple': 0.61; "you'll": 0.62; 'email addr:gmail.com': 0.63; 'skip:n 10': 0.64; 'our': 0.64; 'interest': 0.64; 'more': 0.64; 'musical': 0.84; '"how': 0.91 X-Injected-Via-Gmane: http://gmane.org/ To: python-list@python.org From: Mark Lawrence Subject: Re: awesome slugify and unicode Date: Thu, 23 Jan 2014 13:18:08 +0000 References: <59461353-d450-471f-ae28-84f686543a57@googlegroups.com> <1e0aeb33-23b6-4af1-8447-38dc4473d09f@googlegroups.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Gmane-NNTP-Posting-Host: host-78-147-184-11.as13285.net User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:24.0) Gecko/20100101 Thunderbird/24.2.0 In-Reply-To: <1e0aeb33-23b6-4af1-8447-38dc4473d09f@googlegroups.com> X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 109 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1390483089 news.xs4all.nl 2845 [2001:888:2000:d::a6]:34937 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:64602 On 23/01/2014 10:41, wxjmfauth@gmail.com wrote: > Le jeudi 23 janvier 2014 10:14:48 UTC+1, Mark Lawrence a écrit : >> On 23/01/2014 07:18, wxjmfauth@gmail.com wrote: >> >>> Le mercredi 22 janvier 2014 20:23:55 UTC+1, Mark Lawrence a écrit : >> >>>> I thought this blog might interest some of you >> >>>> >> >>>> http://pydanny.com/awesome-slugify-human-readable-url-slugs-from-any-string.html >> >>>> >> >>>> My fellow Pythonistas, ask not what our language can do for you, ask >> >>>> >> >>>> what you can do for our language. >> >>>> >> >>> >> >>> This is not "unicode", only string manipulations. >> >>> The same work could be done with, let say, cp1252. >> >>> The difference lies in the repertoires of characters >> >>> to be handled. >> >>> >> >>> A better way is to work with normalization() and/or >> >>> with methods like .translate() with dedicated >> >>> tables; the hard task being the creation of these tables. >> >>> >> >>> Shortly, very naive. >> >>> >> >>> jmf >> >>> >> >> >> >> You'll have to excuse my ignorance of this stuff. How do I express the >> >> following in cp1252? >> >> >> >> def test_musical_notes(): >> >> txt = "Is ♬ ♫ ♪ ♩ a melody or just noise?" >> >> assert slugify(txt) == "Is-a-melody-or-just-noise" >> >> assert slugify_unicode(txt) == "Is-a-melody-or-just-noise" >> >> >> >> -- >> >> My fellow Pythonistas, ask not what our language can do for you, ask >> >> what you can do for our language. >> >> > > I wrote: The same work could be done with, let say, cp1252. > Understand: The same work (string manipulation) ... > Would something like this not be more informative? > >>>> "Is ♬ ♫ ♪ ♩ a melody or just noise?".encode('ascii', 'replace').decode('ascii') > 'Is ? ? ? ? a melody or just noise?' >>>> >>>> > > cp1252 analogy. > >>>> 'abc€€€'.encode('cp1252').decode('ascii', 'replace').encode('ascii', 'replace').decode('ascii') > 'abc???' >>>> > > Again, not a "unicode" question, more "how to handle strings in a judicious way?" > > jmf > Now I'm really confused. I thought that the musical notes I've shown above are represented as unicode characters. So I'd like to see how you jmf would represent them in cp1252. Instead you give me an example showing a simple string manipulation which simply strips the characters that I want to see, then an even simpler example, clearly not what I've asked for. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence