Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #63995
| Subject | Re: 'Straße' ('Strasse') and Python 2 |
|---|---|
| From | Travis Griggs <travisgriggs@gmail.com> |
| Date | 2014-01-15 08:28 -0800 |
| References | <30dfa6f1-61b2-49b8-bc65-5fd18d498c38@googlegroups.com> <52D67873.2010502@chamonix.reportlab.co.uk> <lb5u13$9hs$1@ger.gmane.org> <52D68402.6030407@chamonix.reportlab.co.uk> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.5520.1389803336.18130.python-list@python.org> (permalink) |
On Jan 15, 2014, at 4:50 AM, Robin Becker <robin@reportlab.com> wrote:
> On 15/01/2014 12:13, Ned Batchelder wrote:
> ........
>>> On my utf8 based system
>>>
>>>
>>>> robin@everest ~:
>>>> $ cat ooo.py
>>>> if __name__=='__main__':
>>>> import sys
>>>> s='A̅B'
>>>> print('version_info=%s\nlen(%s)=%d' % (sys.version_info,s,len(s)))
>>>> robin@everest ~:
>>>> $ python ooo.py
>>>> version_info=sys.version_info(major=3, minor=3, micro=3,
>>>> releaselevel='final', serial=0)
>>>> len(A̅B)=3
>>>> robin@everest ~:
>>>> $
>>>
>>>
> ........
>> You are right that more than one codepoint makes up a grapheme, and that you'll
>> need code to deal with the correspondence between them. But let's not muddy
>> these already confusing waters by referring to that mapping as an encoding.
>>
>> In Unicode terms, an encoding is a mapping between codepoints and bytes. Python
>> 3's str is a sequence of codepoints.
>>
> Semantics is everything. For me graphemes are the endpoint (or should be); to get a proper rendering of a sequence of graphemes I can use either a sequence of bytes or a sequence of codepoints. They are both encodings of the graphemes; what unicode says is an encoding doesn't define what encodings are ie mappings from some source alphabet to a target alphabet.
But you’re talking about two levels of encoding. One runs on top of the other. So insisting that you be able to call them all encodings, makes the term pointless, because now it’s ambiguous as to what you’re referring to. Are you referring to encoding in the sense of representing code points with bytes? Or are you referring to what the unicode guys call “forms”?
For example, the NFC form of ‘ñ’ is ’\u00F1’. ‘nThe NFD form represents the exact same grapheme, but is ‘\u006e\u0303’. You can call them encodings if you want, but I echo Ned’s sentiment that you keep that to yourself. Conventionally, they’re different forms, not different encodings. You can encode either form with an encoding, e.g.
'\u00F1'.encode('utf8’)
'\u00F1'.encode('utf16’)
'\u006e\u0303'.encode('utf8’)
'\u006e\u0303'.encode('utf16')
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
'Straße' ('Strasse') and Python 2 wxjmfauth@gmail.com - 2014-01-11 23:50 -0800
Re: 'Straße' ('Strasse') and Python 2 Peter Otten <__peter__@web.de> - 2014-01-12 09:31 +0100
Re: 'Straße' ('Strasse') and Python 2 Stefan Behnel <stefan_ml@behnel.de> - 2014-01-12 10:00 +0100
Re: 'Straße' ('Strasse') and Python 2 Ned Batchelder <ned@nedbatchelder.com> - 2014-01-12 07:17 -0500
Re: 'Straße' ('Strasse') and Python 2 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-01-12 12:33 +0000
Re: 'Straße' ('Strasse') and Python 2 MRAB <python@mrabarnett.plus.com> - 2014-01-12 18:33 +0000
Re: 'Straße' ('Strasse') and Python 2 Thomas Rachel <nutznetz-0c1b6768-bfa9-48d5-a470-7603bd3aa915@spamschutz.glglgl.de> - 2014-01-13 09:27 +0100
Re: 'Straße' ('Strasse') and Python 2 wxjmfauth@gmail.com - 2014-01-13 01:54 -0800
Re: 'Straße' ('Strasse') and Python 2 Chris Angelico <rosuav@gmail.com> - 2014-01-13 21:26 +1100
Re: 'Straße' ('Strasse') and Python 2 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-01-13 10:38 +0000
Re: 'Straße' ('Strasse') and Python 2 Chris Angelico <rosuav@gmail.com> - 2014-01-13 21:57 +1100
Re: 'Straße' ('Strasse') and Python 2 wxjmfauth@gmail.com - 2014-01-13 08:24 -0800
Re: 'Straße' ('Strasse') and Python 2 Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-01-13 17:02 +0000
Re: 'Straße' ('Strasse') and Python 2 Michael Torrie <torriem@gmail.com> - 2014-01-13 08:58 -0700
Re: 'Straße' ('Strasse') and Python 2 Thomas Rachel <nutznetz-0c1b6768-bfa9-48d5-a470-7603bd3aa915@spamschutz.glglgl.de> - 2014-01-13 19:37 +0100
Mistake or Troll (was Re: 'Straße' ('Strasse') and Python 2) Terry Reedy <tjreedy@udel.edu> - 2014-01-13 18:05 -0500
Re: 'Straße' ('Strasse') and Python 2 Robin Becker <robin@reportlab.com> - 2014-01-15 12:00 +0000
Re: 'Straße' ('Strasse') and Python 2 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-01-16 00:43 +0000
Re: 'Straße' ('Strasse') and Python 2 Chris Angelico <rosuav@gmail.com> - 2014-01-16 12:26 +1100
Re: 'Straße' ('Strasse') and Python 2 Ned Batchelder <ned@nedbatchelder.com> - 2014-01-15 07:13 -0500
Re: 'Straße' ('Strasse') and Python 2 wxjmfauth@gmail.com - 2014-01-15 06:55 -0800
Re: 'Straße' ('Strasse') and Python 2 Chris Angelico <rosuav@gmail.com> - 2014-01-16 02:14 +1100
Re: 'Straße' ('Strasse') and Python 2 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-01-16 00:32 +0000
Re: 'Straße' ('Strasse') and Python 2 Robin Becker <robin@reportlab.com> - 2014-01-16 10:51 +0000
Re: 'Straße' ('Strasse') and Python 2 Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-01-16 14:07 +0000
Re: 'Straße' ('Strasse') and Python 2 Tim Chase <python.list@tim.thechases.com> - 2014-01-16 09:24 -0600
Re: 'Straße' ('Strasse') and Python 2 Chris Angelico <rosuav@gmail.com> - 2014-01-16 21:58 +1100
Re: 'StraÃYe' ('Strasse') and Python 2 "Frank Millman" <frank@chagford.com> - 2014-01-16 14:06 +0200
Re: 'StraÃYe' ('Strasse') and Python 2 Robin Becker <robin@reportlab.com> - 2014-01-16 13:03 +0000
Re: 'Straße' ('Strasse') and Python 2 Travis Griggs <travisgriggs@gmail.com> - 2014-01-16 13:30 -0800
Re: 'Straße' ('Strasse') and Python 2 Robin Becker <robin@reportlab.com> - 2014-01-15 12:50 +0000
Re: 'Straße' ('Strasse') and Python 2 Travis Griggs <travisgriggs@gmail.com> - 2014-01-15 08:28 -0800
Re: 'Straße' ('Strasse') and Python 2 Robin Becker <robin@reportlab.com> - 2014-01-15 16:55 +0000
Re: 'Straße' ('Strasse') and Python 2 Chris Angelico <rosuav@gmail.com> - 2014-01-16 04:14 +1100
Re: 'Straße' ('Strasse') and Python 2 Robin Becker <robin@reportlab.com> - 2014-01-15 17:28 +0000
Re: 'Straße' ('Strasse') and Python 2 Ian Kelly <ian.g.kelly@gmail.com> - 2014-01-15 11:32 -0700
Re: 'Straße' ('Strasse') and Python 2 Terry Reedy <tjreedy@udel.edu> - 2014-01-15 19:27 -0500
csiph-web