Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #59656

Re: Beginner python 3 unicode question

Path csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <gandalf@shopzeus.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.000
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'digest': 0.04; 'encoding': 0.05; 'differently': 0.07; 'none,': 0.07; 'python3': 0.07; 'utf-8': 0.07; 'clean.': 0.09; 'encode': 0.09; 'function:': 0.09; 'parameter': 0.09; 'scheme.': 0.09; 'skip:g 60': 0.09; 'subject:question': 0.10; 'python': 0.11; 'def': 0.12; 'bug': 0.12; "'replace'": 0.16; 'codec': 0.16; 'encoding.': 0.16; 'mailscanner,': 0.16; 'ordinal': 0.16; 'script,': 0.16; 'skip:b 80': 0.16; 'subject:unicode': 0.16; 'subject:python': 0.16; 'fix': 0.17; 'bit': 0.19; 'things.': 0.19; 'seems': 0.21; 'command': 0.22; '>>>': 0.22; 'this?': 0.23; 'header:User-Agent:1': 0.23; 'error': 0.23; 'byte': 0.24; 'bytes': 0.24; 'handling': 0.26; 'this:': 0.26; 'skip:" 20': 0.27; 'values': 0.27; 'header:In- Reply-To:1': 0.27; 'tried': 0.27; 'function': 0.29; 'skip:p 30': 0.29; 'raise': 0.29; 'scanned': 0.29; 'errors': 0.30; "skip:' 10": 0.31; 'behaving': 0.31; 'sep': 0.31; 'file': 0.32; 'me?': 0.32; 'stuff': 0.32; 'run': 0.32; 'linux': 0.33; '(most': 0.33; 'skip:b 30': 0.33; "can't": 0.35; 'skip:u 20': 0.35; 'test': 0.35; 'but': 0.35; 'believed': 0.36; 'skip:" 50': 0.36; 'method': 0.36; 'possible': 0.36; 'should': 0.36; 'somebody': 0.38; 'handle': 0.38; 'to:addr:python-list': 0.38; 'recent': 0.39; 'to:addr:python.org': 0.39; 'skip:p 20': 0.39; 'called': 0.40; 'skip:u 10': 0.60; 'dangerous': 0.60; 'viruses': 0.61; 'name': 0.63; 'more': 0.64; 'different': 0.65; 'here': 0.66; 'default': 0.69; '3.3.1': 0.84; 'confusing': 0.84; 'here...': 0.84; '2013,': 0.91
Date Sat, 16 Nov 2013 22:19:31 +0100
From Laszlo Nagy <gandalf@shopzeus.com>
User-Agent Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.1.0
MIME-Version 1.0
To python-list@python.org
Subject Re: Beginner python 3 unicode question
References <5287C3B2.30807@shopzeus.com>
In-Reply-To <5287C3B2.30807@shopzeus.com>
Content-Type text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding 7bit
X-shopzeus-MailScanner-Information Please contact the ISP for more information
X-shopzeus-MailScanner-ID D3FAE8895C52.AFC0C
X-shopzeus-MailScanner Found to be clean
X-shopzeus-MailScanner-From gandalf@shopzeus.com
X-Spam-Status No
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.2736.1384636777.18130.python-list@python.org> (permalink)
Lines 76
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1384636777 news.xs4all.nl 15891 [2001:888:2000:d::a6]:45816
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:59656

Show key headers only | View raw


> Why it is behaving differently on the command line? What should I do 
> to fix this?
>
I was experimenting with this a bit more and found some more confusing 
things. Can somebody please enlight me?

Here is a test function:


     def password_hash(self,password):
         public = bytearray([random.randint(0,255) for _ in range(5)])
         private = bytearray([random.randint(0,255)])
         pwd = bytearray(password.encode())
         digest = hashlib.sha1(public+pwd+private).digest()
         print("digest",digest,type(digest))
         print("de",digest.encode())
         # and some more stuff here...

This function was called inside a script, and gave me this:

('digest', '\xa0\x98\x8b\xff\x04\xf9V;\xbd\x1eIHzh\x10-\xc5!\x14\x1b', 
<type 'str'>)
Traceback (most recent call last):
   File "/home/gandalf/Python/Lib/shopzeus/scripts/yaaf_pwmgr.py", line 
478, in <module>
     pwmgr.run(parser,args)
   File "/home/gandalf/Python/Lib/shopzeus/scripts/yaaf_pwmgr.py", line 
241, in run
     self.authdb.user_create(name,password,propvalues)
   File "/home/gandalf/Python/Lib/shopzeus/yaaf/db/authdb.py", line 205, 
in user_create
     "password":(password and Binary(self.password_hash(password))) or None,
   File "/home/gandalf/Python/Lib/shopzeus/yaaf/db/authdb.py", line 134, 
in password_hash
     print("de",digest.encode())
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 0: 
ordinal not in range(128)

Then I have tried the very same thing from the interactive shell:

gandalf@gandalf-HP-G62-Notebook-PC:~/Python/Projects/appserver$ python3
Python 3.3.1 (default, Sep 25 2013, 19:29:01)
[GCC 4.7.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
 >>> digest = '\xa0\x98\x8b\xff\x04\xf9V;\xbd\x1eIHzh\x10-\xc5!\x14\x1b'
 >>> digest.encode()
b'\xc2\xa0\xc2\x98\xc2\x8b\xc3\xbf\x04\xc3\xb9V;\xc2\xbd\x1eIHzh\x10-\xc3\x85!\x14\x1b'
 >>>


WHAT??? Seems like the default value of the encoding parameter of the 
str.encode method is different if I start it interactively. But this 
contradicts its documentation:

 >>> print(digest.encode.__doc__)
S.encode(encoding='utf-8', errors='strict') -> bytes

Encode S using the codec registered for encoding. Default encoding
is 'utf-8'. errors may be given to set a different error
handling scheme. Default is 'strict' meaning that encoding errors raise
a UnicodeEncodeError. Other possible values are 'ignore', 'replace' and
'xmlcharrefreplace' as well as any other name registered with
codecs.register_error that can handle UnicodeEncodeErrors.


So is the default utf-8 or not? Should the documentation be updated? Or 
do we have a bug in the interactive shell?



-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Re: Beginner python 3 unicode question Laszlo Nagy <gandalf@shopzeus.com> - 2013-11-16 22:19 +0100

csiph-web