Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'digest': 0.04; 'encoding': 0.05; 'differently': 0.07; 'none,': 0.07; 'python3': 0.07; 'utf-8': 0.07; 'clean.': 0.09; 'encode': 0.09; 'function:': 0.09; 'parameter': 0.09; 'scheme.': 0.09; 'skip:g 60': 0.09; 'subject:question': 0.10; 'python': 0.11; 'def': 0.12; 'bug': 0.12; "'replace'": 0.16; 'codec': 0.16; 'encoding.': 0.16; 'mailscanner,': 0.16; 'ordinal': 0.16; 'script,': 0.16; 'skip:b 80': 0.16; 'subject:unicode': 0.16; 'subject:python': 0.16; 'fix': 0.17; 'bit': 0.19; 'things.': 0.19; 'seems': 0.21; 'command': 0.22; '>>>': 0.22; 'this?': 0.23; 'header:User-Agent:1': 0.23; 'error': 0.23; 'byte': 0.24; 'bytes': 0.24; 'handling': 0.26; 'this:': 0.26; 'skip:" 20': 0.27; 'values': 0.27; 'header:In- Reply-To:1': 0.27; 'tried': 0.27; 'function': 0.29; 'skip:p 30': 0.29; 'raise': 0.29; 'scanned': 0.29; 'errors': 0.30; "skip:' 10": 0.31; 'behaving': 0.31; 'sep': 0.31; 'file': 0.32; 'me?': 0.32; 'stuff': 0.32; 'run': 0.32; 'linux': 0.33; '(most': 0.33; 'skip:b 30': 0.33; "can't": 0.35; 'skip:u 20': 0.35; 'test': 0.35; 'but': 0.35; 'believed': 0.36; 'skip:" 50': 0.36; 'method': 0.36; 'possible': 0.36; 'should': 0.36; 'somebody': 0.38; 'handle': 0.38; 'to:addr:python-list': 0.38; 'recent': 0.39; 'to:addr:python.org': 0.39; 'skip:p 20': 0.39; 'called': 0.40; 'skip:u 10': 0.60; 'dangerous': 0.60; 'viruses': 0.61; 'name': 0.63; 'more': 0.64; 'different': 0.65; 'here': 0.66; 'default': 0.69; '3.3.1': 0.84; 'confusing': 0.84; 'here...': 0.84; '2013,': 0.91 Date: Sat, 16 Nov 2013 22:19:31 +0100 From: Laszlo Nagy User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.1.0 MIME-Version: 1.0 To: python-list@python.org Subject: Re: Beginner python 3 unicode question References: <5287C3B2.30807@shopzeus.com> In-Reply-To: <5287C3B2.30807@shopzeus.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-shopzeus-MailScanner-Information: Please contact the ISP for more information X-shopzeus-MailScanner-ID: D3FAE8895C52.AFC0C X-shopzeus-MailScanner: Found to be clean X-shopzeus-MailScanner-From: gandalf@shopzeus.com X-Spam-Status: No X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 76 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1384636777 news.xs4all.nl 15891 [2001:888:2000:d::a6]:45816 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:59656 > Why it is behaving differently on the command line? What should I do > to fix this? > I was experimenting with this a bit more and found some more confusing things. Can somebody please enlight me? Here is a test function: def password_hash(self,password): public = bytearray([random.randint(0,255) for _ in range(5)]) private = bytearray([random.randint(0,255)]) pwd = bytearray(password.encode()) digest = hashlib.sha1(public+pwd+private).digest() print("digest",digest,type(digest)) print("de",digest.encode()) # and some more stuff here... This function was called inside a script, and gave me this: ('digest', '\xa0\x98\x8b\xff\x04\xf9V;\xbd\x1eIHzh\x10-\xc5!\x14\x1b', ) Traceback (most recent call last): File "/home/gandalf/Python/Lib/shopzeus/scripts/yaaf_pwmgr.py", line 478, in pwmgr.run(parser,args) File "/home/gandalf/Python/Lib/shopzeus/scripts/yaaf_pwmgr.py", line 241, in run self.authdb.user_create(name,password,propvalues) File "/home/gandalf/Python/Lib/shopzeus/yaaf/db/authdb.py", line 205, in user_create "password":(password and Binary(self.password_hash(password))) or None, File "/home/gandalf/Python/Lib/shopzeus/yaaf/db/authdb.py", line 134, in password_hash print("de",digest.encode()) UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 0: ordinal not in range(128) Then I have tried the very same thing from the interactive shell: gandalf@gandalf-HP-G62-Notebook-PC:~/Python/Projects/appserver$ python3 Python 3.3.1 (default, Sep 25 2013, 19:29:01) [GCC 4.7.3] on linux Type "help", "copyright", "credits" or "license" for more information. >>> digest = '\xa0\x98\x8b\xff\x04\xf9V;\xbd\x1eIHzh\x10-\xc5!\x14\x1b' >>> digest.encode() b'\xc2\xa0\xc2\x98\xc2\x8b\xc3\xbf\x04\xc3\xb9V;\xc2\xbd\x1eIHzh\x10-\xc3\x85!\x14\x1b' >>> WHAT??? Seems like the default value of the encoding parameter of the str.encode method is different if I start it interactively. But this contradicts its documentation: >>> print(digest.encode.__doc__) S.encode(encoding='utf-8', errors='strict') -> bytes Encode S using the codec registered for encoding. Default encoding is 'utf-8'. errors may be given to set a different error handling scheme. Default is 'strict' meaning that encoding errors raise a UnicodeEncodeError. Other possible values are 'ignore', 'replace' and 'xmlcharrefreplace' as well as any other name registered with codecs.register_error that can handle UnicodeEncodeErrors. So is the default utf-8 or not? Should the documentation be updated? Or do we have a bug in the interactive shell? -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.