Path: csiph.com!eternal-september.org!feeder.eternal-september.org!border1.nntp.ams1.giganews.com!nntp.giganews.com!newsfeed.xs4all.nl!newsfeed8.news.xs4all.nl!nzpost1.xs4all.net!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'received:209.85.223': 0.03; 'subject:Python': 0.05; 'defaults': 0.05; 'f.close()': 0.07; 'interpreted': 0.07; 'run,': 0.07; 'see.': 0.07; "'\\n')": 0.09; 'dict': 0.09; 'encode': 0.09; 'modes': 0.09; 'non-ascii': 0.09; 'subject:characters': 0.09; 'subject:script': 0.09; 'python': 0.10; 'argument': 0.15; 'encoding': 0.15; 'interpreter': 0.15; 'subject: \n ': 0.15; "'a')": 0.16; '(via': 0.16; 'ascii,': 0.16; 'codec': 0.16; 'function).': 0.16; 'h.close()': 0.16; 'ordinal': 0.16; 'subject:non': 0.16; 'subject:run': 0.16; 'wsgi': 0.16; 'wrote:': 0.16; 'string': 0.17; 'bytes': 0.18; 'skip:` 20': 0.18; '>>>': 0.20; 'all,': 0.20; '2015': 0.20; 'aug': 0.20; 'to:name :python-list@python.org': 0.20; 'skip:= 20': 0.22; 'trying': 0.22; 'advance.': 0.23; 'header:In-Reply-To:1': 0.24; 'example': 0.26; 'linux': 0.26; 'error': 0.27; 'message-id:@mail.gmail.com': 0.27; 'looks': 0.29; 'tail': 0.29; 'character': 0.29; 'code': 0.30; 'skip:g 30': 0.30; "i'd": 0.31; 'error.': 0.31; 'probably': 0.31; "can't": 0.32; 'run': 0.33; 'extract': 0.33; 'open': 0.33; 'similar': 0.33; 'tue,': 0.34; 'file': 0.34; 'received:google.com': 0.35; 'skip:e 40': 0.35; 'something': 0.35; 'there': 0.36; 'received:209.85': 0.36; 'mode': 0.36; 'to:addr :python-list': 0.36; 'pm,': 0.36; 'subject:: ': 0.37; 'difference': 0.38; 'version': 0.38; 'received:209': 0.38; 'log': 0.38; 'someone': 0.38; 'thank': 0.38; 'application': 0.39; 'subject:-': 0.39; 'to:addr:python.org': 0.40; 'some': 0.40; 'behavior': 0.61; "you'll": 0.61; 'skip:u 10': 0.61; 'more': 0.63; 'within': 0.64; 'hours': 0.65; 'between': 0.65; 'dear': 0.67; 'subject.': 0.72; '15:': 0.84 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type:content-transfer-encoding; bh=hCjCWi89YIgxMztcFDGPADaK0Xf9G+kMrBe6zcZO1V8=; b=U2Y6k7sWpEfwQ/4AR6ziPZ22ttvlGlHTkuKS6UpDbcDBpzv6FYBGGtxEszR6NjxQ5j Ca2VzTs8HkJBRFhuulOcNhnHd6SBFG8EYfNnqqFi995xbncNmUvbmWtLd8GDbgg9lWun +vRxei+0OC4rlZQNZG8f7OqLkivo6BJOLb+Kavc8rc05mVxXZeD0wJS/+wnZNIXDYl8d JZ5jwUqPi+bol8DT3vX95wbRf8KklGsdDbBZsTKepBKZR8onRK2ja8YQ0Focrokb+exS EXjApVMlpIf98nMyHGcENzpZ5Tty1RkY9QvZnHHCxa0vzh/d+oo4VTWIzKi8UZyBdfic 0Ctw== X-Gm-Message-State: ALoCoQnjKKDj/gt0aSun4g/GP33jkpyzOfznWSDewBdjBu4VlHu8i7wua3Nmzgq4N09f9pPoPGnG X-Received: by 10.107.8.96 with SMTP id 93mr26772467ioi.176.1440538135729; Tue, 25 Aug 2015 14:28:55 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: From: Chris Kaynor Date: Tue, 25 Aug 2015 14:28:36 -0700 Subject: Re: file.write() of non-ASCII characters differs in Interpreted Python than in script run To: "python-list@python.org" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.20+ Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 83 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1440538145 news.xs4all.nl 23856 [2001:888:2000:d::a6]:50913 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:95641 On Tue, Aug 25, 2015 at 2:19 PM, RAH wrote: > Dear All, > > I experienced an incomprehensible behavior (I've spent already many hours= on this subject): the `file.write('string')` provides an error in run mode= and not when interpreted at the console. The string must contain non-ASCII= characters. If all ASCII, there is no error. > > The following example shows what I can see. I must overlook something bec= ause I cannot think Python makes a difference between interpreted and run m= odes and yet ... Can someone please check that subject. > > Thank you in advance. > Ren=C3=A9 > > Code extract from WSGI application (reply.py) > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > request_body =3D environ['wsgi.input'].read(request_body_size) # b= ytes > rb =3D request_body.decode() # s= tring > d =3D parse_qs(rb) # d= ict > > f =3D open('logbytes', 'ab') > g =3D open('logstr', 'a') > h =3D open('logdict', 'a') > > f.write(request_body) > g.write(str(type(request_body)) + '\t' + str(type(rb)) + '\t' + str(t= ype(d)) + '\n') > h.write(str(d) + '\n') <--- line 28 of the application > > h.close() > g.close() > f.close() > > > Tail of Apache2 error.log > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D > > [Tue Aug 25 20:24:04.657933 2015] [wsgi:error] [pid 3677:tid 3029764928] = [remote 192.168.1.5:27575] File "reply.py", line 28, in application > [Tue Aug 25 20:24:04.658001 2015] [wsgi:error] [pid 3677:tid 3029764928] = [remote 192.168.1.5:27575] h.write(str(d) + '\\n') > [Tue Aug 25 20:24:04.658201 2015] [wsgi:error] [pid 3677:tid 3029764928] = [remote 192.168.1.5:27575] UnicodeEncodeError: 'ascii' codec can't encode c= haracter '\\xc7' in position 15: ordinal not in range(128) > What version of Python is Apache2 using? From the looks of the error, it is probably using some version of Python2, in which case you'll need to manually encode the string and pick an encoding for the file (via an encoding argument to the open function). I'd recommend using UTF-8. You can log out the value of sys.version to find out the version number. > Trying similar code within the Python interpreter > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > rse@Alibaba:~/test$ python > Python 3.4.0 (default, Jun 19 2015, 14:18:46) > [GCC 4.8.2] on linux > Type "help", "copyright", "credits" or "license" for more information. >>>> di =3D {'userName': ['=C3=87a va !']} <--- A dictionary >>>> str(di) > "{'userName': ['=C3=87a va !']}" <--- and its string representa= tion >>>> type(str(di)) > <--- Is a string indeed >>>> fi =3D open('essai', 'a') >>>> fi.write(str(di) + '\n') > 26 <--- It works well >>>> fi.close() >>>> In this run, you are using Python 3.4, which defaults to UTF-8.