Path: csiph.com!usenet.pasdenom.info!aioe.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.003 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'except:': 0.07; 'try:': 0.07; 'utf-8': 0.07; 'encode': 0.09; 'mode,': 0.09; 'open()': 0.09; 'portable': 0.09; 'subject:script': 0.09; 'to:addr:comp.lang.python': 0.09; 'work"': 0.09; 'cc:addr:python- list': 0.10; 'def': 0.10; 'subject:not': 0.11; 'encoding': 0.15; '"a")': 0.16; '#test': 0.16; 'codec': 0.16; 'eclipse': 0.16; 'encodings': 0.16; 'subject:when': 0.16; 'string': 0.17; 'specify': 0.17; 'changes': 0.20; 'skip:" 30': 0.20; 'written': 0.20; 'import': 0.21; 'default,': 0.22; 'cc:2**0': 0.23; 'cc:no real name:2**0': 0.24; 'cc:addr:python.org': 0.25; 'header:In- Reply-To:1': 0.25; 'header:User-Agent:1': 0.26; 'received:209.85.212': 0.28; 'fine': 0.28; 'prints': 0.29; 'skip:k 30': 0.29; 'character': 0.29; 'error': 0.30; 'code': 0.31; 'file': 0.32; 'could': 0.32; 'print': 0.32; 'problem': 0.33; "can't": 0.34; 'changed': 0.34; 'received:google.com': 0.34; 'data,': 0.35; 'doing': 0.35; 'received:209.85': 0.35; 'something': 0.35; 'skip:u 20': 0.36; 'but': 0.36; 'flow': 0.36; 'does': 0.37; 'uses': 0.37; 'received:209': 0.37; 'subject:: ': 0.38; 'behind': 0.38; 'mean': 0.38; 'skip:o 20': 0.38; 'some': 0.38; 'sure': 0.38; 'page': 0.38; 'instead': 0.39; 'help': 0.40; 'skip:u 10': 0.60; 'containing': 0.61; 'here:': 0.62; 'letters': 0.62; 'card': 0.62; 'dont': 0.64; 'here': 0.65; 'webpage': 0.65; 'skip:c 50': 0.66; 'now:': 0.71; '.....': 0.75; 'subject:running': 0.84; 'terrible': 0.84; 'scenes': 0.91; 'scraping': 0.91 X-Received: by 10.49.38.194 with SMTP id i2mr1286960qek.30.1360682940911; Tue, 12 Feb 2013 07:29:00 -0800 (PST) Newsgroups: comp.lang.python Date: Tue, 12 Feb 2013 07:29:00 -0800 (PST) In-Reply-To: Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=46.9.253.222; posting-account=rbrw_goAAADkxBdp_kDLn3mjmxW9-buk References: <650d144e-da3d-4ca7-ad3a-49f44ce9cbaa@googlegroups.com> <0d6d513d-fa12-4d51-a33d-7bb38f1ee6b2@googlegroups.com> User-Agent: G2/1.0 X-Google-Web-Client: true X-Google-IP: 46.9.253.222 MIME-Version: 1.0 Subject: Re: UnicodeEncodeError when not running script from IDE From: Magnus Pettersson To: comp.lang.python@googlegroups.com Content-Type: text/plain; charset=ISO-8859-1 Cc: python-list@python.org X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Message-ID: Lines: 72 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1360682950 news.xs4all.nl 6935 [2001:888:2000:d::a6]:36347 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:38761 > Are you sure you are writing the same data? That would mean that pydev > > changes the default encoding -- which is evil. > > > > A portable approach would be to use codecs.open() or io.open() instead of > > the built-in: > > > > import io > > with io.open(filepath, "a") as f: > > ... > > > > io.open() uses UTF-8 by default, but you can specify other encodings with > > io.open(filepath, mode, encoding=whatever). Interesting. Pydev must be doing something behind the scenes because when i changed open() to io.open() i get error inside of eclipse now: f.write(card+"\n") File "C:\python27\lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character u'\u53c8' in position 32: character maps to .... io.open(filepath, "a", encoding="UTF-8") as f: Then it works in eclipse. But I seem to be having an encoding problem all over the place that works in eclipse but dosnt work outside of eclipse pydev. Here is the flow of my data, im terrible at using unicode/encode/decode so could use some help here: kanji_anki_gui.py: def on_addButton_clicked(self): #code # self.kanji.text() comes from a kanji letter written into a pyqt4 QLineEdit kanji = unicode(self.kanji.text()) card = kanji_anki.scrapeKanji(kanji,tags) #more code kanji_anki.py: def scrapeKanji(kanji, tags="", onlymeaning=False): baseurl = unicode("http://www.romajidesu.com/kanji/") url = unicode(baseurl+kanji) #test to write out url to disk, works outside of eclipse now savefile([url]) #getting webpage works fine in eclipse, prints "Oh no..." in terminal try: page = urllib2.urlopen(url) except: print "OH no website dont work" return None #Code that does some scraping and returns a string containing kanji letters return card def savefile(cardlist,filepath="D:/iknow_kanji.txt"): with io.open(filepath, "a") as f: for card in cardlist: f.write(card+"\n") return True