Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #38784
| Path | csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!eu.feeder.erje.net!xlned.com!feeder7.xlned.com!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail |
|---|---|
| Return-Path | <davea@davea.name> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.003 |
| X-Spam-Evidence | '*H*': 0.99; '*S*': 0.00; 'ascii': 0.07; 'bits': 0.07; 'bytes.': 0.07; 'character,': 0.07; 'interpreted': 0.07; 'utf-8': 0.07; 'bits.': 0.09; 'defined.': 0.09; 'encode': 0.09; 'subject:script': 0.09; 'subset': 0.09; 'subject:not': 0.11; 'encoding': 0.15; 'file,': 0.15; '#this': 0.16; 'ascii,': 0.16; 'decoding': 0.16; 'encodings': 0.16; 'subject:when': 0.16; 'unicode?': 0.16; 'wrote:': 0.17; 'bytes': 0.17; 'skip:u 30': 0.17; 'unicode': 0.17; 'examples': 0.18; 'windows': 0.19; 'variable': 0.20; 'bit': 0.21; 'error.': 0.21; 'work.': 0.23; 'nearly': 0.23; 'external': 0.24; 'header:In-Reply-To:1': 0.25; 'header:User-Agent:1': 0.26; 'skip:" 20': 0.26; 'guess': 0.27; 'functions.': 0.27; 'character.': 0.29; 'character': 0.29; 'this.': 0.29; "i'm": 0.29; 'function': 0.30; 'error': 0.30; 'code': 0.31; 'gets': 0.32; 'file': 0.32; 'certain': 0.33; 'to:addr:python-list': 0.33; "can't": 0.34; 'pm,': 0.35; 'sometimes': 0.35; 'there': 0.35; 'but': 0.36; 'characters': 0.36; 'test': 0.36; 'possible': 0.37; 'does': 0.37; 'two': 0.37; 'uses': 0.37; 'why': 0.37; 'subject:: ': 0.38; 'page': 0.38; 'to:addr:python.org': 0.39; 'received:192': 0.39; 'notice': 0.39; 'called': 0.39; 'received:192.168': 0.40; 'skip:u 10': 0.60; 'most': 0.61; "you'll": 0.62; 'back': 0.62; 'more': 0.63; 'show': 0.63; 'dont': 0.64; 'talking': 0.66; 'received:74.208': 0.71; 'million': 0.72; 'directly.': 0.78; '128,': 0.84; 'different.': 0.84; 'fortunately': 0.84; 'received:74.208.4.194': 0.84; 'subject:running': 0.84; 'device,': 0.91 |
| Date | Tue, 12 Feb 2013 15:51:22 -0500 |
| From | Dave Angel <davea@davea.name> |
| User-Agent | Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130106 Thunderbird/17.0.2 |
| MIME-Version | 1.0 |
| To | python-list@python.org |
| Subject | Re: UnicodeEncodeError when not running script from IDE |
| References | <650d144e-da3d-4ca7-ad3a-49f44ce9cbaa@googlegroups.com> <mailman.1696.1360666894.2939.python-list@python.org> <0d6d513d-fa12-4d51-a33d-7bb38f1ee6b2@googlegroups.com> <mailman.1700.1360680572.2939.python-list@python.org> <780d353a-de5c-4d04-8f51-11d81802351b@googlegroups.com> <mailman.1711.1360684727.2939.python-list@python.org> <a80a49be-b3c4-4549-bf94-523605dbbeec@googlegroups.com> |
| In-Reply-To | <a80a49be-b3c4-4549-bf94-523605dbbeec@googlegroups.com> |
| Content-Type | text/plain; charset=UTF-8; format=flowed |
| Content-Transfer-Encoding | 8bit |
| X-Provags-ID | V02:K0:TGjUjmqw89+F58wml9OnklNI/W7SX87zIlsrZySs9aJ FsWOBBs0z2KYBcKUsNeoPTmJJj+sxDcJ8ckTRL923ZsiIhhDkY jk1YT9tOBzh0EqIJanc0gpQ6EzPrmUwWSvvgDRQ58FqsmNlj1x TnYfS2Gdk77icvoQYtz0nC9TtQgLJ/A8DKVsTJ8b9gm4T7aoP3 lZwc5gz05vtfgNm/K+DzhJWLZTvgHvYPkMN/nYOZ6aS1rNncKy sD8+8Mk2eMSi+wVueqPA1sCP1pY84obna4NNjxL0odWl8MuuqW HDYyWnpIQIRyFJ23+307tnyrD6c1r5j92CTBB2+ZQKlm7m3cQ= = |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.15 |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list/> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.1725.1360702303.2939.python-list@python.org> (permalink) |
| Lines | 48 |
| NNTP-Posting-Host | 2001:888:2000:d::a6 |
| X-Trace | 1360702303 news.xs4all.nl 6963 [2001:888:2000:d::a6]:49800 |
| X-Complaints-To | abuse@xs4all.nl |
| Xref | csiph.com comp.lang.python:38784 |
Show key headers only | View raw
On 02/12/2013 12:12 PM, Magnus Pettersson wrote:
>> < snip >
>>
> #Here kanji = u"私"
> baseurl = u"http://www.romajidesu.com/kanji/"
> url = baseurl+kanji
> savefile([url]) #this test works now. uses: io.open(filepath, "a",encoding="UTF-8") as f:
> # This made the fetching of the website work.
You don't show the code that actually does the io.open(), nor the
url.encode, so I'm not going to guess what you're actually doing.
> Why did i have to write url.encode("UTF-8") when url already is unicode? I feel i dont have a good understanding of this.
> page = urllib2.urlopen(url.encode("UTF-8"))
utf-8 is NOT unicode; they are entirely different. Unicode is
conceptually 32 bits per character, and is an internal representation.
There are a million or so characters defined. Nearly always when you're
talking to an external device, you need bytes. Since you can't cram 32
bits into 8, you have to encode it. Examples of devices would be any
file, or the console. Notice that sometimes you can use unicode
directly for certain functions. For example, the Windows file name is
composed of Unicode characters, so Windows has function calls that
accept Unicode directly. But back to 8 bits:
One encoding is called ASCII, which is simply the bottommost 7 bits.
But of course it gets an error if there are any characters above 127.
Other encodings try to pick an 8 bit subset of the million possible
characters. Again, if you happen to have a character that's not in that
subset, you'll get an error.
There are also other encodings which are hard to describe, but
fortunately pretty rare these days.
Then there's utf-8, which uses a variable length bunch of bytes for
each character. It's designed to use the ASCII encoding for characters
which are below 128, but uses two or more bytes for all the other
characters. So it works out well when most characters happen to be ASCII.
Once encoded, a stream of bytes can only be successfully interpreted if
you use the same decoding when processing them.
--
DaveA
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
UnicodeEncodeError when not running script from IDE Magnus Pettersson <magpettersson@gmail.com> - 2013-02-12 02:43 -0800
Re: UnicodeEncodeError when not running script from IDE Andrew Berg <bahamutzero8825@gmail.com> - 2013-02-12 05:01 -0600
Re: UnicodeEncodeError when not running script from IDE Magnus Pettersson <magpettersson@gmail.com> - 2013-02-12 06:24 -0800
Re: UnicodeEncodeError when not running script from IDE Peter Otten <__peter__@web.de> - 2013-02-12 15:49 +0100
Re: UnicodeEncodeError when not running script from IDE Magnus Pettersson <magpettersson@gmail.com> - 2013-02-12 07:29 -0800
Re: UnicodeEncodeError when not running script from IDE Peter Otten <__peter__@web.de> - 2013-02-12 16:48 +0100
Re: UnicodeEncodeError when not running script from IDE Dave Angel <davea@davea.name> - 2013-02-12 10:58 -0500
Re: UnicodeEncodeError when not running script from IDE Magnus Pettersson <magpettersson@gmail.com> - 2013-02-12 09:12 -0800
Re: UnicodeEncodeError when not running script from IDE Fabio Zadrozny <fabiofz@gmail.com> - 2013-02-12 18:04 -0200
Re: UnicodeEncodeError when not running script from IDE Dave Angel <davea@davea.name> - 2013-02-12 15:51 -0500
Re: UnicodeEncodeError when not running script from IDE Magnus Pettersson <magpettersson@gmail.com> - 2013-02-12 16:20 -0800
Re: UnicodeEncodeError when not running script from IDE Dave Angel <davea@davea.name> - 2013-02-12 22:51 -0500
Re: UnicodeEncodeError when not running script from IDE Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-02-13 11:21 +1100
Re: UnicodeEncodeError when not running script from IDE Magnus Pettersson <magpettersson@gmail.com> - 2013-02-12 16:40 -0800
Re: UnicodeEncodeError when not running script from IDE Magnus Pettersson <magpettersson@gmail.com> - 2013-02-12 07:29 -0800
Re: UnicodeEncodeError when not running script from IDE MRAB <python@mrabarnett.plus.com> - 2013-02-12 21:03 +0000
Re: UnicodeEncodeError when not running script from IDE Magnus Pettersson <magpettersson@gmail.com> - 2013-02-12 06:24 -0800
Re: UnicodeEncodeError when not running script from IDE Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-02-12 22:43 +1100
Re: UnicodeEncodeError when not running script from IDE Magnus Pettersson <magpettersson@gmail.com> - 2013-02-12 04:34 -0800
Re: UnicodeEncodeError when not running script from IDE Terry Reedy <tjreedy@udel.edu> - 2013-02-12 11:07 -0500
csiph-web