Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #93548
| Path | csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!feeds.phibee-telecom.net!newsfeed.xs4all.nl!newsfeed7.news.xs4all.nl!post.news.xs4all.nl!not-for-mail |
|---|---|
| Return-Path | <python-python-list@m.gmane.org> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.004 |
| X-Spam-Evidence | '*H*': 0.99; '*S*': 0.00; '(of': 0.07; '------------': 0.07; 'chunk': 0.07; "subject:' ": 0.07; 'encode': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'subject:set': 0.09; 'output': 0.13; 'def': 0.13; 'encoding': 0.15; 'headers': 0.15; 'codec': 0.16; 'received:80.91.229.3': 0.16; 'received:dip0.t-ipconnect.de': 0.16; 'received:plane.gmane.org': 0.16; 'received:t-ipconnect.de': 0.16; 'wrote:': 0.16; 'bytes': 0.18; 'char': 0.18; '>>>': 0.20; 'context.': 0.22; 'trying': 0.22; 'seems': 0.23; '(this': 0.24; 'module': 0.25; 'header:User- Agent:1': 0.26; 'header:X-Complaints-To:1': 0.26; 'skip:" 20': 0.26; 'error': 0.27; 'handling': 0.27; 'sequence': 0.27; "skip:' 10": 0.28; 'looks': 0.29; 'character': 0.29; 'code:': 0.29; "i'm": 0.30; '(including': 0.30; 'work.': 0.30; 'code': 0.30; "can't": 0.32; 'related': 0.32; 'options': 0.33; 'open': 0.33; 'file': 0.34; 'server': 0.34; 'text': 0.35; 'maps': 0.35; 'question,': 0.35; 'unicode': 0.35; 'knowledge': 0.35; 'supports': 0.35; 'should': 0.36; 'there': 0.36; 'to:addr:python-list': 0.36; 'subject:?': 0.36; 'subject:: ': 0.37; 'being': 0.37; 'received:org': 0.37; 'charset:us-ascii': 0.37; 'why': 0.39; 'does': 0.39; 'subject:the': 0.39; 'to:addr:python.org': 0.40; 'where': 0.40; 'received:de': 0.40; 'called': 0.40; 'some': 0.40; 'him': 0.60; 'care': 0.60; 'your': 0.60; 'determine': 0.61; 'skip:u 10': 0.61; 'avoid': 0.61; 'strange': 0.63; 'within': 0.64; 'contact': 0.66; 'received:217': 0.66; 'facilities': 0.72; 'you:': 0.79; 'strategy.': 0.84; 'way)': 0.84; 'subject:you': 0.85 |
| X-Injected-Via-Gmane | http://gmane.org/ |
| To | python-list@python.org |
| From | dieter <dieter@handshake.de> |
| Subject | Re: requests.Session() how do you set 'replace' on the encoding? |
| Date | Tue, 07 Jul 2015 07:38:19 +0200 |
| References | <mn3oec$7ep$1@dont-email.me> <mailman.266.1435903179.3674.python-list@python.org> <mndi4e$kd6$1@dont-email.me> |
| Mime-Version | 1.0 |
| Content-Type | text/plain; charset=us-ascii |
| X-Gmane-NNTP-Posting-Host | pd9e0a7fc.dip0.t-ipconnect.de |
| User-Agent | Gnus/5.1008 (Gnus v5.10.8) XEmacs/21.4.22 (linux) |
| Cancel-Lock | sha1:ZSEMtqL30q5a1hWbt4abONLMG8I= |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.20+ |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list/> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.337.1436247510.3674.python-list@python.org> (permalink) |
| Lines | 59 |
| NNTP-Posting-Host | 2001:888:2000:d::a6 |
| X-Trace | 1436247510 news.xs4all.nl 2900 [2001:888:2000:d::a6]:57843 |
| X-Complaints-To | abuse@xs4all.nl |
| Xref | csiph.com comp.lang.python:93548 |
Show key headers only | View raw
Veek M <vek.m1234@gmail.com> writes:
> dieter wrote:
>
>> Veek M <vek.m1234@gmail.com> writes:
>>> UnicodeEncodeError: 'gbk' codec can't encode character u'\xa0' in
>>> position 8: illegal multibyte sequence
>>
>> You give us very little context.
>
> It's a longish chunk of code: basically, i'm trying to download using the
> 'requests.Session' module and that should give me Unicode once it's told
> what encoding is being used 'gbk'.
>
> def get_page(s, url):
> print(url)
> r = s.get(url, headers = {
> 'User-Agent' : user_agent,
> 'Keep-Alive' : '3600',
> 'Connection' : 'keep-alive',
> })
> s.encoding='gbk'
It looks strange that you can set "s.encoding" after you have
called "s.get" - but, as you apparently get an error related to
the "gbk" encoding, it seems to work.
> text = r.text
> return text
>
> # Open output file
> fh=codecs.open('/tmp/out', 'wb')
> fh.write(header)
>
> # Download
> s = requests.Session()
> ------------
>
> If 'text' is NOT proper unicode because the server introduced some junk,
> then when i do anchor.getparent() on my 'text' it'll traceback..
> ergo the question, how do i set a replacement char within 'requests'
I see the following options for you:
* you look at the code (of "requests.Session"),
determine where the "s.encoding" is taken care of and
look around whether there it also support a replacement strategy.
Then, you use this knowledge to set up your replacement.
* you avoid the "unicode" translating functionality of
"requests.Session". If it does not immediately supports this,
you can trick it using the "iso-8859-1" encoding (this maps
bytes to the first 256 unicode codepoints in a one-to-one way)
and then do the unicode handling in your own code -- with
facilities you already know of (including replacement)
* you contact the website administrator and ask him why
the delivered pages do not contain valid "gbk" content.
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
requests.Session() how do you set 'replace' on the encoding? Veek M <vek.m1234@gmail.com> - 2015-07-02 21:52 +0530
Re: requests.Session() how do you set 'replace' on the encoding? dieter <dieter@handshake.de> - 2015-07-03 07:59 +0200
Re: requests.Session() how do you set 'replace' on the encoding? Veek M <vek.m1234@gmail.com> - 2015-07-06 15:06 +0530
Re: requests.Session() how do you set 'replace' on the encoding? dieter <dieter@handshake.de> - 2015-07-07 07:38 +0200
Re: requests.Session() how do you set 'replace' on the encoding? Veek M <vek.m1234@gmail.com> - 2015-07-09 15:55 +0530
csiph-web