Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #93524
| From | Veek M <vek.m1234@gmail.com> |
|---|---|
| Newsgroups | comp.lang.python |
| Subject | Re: requests.Session() how do you set 'replace' on the encoding? |
| Date | 2015-07-06 15:06 +0530 |
| Organization | Home |
| Message-ID | <mndi4e$kd6$1@dont-email.me> (permalink) |
| References | <mn3oec$7ep$1@dont-email.me> <mailman.266.1435903179.3674.python-list@python.org> |
dieter wrote:
> Veek M <vek.m1234@gmail.com> writes:
>> UnicodeEncodeError: 'gbk' codec can't encode character u'\xa0' in
>> position 8: illegal multibyte sequence
>
> You give us very little context.
It's a longish chunk of code: basically, i'm trying to download using the
'requests.Session' module and that should give me Unicode once it's told
what encoding is being used 'gbk'.
def get_page(s, url):
print(url)
r = s.get(url, headers = {
'User-Agent' : user_agent,
'Keep-Alive' : '3600',
'Connection' : 'keep-alive',
})
s.encoding='gbk'
text = r.text
return text
# Open output file
fh=codecs.open('/tmp/out', 'wb')
fh.write(header)
# Download
s = requests.Session()
------------
If 'text' is NOT proper unicode because the server introduced some junk,
then when i do anchor.getparent() on my 'text' it'll traceback..
ergo the question, how do i set a replacement char within 'requests'
> In general: when you need control over encoding handling because
> deep in a framework an econding causes problems (as apparently in
> your case), you can usually first take the plain text,
> fix any encoding problems and only then pass the fixed text to
> your framework.
>
>> I'm doing:
>> s = requests.Session()
>> to suck data in, so.. how do i 'replace' chars that fit gbk
>
> It does not seem that the problem occurs inside the "requests" module.
> Thus, you have a chance to "intercept" the downloaded text
> and fix encoding problems.
Okay, so i should use the 'raw' method in requests and then clean up the
raw-text and then convert that to unicode.. vs trying to do it using
'requests'? The thing is 'codec's has a xmlcharrefreplace_errors(...) etc so
i figured if output has clean up, input ought to have it :p
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
requests.Session() how do you set 'replace' on the encoding? Veek M <vek.m1234@gmail.com> - 2015-07-02 21:52 +0530
Re: requests.Session() how do you set 'replace' on the encoding? dieter <dieter@handshake.de> - 2015-07-03 07:59 +0200
Re: requests.Session() how do you set 'replace' on the encoding? Veek M <vek.m1234@gmail.com> - 2015-07-06 15:06 +0530
Re: requests.Session() how do you set 'replace' on the encoding? dieter <dieter@handshake.de> - 2015-07-07 07:38 +0200
Re: requests.Session() how do you set 'replace' on the encoding? Veek M <vek.m1234@gmail.com> - 2015-07-09 15:55 +0530
csiph-web