Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #106612 > unrolled thread
| Started by | Jussi Piitulainen <jussi.piitulainen@helsinki.fi> |
|---|---|
| First post | 2016-04-07 08:01 +0300 |
| Last post | 2016-04-08 08:02 +0300 |
| Articles | 2 — 1 participant |
Back to article view | Back to comp.lang.python
This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by
below is the oldest one visible, not the original post.
Re: Joining Strings Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-04-07 08:01 +0300
Re: Joining Strings Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-04-08 08:02 +0300
| From | Jussi Piitulainen <jussi.piitulainen@helsinki.fi> |
|---|---|
| Date | 2016-04-07 08:01 +0300 |
| Subject | Re: Joining Strings |
| Message-ID | <lf5oa9mt3kv.fsf@ling.helsinki.fi> |
Emeka writes:
> Hello All,
>
> import urllib.request
> import re
>
> url = 'https://www.everyday.com/
>
>
>
> req = urllib.request.Request(url)
> resp = urllib.request.urlopen(req)
> respData = resp.read()
>
>
> paragraphs = re.findall(r'\[(.*?)\]',str(respData))
> for eachP in paragraphs:
> print("".join(eachP.split(',')[1:-2]))
> print("\n")
>
>
>
> I got the below:
> "Coke - Yala Market Branch""NO. 113 IKU BAKR WAY YALA"""
> But what I need is
>
> 'Coke - Yala Market Branch NO. 113 IKU BAKR WAY YALA'
>
> How to I achieve the above?
A couple of things you could do to understand your problem and work
around it: Change your code to print(eachP). Change your "".join to
"!".join to see where the commas were. Experiment with data of that form
in the REPL. Sometimes it's good to print repr(datum) instead of datum,
though not in this case.
But are you trying to extract and parse paragraphs from a JSON response?
Do not use regex for that at all. Use json.load or json.loads to parse
it properly, and access the relevant data by indexing:
x = json.loads('{"foo":[["Weather Forecast","It\'s Rain"],[]]}')
x ==> {'foo': [['Weather Forecast', "It's Rain"], []]}
x['foo'] ==> [['Weather Forecast', "It's Rain"], []]
x['foo'][0] ==> ['Weather Forecast', "It's Rain"]
[toc] | [next] | [standalone]
| From | Jussi Piitulainen <jussi.piitulainen@helsinki.fi> |
|---|---|
| Date | 2016-04-08 08:02 +0300 |
| Message-ID | <lf54mbc3d77.fsf@ling.helsinki.fi> |
| In reply to | #106612 |
Emeka writes:
> Thanks it worked when parsed with json.load. However, it needed this
> decode('utf'):
>
> data = json.loads(respData.decode('utf-8'))
So it does. The response data is bytes.
There's also a way to wrap a decoding reader between the response object
and the JSON parser (json.load instead of json.loads):
response = urllib.request.urlopen(command) # a stream of bytes ...
please = codecs.getreader('UTF-8') # ... to characters
result = json.load(please(response))
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web