Groups > comp.lang.python > #106612 > unrolled thread

Re: Joining Strings

Started by	Jussi Piitulainen <jussi.piitulainen@helsinki.fi>
First post	2016-04-07 08:01 +0300
Last post	2016-04-08 08:02 +0300
Articles	2 — 1 participant

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.

  Re: Joining Strings Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-04-07 08:01 +0300
    Re: Joining Strings Jussi Piitulainen <jussi.piitulainen@helsinki.fi> - 2016-04-08 08:02 +0300

#106612 — Re: Joining Strings

From	Jussi Piitulainen <jussi.piitulainen@helsinki.fi>
Date	2016-04-07 08:01 +0300
Subject	Re: Joining Strings
Message-ID	<lf5oa9mt3kv.fsf@ling.helsinki.fi>

Emeka writes:

> Hello All,
>
> import urllib.request
> import re
>
> url = 'https://www.everyday.com/
>
>
>
> req = urllib.request.Request(url)
> resp = urllib.request.urlopen(req)
> respData = resp.read()
>
>
> paragraphs = re.findall(r'\[(.*?)\]',str(respData))
> for eachP in paragraphs:
>     print("".join(eachP.split(',')[1:-2]))
>     print("\n")
>
>
>
> I got the below:
> "Coke -  Yala Market Branch""NO. 113 IKU BAKR WAY YALA"""
> But what I need is
>
> 'Coke -  Yala Market Branch NO. 113 IKU BAKR WAY YALA'
>
> How to I achieve the above?

A couple of things you could do to understand your problem and work
around it: Change your code to print(eachP). Change your "".join to
"!".join to see where the commas were. Experiment with data of that form
in the REPL. Sometimes it's good to print repr(datum) instead of datum,
though not in this case.

But are you trying to extract and parse paragraphs from a JSON response?
Do not use regex for that at all. Use json.load or json.loads to parse
it properly, and access the relevant data by indexing:

x = json.loads('{"foo":[["Weather Forecast","It\'s Rain"],[]]}')

x ==> {'foo': [['Weather Forecast', "It's Rain"], []]}

x['foo'] ==> [['Weather Forecast', "It's Rain"], []]

x['foo'][0] ==> ['Weather Forecast', "It's Rain"]

[toc] | [next] | [standalone]

#106645

From	Jussi Piitulainen <jussi.piitulainen@helsinki.fi>
Date	2016-04-08 08:02 +0300
Message-ID	<lf54mbc3d77.fsf@ling.helsinki.fi>
In reply to	#106612

Emeka writes:

> Thanks it worked when parsed with json.load. However, it needed this
> decode('utf'):
>
> data = json.loads(respData.decode('utf-8'))

So it does. The response data is bytes.

There's also a way to wrap a decoding reader between the response object
and the JSON parser (json.load instead of json.loads):

response = urllib.request.urlopen(command) # a stream of bytes ...
please = codecs.getreader('UTF-8') # ... to characters

result = json.load(please(response))

[toc] | [prev] | [standalone]

csiph-web

Re: Joining Strings

Contents

#106612 — Re: Joining Strings

#106645