Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #70676 > unrolled thread

problem with regex

Started bydimmaim@gmail.com
First post2014-04-28 05:52 -0700
Last post2014-04-28 15:33 -0400
Articles 3 — 3 participants

Back to article view | Back to comp.lang.python


Contents

  problem with regex dimmaim@gmail.com - 2014-04-28 05:52 -0700
    Re: problem with regex Roy Smith <roy@panix.com> - 2014-04-28 09:03 -0400
    Re:problem with regex Dave Angel <davea@davea.name> - 2014-04-28 15:33 -0400

#70676 — problem with regex

Fromdimmaim@gmail.com
Date2014-04-28 05:52 -0700
Subjectproblem with regex
Message-ID<caeba811-441e-42a0-9b2b-c743205b1f82@googlegroups.com>
i want to find a specific urls from a txt file but i have some issus. First when i take just two lines from the file with copy paste and assign it to a variable like this and it works only with triple quotes
 
test='''_*_n.jpg","timelineCoverPhoto":"{\"focus\":{\"x\":0.5,\"y\":0.386925795053},\"photo\":{\"__type__\":{\"name\":\"Photo\"},\"image_lowres\":{\"uri\":\"https://fbcdn-photos-f-a.akamaihd.net/*-*-*/*_*_*_a.jpg\",\"width\":180,\"height\":179}}}","subscribeStatus":"IS_SUBSCRIBED","smallPictureUrl":"https://fbcdn-profile-a.akamaihd.net/*-*-*/s100x100/*_*_*_s.jpg","contactId":"*==","contactType":"USER","friendshipStatus":"ARE_FRIENDS","graphApiWriteId":"contact_*:*:*","hugePictureUrl":"https://fbcdn-profile-a.akamaihd.net/hprofile-ak-frc3/*_*_*_n.jpg","profileFbid":"1284503586","isMobilePushable":"NO","lookupKey":null,"name":{"displayName":"* *","firstName":"*","lastName":"*"},"nameSearchTokens":["*","*"],"phones":[],"phoneticName":{"displayName":null,"firstName":null,"lastName":null},"isMemorialized":false,"communicationRank":1.1144714,"canViewerSendGift":false,"canMessage":true}
*=={"bigPictureUrl":"https://fbcdn-profile-a.akamaihd.net/hprofile-ak-ash3/*.*.*.*/s200x200/*_*_*_n.jpg","timelineCoverPhoto":"{\"focus\":{\"x\":0.5,\"y\":0.49137931034483},\"photo\":{\"__type__\":{\"name\":\"Photo\"},\"image_lowres\":{\"uri\":\"https://fbcdn-photos-h-a.akamaihd.net/*-*-*/*_*_*_a.jpg\",\"width\":180,\"height\":135}}}","subscribeStatus":"IS_SUBSCRIBED","smallPictureUrl":"https://fbcdn-profile-a.akamaihd.net/*-*-*/*.*.*.*/s100x100/*_*_*_a.jpg","contactId":"*==","contactType":"USER","friendshipStatus":"ARE_FRIENDS","graphApiWriteId":"contact_*:*:*","hugePictureUrl":"https://fbcdn-profile-a.akamaihd.net/hprofile-ak-ash3/c0.0.540.540/*_*_*_n.jpg","profileFbid":"*","isMobilePushable":"YES","lookupKey":null,"name":{"displayName":"* *","firstName":"*","lastName":"*"},"nameSearchTokens":["*","*"],"phones":[],"phoneticName":{"displayName":null,"firstName":null,"lastName":null},"isMemorialized":false,"communicationRank":1.2158813,"canViewerSendGift":false,"canMessage":true}'''

uri = re.findall(r'''uri\":\"https://fbcdn-(a-z|photos)?([^\'" >]+)''',test)
print uri

it works fine and i have my result [('photos', '-f-a.akamaihd.net/*-*-*/*_*_*_a.jpg'), ('photos', '-h-a.akamaihd.net/*-*-*/*_*_*_a.jpg')]

but if a take those lines and save it into a txt file like the original is without the quotes and do the following 

datafile=open('a.txt','r')
data_array=''
for line in datafile:
    data_array=data_array+line

uri = re.findall(r'''uri\":\"https://fbcdn-(a-z|photos)?([^\'" >]+)''',data_array)

after printing uri it gives an empty list,. what to do to make it work for the lines of a txt file

[toc] | [next] | [standalone]


#70678

FromRoy Smith <roy@panix.com>
Date2014-04-28 09:03 -0400
Message-ID<roy-8B7422.09035728042014@news.panix.com>
In reply to#70676
In article <caeba811-441e-42a0-9b2b-c743205b1f82@googlegroups.com>,
 dimmaim@gmail.com wrote:

> i want to find a specific urls from a txt file but i have some issus. First 
> when i take just two lines from the file with copy paste and assign it to a 
> variable like this and it works only with triple quotes
>  
> test='''<long string elided>'''
[...]
> but if a take those lines and save it into a txt file like the original is 
> without the quotes [it doesn't work]

I suspect this has nothing to do with regular expressions, but it's just 
about string management.

The first thing you want to do is verify that the text you are reading 
in from the file is the same as the text you have in triple quotes.  So, 
write a program like this:

test='''<long string elided>'''

datafile=open('a.txt','r')
data_array=''
for line in datafile:
    data_array=data_array+line

print test == data_array

If that prints True, then you've got the same text in both cases (and 
you can go on to looking for other problems).  I suspect it will print 
False, though.  So, now your task is to figure out where those two 
strings differ.  Maybe something like:

for c1, c2 in zip(test, data_array):
    print c1 == c2, repr(c1), repr(c2)

and look for the first place they're not the same.  Hopefully that will 
give you a clue what's going wrong.

[toc] | [prev] | [next] | [standalone]


#70692

FromDave Angel <davea@davea.name>
Date2014-04-28 15:33 -0400
Message-ID<mailman.9556.1398713268.18130.python-list@python.org>
In reply to#70676
dimmaim@gmail.com Wrote in message:
> i want to find a specific urls from a txt file but i have some issus. First when i take just two lines from the file with copy paste and assign it to a variable like this and it works only with triple quotes
>  
> test='''_*_n.jpg","timelineCoverPhoto":"{\"focus\":{\"x\":0.5,\"y\":0.386925795053},\"photo\":{\"__type__\":{\"name\":\"Photo\"},\"image_lowres\":{\"uri\":\"https://fbcdn-photos-f-a.akamaihd.net/*-*-*

Why did you start a second thread with similar content two minutes
 after the first? Do you expect us to compare the two messages and
 figure out what you changed,  or were you just impatient for a
 response?  I only check in here about 6 times a day, and I
 imagine some might be even less often.

Your test string literal has lots of backslashes in it, which get
 interpreted into escape sequences in a literal,  but not in a
 file.  If that's really what the file looks like,  you're going
 to want to use a raw string.  I agree with Roy, you're probably
 not getting the same string the two ways.

-- 
DaveA

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web