Groups > comp.lang.python > #41022 > unrolled thread

Regular expression problem

Started by	mukesh tiwari <mukeshtiwari.iiitm@gmail.com>
First post	2013-03-10 10:42 -0700
Last post	2013-03-11 16:23 -0400
Articles	15 — 8 participants

Back to article view | Back to comp.lang.python

  Regular expression problem mukesh tiwari <mukeshtiwari.iiitm@gmail.com> - 2013-03-10 10:42 -0700
    Re: Regular expression problem Chris Angelico <rosuav@gmail.com> - 2013-03-11 04:59 +1100
      Re: Regular expression problem mukesh tiwari <mukeshtiwari.iiitm@gmail.com> - 2013-03-10 11:05 -0700
      Re: Regular expression problem mukesh tiwari <mukeshtiwari.iiitm@gmail.com> - 2013-03-10 11:05 -0700
    Re: Regular expression problem Chris Angelico <rosuav@gmail.com> - 2013-03-11 05:08 +1100
      Re: Regular expression problem mukesh tiwari <mukeshtiwari.iiitm@gmail.com> - 2013-03-10 11:48 -0700
        Re: Regular expression problem Chris Angelico <rosuav@gmail.com> - 2013-03-11 05:57 +1100
      Re: Regular expression problem mukesh tiwari <mukeshtiwari.iiitm@gmail.com> - 2013-03-10 11:48 -0700
    Re: Regular expression problem Terry Reedy <tjreedy@udel.edu> - 2013-03-10 22:06 -0400
      Re: Regular expression problem jmfauth <wxjmfauth@gmail.com> - 2013-03-11 02:28 -0700
        Re: Regular expression problem Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-03-11 10:19 +0000
        Re: Regular expression problem rusi <rustompmody@gmail.com> - 2013-03-11 06:18 -0700
        On topic, please [Was:Re: Regular expression problem] Ned Deily <nad@acm.org> - 2013-03-11 11:13 -0700
    Re: Regular expression problem Serhiy Storchaka <storchaka@gmail.com> - 2013-03-11 20:30 +0200
    Re: Regular expression problem Terry Reedy <tjreedy@udel.edu> - 2013-03-11 16:23 -0400

#41022 — Regular expression problem

From	mukesh tiwari <mukeshtiwari.iiitm@gmail.com>
Date	2013-03-10 10:42 -0700
Subject	Regular expression problem
Message-ID	<c24ab822-a3f4-4faf-8a6d-2e1fde6552ae@googlegroups.com>

Hello all 
I am trying to solve this problem[1] using regular expression. I wrote this code but I am getting time limit exceed. Could some one please tell me how to make this code run faster. 

import re

if __name__ == "__main__":
    n = int ( raw_input() )
    c = 1
    while c <= n :
        email =  filter ( lambda x : x !=  None , [ re.search ( '[^~!@#$%^&*()<>?,.]*[a-zA-Z0-9][a-zA-Z0-9._][a-zA-Z0-9._][a-zA-Z0-9._][a-zA-Z0-9._][a-zA-Z0-9._]*@[a-zA-Z0-9]+.(com|edu|org|co.in)[^~!@#$%^&*()<>?,.a-zA-Z0-9]*' , x ) for x in raw_input().split(' ') ] )
        t = len ( email )
        print 'Case #' + str ( c ) + ': ' + str ( t )
        for i in xrange ( t ):
            print email[i].group()
        c += 1

Also rather than breaking the string at space, I tried to use findall but I am not getting correct result. 

>>> re.findall ( '[^~!@#$%^&*()<>?,.]*[a-zA-Z0-9][a-zA-Z0-9._][a-zA-Z0-9._][a-zA-Z0-9._][a-zA-Z0-9._][a-zA-Z0-9._]*@[a-zA-Z0-9]+.(com|edu|org|co.in)[^~!@#$%^&*()<>?,.a-zA-Z0-9]*[ ]+','mukeshtiwari.iiitmmmm@gmail.com mukeshtiwari.iiitmmmmm@gmail.com')
['com']

I am suppose to get [ mukeshtiwari.iiitmmm@gmail.com , mukeshtiwari.iiitmmm@gmail.com] ?

Regards
Mukesh Tiwari

[1] http://www.spoj.com/problems/MAIN12C/

[toc] | [next] | [standalone]

#41025

From	Chris Angelico <rosuav@gmail.com>
Date	2013-03-11 04:59 +1100
Message-ID	<mailman.3158.1362938407.2939.python-list@python.org>
In reply to	#41022

On Mon, Mar 11, 2013 at 4:42 AM, mukesh tiwari
<mukeshtiwari.iiitm@gmail.com> wrote:
> I am trying to solve this problem[1] using regular expression. I wrote this code but I am getting time limit exceed. Could some one please tell me how to make this code run faster.

What is the time limit? I just tried it (Python 2.6 under Windows) and
it finished in a humanly-immeasurable amount of time. Are you sure
that STDIN (eg raw_input()) is where your test data is coming from?

ChrisA

[toc] | [prev] | [next] | [standalone]

#41026

From	mukesh tiwari <mukeshtiwari.iiitm@gmail.com>
Date	2013-03-10 11:05 -0700
Message-ID	<d1f38f67-ec36-480d-8c4d-2d6b2e1c1f3d@googlegroups.com>
In reply to	#41025

Hi Chris
On the problem page, it is 3 second. 
 
> What is the time limit? I just tried it (Python 2.6 under Windows) and
> 
> it finished in a humanly-immeasurable amount of time. Are you sure
> 
> that STDIN (eg raw_input()) is where your test data is coming from?

Yes, on SPOJ we read data from STDIN. 

Regards 
Mukesh Tiwari

> 
> 
> 
> ChrisA

[toc] | [prev] | [next] | [standalone]

#41029

From	mukesh tiwari <mukeshtiwari.iiitm@gmail.com>
Date	2013-03-10 11:05 -0700
Message-ID	<mailman.3161.1362939467.2939.python-list@python.org>
In reply to	#41025

Hi Chris
On the problem page, it is 3 second. 
 
> What is the time limit? I just tried it (Python 2.6 under Windows) and
> 
> it finished in a humanly-immeasurable amount of time. Are you sure
> 
> that STDIN (eg raw_input()) is where your test data is coming from?

Yes, on SPOJ we read data from STDIN. 

Regards 
Mukesh Tiwari

> 
> 
> 
> ChrisA

[toc] | [prev] | [next] | [standalone]

#41028

From	Chris Angelico <rosuav@gmail.com>
Date	2013-03-11 05:08 +1100
Message-ID	<mailman.3160.1362938932.2939.python-list@python.org>
In reply to	#41022

On Mon, Mar 11, 2013 at 4:59 AM, Chris Angelico <rosuav@gmail.com> wrote:
> On Mon, Mar 11, 2013 at 4:42 AM, mukesh tiwari
> <mukeshtiwari.iiitm@gmail.com> wrote:
>> I am trying to solve this problem[1] using regular expression. I wrote this code but I am getting time limit exceed. Could some one please tell me how to make this code run faster.
>
> What is the time limit? I just tried it (Python 2.6 under Windows) and
> it finished in a humanly-immeasurable amount of time. Are you sure
> that STDIN (eg raw_input()) is where your test data is coming from?

Oops, reading comprehension fail. Time limit is 3s on a Pentium III.
I've no idea how long your code will take on that hardware, but I
doubt that it's taking three seconds. So my query regarding source of
test data still stands. Can you put together an uber-simple test
program that just echoes the lines of input, to make sure it really is
coming off stdin?

The problem description certainly does seem to imply stdin, but I
can't see why your code would take three seconds unless it's stalling
for some reason. Though perhaps on a P3 with the maximum 100 tests,
maybe that could take a while...

Something to try: Since you're using re.search(), see if you can drop
the complemented sets at the beginning [^~!@#$%^&*()<>?,.]* and end
[^~!@#$%^&*()<>?,.a-zA-Z0-9]* - they're going to be slow to process.
Also, you can simplify this:

[a-zA-Z0-9][a-zA-Z0-9._][a-zA-Z0-9._][a-zA-Z0-9._][a-zA-Z0-9._][a-zA-Z0-9._]*

to this:

[a-zA-Z0-9][a-zA-Z0-9._]{4,}

The brace notation means "at least 4, at most infinity".

Try those out and see if you still get the results you want.

ChrisA

[toc] | [prev] | [next] | [standalone]

#41031

From	mukesh tiwari <mukeshtiwari.iiitm@gmail.com>
Date	2013-03-10 11:48 -0700
Message-ID	<a5a2bd74-4d7e-45c1-aad2-2f67cf9bd9fe@googlegroups.com>
In reply to	#41028

Hi Chris
Thank you! Now I am getting wrong answer so at least program is faster then previous one and I am looking for wrong answer reason. Thanks again!

import re

if __name__ == "__main__":
    n = int ( raw_input() )
    c = 1
    while c <= n :
        email =  filter ( lambda x : x != None , [ re.search ( '[a-zA-Z0-9][a-zA-Z0-9._]{4,}@[a-zA-Z0-9]+.(com|edu|org|co.in)' , x ) for x in raw_input().split(' ') ] )
        t = len ( email )
        print 'Case #' + str ( c ) + ': ' + str ( t )
        for i in xrange ( t ) : 
            print email[i].group()
        c += 1

Regards 
Mukesh Tiwari

[toc] | [prev] | [next] | [standalone]

#41033

From	Chris Angelico <rosuav@gmail.com>
Date	2013-03-11 05:57 +1100
Message-ID	<mailman.3163.1362941836.2939.python-list@python.org>
In reply to	#41031

On Mon, Mar 11, 2013 at 5:48 AM, mukesh tiwari
<mukeshtiwari.iiitm@gmail.com> wrote:
> Hi Chris
> Thank you! Now I am getting wrong answer so at least program is faster then previous one and I am looking for wrong answer reason. Thanks again!

Excellent! Have fun.

Incidentally, regular expressions aren't the only way to solve this
sort of problem. If you get stuck with one method, it may be worth
trying another one, to see if you can get around the issue.

As they say, now you have two problems...

ChrisA

[toc] | [prev] | [next] | [standalone]

#41032

From	mukesh tiwari <mukeshtiwari.iiitm@gmail.com>
Date	2013-03-10 11:48 -0700
Message-ID	<mailman.3162.1362941307.2939.python-list@python.org>
In reply to	#41028

Hi Chris
Thank you! Now I am getting wrong answer so at least program is faster then previous one and I am looking for wrong answer reason. Thanks again!

import re

if __name__ == "__main__":
    n = int ( raw_input() )
    c = 1
    while c <= n :
        email =  filter ( lambda x : x != None , [ re.search ( '[a-zA-Z0-9][a-zA-Z0-9._]{4,}@[a-zA-Z0-9]+.(com|edu|org|co.in)' , x ) for x in raw_input().split(' ') ] )
        t = len ( email )
        print 'Case #' + str ( c ) + ': ' + str ( t )
        for i in xrange ( t ) : 
            print email[i].group()
        c += 1

Regards 
Mukesh Tiwari

[toc] | [prev] | [next] | [standalone]

#41049

From	Terry Reedy <tjreedy@udel.edu>
Date	2013-03-10 22:06 -0400
Message-ID	<mailman.3174.1362967600.2939.python-list@python.org>
In reply to	#41022

On 3/10/2013 1:42 PM, mukesh tiwari wrote:
> Hello all
> I am trying to solve this problem[1]
> [1] http://www.spoj.com/problems/MAIN12C/

As I remember, and as it still appears, this site severely penalizes 
Python solvers by using the same time limit for all languages. Thus, a 
'slow' python program may work correctly but the site will not let you 
know. A test that refuses to answer is no test at all. In the meanwhile, 
an algorithmically equivalent C program will be run and judged correct, 
so the programmer can try to speed up while not losing correctness.

By teaching 'speed before correctness", this site promotes bad 
programming habits and thinking (and the use of low-level but faster 
languages). I quote your later response: "Now I am getting wrong answer 
so at least program is faster then previous one". If the previous one 
was correct and the revision wrong, you should toss the revision and go 
back to the correct program.

I recommend that you work on problems where you have tests that you can 
actually run even before you code.

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]

#41056

From	jmfauth <wxjmfauth@gmail.com>
Date	2013-03-11 02:28 -0700
Message-ID	<d5432f08-2fc4-4e1c-98e5-6f637e4d1c5d@w3g2000vba.googlegroups.com>
In reply to	#41049

On 11 mar, 03:06, Terry Reedy <tjre...@udel.edu> wrote:

>
> ...
> By teaching 'speed before correctness", this site promotes bad
> programming habits and thinking (and the use of low-level but faster
> languages).
> ...

This is exactly what "your" flexible string representation
does!

And away from technical aspects, you even succeeded to
somehow lose unicode compliance.

jmf

[toc] | [prev] | [next] | [standalone]

#41057

From	Mark Lawrence <breamoreboy@yahoo.co.uk>
Date	2013-03-11 10:19 +0000
Message-ID	<mailman.3177.1362997176.2939.python-list@python.org>
In reply to	#41056

On 11/03/2013 09:28, jmfauth wrote:
> On 11 mar, 03:06, Terry Reedy <tjre...@udel.edu> wrote:
>
>>
>> ...
>> By teaching 'speed before correctness", this site promotes bad
>> programming habits and thinking (and the use of low-level but faster
>> languages).
>> ...
>
>
> This is exactly what "your" flexible string representation
> does!
>
> And away from technical aspects, you even succeeded to
> somehow lose unicode compliance.
>
> jmf
>

Please stick to something you know about such as sexual self abuse.

-- 
Cheers.

Mark Lawrence

[toc] | [prev] | [next] | [standalone]

#41062

From	rusi <rustompmody@gmail.com>
Date	2013-03-11 06:18 -0700
Message-ID	<160bcdfd-2400-4d51-8431-93f47d962149@vh9g2000pbb.googlegroups.com>
In reply to	#41056

On Mar 11, 2:28 pm, jmfauth <wxjmfa...@gmail.com> wrote:
> On 11 mar, 03:06, Terry Reedy <tjre...@udel.edu> wrote:
>
>
>
> > ...
> > By teaching 'speed before correctness", this site promotes bad
> > programming habits and thinking (and the use of low-level but faster
> > languages).
> > ...
>
> This is exactly what "your" flexible string representation
> does!

This is an old complaint of your with no new data for supporting it

>
> And away from technical aspects, you even succeeded to
> somehow lose unicode compliance.

This is a new complaint.
Just to make it clear:
1. All your recent complaints about unicode were in the realm of
performance
So your complaint that python has lost unicode compliance can mean one
of:
2a. The unicode standard mandates performance criteria
or
2b. There are problems with python's implementation (of strings?) that
have functional problems apart from your old performance complaints
or
2c. You need to look up what 'compliance' means

My own choice is to have a mid-point between
Very early binding: Narrow vs wide builds
Very late binding: String reps can change with the characters as they
are seen
This mid point would be perhaps a commandline switch to choose string-
engine attributes.

However to make this choice even worth a second look we need to have
hard data about performance that you have been unable to provide.

[See the recent thread of RoR vs Django to see the problems of
excessive spurious choice]

[toc] | [prev] | [next] | [standalone]

#41083 — On topic, please [Was:Re: Regular expression problem]

From	Ned Deily <nad@acm.org>
Date	2013-03-11 11:13 -0700
Subject	On topic, please [Was:Re: Regular expression problem]
Message-ID	<mailman.3201.1363025617.2939.python-list@python.org>
In reply to	#41056

A friendly reminder that this forum is for general discussion and 
questions about Python.

"Pretty much anything Python-related is fair game for discussion, and 
the group is even fairly tolerant of off-topic digressions; there have 
been entertaining discussions of topics such as floating point, good 
software design, and other programming languages such as Lisp and Forth."

But ...

"Rudeness and personal attacks, even in reaction to blatant flamebait, 
are strongly frowned upon. People may strongly disagree on an issue, but 
usually discussion remains civil. In case of an actual flamebait 
posting, you can ignore it, quietly plonk the offending poster in your 
killfile or mail filters, or write a sharp but still-polite response, 
but at all costs resist the urge to flame back."

http://www.python.org/community/lists/

It's up to all of us to help keep this group/list a place where people 
enjoy participating, without fear of gratuitous personal sniping.   
Thanks!

-- 
 Ned Deily,
 nad@acm.org

[toc] | [prev] | [next] | [standalone]

#41085

From	Serhiy Storchaka <storchaka@gmail.com>
Date	2013-03-11 20:30 +0200
Message-ID	<mailman.3202.1363026660.2939.python-list@python.org>
In reply to	#41022

On 11.03.13 04:06, Terry Reedy wrote:
> On 3/10/2013 1:42 PM, mukesh tiwari wrote:
>> Hello all
>> I am trying to solve this problem[1]
>> [1] http://www.spoj.com/problems/MAIN12C/
>
> As I remember, and as it still appears, this site severely penalizes
> Python solvers by using the same time limit for all languages. Thus, a
> 'slow' python program may work correctly but the site will not let you
> know.

I'm sure the time limits are enough to solve most (if not all) of 
problems. Actually all submitted solutions on Python for this problem 
run from 0.47 to 0.61 seconds (http://www.spoj.com/ranks/MAIN12C/).

[toc] | [prev] | [next] | [standalone]

#41089

From	Terry Reedy <tjreedy@udel.edu>
Date	2013-03-11 16:23 -0400
Message-ID	<mailman.3206.1363033451.2939.python-list@python.org>
In reply to	#41022

On 3/11/2013 2:30 PM, Serhiy Storchaka wrote:
> On 11.03.13 04:06, Terry Reedy wrote:
>> On 3/10/2013 1:42 PM, mukesh tiwari wrote:
>>> Hello all
>>> I am trying to solve this problem[1]
>>> [1] http://www.spoj.com/problems/MAIN12C/
>>
>> As I remember, and as it still appears, this site severely penalizes
>> Python solvers by using the same time limit for all languages. Thus, a
>> 'slow' python program may work correctly but the site will not let you
>> know.
>
> I'm sure the time limits are enough to solve most (if not all) of
> problems. Actually all submitted solutions on Python for this problem
> run from 0.47 to 0.61 seconds (http://www.spoj.com/ranks/MAIN12C/).

You do not see the solutions that timed out. I suppose you are pointing 
to the fact that for this problem there are solutions close to but under 
the time limit. However, algorithm running times are not evenly 
distributed. Suppose, for instance) there is a correct O(n**2) solution 
and a correct O(n) solution and that the ones listed are the O(n) 
solutions. Then the Python O(n**2) solutions could easily take 10x 
longer to run and time out, while equivalent C solutions do not.

Mukesh is not the first to post here a reasonable looking solution for 
that site that he could not judge because the test quite and refused to 
answer. I point out again that he was 'happy' to have a faster but 
incorrect program, even though it might have been a regression from his 
original.

-- 
Terry Jan Reedy

[toc] | [prev] | [standalone]

csiph-web

Regular expression problem

Contents

#41022 — Regular expression problem

#41025

#41026

#41029

#41028

#41031

#41033

#41032

#41049

#41056

#41057

#41062

#41083 — On topic, please [Was:Re: Regular expression problem]

#41085

#41089