Path: csiph.com!usenet.pasdenom.info!news.redatomik.org!newsfeed.xs4all.nl!newsfeed3a.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
To: Palpandi <palpandi111@gmail.com>
cc: python-list@python.org
From: Laura Creighton <lac@openend.se>
Subject: Re: Regular Expression
In-Reply-To: Message from Palpandi <palpandi111@gmail.com> of "Thu, 04 Jun 2015 06:36:29 -0700." <c85bc324-37fe-469e-b3b1-b1d4e51bf7d8@googlegroups.com>
References: <c85bc324-37fe-469e-b3b1-b1d4e51bf7d8@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-ID: <1943.1433427675.1@fido>
Content-Transfer-Encoding: quoted-printable
Date: Thu, 04 Jun 2015 16:21:15 +0200
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.166.1433427684.13271.python-list@python.org>
Lines: 45
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:92059

In a message of Thu, 04 Jun 2015 06:36:29 -0700, Palpandi writes:
>Hi All,
>
>This is the case. To split "string2" from "string1_string2" I am using =

>re.split('_', "string1_string2", 1)

And you shouldn't be.  The 3rd argument, 1 says stop after one match.

>It is working fine for string "string1_string2" and output as "string2". =
But actually the problem is that if a sting is "__string1_string2" and the=
 output is "_string1_string2". It is wrong.
>
>How to fix this issue?

Depends on what you want.

Approach #1 - just use the string method, forget re, because you do not
need it.

>>>> "__string1_string2".split("_")
['', '', 'string1', 'string2']
>>>> "_string1_string2__".split("_")
['', 'string1', 'string2', '', '']

Approach #2 -- use re but with a fixed string (probably a bad idea,
you should be using approach 1 instead if you have a fixed string)

>>>> re.split('_', "__string1_string2")
['', '', 'string1', 'string2']
>>>> re.split('_', "__string1_string2__")
['', '', 'string1', 'string2', '', '']

Approach #3 - there is a real pattern here I want to use, the example
I posted to the list is a lot simpler than what I really want to do.
Ok, in this case we will match 'any number of underscores' for an
example.

>>>> p =3D re.compile('_*')
>>>> p.split("__string1_string2")
['', 'string1', 'string2']
>>>> p.split("__string1__string2__")
['', 'string1', 'string2', '']

Laura