Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #72107 > unrolled thread

Regular Expression for the special character "|" pipe

Started byAman Kashyap <amankashyap1223@gmail.com>
First post2014-05-27 03:59 -0700
Last post2014-05-27 14:06 +0100
Articles 8 — 6 participants

Back to article view | Back to comp.lang.python


Contents

  Regular Expression for the special character "|" pipe Aman Kashyap <amankashyap1223@gmail.com> - 2014-05-27 03:59 -0700
    Re: Regular Expression for the special character "|" pipe Vlastimil Brom <vlastimil.brom@gmail.com> - 2014-05-27 13:09 +0200
      Re: Regular Expression for the special character "|" pipe Aman Kashyap <amankashyap1223@gmail.com> - 2014-05-27 04:20 -0700
    Re: Regular Expression for the special character "|" pipe Daniel <5960761@gmail.com> - 2014-05-27 14:29 +0300
      Re: Regular Expression for the special character "|" pipe Aman Kashyap <amankashyap1223@gmail.com> - 2014-05-27 04:39 -0700
        Re: Regular Expression for the special character "|" pipe Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> - 2014-05-27 13:55 +0200
          Re: Regular Expression for the special character "|" pipe Roy Smith <roy@panix.com> - 2014-05-27 08:35 -0400
        Re: Regular Expression for the special character "|" pipe Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-05-27 14:06 +0100

#72107 — Regular Expression for the special character "|" pipe

FromAman Kashyap <amankashyap1223@gmail.com>
Date2014-05-27 03:59 -0700
SubjectRegular Expression for the special character "|" pipe
Message-ID<9c8e58be-9619-44c7-8098-961a0134c422@googlegroups.com>
I would like to create a regular expression in which i can match the "|" special character too.

e.g.

start=|ID=ter54rt543d|SID=ter54rt543d|end=|

I want to only |ID=ter54rt543d| from the above string but i am unable to write the  pattern match containing "|" pipe too.

By default python treat "|" as an OR operator.

But in my case I want to use to as a part of search string.

[toc] | [next] | [standalone]


#72110

FromVlastimil Brom <vlastimil.brom@gmail.com>
Date2014-05-27 13:09 +0200
Message-ID<mailman.10368.1401188967.18130.python-list@python.org>
In reply to#72107
2014-05-27 12:59 GMT+02:00 Aman Kashyap <amankashyap1223@gmail.com>:
> I would like to create a regular expression in which i can match the "|" special character too.
>
> e.g.
>
> start=|ID=ter54rt543d|SID=ter54rt543d|end=|
>
> I want to only |ID=ter54rt543d| from the above string but i am unable to write the  pattern match containing "|" pipe too.
>
> By default python treat "|" as an OR operator.
>
> But in my case I want to use to as a part of search string.
> --

Hi,
you can just escpape the pipe with backlash like any other metacharacter:

r"start=\|ID=ter54rt543d"

be sure to use the raw string notation r"...", or you can double all
backslashes in the string.

hth,
   vbr

[toc] | [prev] | [next] | [standalone]


#72111

FromAman Kashyap <amankashyap1223@gmail.com>
Date2014-05-27 04:20 -0700
Message-ID<e0a96d65-95a1-401a-9335-ab2e24397896@googlegroups.com>
In reply to#72110
On Tuesday, 27 May 2014 16:39:19 UTC+5:30, Vlastimil Brom  wrote:
> 2014-05-27 12:59 GMT+02:00 Aman Kashyap <amankashyap1223@gmail.com>:
> 
> > I would like to create a regular expression in which i can match the "|" special character too.
> 
> >
> 
> > e.g.
> 
> >
> 
> > start=|ID=ter54rt543d|SID=ter54rt543d|end=|
> 
> >
> 
> > I want to only |ID=ter54rt543d| from the above string but i am unable to write the  pattern match containing "|" pipe too.
> 
> >
> 
> > By default python treat "|" as an OR operator.
> 
> >
> 
> > But in my case I want to use to as a part of search string.
> 
> > --
> 
> 
> 
> Hi,
> 
> you can just escpape the pipe with backlash like any other metacharacter:
> 
> 
> 
> r"start=\|ID=ter54rt543d"
> 
> 
> 
> be sure to use the raw string notation r"...", or you can double all
> 
> backslashes in the string.
> 
> 
> 
> hth,
> 
>    vbr


Thanks vbr for the quick response.

I have string = |SOH=|ID=re65dgt5dd|DS=fjkjf|SDID=fhkhkf|ID=fkjfkf|EOM=|

and want to replace 2 sub-strings
|ID=re65dgt5dd| with |ID=MAN|
|ID=fkjfkf| with |MAN|

I am using regular expression ID=[a-z]*[0-9]*[a-z]*[0-9]*[a-z]*|$

the output is |SOH=|ID=MAN|DS=fjkjf|SDID=MAN|ID=MAN|EOM=|ID=MAN

expected value is = |SOH=|ID=MAN|DS=fjkjf|SDID=fhkhkf|ID=MAN|EOM=|

could you please help me in this regard?

[toc] | [prev] | [next] | [standalone]


#72112

FromDaniel <5960761@gmail.com>
Date2014-05-27 14:29 +0300
Message-ID<mailman.10369.1401190577.18130.python-list@python.org>
In reply to#72107
What about skipping the re and try this:

'start=|ID=ter54rt543d|SID=ter54rt543d|end=|'.split('|')[1][3:]

On 27.05.2014 14:09, Vlastimil Brom wrote:
> 2014-05-27 12:59 GMT+02:00 Aman Kashyap <amankashyap1223@gmail.com>:
>> I would like to create a regular expression in which i can match the "|" special character too.
>>
>> e.g.
>>
>> start=|ID=ter54rt543d|SID=ter54rt543d|end=|
>>
>> I want to only |ID=ter54rt543d| from the above string but i am unable to write the  pattern match containing "|" pipe too.
>>
>> By default python treat "|" as an OR operator.
>>
>> But in my case I want to use to as a part of search string.
>> --
> Hi,
> you can just escpape the pipe with backlash like any other metacharacter:
>
> r"start=\|ID=ter54rt543d"
>
> be sure to use the raw string notation r"...", or you can double all
> backslashes in the string.
>
> hth,
>     vbr

[toc] | [prev] | [next] | [standalone]


#72113

FromAman Kashyap <amankashyap1223@gmail.com>
Date2014-05-27 04:39 -0700
Message-ID<3cc77455-39ed-4403-a46c-5dd8e640a483@googlegroups.com>
In reply to#72112
On Tuesday, 27 May 2014 16:59:38 UTC+5:30, Daniel  wrote:
> What about skipping the re and try this:
> 
> 
> 
> 'start=|ID=ter54rt543d|SID=ter54rt543d|end=|'.split('|')[1][3:]
> 
> 
> 
> On 27.05.2014 14:09, Vlastimil Brom wrote:
> 
> > 2014-05-27 12:59 GMT+02:00 Aman Kashyap <amankashyap1223@gmail.com>:
> 
> >> I would like to create a regular expression in which i can match the "|" special character too.
> 
> >>
> 
> >> e.g.
> 
> >>
> 
> >> start=|ID=ter54rt543d|SID=ter54rt543d|end=|
> 
> >>
> 
> >> I want to only |ID=ter54rt543d| from the above string but i am unable to write the  pattern match containing "|" pipe too.
> 
> >>
> 
> >> By default python treat "|" as an OR operator.
> 
> >>
> 
> >> But in my case I want to use to as a part of search string.
> 
> >> --
> 
> > Hi,
> 
> > you can just escpape the pipe with backlash like any other metacharacter:
> 
> >
> 
> > r"start=\|ID=ter54rt543d"
> 
> >
> 
> > be sure to use the raw string notation r"...", or you can double all
> 
> > backslashes in the string.
> 
> >
> 
> > hth,
> 
> >     vbr

Thanks for the response.

I got the answer finally.

This is the regular expression to be used:\\|ID=[a-z]*[0-9]*[a-z]*[0-9]*[a-z]*\\|

[toc] | [prev] | [next] | [standalone]


#72114

FromWolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de>
Date2014-05-27 13:55 +0200
Message-ID<mailman.10370.1401191774.18130.python-list@python.org>
In reply to#72113
On 27.05.2014 13:39, Aman Kashyap wrote:
>> On 27.05.2014 14:09, Vlastimil Brom wrote:
>>
>>> you can just escpape the pipe with backlash like any other metacharacter:
>>>
>>> r"start=\|ID=ter54rt543d"
>>>
>>> be sure to use the raw string notation r"...", or you can double all
>>
>>> backslashes in the string.
>>
> Thanks for the response.
>
> I got the answer finally.
>
> This is the regular expression to be used:\\|ID=[a-z]*[0-9]*[a-z]*[0-9]*[a-z]*\\|
>

or, and more readable:

r'\|ID=[a-z]*[0-9]*[a-z]*[0-9]*[a-z]*\|'

This is what Vlastimil was talking about. It saves you from having to 
escape the backslashes.

[toc] | [prev] | [next] | [standalone]


#72115

FromRoy Smith <roy@panix.com>
Date2014-05-27 08:35 -0400
Message-ID<roy-ED3A2B.08355327052014@news.panix.com>
In reply to#72114
In article <mailman.10370.1401191774.18130.python-list@python.org>,
 Wolfgang Maier <wolfgang.maier@biologie.uni-freiburg.de> wrote:

> On 27.05.2014 13:39, Aman Kashyap wrote:
> >> On 27.05.2014 14:09, Vlastimil Brom wrote:
> >>
> >>> you can just escpape the pipe with backlash like any other metacharacter:
> >>>
> >>> r"start=\|ID=ter54rt543d"
> >>>
> >>> be sure to use the raw string notation r"...", or you can double all
> >>
> >>> backslashes in the string.
> >>
> > Thanks for the response.
> >
> > I got the answer finally.
> >
> > This is the regular expression to be 
> > used:\\|ID=[a-z]*[0-9]*[a-z]*[0-9]*[a-z]*\\|
> >
> 
> or, and more readable:
> 
> r'\|ID=[a-z]*[0-9]*[a-z]*[0-9]*[a-z]*\|'
> 
> This is what Vlastimil was talking about. It saves you from having to 
> escape the backslashes.

Sometimes what I do, instead of using backslashes, I put the problem 
character into a character class by itself.  It's a matter of personal 
opinion which way is easier to read, but it certainly eliminates all the 
questions about "how many backslashes do I need?"

> r'[|]ID=[a-z]*[0-9]*[a-z]*[0-9]*[a-z]*[|]'

Another thing that can help make regexes easier to read is the VERBOSE 
flag.  Basically, it ignores whitespace inside the regex (see 
https://docs.python.org/2/library/re.html#module-contents for details).  
So, you can write something like:

pattern = re.compile(r'''[|]
                         ID=
                         [a-z]*
                         [0-9]*
                         [a-z]*
                         [0-9]*
                         [a-z]*
                         [|]''',
                     re.VERBOSE)

Or, alternatively, take advantage of the fact that Python concatenates 
adjacent string literals, and write it like this:

pattern = re.compile(r'[|]'
                     r'ID='
                     r'[a-z]*'
                     r'[0-9]*'
                     r'[a-z]*'
                     r'[0-9]*'
                     r'[a-z]*'
                     r'[|]'
                    )

[toc] | [prev] | [next] | [standalone]


#72116

FromMark Lawrence <breamoreboy@yahoo.co.uk>
Date2014-05-27 14:06 +0100
Message-ID<mailman.10371.1401195996.18130.python-list@python.org>
In reply to#72113
On 27/05/2014 12:39, Aman Kashyap wrote:
> On Tuesday, 27 May 2014 16:59:38 UTC+5:30, Daniel  wrote:
>> What about skipping the re and try this:
>>
>>
>>
>> 'start=|ID=ter54rt543d|SID=ter54rt543d|end=|'.split('|')[1][3:]
>>
>>
>>
>> On 27.05.2014 14:09, Vlastimil Brom wrote:
>>
>>> 2014-05-27 12:59 GMT+02:00 Aman Kashyap <amankashyap1223@gmail.com>:
>>
>>>> I would like to create a regular expression in which i can match the "|" special character too.
>>
>>>>
>>
>>>> e.g.
>>
>>>>
>>
>>>> start=|ID=ter54rt543d|SID=ter54rt543d|end=|
>>
>>>>
>>
>>>> I want to only |ID=ter54rt543d| from the above string but i am unable to write the  pattern match containing "|" pipe too.
>>
>>>>
>>
>>>> By default python treat "|" as an OR operator.
>>
>>>>
>>
>>>> But in my case I want to use to as a part of search string.
>>
>>>> --
>>
>>> Hi,
>>
>>> you can just escpape the pipe with backlash like any other metacharacter:
>>
>>>
>>
>>> r"start=\|ID=ter54rt543d"
>>
>>>
>>
>>> be sure to use the raw string notation r"...", or you can double all
>>
>>> backslashes in the string.
>>
>>>
>>
>>> hth,
>>
>>>      vbr
>
> Thanks for the response.
>
> I got the answer finally.
>
> This is the regular expression to be used:\\|ID=[a-z]*[0-9]*[a-z]*[0-9]*[a-z]*\\|
>

I'm pleased to see that you have answers.  In return would you please 
use the mailing list 
https://mail.python.org/mailman/listinfo/python-list or read and action 
this https://wiki.python.org/moin/GoogleGroupsPython to prevent us 
seeing double line spacing and single line paragraphs, thanks.

-- 
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.

Mark Lawrence

---
This email is free from viruses and malware because avast! Antivirus protection is active.
http://www.avast.com

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web