Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #99700 > unrolled thread

I can't understand re.sub

Started byMr Zaug <matthew.herzog@gmail.com>
First post2015-11-29 13:36 -0800
Last post2015-12-01 21:31 +0000
Articles 10 — 5 participants

Back to article view | Back to comp.lang.python


Contents

  I can't understand re.sub Mr Zaug <matthew.herzog@gmail.com> - 2015-11-29 13:36 -0800
    Re: I can't understand re.sub Denis McMahon <denismfmcmahon@gmail.com> - 2015-11-29 22:01 +0000
      Re: I can't understand re.sub Mr Zaug <matthew.herzog@gmail.com> - 2015-11-29 17:20 -0800
    Re: I can't understand re.sub Rick Johnson <rantingrickjohnson@gmail.com> - 2015-11-29 17:12 -0800
      Re: I can't understand re.sub Mr Zaug <matthew.herzog@gmail.com> - 2015-11-29 17:24 -0800
    Re: I can't understand re.sub Erik <python@lucidity.plus.com> - 2015-11-29 21:53 +0000
      Re: I can't understand re.sub Jussi Piitulainen <harvesting@is.invalid> - 2015-11-30 10:51 +0200
        Re: I can't understand re.sub Erik <python@lucidity.plus.com> - 2015-12-01 01:26 +0000
          Re: I can't understand re.sub Jussi Piitulainen <harvesting@is.invalid> - 2015-12-01 07:28 +0200
            Re: I can't understand re.sub Erik <python@lucidity.plus.com> - 2015-12-01 21:31 +0000

#99700 — I can't understand re.sub

FromMr Zaug <matthew.herzog@gmail.com>
Date2015-11-29 13:36 -0800
SubjectI can't understand re.sub
Message-ID<af27abe4-f81e-4d44-a504-c58d9e71986a@googlegroups.com>
I need to use re.sub to replace strings in a text file. I can't seem to understand how to use the re module to this end.

result = re.sub(pattern, repl, string, count=0, flags=0);

I think I understand that pattern is the regex I'm searching for and repl is the thing I want to substitute for whatever pattern finds but what is string?

The items I'm searching for are few and they do not change. They are "CONTENT_PATH", "ENV" and "NNN". These appear on a few lines in a template file. They do not appear together on any line and they only appear once on each line.

This should be simple, right?

[toc] | [next] | [standalone]


#99702

FromDenis McMahon <denismfmcmahon@gmail.com>
Date2015-11-29 22:01 +0000
Message-ID<n3fsju$348$2@dont-email.me>
In reply to#99700
On Sun, 29 Nov 2015 13:36:57 -0800, Mr Zaug wrote:

> result = re.sub(pattern, repl, string, count=0, flags=0);

re.sub works on a string, not on a file.

Read the file to a string, pass it in as the string.

Or pre-compile the search pattern(s) and process the file line by line:

import re

patts = [
 (re.compile("axe"), "hammer"),
 (re.compile("cat"), "dog"),
 (re.compile("tree"), "fence")
 ]

with open("input.txt","r") as inf, open("output.txt","w") as ouf:
    line = inf.readline()
    for patt in patts:
        line = patt[0].sub(patt[1], line)
    ouf.write(line)

Not tested, but I think it should do the trick.

Or use a single patt and a replacement func:

import re

patt = re.compile("(axe)|(cat)|(tree)")

def replfunc(match):
    if match == 'axe':
        return 'hammer'
    if match == 'cat':
        return 'dog'
    if match == 'tree':
        return 'fence'
    return match

with open("input.txt","r") as inf, open("output.txt","w") as ouf:
    line = inf.readline()
    line = patt.sub(replfunc, line)
    ouf.write(line)

(also not tested)

-- 
Denis McMahon, denismfmcmahon@gmail.com

[toc] | [prev] | [next] | [standalone]


#99707

FromMr Zaug <matthew.herzog@gmail.com>
Date2015-11-29 17:20 -0800
Message-ID<58af2723-cd82-4ce5-a6fd-fbe31d4bf692@googlegroups.com>
In reply to#99702
Thanks. That does help quite a lot.

[toc] | [prev] | [next] | [standalone]


#99706

FromRick Johnson <rantingrickjohnson@gmail.com>
Date2015-11-29 17:12 -0800
Message-ID<feee81b6-2549-4bfa-b741-35da861a0317@googlegroups.com>
In reply to#99700
On Sunday, November 29, 2015 at 3:37:34 PM UTC-6, Mr Zaug wrote:

> The items I'm searching for are few and they do not change. They are "CONTENT_PATH", "ENV" and "NNN". These appear on a few lines in a template file. They do not appear together on any line and they only appear once on each line. This should be simple, right?

Yes. In fact so simple that string methods and a "for loop" will suffice. Using regexps for this tasks would be like using a dump truck to haul a teaspoon of salt.

[toc] | [prev] | [next] | [standalone]


#99708

FromMr Zaug <matthew.herzog@gmail.com>
Date2015-11-29 17:24 -0800
Message-ID<967ecfa3-b240-44d6-9a75-bbd9f3865da4@googlegroups.com>
In reply to#99706
On Sunday, November 29, 2015 at 8:12:25 PM UTC-5, Rick Johnson wrote:
> On Sunday, November 29, 2015 at 3:37:34 PM UTC-6, Mr Zaug wrote:
> 
> > The items I'm searching for are few and they do not change. They are "CONTENT_PATH", "ENV" and "NNN". These appear on a few lines in a template file. They do not appear together on any line and they only appear once on each line. This should be simple, right?
> 
> Yes. In fact so simple that string methods and a "for loop" will suffice. Using regexps for this tasks would be like using a dump truck to haul a teaspoon of salt.

I rarely get a chance to do any scripting so yeah, I stink at it.

Ideally I would have a script that will spit out a config file such as 087_pre-prod_snakeoil_farm.any and not need to manually rename said output file.

[toc] | [prev] | [next] | [standalone]


#99728

FromErik <python@lucidity.plus.com>
Date2015-11-29 21:53 +0000
Message-ID<mailman.26.1448872519.14615.python-list@python.org>
In reply to#99700
On 29/11/15 21:36, Mr Zaug wrote:
> I need to use re.sub to replace strings in a text file.

Do you? Is there any other way?

> result = re.sub(pattern, repl, string, count=0, flags=0);
>
> I think I understand that pattern is the regex I'm searching for and
> repl is the thing I want to substitute for whatever pattern finds but
> what is string?

Where do you think the function gets the string you want to transform from?

> This should be simple, right?

It is. And it could be even simpler if you don't bother with regexes at 
all (if your input is as fixed as you say it is):

 >>> foo = "foo bar baz spam CONTENT_PATH bar spam"
 >>> ' Substitute '.join(foo.split(' CONTENT_PATH ', 1))
'foo bar baz spam Substitute bar spam'
 >>>

E.

[toc] | [prev] | [next] | [standalone]


#99731

FromJussi Piitulainen <harvesting@is.invalid>
Date2015-11-30 10:51 +0200
Message-ID<lf54mg3eupq.fsf@ling.helsinki.fi>
In reply to#99728
Erik writes:

> On 29/11/15 21:36, Mr Zaug wrote:
>> This should be simple, right?
>
> It is. And it could be even simpler if you don't bother with regexes
> at all (if your input is as fixed as you say it is):
>
> >>> foo = "foo bar baz spam CONTENT_PATH bar spam"
> >>> ' Substitute '.join(foo.split(' CONTENT_PATH ', 1))
> 'foo bar baz spam Substitute bar spam'

Surely the straight thing to say is:

   >>> foo.replace(' CONTENT_PATH ', ' Substitute ')
   'foo bar baz spam Substitute bar spam'

But there was no guarantee of spaces around the target. If you wish to,
say, replace "spam" in your foo with "REDACTED" but leave it intact in
"May the spammer be prosecuted", a regex might be attractive after all.

[toc] | [prev] | [next] | [standalone]


#99762

FromErik <python@lucidity.plus.com>
Date2015-12-01 01:26 +0000
Message-ID<mailman.49.1448933226.14615.python-list@python.org>
In reply to#99731
On 30/11/15 08:51, Jussi Piitulainen wrote:
> Surely the straight thing to say is:
>
>     >>> foo.replace(' CONTENT_PATH ', ' Substitute ')
>     'foo bar baz spam Substitute bar spam'

Not quite the same thing (but yes, with a third argument of 1, it would be).

> But there was no guarantee of spaces around the target.

I know. It was just an example to show that there might be an option 
that's not a regex for the specific use indicated. It's up to the OP to 
decide whether they think the spaces (or any other, or no, delimiter) 
would actually be required or useful. Or whether they really prefer a 
regex after all.

> If you wish to,
> say, replace "spam" in your foo with "REDACTED" but leave it intact in
> "May the spammer be prosecuted", a regex might be attractive after all.

But that's not what the OP said they wanted to do. They said everything 
was very fixed - they did not want a general purpose human language text 
processing solution ... ;)

E.

[toc] | [prev] | [next] | [standalone]


#99768

FromJussi Piitulainen <harvesting@is.invalid>
Date2015-12-01 07:28 +0200
Message-ID<lf5r3j6ka9q.fsf@ling.helsinki.fi>
In reply to#99762
Erik writes:
> On 30/11/15 08:51, Jussi Piitulainen wrote:
[- -]
>> If you wish to,
>> say, replace "spam" in your foo with "REDACTED" but leave it intact in
>> "May the spammer be prosecuted", a regex might be attractive after all.
>
> But that's not what the OP said they wanted to do. They said
> everything was very fixed - they did not want a general purpose human
> language text processing solution ... ;)

Language processing is not what I had in mind here. Merely this, that
there is some sort of word boundary, be it punctuation, whitespace, or
an end of the string:

   >>> re.sub(r'\bspam\b', '****', 'spamalot spam')
   'spamalot ****'

That's not perfect either, but it's simple and might be somewhat
proportional to the problem.

A real solution should be aware of the actual structure of those lines,
assuming they follow some defined syntax.

[toc] | [prev] | [next] | [standalone]


#99819

FromErik <python@lucidity.plus.com>
Date2015-12-01 21:31 +0000
Message-ID<mailman.85.1449005519.14615.python-list@python.org>
In reply to#99768
On 01/12/15 05:28, Jussi Piitulainen wrote:
> A real solution should be aware of the actual structure of those lines,
> assuming they follow some defined syntax.

I think that we are in violent agreement on this ;)

E.

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web