Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #99700 > unrolled thread
| Started by | Mr Zaug <matthew.herzog@gmail.com> |
|---|---|
| First post | 2015-11-29 13:36 -0800 |
| Last post | 2015-12-01 21:31 +0000 |
| Articles | 10 — 5 participants |
Back to article view | Back to comp.lang.python
I can't understand re.sub Mr Zaug <matthew.herzog@gmail.com> - 2015-11-29 13:36 -0800
Re: I can't understand re.sub Denis McMahon <denismfmcmahon@gmail.com> - 2015-11-29 22:01 +0000
Re: I can't understand re.sub Mr Zaug <matthew.herzog@gmail.com> - 2015-11-29 17:20 -0800
Re: I can't understand re.sub Rick Johnson <rantingrickjohnson@gmail.com> - 2015-11-29 17:12 -0800
Re: I can't understand re.sub Mr Zaug <matthew.herzog@gmail.com> - 2015-11-29 17:24 -0800
Re: I can't understand re.sub Erik <python@lucidity.plus.com> - 2015-11-29 21:53 +0000
Re: I can't understand re.sub Jussi Piitulainen <harvesting@is.invalid> - 2015-11-30 10:51 +0200
Re: I can't understand re.sub Erik <python@lucidity.plus.com> - 2015-12-01 01:26 +0000
Re: I can't understand re.sub Jussi Piitulainen <harvesting@is.invalid> - 2015-12-01 07:28 +0200
Re: I can't understand re.sub Erik <python@lucidity.plus.com> - 2015-12-01 21:31 +0000
| From | Mr Zaug <matthew.herzog@gmail.com> |
|---|---|
| Date | 2015-11-29 13:36 -0800 |
| Subject | I can't understand re.sub |
| Message-ID | <af27abe4-f81e-4d44-a504-c58d9e71986a@googlegroups.com> |
I need to use re.sub to replace strings in a text file. I can't seem to understand how to use the re module to this end. result = re.sub(pattern, repl, string, count=0, flags=0); I think I understand that pattern is the regex I'm searching for and repl is the thing I want to substitute for whatever pattern finds but what is string? The items I'm searching for are few and they do not change. They are "CONTENT_PATH", "ENV" and "NNN". These appear on a few lines in a template file. They do not appear together on any line and they only appear once on each line. This should be simple, right?
[toc] | [next] | [standalone]
| From | Denis McMahon <denismfmcmahon@gmail.com> |
|---|---|
| Date | 2015-11-29 22:01 +0000 |
| Message-ID | <n3fsju$348$2@dont-email.me> |
| In reply to | #99700 |
On Sun, 29 Nov 2015 13:36:57 -0800, Mr Zaug wrote:
> result = re.sub(pattern, repl, string, count=0, flags=0);
re.sub works on a string, not on a file.
Read the file to a string, pass it in as the string.
Or pre-compile the search pattern(s) and process the file line by line:
import re
patts = [
(re.compile("axe"), "hammer"),
(re.compile("cat"), "dog"),
(re.compile("tree"), "fence")
]
with open("input.txt","r") as inf, open("output.txt","w") as ouf:
line = inf.readline()
for patt in patts:
line = patt[0].sub(patt[1], line)
ouf.write(line)
Not tested, but I think it should do the trick.
Or use a single patt and a replacement func:
import re
patt = re.compile("(axe)|(cat)|(tree)")
def replfunc(match):
if match == 'axe':
return 'hammer'
if match == 'cat':
return 'dog'
if match == 'tree':
return 'fence'
return match
with open("input.txt","r") as inf, open("output.txt","w") as ouf:
line = inf.readline()
line = patt.sub(replfunc, line)
ouf.write(line)
(also not tested)
--
Denis McMahon, denismfmcmahon@gmail.com
[toc] | [prev] | [next] | [standalone]
| From | Mr Zaug <matthew.herzog@gmail.com> |
|---|---|
| Date | 2015-11-29 17:20 -0800 |
| Message-ID | <58af2723-cd82-4ce5-a6fd-fbe31d4bf692@googlegroups.com> |
| In reply to | #99702 |
Thanks. That does help quite a lot.
[toc] | [prev] | [next] | [standalone]
| From | Rick Johnson <rantingrickjohnson@gmail.com> |
|---|---|
| Date | 2015-11-29 17:12 -0800 |
| Message-ID | <feee81b6-2549-4bfa-b741-35da861a0317@googlegroups.com> |
| In reply to | #99700 |
On Sunday, November 29, 2015 at 3:37:34 PM UTC-6, Mr Zaug wrote: > The items I'm searching for are few and they do not change. They are "CONTENT_PATH", "ENV" and "NNN". These appear on a few lines in a template file. They do not appear together on any line and they only appear once on each line. This should be simple, right? Yes. In fact so simple that string methods and a "for loop" will suffice. Using regexps for this tasks would be like using a dump truck to haul a teaspoon of salt.
[toc] | [prev] | [next] | [standalone]
| From | Mr Zaug <matthew.herzog@gmail.com> |
|---|---|
| Date | 2015-11-29 17:24 -0800 |
| Message-ID | <967ecfa3-b240-44d6-9a75-bbd9f3865da4@googlegroups.com> |
| In reply to | #99706 |
On Sunday, November 29, 2015 at 8:12:25 PM UTC-5, Rick Johnson wrote: > On Sunday, November 29, 2015 at 3:37:34 PM UTC-6, Mr Zaug wrote: > > > The items I'm searching for are few and they do not change. They are "CONTENT_PATH", "ENV" and "NNN". These appear on a few lines in a template file. They do not appear together on any line and they only appear once on each line. This should be simple, right? > > Yes. In fact so simple that string methods and a "for loop" will suffice. Using regexps for this tasks would be like using a dump truck to haul a teaspoon of salt. I rarely get a chance to do any scripting so yeah, I stink at it. Ideally I would have a script that will spit out a config file such as 087_pre-prod_snakeoil_farm.any and not need to manually rename said output file.
[toc] | [prev] | [next] | [standalone]
| From | Erik <python@lucidity.plus.com> |
|---|---|
| Date | 2015-11-29 21:53 +0000 |
| Message-ID | <mailman.26.1448872519.14615.python-list@python.org> |
| In reply to | #99700 |
On 29/11/15 21:36, Mr Zaug wrote:
> I need to use re.sub to replace strings in a text file.
Do you? Is there any other way?
> result = re.sub(pattern, repl, string, count=0, flags=0);
>
> I think I understand that pattern is the regex I'm searching for and
> repl is the thing I want to substitute for whatever pattern finds but
> what is string?
Where do you think the function gets the string you want to transform from?
> This should be simple, right?
It is. And it could be even simpler if you don't bother with regexes at
all (if your input is as fixed as you say it is):
>>> foo = "foo bar baz spam CONTENT_PATH bar spam"
>>> ' Substitute '.join(foo.split(' CONTENT_PATH ', 1))
'foo bar baz spam Substitute bar spam'
>>>
E.
[toc] | [prev] | [next] | [standalone]
| From | Jussi Piitulainen <harvesting@is.invalid> |
|---|---|
| Date | 2015-11-30 10:51 +0200 |
| Message-ID | <lf54mg3eupq.fsf@ling.helsinki.fi> |
| In reply to | #99728 |
Erik writes:
> On 29/11/15 21:36, Mr Zaug wrote:
>> This should be simple, right?
>
> It is. And it could be even simpler if you don't bother with regexes
> at all (if your input is as fixed as you say it is):
>
> >>> foo = "foo bar baz spam CONTENT_PATH bar spam"
> >>> ' Substitute '.join(foo.split(' CONTENT_PATH ', 1))
> 'foo bar baz spam Substitute bar spam'
Surely the straight thing to say is:
>>> foo.replace(' CONTENT_PATH ', ' Substitute ')
'foo bar baz spam Substitute bar spam'
But there was no guarantee of spaces around the target. If you wish to,
say, replace "spam" in your foo with "REDACTED" but leave it intact in
"May the spammer be prosecuted", a regex might be attractive after all.
[toc] | [prev] | [next] | [standalone]
| From | Erik <python@lucidity.plus.com> |
|---|---|
| Date | 2015-12-01 01:26 +0000 |
| Message-ID | <mailman.49.1448933226.14615.python-list@python.org> |
| In reply to | #99731 |
On 30/11/15 08:51, Jussi Piitulainen wrote:
> Surely the straight thing to say is:
>
> >>> foo.replace(' CONTENT_PATH ', ' Substitute ')
> 'foo bar baz spam Substitute bar spam'
Not quite the same thing (but yes, with a third argument of 1, it would be).
> But there was no guarantee of spaces around the target.
I know. It was just an example to show that there might be an option
that's not a regex for the specific use indicated. It's up to the OP to
decide whether they think the spaces (or any other, or no, delimiter)
would actually be required or useful. Or whether they really prefer a
regex after all.
> If you wish to,
> say, replace "spam" in your foo with "REDACTED" but leave it intact in
> "May the spammer be prosecuted", a regex might be attractive after all.
But that's not what the OP said they wanted to do. They said everything
was very fixed - they did not want a general purpose human language text
processing solution ... ;)
E.
[toc] | [prev] | [next] | [standalone]
| From | Jussi Piitulainen <harvesting@is.invalid> |
|---|---|
| Date | 2015-12-01 07:28 +0200 |
| Message-ID | <lf5r3j6ka9q.fsf@ling.helsinki.fi> |
| In reply to | #99762 |
Erik writes: > On 30/11/15 08:51, Jussi Piitulainen wrote: [- -] >> If you wish to, >> say, replace "spam" in your foo with "REDACTED" but leave it intact in >> "May the spammer be prosecuted", a regex might be attractive after all. > > But that's not what the OP said they wanted to do. They said > everything was very fixed - they did not want a general purpose human > language text processing solution ... ;) Language processing is not what I had in mind here. Merely this, that there is some sort of word boundary, be it punctuation, whitespace, or an end of the string: >>> re.sub(r'\bspam\b', '****', 'spamalot spam') 'spamalot ****' That's not perfect either, but it's simple and might be somewhat proportional to the problem. A real solution should be aware of the actual structure of those lines, assuming they follow some defined syntax.
[toc] | [prev] | [next] | [standalone]
| From | Erik <python@lucidity.plus.com> |
|---|---|
| Date | 2015-12-01 21:31 +0000 |
| Message-ID | <mailman.85.1449005519.14615.python-list@python.org> |
| In reply to | #99768 |
On 01/12/15 05:28, Jussi Piitulainen wrote: > A real solution should be aware of the actual structure of those lines, > assuming they follow some defined syntax. I think that we are in violent agreement on this ;) E.
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web