Groups > comp.lang.python > #96254 > unrolled thread

textfile: copy between 2 keywords

Started by	Gerald <schweiger.gerald@gmail.com>
First post	2015-09-10 04:18 -0700
Last post	2015-09-10 19:41 +0000
Articles	8 — 7 participants

Back to article view | Back to comp.lang.python

  textfile: copy between 2 keywords Gerald <schweiger.gerald@gmail.com> - 2015-09-10 04:18 -0700
    Re: textfile: copy between 2 keywords Steven D'Aprano <steve@pearwood.info> - 2015-09-10 22:10 +1000
    Re: textfile: copy between 2 keywords Jussi Piitulainen <harvesting@makes.email.invalid> - 2015-09-10 16:47 +0300
    Re: textfile: copy between 2 keywords Vlastimil Brom <vlastimil.brom@gmail.com> - 2015-09-10 16:33 +0200
      Re: textfile: copy between 2 keywords Jussi Piitulainen <harvesting@makes.email.invalid> - 2015-09-10 18:48 +0300
    Re: textfile: copy between 2 keywords Christian Gollwitzer <auriocus@gmx.de> - 2015-09-10 19:29 +0200
      Re: textfile: copy between 2 keywords wxjmfauth@gmail.com - 2015-09-10 12:11 -0700
        Re: textfile: copy between 2 keywords alister <alister.nospam.ware@ntlworld.com> - 2015-09-10 19:41 +0000

#96254 — textfile: copy between 2 keywords

From	Gerald <schweiger.gerald@gmail.com>
Date	2015-09-10 04:18 -0700
Subject	textfile: copy between 2 keywords
Message-ID	<13d875f0-ae8d-43de-85b4-c943a0e7f5e2@googlegroups.com>

Hey,

is there a easy way to copy the content between 2 unique keywords in a .txt file?

example.txt

1, 2, 3, 4
#keyword1
3, 4, 5, 6
2, 3, 4, 5
#keyword2 
4, 5, 6 ,7


Thank you very much

[toc] | [next] | [standalone]

#96255

From	Steven D'Aprano <steve@pearwood.info>
Date	2015-09-10 22:10 +1000
Message-ID	<55f17326$0$1655$c3e8da3$5496439d@news.astraweb.com>
In reply to	#96254

On Thu, 10 Sep 2015 09:18 pm, Gerald wrote:

> Hey,
> 
> is there a easy way to copy the content between 2 unique keywords in a
> .txt file?
> 
> example.txt
> 
> 1, 2, 3, 4
> #keyword1
> 3, 4, 5, 6
> 2, 3, 4, 5
> #keyword2
> 4, 5, 6 ,7
> 
> 
> Thank you very much


Copy in what sense? Write to another file, or just copy to memory?

Either way, your solution will look something like this:

* read each line from the input file, until you reach the first keyword;
* as soon as you see the first keyword, change to "copy mode" and start
copying lines in whatever way you feel is best;
* until you see the second keyword, then stop.


E.g.

with open("input.txt") as f:
    # Skip lines as fast as possible.
    for line in f:
        if line == "START\n":
            break
    # Instead of copying, I'll just print the lines. That's sort of a copy.
    for line in f:  # This will pick up where the previous for loop ended.
        if line == "STOP\n":
            break
        print(line)
    # If you like, you can just finish now.
    # Or, we can read the rest of the lines.
    for line in f:  # continue from just after the STOP keyword.
        pass  # This is a waste of time...
print("Done!")



-- 
Steven

[toc] | [prev] | [next] | [standalone]

#96258

From	Jussi Piitulainen <harvesting@makes.email.invalid>
Date	2015-09-10 16:47 +0300
Message-ID	<lf57fnyo078.fsf@ling.helsinki.fi>
In reply to	#96254

Gerald writes:

> Hey,
>
> is there a easy way to copy the content between 2 unique keywords in a
> .txt file?
>
> example.txt
>
> 1, 2, 3, 4
> #keyword1
> 3, 4, 5, 6
> 2, 3, 4, 5
> #keyword2 
> 4, 5, 6 ,7

Depending on your notion of easy, you may or may not like itertools.
The following code gets you the first keyword and the lines between but
consumes the second keyword. If I needed more control, I'd probably
write what Steven D'Aprano wrote but as a generator function, to get the
flexibility of deciding separately what kind of copy I want in the end.

And I'd be anxious about the possibility that the second keyword is not
there in the input at all. Steven's code and mine simply take every line
after the first keyword in that case. Worth a comment in the code, if
not an exception. Depends.

Code:

from itertools import dropwhile, takewhile
from sys import stdin

def notbeg(line): return line != '#keyword1\n'
def notend(line): return line != '#keyword2 \n' # sic!

if __name__ == '__main__':
    print(list(takewhile(notend, dropwhile(notbeg, stdin))))

Output with your original mail as input in stdin:

['#keyword1\n', '3, 4, 5, 6\n', '2, 3, 4, 5\n']

[toc] | [prev] | [next] | [standalone]

#96261

From	Vlastimil Brom <vlastimil.brom@gmail.com>
Date	2015-09-10 16:33 +0200
Message-ID	<mailman.318.1441895625.8327.python-list@python.org>
In reply to	#96254

2015-09-10 13:18 GMT+02:00 Gerald <schweiger.gerald@gmail.com>:
> Hey,
>
> is there a easy way to copy the content between 2 unique keywords in a .txt file?
>
> example.txt
>
> 1, 2, 3, 4
> #keyword1
> 3, 4, 5, 6
> 2, 3, 4, 5
> #keyword2
> 4, 5, 6 ,7
>
>
> Thank you very much

Hi,
just to add another possible approach, you can use regular expression
search for this task, e.g.
(after you have read the text content to an input string):

>>> import re
>>> input_txt ="""1, 2, 3, 4
... #keyword1
... 3, 4, 5, 6
... 2, 3, 4, 5
... #keyword2
... 4, 5, 6 ,7"""
>>> re.findall(r"(?s)(#keyword1)(.*?)(#keyword2)", input_txt)
[('#keyword1', '\n3, 4, 5, 6\n2, 3, 4, 5\n', '#keyword2')]
>>>

like in the other approaches, you might need to specify the details
for specific cases (no keywords, only one of them, repeated keywords
(possible in different order, overlapping or "crossed"), handling of
newlines etc.

hth,
   vbr

[toc] | [prev] | [next] | [standalone]

#96265

From	Jussi Piitulainen <harvesting@makes.email.invalid>
Date	2015-09-10 18:48 +0300
Message-ID	<lf5wpvymg12.fsf@ling.helsinki.fi>
In reply to	#96261

Vlastimil Brom writes:

> just to add another possible approach, you can use regular expression

Now you have three problems: whatever the two problems are that you are
alleged to have whenever you decide to use regular expressions for
anything at all, plus all the people piling on you to tell that a Jamie
Zawinski once said that whenever you decide to use regular expressions
to solve a problem, you end up with two problems.

:)

[toc] | [prev] | [next] | [standalone]

#96273

From	Christian Gollwitzer <auriocus@gmx.de>
Date	2015-09-10 19:29 +0200
Message-ID	<mssehl$pb8$1@dont-email.me>
In reply to	#96254

Am 10.09.15 um 13:18 schrieb Gerald:
> Hey,
>
> is there a easy way to copy the content between 2 unique keywords in a .txt file?
>
> example.txt
>
> 1, 2, 3, 4
> #keyword1
> 3, 4, 5, 6
> 2, 3, 4, 5
> #keyword2
> 4, 5, 6 ,7

If "copying" does mean copy it to another file, and you are not obliged 
to use Python, this is unmatched in awk:

Apfelkiste:Tests chris$ cat kw.txt
1, 2, 3, 4
#keyword1
3, 4, 5, 6
2, 3, 4, 5
#keyword2
4, 5, 6 ,7
Apfelkiste:Tests chris$ awk '/keyword1/,/keyword2/' kw.txt
#keyword1
3, 4, 5, 6
2, 3, 4, 5
#keyword2

Consequently, awk '/keyword1/,/keyword2/' kw.txt  > kw_copy.txt

would write it out to kw_copy.txt

Beware that between the two slashes there are regexps, so if you have 
metacharacters in your keywords, you need to quote them.

	Christian

[toc] | [prev] | [next] | [standalone]

#96292

From	wxjmfauth@gmail.com
Date	2015-09-10 12:11 -0700
Message-ID	<a41d94a3-5be0-4b95-98ad-f7a00520b984@googlegroups.com>
In reply to	#96273

>>> s = """1, 2, 3, 4
... #keyword1
... 3, 4, 5, 6
... 2, 3, 4, 5
... #keyword2
... 4, 5, 6 ,7"""
>>> s[s.find('keyword1') + len('keyword1'):s.find('keyword2') - 1]
'\n3, 4, 5, 6\n2, 3, 4, 5\n'
>>> #or
>>> s[s.find('keyword1') + len('keyword1') + 1:s.find('keyword2') - 2]
'3, 4, 5, 6\n2, 3, 4, 5'
>>>

[toc] | [prev] | [next] | [standalone]

#96296

From	alister <alister.nospam.ware@ntlworld.com>
Date	2015-09-10 19:41 +0000
Message-ID	<mssmcl$hgu$1@speranza.aioe.org>
In reply to	#96292

On Thu, 10 Sep 2015 12:11:55 -0700, wxjmfauth wrote:

>>>> s = """1, 2, 3, 4
> ... #keyword1 ... 3, 4, 5, 6 ... 2, 3, 4, 5 ... #keyword2 ... 4, 5, 6
> ,7"""
>>>> s[s.find('keyword1') + len('keyword1'):s.find('keyword2') - 1]
> '\n3, 4, 5, 6\n2, 3, 4, 5\n'
>>>> #or s[s.find('keyword1') + len('keyword1') + 1:s.find('keyword2') -
>>>> 2]
> '3, 4, 5, 6\n2, 3, 4, 5'
>>>>

split works well
as a simple 1 liner (well 2 if you include the string setup)

>>>a="crap word1 more crap word1 again word2 still more crap"

>>>a.split('word1',1)[1].split('word2')[0]

' more crap word1 again '



-- 
All bad precedents began as justifiable measures.
		-- Gaius Julius Caesar, quoted in "The Conspiracy of
		   Catiline", by Sallust

[toc] | [prev] | [standalone]

csiph-web

textfile: copy between 2 keywords

Contents

#96254 — textfile: copy between 2 keywords

#96255

#96258

#96261

#96265

#96273

#96292

#96296