Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #49645 > unrolled thread

Parsing Text file

Started bysas429s@gmail.com
First post2013-07-02 11:45 -0700
Last post2013-07-02 21:28 +0100
Articles 9 — 5 participants

Back to article view | Back to comp.lang.python


Contents

  Parsing Text file sas429s@gmail.com - 2013-07-02 11:45 -0700
    Re: Parsing Text file Neil Cerutti <neilc@norwich.edu> - 2013-07-02 19:24 +0000
      Re: Parsing Text file sas429s@gmail.com - 2013-07-02 12:30 -0700
        Re: Parsing Text file Tobiah <toby@tobiah.org> - 2013-07-02 12:50 -0700
          Re: Parsing Text file Neil Cerutti <neilc@norwich.edu> - 2013-07-02 20:12 +0000
            Re: Parsing Text file sas429s@gmail.com - 2013-07-02 13:28 -0700
              Re: Parsing Text file Joshua Landau <joshua.landau.ws@gmail.com> - 2013-07-02 21:56 +0100
              Re: Parsing Text file Denis McMahon <denismfmcmahon@gmail.com> - 2013-07-03 00:55 +0000
          Re: Parsing Text file Joshua Landau <joshua.landau.ws@gmail.com> - 2013-07-02 21:28 +0100

#49645 — Parsing Text file

Fromsas429s@gmail.com
Date2013-07-02 11:45 -0700
SubjectParsing Text file
Message-ID<08ae2828-1532-47b6-a9cb-208549189467@googlegroups.com>
I have a text file like this:

Sometext
Somemore
Somemore
maskit

Sometext
Somemore
Somemore
Somemore
maskit

Sometext
Somemore
maskit

I want to search for the string maskit in this file and also need to print Sometext above it..SOmetext location can vary as you can see above.

In the first instance it is 3 lines above mask it, in the second instance it is 4 lines above it and so on..

Please help how to do it?

[toc] | [next] | [standalone]


#49647

FromNeil Cerutti <neilc@norwich.edu>
Date2013-07-02 19:24 +0000
Message-ID<b3gnnaFbe60U1@mid.individual.net>
In reply to#49645
On 2013-07-02, sas429s@gmail.com <sas429s@gmail.com> wrote:
> I have a text file like this:
>
> Sometext
> Somemore
> Somemore
> maskit
>
> Sometext
> Somemore
> Somemore
> Somemore
> maskit
>
> Sometext
> Somemore
> maskit
>
> I want to search for the string maskit in this file and also
> need to print Sometext above it..SOmetext location can vary as
> you can see above.
>
> In the first instance it is 3 lines above mask it, in the
> second instance it is 4 lines above it and so on..
>
> Please help how to do it?

How can you tell the difference between Sometext and Somemore?

-- 
Neil Cerutti

[toc] | [prev] | [next] | [standalone]


#49648

Fromsas429s@gmail.com
Date2013-07-02 12:30 -0700
Message-ID<8ea32ea7-2cee-4e61-8cbd-066721d88d4a@googlegroups.com>
In reply to#49647
Somemore can be anything for instance:

Sometext
mail
maskit

Sometext
rupee
dollar
maskit

and so on..

Is there a way I can achieve this?

On Tuesday, July 2, 2013 2:24:26 PM UTC-5, Neil Cerutti wrote:
> On 2013-07-02, sas429s@gmail.com <sas429s@gmail.com> wrote:
> 
> > I have a text file like this:
> 
> >
> 
> > Sometext
> 
> > Somemore
> 
> > Somemore
> 
> > maskit
> 
> >
> 
> > Sometext
> 
> > Somemore
> 
> > Somemore
> 
> > Somemore
> 
> > maskit
> 
> >
> 
> > Sometext
> 
> > Somemore
> 
> > maskit
> 
> >
> 
> > I want to search for the string maskit in this file and also
> 
> > need to print Sometext above it..SOmetext location can vary as
> 
> > you can see above.
> 
> >
> 
> > In the first instance it is 3 lines above mask it, in the
> 
> > second instance it is 4 lines above it and so on..
> 
> >
> 
> > Please help how to do it?
> 
> 
> 
> How can you tell the difference between Sometext and Somemore?
> 
> 
> 
> -- 
> 
> Neil Cerutti

[toc] | [prev] | [next] | [standalone]


#49650

FromTobiah <toby@tobiah.org>
Date2013-07-02 12:50 -0700
Message-ID<I9GAt.794$ct1.646@newsfe07.iad>
In reply to#49648
On 07/02/2013 12:30 PM, sas429s@gmail.com wrote:
> Somemore can be anything for instance:
>
> Sometext
> mail
> maskit
>
> Sometext
> rupee
> dollar
> maskit
>
> and so on..
>
> Is there a way I can achieve this?

How do we know whether we have Sometext?
If it's really just a literal 'Sometext', then
just print that when you hit maskit.

Otherwise:


for line in open('file.txt').readlines():
	
	if is_sometext(line):
		memory = line

	if line == 'maskit':
		print memory

[toc] | [prev] | [next] | [standalone]


#49651

FromNeil Cerutti <neilc@norwich.edu>
Date2013-07-02 20:12 +0000
Message-ID<b3gqi7FbvfsU1@mid.individual.net>
In reply to#49650
On 2013-07-02, Tobiah <toby@tobiah.org> wrote:
> On 07/02/2013 12:30 PM, sas429s@gmail.com wrote:
>> Somemore can be anything for instance:
>>
>> Sometext
>> mail
>> maskit
>>
>> Sometext
>> rupee
>> dollar
>> maskit
>>
>> and so on..
>>
>> Is there a way I can achieve this?
>
> How do we know whether we have Sometext?
> If it's really just a literal 'Sometext', then
> just print that when you hit maskit.
>
> Otherwise:
>
>
> for line in open('file.txt').readlines():
> 	
> 	if is_sometext(line):
> 		memory = line
>
> 	if line == 'maskit':
> 		print memory

Tobiah's solution fits what little we can make of your problem.

My feeling is that you've simplified your question a little too
much in hopes that it would help us provide a better solution.
Can you provide more context? 

-- 
Neil Cerutti

[toc] | [prev] | [next] | [standalone]


#49652

Fromsas429s@gmail.com
Date2013-07-02 13:28 -0700
Message-ID<7e82becd-77c1-4800-8f4e-7624b19de82b@googlegroups.com>
In reply to#49651
Ok here is a snippet of the text file I have:

config/meal/governor_mode_config.h
  #define GOVERNOR_MODE_TASK_RATE         SSS_TID_0015MSEC
  #define GOVERNOR_MODE_WORK_MODE_MASK    (CEAL_MODE_WORK_MASK_GEAR| \
                                           CEAL_MODE_WORK_MASK_PARK_BRAKE | \
                                           CEAL_MODE_WORK_MASK_VEHICLE_SPEED)
  #define GOVERNOR_MODE_IDLE_CHECK        FALSE
  #define GOVERNOR_MODE_SPD_THRES         50
  #define GOVERNOR_MODE_SPDDES_THRES      10

config/meal/components/source/kso_aic_core_config.h
  #define CEAL_KSO_AIC_CORE_TASK_RATE          SSS_TID_0120MSEC
  #define CEAL_KSO_AIC_LOAD_FAC_AVG_TIME       300
  #define CEAL_KSO_AIC_LOAD_FAC_HYST_TIME      30
  #define CEAL_KSO_AIC_TEMP_DPF_INSTALLED      TRUE
  #define CEAL_KSO_AIC_TEMP_DPF_ENABLE         450
  #define CEAL_KSO_AIC_TEMP_DPF_HYST           25
  #define CEAL_KSO_AIC_DPF_ROC_TIME            10
  #define CEAL_KSO_AIC_TEMP_EXHAUST_INSTALLED  FALSE
  #define CEAL_KSO_AIC_TEMP_EXHAUST_ENABLE     275
  #define CEAL_KSO_AIC_TEMP_EXHAUST_HYST       25
  #define CEAL_KSO_AIC_EXHAUST_ROC_TIME        10
  #define CEAL_KSO_AIC_WORK_MODE_MASK   (CEAL_MODE_WORK_MASK_GEAR       | \
                                   CEAL_MODE_WORK_MASK_PARK_BRAKE | \
                                   CEAL_MODE_WORK_MASK_VEHICLE_SPEED)
  #define CEAL_KSO_AIC_OV_TIME                 15

Here I am looking for the line that contains: "WORK_MODE_MASK", I want to print that line as well as the file name above it: config/meal/governor_mode_config.h
or config/meal/components/source/ceal_PackD_kso_aic_core_config.h.

SO the output should be something like this:
config/meal/governor_mode_config.h

#define GOVERNOR_MODE_WORK_MODE_MASK    (CEAL_MODE_WORK_MASK_GEAR| \
                                           CEAL_MODE_WORK_MASK_PARK_BRAKE | \
                                           CEAL_MODE_WORK_MASK_VEHICLE_SPEED)

config/meal/components/source/kso_aic_core_config.h
#define CEAL_KSO_AIC_WORK_MODE_MASK   (CEAL_MODE_WORK_MASK_GEAR       | \
                                   CEAL_MODE_WORK_MASK_PARK_BRAKE | \
                                   CEAL_MODE_WORK_MASK_VEHICLE_SPEED)

I hope this helps..

Thanks for your help


On Tuesday, July 2, 2013 3:12:55 PM UTC-5, Neil Cerutti wrote:
> On 2013-07-02, Tobiah <toby@tobiah.org> wrote:
> 
> > On 07/02/2013 12:30 PM, sas429s@gmail.com wrote:
> 
> >> Somemore can be anything for instance:
> 
> >>
> 
> >> Sometext
> 
> >> mail
> 
> >> maskit
> 
> >>
> 
> >> Sometext
> 
> >> rupee
> 
> >> dollar
> 
> >> maskit
> 
> >>
> 
> >> and so on..
> 
> >>
> 
> >> Is there a way I can achieve this?
> 
> >
> 
> > How do we know whether we have Sometext?
> 
> > If it's really just a literal 'Sometext', then
> 
> > just print that when you hit maskit.
> 
> >
> 
> > Otherwise:
> 
> >
> 
> >
> 
> > for line in open('file.txt').readlines():
> 
> > 	
> 
> > 	if is_sometext(line):
> 
> > 		memory = line
> 
> >
> 
> > 	if line == 'maskit':
> 
> > 		print memory
> 
> 
> 
> Tobiah's solution fits what little we can make of your problem.
> 
> 
> 
> My feeling is that you've simplified your question a little too
> 
> much in hopes that it would help us provide a better solution.
> 
> Can you provide more context? 
> 
> 
> 
> -- 
> 
> Neil Cerutti

[toc] | [prev] | [next] | [standalone]


#49656

FromJoshua Landau <joshua.landau.ws@gmail.com>
Date2013-07-02 21:56 +0100
Message-ID<mailman.4126.1372798662.3114.python-list@python.org>
In reply to#49652
On 2 July 2013 21:28,  <sas429s@gmail.com> wrote:
> Here I am looking for the line that contains: "WORK_MODE_MASK", I want to print that line as well as the file name above it: config/meal/governor_mode_config.h
> or config/meal/components/source/ceal_PackD_kso_aic_core_config.h.
>
> SO the output should be something like this:
> config/meal/governor_mode_config.h
>
> #define GOVERNOR_MODE_WORK_MODE_MASK    (CEAL_MODE_WORK_MASK_GEAR| \
>                                            CEAL_MODE_WORK_MASK_PARK_BRAKE | \
>                                            CEAL_MODE_WORK_MASK_VEHICLE_SPEED)
>
> config/meal/components/source/kso_aic_core_config.h
> #define CEAL_KSO_AIC_WORK_MODE_MASK   (CEAL_MODE_WORK_MASK_GEAR       | \
>                                    CEAL_MODE_WORK_MASK_PARK_BRAKE | \
>                                    CEAL_MODE_WORK_MASK_VEHICLE_SPEED)

(Please don't top-post.)

    filename = None

    with open("tmp.txt") as file:
        nonblanklines = (line for line in file if line)

        for line in nonblanklines:
            if line.lstrip().startswith("#define"):
                defn, name, *other = line.split()
                if name.endswith("WORK_MODE_MASK"):
                    print(filename, line, sep="")

            else:
                filename = line

Basically, you loop through remembering what lines you need, match a
little bit and ignore blank lines. If this isn't a solid
specification, you'll 'ave to tell me more about the edge-cases.

You said that

> #define CEAL_KSO_AIC_WORK_MODE_MASK   (CEAL_MODE_WORK_MASK_GEAR       | \
>                                    CEAL_MODE_WORK_MASK_PARK_BRAKE | \
>                                    CEAL_MODE_WORK_MASK_VEHICLE_SPEED)

was one line. If it is not, I suggest doing a pre-process to "wrap"
lines with trailing "\"s before running the algorithm:

    def wrapped(lines):
        wrap = ""
        for line in lines:
            if line.rstrip().endswith("\\"):
                wrap += line

            else:
                yield wrap + line
                wrap = ""

...
        nonblanklines = (line for line in wrapped(file) if line)
...


This doesn't handle all wrapped lines properly, as it leaves the "\"
in so may interfere with matching. That's easily fixable, and there
are many other ways to do this.

What did you try?

[toc] | [prev] | [next] | [standalone]


#49686

FromDenis McMahon <denismfmcmahon@gmail.com>
Date2013-07-03 00:55 +0000
Message-ID<kqvspq$ita$1@dont-email.me>
In reply to#49652
On Tue, 02 Jul 2013 13:28:33 -0700, sas429s wrote:

> Ok here is a snippet of the text file I have:
> I hope this helps..
> .....
> Thanks for your help

ok ... so you need to figure out how best to distinguish the filename, 
then loop through the file, remember each filename as you find it, and 
when you find lines containing your target text, print the current value 
of filename and the target text line.

filenames might be distinguished by one or more of the following:

They always start in column 0 and nothing else starts in column 0
They never contain spaces and all other lines contain spaces or are blank
They always contain at least one / characters
They always terminate with a . followed by one or more characters
All the characters in them are lower case

Then loop through the file in something like the following manner:

open input file;
open output file;
for each line in input file: {
	if line is a filename: {
		thisfile = line; }
	elif line matches search term: {
		print thisfile in output file;
		print line in output file; } }
close input file;
close output file;

(Note this is an algorithm written in a sort of pythonic manner, rather 
than actual python code - also because some newsreaders may break 
indenting etc, I've used ; as line terminators and {} to group blocks)

-- 
Denis McMahon, denismfmcmahon@gmail.com

[toc] | [prev] | [next] | [standalone]


#49653

FromJoshua Landau <joshua.landau.ws@gmail.com>
Date2013-07-02 21:28 +0100
Message-ID<mailman.4123.1372796949.3114.python-list@python.org>
In reply to#49650
On 2 July 2013 20:50, Tobiah <toby@tobiah.org> wrote:
> How do we know whether we have Sometext?
> If it's really just a literal 'Sometext', then
> just print that when you hit maskit.
>
> Otherwise:
>
>
> for line in open('file.txt').readlines():
>
>         if is_sometext(line):
>                 memory = line
>
>         if line == 'maskit':
>                 print memory

My understanding of the question follows more like:

# Python 3, UNTESTED

memory = []
for line in open('file.txt').readlines():
    if line == 'maskit':
        print(*memory, sep="")

    elif line:
        memory.append(line)

    else:
        memory = []

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web