Groups > comp.lang.python > #85939 > unrolled thread

Python - parsing nested information and provide it in proper format from log file

Started by	Jay T <jt11378@gmail.com>
First post	2015-02-19 18:42 -0800
Last post	2015-02-20 15:27 +0100
Articles	4 — 3 participants

Back to article view | Back to comp.lang.python

  Python - parsing nested information and provide it in proper format from log file Jay T <jt11378@gmail.com> - 2015-02-19 18:42 -0800
    Re: Python - parsing nested information and provide it in proper format from log file Peter Otten <__peter__@web.de> - 2015-02-20 14:10 +0100
      Re: Python - parsing nested information and provide it in proper format from log file jt11380@gmail.com - 2015-02-20 05:31 -0800
        Re: Python - parsing nested information and provide it in proper format from log file Peter Otten <__peter__@web.de> - 2015-02-20 15:27 +0100

#85939 — Python - parsing nested information and provide it in proper format from log file

From	Jay T <jt11378@gmail.com>
Date	2015-02-19 18:42 -0800
Subject	Python - parsing nested information and provide it in proper format from log file
Message-ID	<0097dab0-301c-42e1-a6be-b21eb5356567@googlegroups.com>

 have some log file which has nested data which i want to filter and provide specific for student with total counts

Here is my log file sample: 
Student name is ABC 
Student age is 12 
student was late 
student was late 
student was late 
Student name is DEF 
student age is 13 
student was late 
student was late

i want to parse and show data as Student name, student age , number of counts how many times student was late e:g 
Name Age TotalCount 
ABC 12   3 
DEF 13    2

Please help me with solution that will be really grateful.

thanks, Jt

[toc] | [next] | [standalone]

#85968

From	Peter Otten <__peter__@web.de>
Date	2015-02-20 14:10 +0100
Message-ID	<mailman.18920.1424437896.18130.python-list@python.org>
In reply to	#85939

Jay T wrote:

>  have some log file which has nested data which i want to filter and
>  provide specific for student with total counts
> 
> Here is my log file sample:
> Student name is ABC
> Student age is 12
> student was late
> student was late
> student was late
> Student name is DEF
> student age is 13
> student was late
> student was late
> 
> i want to parse and show data as Student name, student age , number of
> counts how many times student was late e:g Name Age TotalCount
> ABC 12   3
> DEF 13    2
> 
> Please help me with solution that will be really grateful.

What have you tried? Please show us some code.

The basic idea would be to iterate over the lines and split the current line 
into words. 

If the second word is "name" and it's not the first iteration print the 
student's name, age, and was_late count. Then set the name variable to the 
new name and reset age and was_late to 0. To detect the first iteration you 
can set 

name = None

before you enter the loop and then check for that value before printing:

if name is not None:
    ... # print student data

If the second word is "age" convert the 4th word to integer and set the age 
variable.

If the second word is "was" increment the was_late counter.

Remember that when the loop ends and the file was not empty you have one 
more student's data to print.

[toc] | [prev] | [next] | [standalone]

#85970

From	jt11380@gmail.com
Date	2015-02-20 05:31 -0800
Message-ID	<03cf1f25-b31b-4c48-96c9-be86a2ecdbc8@googlegroups.com>
In reply to	#85968

On Friday, February 20, 2015 at 8:11:59 AM UTC-5, Peter Otten wrote:
> Jay T wrote:
> 
> >  have some log file which has nested data which i want to filter and
> >  provide specific for student with total counts
> > 
> > Here is my log file sample:
> > Student name is ABC
> > Student age is 12
> > student was late
> > student was late
> > student was late
> > Student name is DEF
> > student age is 13
> > student was late
> > student was late
> > 
> > i want to parse and show data as Student name, student age , number of
> > counts how many times student was late e:g Name Age TotalCount
> > ABC 12   3
> > DEF 13    2
> > 
> > Please help me with solution that will be really grateful.
> 
> What have you tried? Please show us some code.
> 
> The basic idea would be to iterate over the lines and split the current line 
> into words. 
> 
> If the second word is "name" and it's not the first iteration print the 
> student's name, age, and was_late count. Then set the name variable to the 
> new name and reset age and was_late to 0. To detect the first iteration you 
> can set 
> 
> name = None
> 
> before you enter the loop and then check for that value before printing:
> 
> if name is not None:
>     ... # print student data
> 
> If the second word is "age" convert the 4th word to integer and set the age 
> variable.
> 
> If the second word is "was" increment the was_late counter.
> 
> Remember that when the loop ends and the file was not empty you have one 
> more student's data to print.

I tried to implent below code and got stucked how to do nested loop to count instead doing another logic and parsing:

import re
def GetName(input_string):
                  myName=input_string.split()
                  myName1= myName[1]
                  return myName1
def GetAge(input_string):
                  myAge=input_string.split()
                  myAge1= myAge[2]
                  return myAge1
 
                  
        
                     
file = open('mylogfile')
log_data = file.readlines()
print 'entered'
for eachline in log_data:
           input_string = eachline
           if 'name' in input_string:
                    sometextval = GetName(input_string)
                    print "name", sometextval
           if 'Age' in input_string:
                     sometextval2 = GetAge(input_string)
                     print "Age", sometextval2

Now get stuck to get count for total_late time as it is part of name, age so how to write logic which counts as a part of group.

any help will be grateful.

-J

[toc] | [prev] | [next] | [standalone]

#85972

From	Peter Otten <__peter__@web.de>
Date	2015-02-20 15:27 +0100
Message-ID	<mailman.18922.1424442491.18130.python-list@python.org>
In reply to	#85970

jt11380@gmail.com wrote:

> On Friday, February 20, 2015 at 8:11:59 AM UTC-5, Peter Otten wrote:
>> Jay T wrote:
>> 
>> >  have some log file which has nested data which i want to filter and
>> >  provide specific for student with total counts
>> > 
>> > Here is my log file sample:
>> > Student name is ABC
>> > Student age is 12
>> > student was late
>> > student was late
>> > student was late
>> > Student name is DEF
>> > student age is 13
>> > student was late
>> > student was late
>> > 
>> > i want to parse and show data as Student name, student age , number of
>> > counts how many times student was late e:g Name Age TotalCount
>> > ABC 12   3
>> > DEF 13    2
>> > 
>> > Please help me with solution that will be really grateful.
>> 
>> What have you tried? Please show us some code.
>> 
>> The basic idea would be to iterate over the lines and split the current
>> line into words.
>> 
>> If the second word is "name" and it's not the first iteration print the
>> student's name, age, and was_late count. Then set the name variable to
>> the new name and reset age and was_late to 0. To detect the first
>> iteration you can set
>> 
>> name = None
>> 
>> before you enter the loop and then check for that value before printing:
>> 
>> if name is not None:
>>     ... # print student data
>> 
>> If the second word is "age" convert the 4th word to integer and set the
>> age variable.
>> 
>> If the second word is "was" increment the was_late counter.
>> 
>> Remember that when the loop ends and the file was not empty you have one
>> more student's data to print.


> Now get stuck to get count for total_late time as it is part of name, age
> so how to write logic which counts as a part of group.
> 
> any help will be grateful.

Try to write code that does what I describe in my outline. Initialise name 
before the loop and dump the collected data when you encounter a new name.

name = None
total_late = 0
age = "unknown"

with open("student.txt") as instream:
    for line in instream:
        words = line.split()
        if words[1] == "name":
            if name is not None:
                print name, age, total_late
            name = " ".join(words[3:])
            age = "unknown"
            total_late = 0
        elif words[1] == "was":
            total_late += 1
        elif words[1] == "age":
            age = int(words[3])
        else:
            print "don't know what to do with line %r" %line
if name is not None:
    print name, age, total_late

Checking whole words has the advantage that there will be no match if the 
string "name" or "age" is part of the student's name.

> I tried to implent below code and got stucked how to do nested loop to
> count instead doing another logic and parsing:
> 
> import re
> def GetName(input_string):
>                   myName=input_string.split()
>                   myName1= myName[1]

That's the wrong index.

>                   return myName1
> def GetAge(input_string):
>                   myAge=input_string.split()
>                   myAge1= myAge[2]

That's the wrong index.

>                   return myAge1

In general you should test your functions independently from the whole 
program. That way you can build on known-good components and thus reduce the 
area where to look for remaining bugs.
                      
> file = open('mylogfile')
> log_data = file.readlines()

The file is probably short enough that it doesn't matter here, but iterating 
over the file directly is good habit to get into. Example:

with open("mylogfile") as log_data:
   for eachline in log_data:
       ...

> print 'entered'
> for eachline in log_data:
>            input_string = eachline
>            if 'name' in input_string:
>                     sometextval = GetName(input_string)
>                     print "name", sometextval
>            if 'Age' in input_string:

This test is problematic because Python takes case into account when 
comparing strings:

>>> "age" == "AGE"
False
>>> s = "RAGE"
>>> "age" in s
False

If case isn't consistent you should convert the string to lowercase:

>>> "age" == "AGE".lower()
True
>>> "age" in s.lower()
True

>                      sometextval2 = GetAge(input_string)
>                      print "Age", sometextval2
>

[toc] | [prev] | [standalone]

csiph-web

Python - parsing nested information and provide it in proper format from log file

Contents

#85939 — Python - parsing nested information and provide it in proper format from log file

#85968

#85970

#85972