Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #102823 > unrolled thread

Suggested datatype for getting latest information from log files

Started byltomassmail@gmail.com
First post2016-02-11 10:07 -0800
Last post2016-02-11 20:19 -0500
Articles 6 — 4 participants

Back to article view | Back to comp.lang.python


Contents

  Suggested datatype for getting latest information from log files ltomassmail@gmail.com - 2016-02-11 10:07 -0800
    Re: Suggested datatype for getting latest information from log files jmp <jeanmichel@sequans.com> - 2016-02-11 19:16 +0100
      Re: Suggested datatype for getting latest information from log files ltomassmail@gmail.com - 2016-02-11 10:25 -0800
      Re: Suggested datatype for getting latest information from log files ltomassmail@gmail.com - 2016-02-11 10:31 -0800
    Re: Suggested datatype for getting latest information from log files "Martin A. Brown" <martin@linux-ip.net> - 2016-02-11 10:58 -0800
    Re: Suggested datatype for getting latest information from log files Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2016-02-11 20:19 -0500

#102823 — Suggested datatype for getting latest information from log files

Fromltomassmail@gmail.com
Date2016-02-11 10:07 -0800
SubjectSuggested datatype for getting latest information from log files
Message-ID<621f72dc-ad32-48e8-ad37-a2e7eb2d4bd8@googlegroups.com>
I have timestamped  log files I need to read through and keep track of the most upto date information.

For example lets say we had a log file

timeStamp,name,marblesHeld,timeNow,timeSinceLastEaten

I need to keep track of every 'name' in this table, I don't want duplicate values so if values come in from a later timestamp that is different then that needs to get updated. For example if a later timestamp showed 'dave' with less marbles that should get updated.

I thought a dictionary would be a good idea because of the key restrictions ensuring no duplicates, so the data would always update - However because they are unordered and I need to do some more processing on the data afterwards I'm having trouble.

For example lets assume that once I have the most upto date values from dave,steve,jenny I wanted to do timeNow - timeSinceLastEaten to get an interval then write all the info together to some other database. Crucially order is important here.

I don't know of a particular name will appear in the records or not, so it needs to created on the first instance and updated from then on.

Could anyone suggest some good approaches or suggested data structures for this?

I thought about trying to create an object for each 'name' then check if that object exists and update values within that object. However that seemed like
a. overkill
b. beyond my Python skills for the timeframe I have

[toc] | [next] | [standalone]


#102824

Fromjmp <jeanmichel@sequans.com>
Date2016-02-11 19:16 +0100
Message-ID<mailman.57.1455214580.22075.python-list@python.org>
In reply to#102823
On 02/11/2016 07:07 PM, ltomassmail@gmail.com wrote:
> I thought a dictionary would be a good idea because of the key restrictions ensuring no duplicates, so the data would always update - However because they are unordered and I need to do some more processing on the data afterwards I'm having trouble.

If it's your only concern about using dictionaries, then you may have a 
look  at 
https://docs.python.org/2/library/collections.html#collections.OrderedDict

JM




[toc] | [prev] | [next] | [standalone]


#102827

Fromltomassmail@gmail.com
Date2016-02-11 10:25 -0800
Message-ID<fdff52e9-da6d-4a0d-b3e7-19d92e2793d5@googlegroups.com>
In reply to#102824
On Thursday, February 11, 2016 at 6:16:35 PM UTC, jmp wrote:
> On 02/11/2016 07:07 PM, ltomassmail@gmail.com wrote:
> > I thought a dictionary would be a good idea because of the key restrictions ensuring no duplicates, so the data would always update - However because they are unordered and I need to do some more processing on the data afterwards I'm having trouble.
> 
> If it's your only concern about using dictionaries, then you may have a 
> look  at 
> https://docs.python.org/2/library/collections.html#collections.OrderedDict
> 
> JM

I did look into that but I'm trying to do something like this which doesn't work - I guess I'm struggling a little with the implementation.
fillinfo = {}
fillInfo['name'] = OrderedDict('info1','info2','info3','info4','info5',)

[toc] | [prev] | [next] | [standalone]


#102828

Fromltomassmail@gmail.com
Date2016-02-11 10:31 -0800
Message-ID<80862362-7edb-403d-8854-7b17d3f69cc2@googlegroups.com>
In reply to#102824
On Thursday, February 11, 2016 at 6:16:35 PM UTC, jmp wrote:
> On 02/11/2016 07:07 PM, ltomassmail@gmail.com wrote:
> > I thought a dictionary would be a good idea because of the key restrictions ensuring no duplicates, so the data would always update - However because they are unordered and I need to do some more processing on the data afterwards I'm having trouble.
> 
> If it's your only concern about using dictionaries, then you may have a 
> look  at 
> https://docs.python.org/2/library/collections.html#collections.OrderedDict
> 
> JM

I did look into this but struggling a little with the implementation, currently trying to do something like this which doesn't work:

    fillInfo = {}
    p = re.compile('PATTERN')
    with (open(path,'r')) as f:
        for row in f:
            m = p.search(row)
            if m == None:
                continue
            else:
                fillInfo[m.group(5)] = OrderedDict(m.group(1),m.group(2),m.group(3),m.group(4),m.group(6))

[toc] | [prev] | [next] | [standalone]


#102829

From"Martin A. Brown" <martin@linux-ip.net>
Date2016-02-11 10:58 -0800
Message-ID<mailman.58.1455217124.22075.python-list@python.org>
In reply to#102823
Greetings,

>I have timestamped log files I need to read through and keep track 
>of the most upto date information.
>
>For example lets say we had a log file
>
>timeStamp,name,marblesHeld,timeNow,timeSinceLastEaten

I do not quite understand the distinction between timeStamp and 
timeNow.

>I need to keep track of every 'name' in this table, I don't want 
>duplicate values so if values come in from a later timestamp that 
>is different then that needs to get updated. For example if a later 
>timestamp showed 'dave' with less marbles that should get updated.
>
>I thought a dictionary would be a good idea because of the key 
>restrictions ensuring no duplicates, so the data would always 
>update - 

Yes.  A dictionary seems reasonable.

>However because they are unordered and I need to do some more 
>processing on the data afterwards I'm having trouble.

Ordered how?  For each name, you need to keep the stream of data 
ordered?  This is what I'm assuming based on your problem 
description.

If the order of names (dave, steve and jenny) is important, then you 
should look to OrderedDict as JM has suggested.

I am inferring from your description that the order of events (along 
a timeline) is what is important, not the sequence of players to 
each other(, since that is already in the logfile).

>For example lets assume that once I have the most upto date values 
>from dave,steve,jenny I wanted to do timeNow - timeSinceLastEaten 
>to get an interval then write all the info together to some other 
>database. Crucially order is important here.

Again, it's not utterly clear what "order" means.  If order of 
events for a single player is important, then see below.

>I don't know of a particular name will appear in the records or 
>not, so it needs to created on the first instance and updated from 
>then on.

Again, a dictionary is great for this.

It seems that you could benefit, also from a list (to store an event 
and the time at which the event occurred).  But, you don't want to 
store all of history, so you want to use a bounded length list.  You 
may find a collections.deque useful here.

>Could anyone suggest some good approaches or suggested data 
>structures for this?

First, JM already pointed you to OrderedDict, which may help 
depending on exactly what you are trying to order.

There are two other data structures in the collections module that 
may be helpful for you.  I perceive the following (from your 
description).

You have a set of names (players).
You wish to store, for each name, a value (marblesHeld).
You wish to store, for each name, a value (timeSinceLastEaten).

I recommend learning how to use both:

  collections.defaultdict [0]:  so you can dynamically create 
    entries for new players in the marble game without checking if 
    they already exist in the dictionary (very convenient!)

  collectionst.deque [1]:  in this case, I'm suggesting using it as 
    a bounded-length list; you keep adding stuff to it and after
    it stores X entries, the old ones will "fall off"

Note, I fabricated players and data, but the bit that you are 
probably interested in is the interaction between the dictionary, 
whose keys are the names of the players, and whose values contain 
the deque capturing (the last 10 entries) of the users marble count 
and the time at which this occurred.

  mydeque = functools.partial(collections.deque, maxlen=10)

  record = collections.defaultdict(mydeque)

Storing both the marble count and the time will allow you to
calculate at any time later the duration since the user last had a 
marble count change.

I don't understand how the eating fits into your problem, but maybe 
my code (below) will afford you an example of how to approach the 
problem with a few of Python's wonderfully convenient standard 
library data structures.

Good luck,

-Martin

P.S. I just read your reply to JM, and it looks like you also are 
trying to figure out how to read the input data.  Is it CSV?  Could 
you simply use the csv module [2]?

  [0] https://docs.python.org/3/library/collections.html#collections.defaultdict
  [1] https://docs.python.org/3/library/collections.html#collections.deque
  [2] https://docs.python.org/3/library/csv.html


#! /usr/bin/python3

import time
import random
import functools
import collections

import pprint

players = ['Steve', 'Jenny', 'Dave', 'Samuel', 'Jerzy', 'Ellen']
mydeque = functools.partial(collections.deque, maxlen=10)

def marblegame(rounds):
    record = collections.defaultdict(mydeque)
    for _ in range(rounds):
        now = time.time()
        who = random.choice(players)
        marbles = random.randint(0, 100)
        record[who].append((marbles, now))
    for whom, marblehistory in record.items():
        print(whom, end=": ")
        pprint.pprint(marblehistory)

if __name__ == '__main__':
    import sys
    if len(sys.argv) > 1:
        count = int(sys.argv[1])
    else:
        count = 30
    marblegame(count)

# -- end of file

-- 
Martin A. Brown
http://linux-ip.net/

[toc] | [prev] | [next] | [standalone]


#102838

FromDennis Lee Bieber <wlfraed@ix.netcom.com>
Date2016-02-11 20:19 -0500
Message-ID<mailman.64.1455239954.22075.python-list@python.org>
In reply to#102823
On Thu, 11 Feb 2016 10:07:21 -0800 (PST), ltomassmail@gmail.com declaimed
the following:

>
>timeStamp,name,marblesHeld,timeNow,timeSinceLastEaten
>
	As mentioned, what is the difference between "timeStamp" and "timeNow"?

	non-sequitur: are they eating marbles? Otherwise the "marblesHeld" and
"timeSinceLastEaten" seems unrelated.

>
>I thought a dictionary would be a good idea because of the key restrictions ensuring no duplicates, so the data would always update - However because they are unordered and I need to do some more processing on the data afterwards I'm having trouble.
>
	Off hand -- my first thought, given the nature of the data -- would be
to just stuff it all into a database, and try to come up with database
queries to produce the needed order/subset information.

>For example lets assume that once I have the most upto date values from dave,steve,jenny I wanted to do timeNow - timeSinceLastEaten to get an interval then write all the info together to some other database. Crucially order is important here.
>
	(very pseudo SQL... I'm not even sure the aggregates work that way)
	select name, marblesLeft, timeNow - timeSinceLastEaten
	group by name having max(timeStamp)
	order by timeStamp

-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
    wlfraed@ix.netcom.com    HTTP://wlfraed.home.netcom.com/

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web