Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #102823 > unrolled thread
| Started by | ltomassmail@gmail.com |
|---|---|
| First post | 2016-02-11 10:07 -0800 |
| Last post | 2016-02-11 20:19 -0500 |
| Articles | 6 — 4 participants |
Back to article view | Back to comp.lang.python
Suggested datatype for getting latest information from log files ltomassmail@gmail.com - 2016-02-11 10:07 -0800
Re: Suggested datatype for getting latest information from log files jmp <jeanmichel@sequans.com> - 2016-02-11 19:16 +0100
Re: Suggested datatype for getting latest information from log files ltomassmail@gmail.com - 2016-02-11 10:25 -0800
Re: Suggested datatype for getting latest information from log files ltomassmail@gmail.com - 2016-02-11 10:31 -0800
Re: Suggested datatype for getting latest information from log files "Martin A. Brown" <martin@linux-ip.net> - 2016-02-11 10:58 -0800
Re: Suggested datatype for getting latest information from log files Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2016-02-11 20:19 -0500
| From | ltomassmail@gmail.com |
|---|---|
| Date | 2016-02-11 10:07 -0800 |
| Subject | Suggested datatype for getting latest information from log files |
| Message-ID | <621f72dc-ad32-48e8-ad37-a2e7eb2d4bd8@googlegroups.com> |
I have timestamped log files I need to read through and keep track of the most upto date information. For example lets say we had a log file timeStamp,name,marblesHeld,timeNow,timeSinceLastEaten I need to keep track of every 'name' in this table, I don't want duplicate values so if values come in from a later timestamp that is different then that needs to get updated. For example if a later timestamp showed 'dave' with less marbles that should get updated. I thought a dictionary would be a good idea because of the key restrictions ensuring no duplicates, so the data would always update - However because they are unordered and I need to do some more processing on the data afterwards I'm having trouble. For example lets assume that once I have the most upto date values from dave,steve,jenny I wanted to do timeNow - timeSinceLastEaten to get an interval then write all the info together to some other database. Crucially order is important here. I don't know of a particular name will appear in the records or not, so it needs to created on the first instance and updated from then on. Could anyone suggest some good approaches or suggested data structures for this? I thought about trying to create an object for each 'name' then check if that object exists and update values within that object. However that seemed like a. overkill b. beyond my Python skills for the timeframe I have
[toc] | [next] | [standalone]
| From | jmp <jeanmichel@sequans.com> |
|---|---|
| Date | 2016-02-11 19:16 +0100 |
| Message-ID | <mailman.57.1455214580.22075.python-list@python.org> |
| In reply to | #102823 |
On 02/11/2016 07:07 PM, ltomassmail@gmail.com wrote: > I thought a dictionary would be a good idea because of the key restrictions ensuring no duplicates, so the data would always update - However because they are unordered and I need to do some more processing on the data afterwards I'm having trouble. If it's your only concern about using dictionaries, then you may have a look at https://docs.python.org/2/library/collections.html#collections.OrderedDict JM
[toc] | [prev] | [next] | [standalone]
| From | ltomassmail@gmail.com |
|---|---|
| Date | 2016-02-11 10:25 -0800 |
| Message-ID | <fdff52e9-da6d-4a0d-b3e7-19d92e2793d5@googlegroups.com> |
| In reply to | #102824 |
On Thursday, February 11, 2016 at 6:16:35 PM UTC, jmp wrote:
> On 02/11/2016 07:07 PM, ltomassmail@gmail.com wrote:
> > I thought a dictionary would be a good idea because of the key restrictions ensuring no duplicates, so the data would always update - However because they are unordered and I need to do some more processing on the data afterwards I'm having trouble.
>
> If it's your only concern about using dictionaries, then you may have a
> look at
> https://docs.python.org/2/library/collections.html#collections.OrderedDict
>
> JM
I did look into that but I'm trying to do something like this which doesn't work - I guess I'm struggling a little with the implementation.
fillinfo = {}
fillInfo['name'] = OrderedDict('info1','info2','info3','info4','info5',)
[toc] | [prev] | [next] | [standalone]
| From | ltomassmail@gmail.com |
|---|---|
| Date | 2016-02-11 10:31 -0800 |
| Message-ID | <80862362-7edb-403d-8854-7b17d3f69cc2@googlegroups.com> |
| In reply to | #102824 |
On Thursday, February 11, 2016 at 6:16:35 PM UTC, jmp wrote:
> On 02/11/2016 07:07 PM, ltomassmail@gmail.com wrote:
> > I thought a dictionary would be a good idea because of the key restrictions ensuring no duplicates, so the data would always update - However because they are unordered and I need to do some more processing on the data afterwards I'm having trouble.
>
> If it's your only concern about using dictionaries, then you may have a
> look at
> https://docs.python.org/2/library/collections.html#collections.OrderedDict
>
> JM
I did look into this but struggling a little with the implementation, currently trying to do something like this which doesn't work:
fillInfo = {}
p = re.compile('PATTERN')
with (open(path,'r')) as f:
for row in f:
m = p.search(row)
if m == None:
continue
else:
fillInfo[m.group(5)] = OrderedDict(m.group(1),m.group(2),m.group(3),m.group(4),m.group(6))
[toc] | [prev] | [next] | [standalone]
| From | "Martin A. Brown" <martin@linux-ip.net> |
|---|---|
| Date | 2016-02-11 10:58 -0800 |
| Message-ID | <mailman.58.1455217124.22075.python-list@python.org> |
| In reply to | #102823 |
Greetings,
>I have timestamped log files I need to read through and keep track
>of the most upto date information.
>
>For example lets say we had a log file
>
>timeStamp,name,marblesHeld,timeNow,timeSinceLastEaten
I do not quite understand the distinction between timeStamp and
timeNow.
>I need to keep track of every 'name' in this table, I don't want
>duplicate values so if values come in from a later timestamp that
>is different then that needs to get updated. For example if a later
>timestamp showed 'dave' with less marbles that should get updated.
>
>I thought a dictionary would be a good idea because of the key
>restrictions ensuring no duplicates, so the data would always
>update -
Yes. A dictionary seems reasonable.
>However because they are unordered and I need to do some more
>processing on the data afterwards I'm having trouble.
Ordered how? For each name, you need to keep the stream of data
ordered? This is what I'm assuming based on your problem
description.
If the order of names (dave, steve and jenny) is important, then you
should look to OrderedDict as JM has suggested.
I am inferring from your description that the order of events (along
a timeline) is what is important, not the sequence of players to
each other(, since that is already in the logfile).
>For example lets assume that once I have the most upto date values
>from dave,steve,jenny I wanted to do timeNow - timeSinceLastEaten
>to get an interval then write all the info together to some other
>database. Crucially order is important here.
Again, it's not utterly clear what "order" means. If order of
events for a single player is important, then see below.
>I don't know of a particular name will appear in the records or
>not, so it needs to created on the first instance and updated from
>then on.
Again, a dictionary is great for this.
It seems that you could benefit, also from a list (to store an event
and the time at which the event occurred). But, you don't want to
store all of history, so you want to use a bounded length list. You
may find a collections.deque useful here.
>Could anyone suggest some good approaches or suggested data
>structures for this?
First, JM already pointed you to OrderedDict, which may help
depending on exactly what you are trying to order.
There are two other data structures in the collections module that
may be helpful for you. I perceive the following (from your
description).
You have a set of names (players).
You wish to store, for each name, a value (marblesHeld).
You wish to store, for each name, a value (timeSinceLastEaten).
I recommend learning how to use both:
collections.defaultdict [0]: so you can dynamically create
entries for new players in the marble game without checking if
they already exist in the dictionary (very convenient!)
collectionst.deque [1]: in this case, I'm suggesting using it as
a bounded-length list; you keep adding stuff to it and after
it stores X entries, the old ones will "fall off"
Note, I fabricated players and data, but the bit that you are
probably interested in is the interaction between the dictionary,
whose keys are the names of the players, and whose values contain
the deque capturing (the last 10 entries) of the users marble count
and the time at which this occurred.
mydeque = functools.partial(collections.deque, maxlen=10)
record = collections.defaultdict(mydeque)
Storing both the marble count and the time will allow you to
calculate at any time later the duration since the user last had a
marble count change.
I don't understand how the eating fits into your problem, but maybe
my code (below) will afford you an example of how to approach the
problem with a few of Python's wonderfully convenient standard
library data structures.
Good luck,
-Martin
P.S. I just read your reply to JM, and it looks like you also are
trying to figure out how to read the input data. Is it CSV? Could
you simply use the csv module [2]?
[0] https://docs.python.org/3/library/collections.html#collections.defaultdict
[1] https://docs.python.org/3/library/collections.html#collections.deque
[2] https://docs.python.org/3/library/csv.html
#! /usr/bin/python3
import time
import random
import functools
import collections
import pprint
players = ['Steve', 'Jenny', 'Dave', 'Samuel', 'Jerzy', 'Ellen']
mydeque = functools.partial(collections.deque, maxlen=10)
def marblegame(rounds):
record = collections.defaultdict(mydeque)
for _ in range(rounds):
now = time.time()
who = random.choice(players)
marbles = random.randint(0, 100)
record[who].append((marbles, now))
for whom, marblehistory in record.items():
print(whom, end=": ")
pprint.pprint(marblehistory)
if __name__ == '__main__':
import sys
if len(sys.argv) > 1:
count = int(sys.argv[1])
else:
count = 30
marblegame(count)
# -- end of file
--
Martin A. Brown
http://linux-ip.net/
[toc] | [prev] | [next] | [standalone]
| From | Dennis Lee Bieber <wlfraed@ix.netcom.com> |
|---|---|
| Date | 2016-02-11 20:19 -0500 |
| Message-ID | <mailman.64.1455239954.22075.python-list@python.org> |
| In reply to | #102823 |
On Thu, 11 Feb 2016 10:07:21 -0800 (PST), ltomassmail@gmail.com declaimed
the following:
>
>timeStamp,name,marblesHeld,timeNow,timeSinceLastEaten
>
As mentioned, what is the difference between "timeStamp" and "timeNow"?
non-sequitur: are they eating marbles? Otherwise the "marblesHeld" and
"timeSinceLastEaten" seems unrelated.
>
>I thought a dictionary would be a good idea because of the key restrictions ensuring no duplicates, so the data would always update - However because they are unordered and I need to do some more processing on the data afterwards I'm having trouble.
>
Off hand -- my first thought, given the nature of the data -- would be
to just stuff it all into a database, and try to come up with database
queries to produce the needed order/subset information.
>For example lets assume that once I have the most upto date values from dave,steve,jenny I wanted to do timeNow - timeSinceLastEaten to get an interval then write all the info together to some other database. Crucially order is important here.
>
(very pseudo SQL... I'm not even sure the aggregates work that way)
select name, marblesLeft, timeNow - timeSinceLastEaten
group by name having max(timeStamp)
order by timeStamp
--
Wulfraed Dennis Lee Bieber AF6VN
wlfraed@ix.netcom.com HTTP://wlfraed.home.netcom.com/
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web