Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #52514 > unrolled thread
| Started by | Guy Tamir <guytamir1@gmail.com> |
|---|---|
| First post | 2013-08-14 06:18 -0700 |
| Last post | 2013-08-15 00:16 -0700 |
| Articles | 5 — 3 participants |
Back to article view | Back to comp.lang.python
Reading log and saving data to DB Guy Tamir <guytamir1@gmail.com> - 2013-08-14 06:18 -0700
Re: Reading log and saving data to DB "marduk@python.net" <marduk@python.net> - 2013-08-14 09:46 -0400
Re: Reading log and saving data to DB Guy Tamir <guytamir1@gmail.com> - 2013-08-15 00:23 -0700
Re: Reading log and saving data to DB Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-08-14 18:34 -0400
Re: Reading log and saving data to DB Guy Tamir <guytamir1@gmail.com> - 2013-08-15 00:16 -0700
| From | Guy Tamir <guytamir1@gmail.com> |
|---|---|
| Date | 2013-08-14 06:18 -0700 |
| Subject | Reading log and saving data to DB |
| Message-ID | <4de52359-d352-4e35-afed-73df672c945c@googlegroups.com> |
Hi all, I have a Ubuntu server running NGINX that logs data for me. I want to write a python script that reads my customized logs and after a little rearrangement save the new data into my DB (postgresql). The process should run about every 5 minutes and i'm expecting large chunks of data on several 5 minute windows.. My plan for achieving this is to install python on the server, write a script and add it to cron. My question is what the simplest way to do this? should i use any python frameworks? For my python app i'm using Django, but on this server i just need to read a file, do some manipulation and save to DB. if you think any of my plan seem troubling in any way i'd love to hear.. Regards, Guy
[toc] | [next] | [standalone]
| From | "marduk@python.net" <marduk@python.net> |
|---|---|
| Date | 2013-08-14 09:46 -0400 |
| Message-ID | <mailman.576.1376487972.1251.python-list@python.org> |
| In reply to | #52514 |
On Wed, Aug 14, 2013, at 09:18 AM, Guy Tamir wrote: > Hi all, > > I have a Ubuntu server running NGINX that logs data for me. > I want to write a python script that reads my customized logs and after > a little rearrangement save the new data into my DB (postgresql). > > The process should run about every 5 minutes and i'm expecting large > chunks of data on several 5 minute windows.. > > My plan for achieving this is to install python on the server, write a > script and add it to cron. > > My question is what the simplest way to do this? > should i use any python frameworks? Rarely do I put "framework" and "simplest way" in the same set. I would do 1 of 2 things: * Write a simple script that reads lines from stdin, and writes to the db. Make sure it gets run in init before nginx does and tail -F -n 0 to that script. Don't worry about the 5-minute cron. * Similar to above but if you want to use cron also store in the db the offset of the last byte read in the file, then when the cron job kicks off again seek to that position + 1 and begin reading, at EOF write the offset again. This is irrespective of any log rotating that is going on behind the scenes, of course.
[toc] | [prev] | [next] | [standalone]
| From | Guy Tamir <guytamir1@gmail.com> |
|---|---|
| Date | 2013-08-15 00:23 -0700 |
| Message-ID | <b5c22671-7fce-4f7f-9053-3eb430430fb2@googlegroups.com> |
| In reply to | #52516 |
On Wednesday, August 14, 2013 4:46:09 PM UTC+3, mar...@python.net wrote: > On Wed, Aug 14, 2013, at 09:18 AM, Guy Tamir wrote: > > > Hi all, > > > > > > I have a Ubuntu server running NGINX that logs data for me. > > > I want to write a python script that reads my customized logs and after > > > a little rearrangement save the new data into my DB (postgresql). > > > > > > The process should run about every 5 minutes and i'm expecting large > > > chunks of data on several 5 minute windows.. > > > > > > My plan for achieving this is to install python on the server, write a > > > script and add it to cron. > > > > > > My question is what the simplest way to do this? > > > should i use any python frameworks? > > > > Rarely do I put "framework" and "simplest way" in the same set. > > > > I would do 1 of 2 things: > > > > * Write a simple script that reads lines from stdin, and writes to the > > db. Make sure it gets run in init before nginx does and tail -F -n 0 to > > that script. Don't worry about the 5-minute cron. > > > > * Similar to above but if you want to use cron also store in the db the > > offset of the last byte read in the file, then when the cron job kicks > > off again seek to that position + 1 and begin reading, at EOF write the > > offset again. > > > > This is irrespective of any log rotating that is going on behind the > > scenes, of course. Not sure i understood the first options and what it means to run before the nginx. The second options sound more like what i had in mind. Aren't there any components like this written that i can use? since the log fills up a lot i'm having trouble reading so much data and writing it all to the DB in a reasonable amount of time. The table receiving the new data is somewhat complex.. the table's purpose is to save data regarding ads shown from my app, the fields are - (ad_id,user_source_site,user_location,day_date,specific_hour,views,clicks) each row is distinct by the first 5 fields since i need to show different types of stats.. because each new line created may or may not be in the DB i have to run a upsert command (update or insert) on each row.. This leads to very poor performance.. Do have any ideas about how i can make this script more efficient?
[toc] | [prev] | [next] | [standalone]
| From | Dennis Lee Bieber <wlfraed@ix.netcom.com> |
|---|---|
| Date | 2013-08-14 18:34 -0400 |
| Message-ID | <mailman.588.1376519688.1251.python-list@python.org> |
| In reply to | #52514 |
On Wed, 14 Aug 2013 06:18:08 -0700 (PDT), Guy Tamir <guytamir1@gmail.com>
declaimed the following:
>Hi all,
>
>I have a Ubuntu server running NGINX that logs data for me.
Is the log coming from NGINX or (since you mention Django below) coming
solely from the Django application.
If the logging is from the Django application only, you should be able
to have it connect to the database and write directly to it.
--
Wulfraed Dennis Lee Bieber AF6VN
wlfraed@ix.netcom.com HTTP://wlfraed.home.netcom.com/
[toc] | [prev] | [next] | [standalone]
| From | Guy Tamir <guytamir1@gmail.com> |
|---|---|
| Date | 2013-08-15 00:16 -0700 |
| Message-ID | <e15e4e19-e25b-4ed2-af8d-a063015e35bd@googlegroups.com> |
| In reply to | #52533 |
On Thursday, August 15, 2013 1:34:38 AM UTC+3, Dennis Lee Bieber wrote: > On Wed, 14 Aug 2013 06:18:08 -0700 (PDT), Guy Tamir <guytamir1@gmail.com> > > declaimed the following: > > > > >Hi all, > > > > > >I have a Ubuntu server running NGINX that logs data for me. > > > > Is the log coming from NGINX or (since you mention Django below) coming > > solely from the Django application. > > > > If the logging is from the Django application only, you should be able > > to have it connect to the database and write directly to it. > > -- > > Wulfraed Dennis Lee Bieber AF6VN > > wlfraed@ix.netcom.com HTTP://wlfraed.home.netcom.com/ the log is from NGINX..
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web