Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #40609 > unrolled thread
| Started by | Νίκος Γκρ33κ <nikos.gr33k@gmail.com> |
|---|---|
| First post | 2013-03-05 23:45 -0800 |
| Last post | 2013-03-06 08:09 -0800 |
| Articles | 20 on this page of 40 — 15 participants |
Back to article view | Back to comp.lang.python
sync databse table based on current directory data without losign previous values Νίκος Γκρ33κ <nikos.gr33k@gmail.com> - 2013-03-05 23:45 -0800
Re: sync databse table based on current directory data without losign previous values Lele Gaifax <lele@metapensiero.it> - 2013-03-06 09:19 +0100
Re: sync databse table based on current directory data without losign previous values Νίκος Γκρ33κ <nikos.gr33k@gmail.com> - 2013-03-06 00:57 -0800
Re: sync databse table based on current directory data without losign previous values Lele Gaifax <lele@metapensiero.it> - 2013-03-06 10:24 +0100
Re: sync databse table based on current directory data without losign previous values Νίκος Γκρ33κ <nikos.gr33k@gmail.com> - 2013-03-06 01:41 -0800
Re: sync databse table based on current directory data without losign previous values Νίκος Γκρ33κ <nikos.gr33k@gmail.com> - 2013-03-06 01:43 -0800
Re: sync databse table based on current directory data without losign previous values Bryan Devaney <bryan.devaney@gmail.com> - 2013-03-06 02:15 -0800
Re: sync databse table based on current directory data without losign previous values Bryan Devaney <bryan.devaney@gmail.com> - 2013-03-06 02:15 -0800
Re: sync databse table based on current directory data without losign previous values Νίκος Γκρ33κ <nikos.gr33k@gmail.com> - 2013-03-06 01:43 -0800
Re: sync databse table based on current directory data without losign previous values Lele Gaifax <lele@metapensiero.it> - 2013-03-06 11:27 +0100
Re: sync databse table based on current directory data without losign previous values Dave Angel <davea@davea.name> - 2013-03-06 08:31 -0500
Re: sync databse table based on current directory data without losign previous values Lele Gaifax <lele@metapensiero.it> - 2013-03-06 15:16 +0100
Re: sync databse table based on current directory data without losign previous values Νίκος Γκρ33κ <nikos.gr33k@gmail.com> - 2013-03-06 01:41 -0800
Re: sync databse table based on current directory data without losign previous values Νίκος Γκρ33κ <nikos.gr33k@gmail.com> - 2013-03-06 00:57 -0800
Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) Wong Wah Meng-R32813 <r32813@freescale.com> - 2013-03-06 10:11 +0000
Re: Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) Bryan Devaney <bryan.devaney@gmail.com> - 2013-03-06 02:25 -0800
RE: Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) Wong Wah Meng-R32813 <r32813@freescale.com> - 2013-03-06 12:31 +0000
Re: Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) Dave Angel <davea@davea.name> - 2013-03-06 08:18 -0500
Re: Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) Dave Angel <davea@davea.name> - 2013-03-06 08:25 -0500
Re: Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) Bryan Devaney <bryan.devaney@gmail.com> - 2013-03-06 02:25 -0800
Re: Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-03-06 23:34 +0000
RE: Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) Wong Wah Meng-R32813 <r32813@freescale.com> - 2013-03-07 06:33 +0000
Re: Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) Chris Angelico <rosuav@gmail.com> - 2013-03-07 18:19 +1100
RE: Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) Wong Wah Meng-R32813 <r32813@freescale.com> - 2013-03-08 09:08 +0000
Re: Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-03-08 19:40 -0500
RE: Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) Wong Wah Meng-R32813 <r32813@freescale.com> - 2013-03-09 08:07 +0000
Re: Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) Grant Edwards <invalid@invalid.invalid> - 2013-03-09 19:18 +0000
Re: Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) Roy Smith <roy@panix.com> - 2013-03-09 15:04 -0500
Re: Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) Grant Edwards <invalid@invalid.invalid> - 2013-03-09 20:35 +0000
Re: Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) Roy Smith <roy@panix.com> - 2013-03-09 16:44 -0500
Re: Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) Grant Edwards <invalid@invalid.invalid> - 2013-03-11 14:27 +0000
Re: Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) Dave Angel <davea@davea.name> - 2013-03-09 06:02 -0500
Re: Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) Isaac To <isaac.to@gmail.com> - 2013-03-09 23:02 +0800
Re: Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) Terry Reedy <tjreedy@udel.edu> - 2013-03-06 05:59 -0500
Re: sync databse table based on current directory data without losign previous values Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-03-06 11:52 +0000
RE: Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) Wong Wah Meng-R32813 <r32813@freescale.com> - 2013-03-06 12:36 +0000
Re: sync databse table based on current directory data without losign previous values Chris Angelico <rosuav@gmail.com> - 2013-03-07 00:40 +1100
Re: sync databse table based on current directory data without losign previous values "Michael Ross" <gmx@ross.cx> - 2013-03-06 15:04 +0100
Re: sync databse table based on current directory data without losign previous values nagia.retsina@gmail.com - 2013-03-06 08:09 -0800
Re: sync databse table based on current directory data without losign previous values nagia.retsina@gmail.com - 2013-03-06 08:09 -0800
Page 1 of 2 [1] 2 Next page →
| From | Νίκος Γκρ33κ <nikos.gr33k@gmail.com> |
|---|---|
| Date | 2013-03-05 23:45 -0800 |
| Subject | sync databse table based on current directory data without losign previous values |
| Message-ID | <390f0dc5-5750-4849-9433-a19d90cc8566@googlegroups.com> |
I'am using this snipper to read a current directory and insert all filenames into a databse and then display them.
But what happens when files are get removed form the directory?
The inserted records into databse remain.
How can i update the databse to only contain the existing filenames without losing the previous stored data?
Here is what i ahve so far:
==================================
path = "/home/nikos/public_html/data/files/"
#read the containing folder and insert new filenames
for result in os.walk(path):
for filename in result[2]:
try:
#find the needed counter for the page URL
cur.execute('''SELECT URL FROM files WHERE URL = %s''', (filename,) )
data = cur.fetchone() #URL is unique, so should only be one
if not data:
#first time for file; primary key is automatic, hit is defaulted
cur.execute('''INSERT INTO files (URL, host, lastvisit) VALUES (%s, %s, %s)''', (filename, host, date) )
except MySQLdb.Error, e:
print ( "Query Error: ", sys.exc_info()[1].excepinfo()[2] )
======================
Thank you.
[toc] | [next] | [standalone]
| From | Lele Gaifax <lele@metapensiero.it> |
|---|---|
| Date | 2013-03-06 09:19 +0100 |
| Message-ID | <mailman.2928.1362557959.2939.python-list@python.org> |
| In reply to | #40609 |
Νίκος Γκρ33κ <nikos.gr33k@gmail.com> writes:
> How can i update the databse to only contain the existing filenames without losing the previous stored data?
Basically you need to keep a list (or better, a set) containing all
current filenames that you are going to insert, and finally do another
"inverse" loop where you scan all the records and delete those that are
not present anymore.
Of course, this assume you have a "bidirectional" identity between the
filenames you are loading and the records you are inserting, which is
not the case in the code you show:
> #read the containing folder and insert new filenames
> for result in os.walk(path):
> for filename in result[2]:
Here "filename" is just that, not the full path: this could result in
collisions, if your are actually loading a *tree* instead of a flat
directory, that is multiple source files are squeezed into a single
record in your database (imagine "/foo/index.html" and
"/foo/subdir/index.html").
With that in mind, I would do something like the following:
# Compute a set of current fullpaths
current_fullpaths = set()
for root, dirs, files in os.walk(path):
for fullpath in files:
current_fullpaths.add(os.path.join(root, file))
# Load'em
for fullpath in current_fullpaths:
try:
#find the needed counter for the page URL
cur.execute('''SELECT URL FROM files WHERE URL = %s''', (fullpath,) )
data = cur.fetchone() #URL is unique, so should only be one
if not data:
#first time for file; primary key is automatic, hit is defaulted
cur.execute('''INSERT INTO files (URL, host, lastvisit) VALUES (%s, %s, %s)''', (fullpath, host, date) )
except MySQLdb.Error, e:
print ( "Query Error: ", sys.exc_info()[1].excepinfo()[2] )
# Delete spurious
cur.execute('''SELECT url FROM files''')
for rec in cur:
fullpath = rec[0]
if fullpath not in current_fullpaths:
other_cur.execute('''DELETE FROM files WHERE url = %s''', (fullpath,))
Of course here I am assuming a lot (a typical thing we do to answer your
questions :-), in particular that the "url" field content matches the
filesystem layout, which may not be the case. Adapt it to your usecase.
hope this helps,
ciao, lele.
--
nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri
real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia.
lele@metapensiero.it | -- Fortunato Depero, 1929.
[toc] | [prev] | [next] | [standalone]
| From | Νίκος Γκρ33κ <nikos.gr33k@gmail.com> |
|---|---|
| Date | 2013-03-06 00:57 -0800 |
| Message-ID | <3958fc2b-a2fe-4c85-a104-4c6f551cf787@googlegroups.com> |
| In reply to | #40612 |
Τη Τετάρτη, 6 Μαρτίου 2013 10:19:06 π.μ. UTC+2, ο χρήστης Lele Gaifax έγραψε:
> Νίκος Γκρ33κ <nikos.gr33k@gmail.com> writes:
>
>
>
> > How can i update the databse to only contain the existing filenames without losing the previous stored data?
>
>
>
> Basically you need to keep a list (or better, a set) containing all
>
> current filenames that you are going to insert, and finally do another
>
> "inverse" loop where you scan all the records and delete those that are
>
> not present anymore.
>
>
>
> Of course, this assume you have a "bidirectional" identity between the
>
> filenames you are loading and the records you are inserting, which is
>
> not the case in the code you show:
>
>
>
> > #read the containing folder and insert new filenames
>
> > for result in os.walk(path):
>
> > for filename in result[2]:
>
>
>
> Here "filename" is just that, not the full path: this could result in
>
> collisions, if your are actually loading a *tree* instead of a flat
>
> directory, that is multiple source files are squeezed into a single
>
> record in your database (imagine "/foo/index.html" and
>
> "/foo/subdir/index.html").
>
>
>
> With that in mind, I would do something like the following:
>
>
>
> # Compute a set of current fullpaths
>
> current_fullpaths = set()
>
> for root, dirs, files in os.walk(path):
>
> for fullpath in files:
>
> current_fullpaths.add(os.path.join(root, file))
>
>
>
> # Load'em
>
> for fullpath in current_fullpaths:
>
>
>
> try:
>
> #find the needed counter for the page URL
>
> cur.execute('''SELECT URL FROM files WHERE URL = %s''', (fullpath,) )
>
> data = cur.fetchone() #URL is unique, so should only be one
>
>
>
> if not data:
>
> #first time for file; primary key is automatic, hit is defaulted
>
> cur.execute('''INSERT INTO files (URL, host, lastvisit) VALUES (%s, %s, %s)''', (fullpath, host, date) )
>
> except MySQLdb.Error, e:
>
> print ( "Query Error: ", sys.exc_info()[1].excepinfo()[2] )
>
>
>
> # Delete spurious
>
> cur.execute('''SELECT url FROM files''')
>
> for rec in cur:
>
> fullpath = rec[0]
>
> if fullpath not in current_fullpaths:
>
> other_cur.execute('''DELETE FROM files WHERE url = %s''', (fullpath,))
>
>
>
> Of course here I am assuming a lot (a typical thing we do to answer your
>
> questions :-), in particular that the "url" field content matches the
>
> filesystem layout, which may not be the case. Adapt it to your usecase.
>
>
>
> hope this helps,
>
> ciao, lele.
>
> --
>
> nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri
>
> real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia.
>
> lele@metapensiero.it | -- Fortunato Depero, 1929.
You are fantastic! Your straightforward logic amazes me!
Thank you very much for making things clear to me!!
But there is a slight problem when iam trying to run the code iam presenting this error ehre you can see its output here:
http://superhost.gr/cgi-bin/files.py
[toc] | [prev] | [next] | [standalone]
| From | Lele Gaifax <lele@metapensiero.it> |
|---|---|
| Date | 2013-03-06 10:24 +0100 |
| Message-ID | <mailman.2931.1362561897.2939.python-list@python.org> |
| In reply to | #40613 |
Νίκος Γκρ33κ <nikos.gr33k@gmail.com> writes: > Thank you very much for making things clear to me!! You're welcome, even more if you spend 1 second to trim your answers removing unneeded citation :-) > > But there is a slight problem when iam trying to run the code iam presenting this error ehre you can see its output here: > > http://superhost.gr/cgi-bin/files.py Sorry, this seems completely unrelated, and from the little snippet that appear on that page I cannot understand what's going on there. ciao, lele. -- nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia. lele@metapensiero.it | -- Fortunato Depero, 1929.
[toc] | [prev] | [next] | [standalone]
| From | Νίκος Γκρ33κ <nikos.gr33k@gmail.com> |
|---|---|
| Date | 2013-03-06 01:41 -0800 |
| Message-ID | <28828019-279a-4d8c-a7ea-ba677b4a8165@googlegroups.com> |
| In reply to | #40616 |
Its about the following line of code:
current_fullpaths.add( os.path.join(root, files) )
that presents the following error:
<type 'exceptions.AttributeError'>: 'list' object has no attribute 'startswith'
args = ("'list' object has no attribute 'startswith'",)
message = "'list' object has no attribute 'startswith'"
join calls some module that find difficulty when parsing its line:
/usr/lib64/python2.6/posixpath.py in join(a='/home/nikos/public_html/data/files/', *p=(['\xce\x9a\xcf\x8d\xcf\x81\xce\xb9\xce\xb5 \xce\x99\xce\xb7\xcf\x83\xce\xbf\xcf\x8d \xce\xa7\xcf\x81\xce\xb9\xcf\x83\xcf\x84\xce\xad \xce\x95\xce\xbb\xce\xad\xce\xb7\xcf\x83\xce\xbf\xce\xbd \xce\x9c\xce\xb5.mp3', '\xce\xa0\xce\xb5\xcf\x81\xce\xaf \xcf\x84\xcf\x89\xce\xbd \xce\x9b\xce\xbf\xce\xb3\xce\xb9\xcf\x83\xce\xbc\xcf\x8e\xce\xbd.mp3'],))
63 path = a
64 for b in p:
65 if b.startswith('/'):
[toc] | [prev] | [next] | [standalone]
| From | Νίκος Γκρ33κ <nikos.gr33k@gmail.com> |
|---|---|
| Date | 2013-03-06 01:43 -0800 |
| Message-ID | <b5d4c6ef-4ec7-4059-8e88-022a16745a1d@googlegroups.com> |
| In reply to | #40617 |
Perhaps because my filenames is in greek letters that thsi error is presented but i'am not sure..... Maybe we can join root+files and store it to the set() someway differenyl....
[toc] | [prev] | [next] | [standalone]
| From | Bryan Devaney <bryan.devaney@gmail.com> |
|---|---|
| Date | 2013-03-06 02:15 -0800 |
| Message-ID | <fbe1716e-0b95-4abb-9795-b28388515310@googlegroups.com> |
| In reply to | #40619 |
On Wednesday, March 6, 2013 9:43:34 AM UTC, Νίκος Γκρ33κ wrote:
> Perhaps because my filenames is in greek letters that thsi error is presented but i'am not sure.....
>
>
>
> Maybe we can join root+files and store it to the set() someway differenyl....
well, the error refers to the line "if b.startswith('/'): " and states "'list' object has no attribute 'startswith'"
so b is assigned to a list type and list does not have a 'startswith' method or attribute.
I Thought .startswith() was a string method but if it's your own method then I apologize (though if it is, I personally would have made a class that inherited from list rather than adding it to list itself)
can you show where you are assigning b (or if its meant to be a list or string object)
[toc] | [prev] | [next] | [standalone]
| From | Bryan Devaney <bryan.devaney@gmail.com> |
|---|---|
| Date | 2013-03-06 02:15 -0800 |
| Message-ID | <mailman.2935.1362565453.2939.python-list@python.org> |
| In reply to | #40619 |
On Wednesday, March 6, 2013 9:43:34 AM UTC, Νίκος Γκρ33κ wrote:
> Perhaps because my filenames is in greek letters that thsi error is presented but i'am not sure.....
>
>
>
> Maybe we can join root+files and store it to the set() someway differenyl....
well, the error refers to the line "if b.startswith('/'): " and states "'list' object has no attribute 'startswith'"
so b is assigned to a list type and list does not have a 'startswith' method or attribute.
I Thought .startswith() was a string method but if it's your own method then I apologize (though if it is, I personally would have made a class that inherited from list rather than adding it to list itself)
can you show where you are assigning b (or if its meant to be a list or string object)
[toc] | [prev] | [next] | [standalone]
| From | Νίκος Γκρ33κ <nikos.gr33k@gmail.com> |
|---|---|
| Date | 2013-03-06 01:43 -0800 |
| Message-ID | <mailman.2933.1362563023.2939.python-list@python.org> |
| In reply to | #40617 |
Perhaps because my filenames is in greek letters that thsi error is presented but i'am not sure..... Maybe we can join root+files and store it to the set() someway differenyl....
[toc] | [prev] | [next] | [standalone]
| From | Lele Gaifax <lele@metapensiero.it> |
|---|---|
| Date | 2013-03-06 11:27 +0100 |
| Message-ID | <mailman.2936.1362565666.2939.python-list@python.org> |
| In reply to | #40617 |
Νίκος Γκρ33κ <nikos.gr33k@gmail.com> writes:
> Its about the following line of code:
>
> current_fullpaths.add( os.path.join(root, files) )
I'm sorry, typo on my part.
That should have been "fullpath", not "file" (and neither "files" as you
wrongly reported back!):
# Compute a set of current fullpaths
current_fullpaths = set()
for root, dirs, files in os.walk(path):
for fullpath in files:
current_fullpaths.add(os.path.join(root, fullpath))
ciao, lele.
--
nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri
real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia.
lele@metapensiero.it | -- Fortunato Depero, 1929.
[toc] | [prev] | [next] | [standalone]
| From | Dave Angel <davea@davea.name> |
|---|---|
| Date | 2013-03-06 08:31 -0500 |
| Message-ID | <mailman.2945.1362576723.2939.python-list@python.org> |
| In reply to | #40617 |
On 03/06/2013 05:27 AM, Lele Gaifax wrote: > Νίκος Γκρ33κ <nikos.gr33k@gmail.com> writes: > >> Its about the following line of code: >> >> current_fullpaths.add( os.path.join(root, files) ) > > I'm sorry, typo on my part. > > That should have been "fullpath", not "file" (and neither "files" as you > wrongly reported back!): > > # Compute a set of current fullpaths > current_fullpaths = set() > for root, dirs, files in os.walk(path): > for fullpath in files: 'fullpath' is a rather misleading name to use, since the 'files' list contains only the terminal node of the file name. It's only a full path after you do the following join. > current_fullpaths.add(os.path.join(root, fullpath)) > -- DaveA
[toc] | [prev] | [next] | [standalone]
| From | Lele Gaifax <lele@metapensiero.it> |
|---|---|
| Date | 2013-03-06 15:16 +0100 |
| Message-ID | <mailman.2948.1362579381.2939.python-list@python.org> |
| In reply to | #40617 |
Dave Angel <davea@davea.name> writes: >> # Compute a set of current fullpaths >> current_fullpaths = set() >> for root, dirs, files in os.walk(path): >> for fullpath in files: > > 'fullpath' is a rather misleading name to use, since the 'files' list > contains only the terminal node of the file name. It's only a full > path after you do the following join. Yes, you're right. Dunno what urged me to ``M-x replace-string file fullpath`` introducing both an error and a bad variable name, most probably the unconscious desire of not clobbering builtin names... :-) Thanks for pointing it out, ciao, lele. -- nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia. lele@metapensiero.it | -- Fortunato Depero, 1929.
[toc] | [prev] | [next] | [standalone]
| From | Νίκος Γκρ33κ <nikos.gr33k@gmail.com> |
|---|---|
| Date | 2013-03-06 01:41 -0800 |
| Message-ID | <mailman.2932.1362562894.2939.python-list@python.org> |
| In reply to | #40616 |
Its about the following line of code:
current_fullpaths.add( os.path.join(root, files) )
that presents the following error:
<type 'exceptions.AttributeError'>: 'list' object has no attribute 'startswith'
args = ("'list' object has no attribute 'startswith'",)
message = "'list' object has no attribute 'startswith'"
join calls some module that find difficulty when parsing its line:
/usr/lib64/python2.6/posixpath.py in join(a='/home/nikos/public_html/data/files/', *p=(['\xce\x9a\xcf\x8d\xcf\x81\xce\xb9\xce\xb5 \xce\x99\xce\xb7\xcf\x83\xce\xbf\xcf\x8d \xce\xa7\xcf\x81\xce\xb9\xcf\x83\xcf\x84\xce\xad \xce\x95\xce\xbb\xce\xad\xce\xb7\xcf\x83\xce\xbf\xce\xbd \xce\x9c\xce\xb5.mp3', '\xce\xa0\xce\xb5\xcf\x81\xce\xaf \xcf\x84\xcf\x89\xce\xbd \xce\x9b\xce\xbf\xce\xb3\xce\xb9\xcf\x83\xce\xbc\xcf\x8e\xce\xbd.mp3'],))
63 path = a
64 for b in p:
65 if b.startswith('/'):
[toc] | [prev] | [next] | [standalone]
| From | Νίκος Γκρ33κ <nikos.gr33k@gmail.com> |
|---|---|
| Date | 2013-03-06 00:57 -0800 |
| Message-ID | <mailman.2930.1362560945.2939.python-list@python.org> |
| In reply to | #40612 |
Τη Τετάρτη, 6 Μαρτίου 2013 10:19:06 π.μ. UTC+2, ο χρήστης Lele Gaifax έγραψε:
> Νίκος Γκρ33κ <nikos.gr33k@gmail.com> writes:
>
>
>
> > How can i update the databse to only contain the existing filenames without losing the previous stored data?
>
>
>
> Basically you need to keep a list (or better, a set) containing all
>
> current filenames that you are going to insert, and finally do another
>
> "inverse" loop where you scan all the records and delete those that are
>
> not present anymore.
>
>
>
> Of course, this assume you have a "bidirectional" identity between the
>
> filenames you are loading and the records you are inserting, which is
>
> not the case in the code you show:
>
>
>
> > #read the containing folder and insert new filenames
>
> > for result in os.walk(path):
>
> > for filename in result[2]:
>
>
>
> Here "filename" is just that, not the full path: this could result in
>
> collisions, if your are actually loading a *tree* instead of a flat
>
> directory, that is multiple source files are squeezed into a single
>
> record in your database (imagine "/foo/index.html" and
>
> "/foo/subdir/index.html").
>
>
>
> With that in mind, I would do something like the following:
>
>
>
> # Compute a set of current fullpaths
>
> current_fullpaths = set()
>
> for root, dirs, files in os.walk(path):
>
> for fullpath in files:
>
> current_fullpaths.add(os.path.join(root, file))
>
>
>
> # Load'em
>
> for fullpath in current_fullpaths:
>
>
>
> try:
>
> #find the needed counter for the page URL
>
> cur.execute('''SELECT URL FROM files WHERE URL = %s''', (fullpath,) )
>
> data = cur.fetchone() #URL is unique, so should only be one
>
>
>
> if not data:
>
> #first time for file; primary key is automatic, hit is defaulted
>
> cur.execute('''INSERT INTO files (URL, host, lastvisit) VALUES (%s, %s, %s)''', (fullpath, host, date) )
>
> except MySQLdb.Error, e:
>
> print ( "Query Error: ", sys.exc_info()[1].excepinfo()[2] )
>
>
>
> # Delete spurious
>
> cur.execute('''SELECT url FROM files''')
>
> for rec in cur:
>
> fullpath = rec[0]
>
> if fullpath not in current_fullpaths:
>
> other_cur.execute('''DELETE FROM files WHERE url = %s''', (fullpath,))
>
>
>
> Of course here I am assuming a lot (a typical thing we do to answer your
>
> questions :-), in particular that the "url" field content matches the
>
> filesystem layout, which may not be the case. Adapt it to your usecase.
>
>
>
> hope this helps,
>
> ciao, lele.
>
> --
>
> nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri
>
> real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia.
>
> lele@metapensiero.it | -- Fortunato Depero, 1929.
You are fantastic! Your straightforward logic amazes me!
Thank you very much for making things clear to me!!
But there is a slight problem when iam trying to run the code iam presenting this error ehre you can see its output here:
http://superhost.gr/cgi-bin/files.py
[toc] | [prev] | [next] | [standalone]
| From | Wong Wah Meng-R32813 <r32813@freescale.com> |
|---|---|
| Date | 2013-03-06 10:11 +0000 |
| Subject | Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) |
| Message-ID | <mailman.2934.1362564689.2939.python-list@python.org> |
| In reply to | #40609 |
Hello there, I am using python 2.7.1 built on HP-11.23 a Itanium 64 bit box. I discovered following behavior whereby the python process doesn't seem to release memory utilized even after a variable is set to None, and "deleted". I use glance tool to monitor the memory utilized by this process. Obviously after the for loop is executed, the memory used by this process has hiked to a few MB. However, after "del" is executed to both I and str variables, the memory of that process still stays at where it was. Any idea why? >>> for i in range(100000L): ... str=str+"%s"%(i,) ... >>> i=None >>> str=None >>> del i >>> del str
[toc] | [prev] | [next] | [standalone]
| From | Bryan Devaney <bryan.devaney@gmail.com> |
|---|---|
| Date | 2013-03-06 02:25 -0800 |
| Subject | Re: Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) |
| Message-ID | <18272cff-cfb0-4bd3-8f5d-3b4641ba828e@googlegroups.com> |
| In reply to | #40622 |
On Wednesday, March 6, 2013 10:11:12 AM UTC, Wong Wah Meng-R32813 wrote: > Hello there, > > > > I am using python 2.7.1 built on HP-11.23 a Itanium 64 bit box. > > > > I discovered following behavior whereby the python process doesn't seem to release memory utilized even after a variable is set to None, and "deleted". I use glance tool to monitor the memory utilized by this process. Obviously after the for loop is executed, the memory used by this process has hiked to a few MB. However, after "del" is executed to both I and str variables, the memory of that process still stays at where it was. > > > > Any idea why? > > > > >>> for i in range(100000L): > > ... str=str+"%s"%(i,) > > ... > > >>> i=None > > >>> str=None > > >>> del i > > >>> del str Hi, I'm new here so I'm making mistakes too but I know they don't like it when you ask your question in someone else's question. that being said, to answer your question: Python uses a 'garbage collector'. When you delete something, all references are removed from the object in memory, the memory itself will not be freed until the next time the garbage collector runs. When that happens, all objects without references in memory are removed and the memory freed. If you wait a while you should see that memory free itself.
[toc] | [prev] | [next] | [standalone]
| From | Wong Wah Meng-R32813 <r32813@freescale.com> |
|---|---|
| Date | 2013-03-06 12:31 +0000 |
| Subject | RE: Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) |
| Message-ID | <mailman.2940.1362574941.2939.python-list@python.org> |
| In reply to | #40625 |
Apologies as after I have left the group for a while I have forgotten how not to post a question on top of another question. Very sorry and appreciate your replies. I tried explicitly calling gc.collect() and didn't manage to see the memory footprint reduced. I probably haven't left the process idle long enough to see the internal garbage collection takes place but I will leave it idle for more than 8 hours and check again. Thanks! -----Original Message----- From: Python-list [mailto:python-list-bounces+wahmeng=freescale.com@python.org] On Behalf Of Bryan Devaney Sent: Wednesday, March 06, 2013 6:25 PM To: python-list@python.org Cc: python-list@python.org Subject: Re: Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) On Wednesday, March 6, 2013 10:11:12 AM UTC, Wong Wah Meng-R32813 wrote: > Hello there, > > > > I am using python 2.7.1 built on HP-11.23 a Itanium 64 bit box. > > > > I discovered following behavior whereby the python process doesn't seem to release memory utilized even after a variable is set to None, and "deleted". I use glance tool to monitor the memory utilized by this process. Obviously after the for loop is executed, the memory used by this process has hiked to a few MB. However, after "del" is executed to both I and str variables, the memory of that process still stays at where it was. > > > > Any idea why? > > > > >>> for i in range(100000L): > > ... str=str+"%s"%(i,) > > ... > > >>> i=None > > >>> str=None > > >>> del i > > >>> del str Hi, I'm new here so I'm making mistakes too but I know they don't like it when you ask your question in someone else's question. that being said, to answer your question: Python uses a 'garbage collector'. When you delete something, all references are removed from the object in memory, the memory itself will not be freed until the next time the garbage collector runs. When that happens, all objects without references in memory are removed and the memory freed. If you wait a while you should see that memory free itself. -- http://mail.python.org/mailman/listinfo/python-list
[toc] | [prev] | [next] | [standalone]
| From | Dave Angel <davea@davea.name> |
|---|---|
| Date | 2013-03-06 08:18 -0500 |
| Subject | Re: Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) |
| Message-ID | <mailman.2943.1362575899.2939.python-list@python.org> |
| In reply to | #40625 |
On 03/06/2013 05:25 AM, Bryan Devaney wrote: > On Wednesday, March 6, 2013 10:11:12 AM UTC, Wong Wah Meng-R32813 wrote: >> Hello there, >> >> >> >> I am using python 2.7.1 built on HP-11.23 a Itanium 64 bit box. >> >> >> >> I discovered following behavior whereby the python process doesn't seem to release memory utilized even after a variable is set to None, and "deleted". I use glance tool to monitor the memory utilized by this process. Obviously after the for loop is executed, the memory used by this process has hiked to a few MB. However, after "del" is executed to both I and str variables, the memory of that process still stays at where it was. >> >> <SNIP> >> > > Python uses a 'garbage collector'. When you delete something, all references are removed from the object in memory, the memory itself will not be freed until the next time the garbage collector runs. When that happens, all objects without references in memory are removed and the memory freed. If you wait a while you should see that memory free itself. > Actually, no. The problem with monitoring memory usage from outside the process is that memory "ownership" is hierarchical, and each hierarchy deals in bigger chunks. So when the CPython runtime calls free() on a particular piece of memory, the C runtime may or may not actually release the memory for use by other processes. Since the C runtime grabs big pieces from the OS, and parcels out little pieces to CPython, a particular big piece can only be freed if ALL the little pieces are free. And even then, it may or may not choose to do so. Completely separate from that are the two mechanisms that CPython uses to free its pieces. It does reference counting, and it does garbage collecting. In this case, only the reference counting is relevant, as when it's done there's no garbage left to collect. When an object is no longer referenced by anything, its count will be zero, and it will be freed by calling the C library function. GC is only interesting when there are cycles in the references, such as when a list contains as one of its elements a tuple, which in turn contains the original list. Sound silly? No, it's quite common once complex objects are created which reference each other. The counts don't go to zero, and the objects wait for garbage collection. OP: There's no need to set to None and also to del the name. Since there's only one None object, keeping another named reference to that object has very little cost. -- DaveA
[toc] | [prev] | [next] | [standalone]
| From | Dave Angel <davea@davea.name> |
|---|---|
| Date | 2013-03-06 08:25 -0500 |
| Subject | Re: Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) |
| Message-ID | <mailman.2944.1362576355.2939.python-list@python.org> |
| In reply to | #40625 |
On 03/06/2013 07:31 AM, Wong Wah Meng-R32813 wrote: > Apologies as after I have left the group for a while I have forgotten how not to post a question on top of another question. Very sorry and appreciate your replies. > > I tried explicitly calling gc.collect() and didn't manage to see the memory footprint reduced. I probably haven't left the process idle long enough to see the internal garbage collection takes place but I will leave it idle for more than 8 hours and check again. Thanks! > You're top-posting, which makes things very confusing, since your contribution to the message is out of accepted order. Put your remarks after the part you're commenting on, and delete anything following your message, as it clearly didn't need your comments. Once you've called gc.collect(), there's no point in waiting 8 hours for it to run again. It either triggered the C runtime's logic or it didn't, and running it again won't help unless in the meantime you rearranged the remaining allocated blocks. Accept the fact that not all freeing of memory blocks can possibly make it through to the OS. If they did, we'd have a minimum object size of at least 4k on the Pentium, and larger on some other processors. We'd also have performance that would crawl. So an external tool can only give you a very approximate size for what's going on in your own code. -- DaveA
[toc] | [prev] | [next] | [standalone]
| From | Bryan Devaney <bryan.devaney@gmail.com> |
|---|---|
| Date | 2013-03-06 02:25 -0800 |
| Subject | Re: Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) |
| Message-ID | <mailman.2938.1362568281.2939.python-list@python.org> |
| In reply to | #40622 |
On Wednesday, March 6, 2013 10:11:12 AM UTC, Wong Wah Meng-R32813 wrote: > Hello there, > > > > I am using python 2.7.1 built on HP-11.23 a Itanium 64 bit box. > > > > I discovered following behavior whereby the python process doesn't seem to release memory utilized even after a variable is set to None, and "deleted". I use glance tool to monitor the memory utilized by this process. Obviously after the for loop is executed, the memory used by this process has hiked to a few MB. However, after "del" is executed to both I and str variables, the memory of that process still stays at where it was. > > > > Any idea why? > > > > >>> for i in range(100000L): > > ... str=str+"%s"%(i,) > > ... > > >>> i=None > > >>> str=None > > >>> del i > > >>> del str Hi, I'm new here so I'm making mistakes too but I know they don't like it when you ask your question in someone else's question. that being said, to answer your question: Python uses a 'garbage collector'. When you delete something, all references are removed from the object in memory, the memory itself will not be freed until the next time the garbage collector runs. When that happens, all objects without references in memory are removed and the memory freed. If you wait a while you should see that memory free itself.
[toc] | [prev] | [next] | [standalone]
Page 1 of 2 [1] 2 Next page →
Back to top | Article view | comp.lang.python
csiph-web