Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #40609 > unrolled thread

sync databse table based on current directory data without losign previous values

Started byΝίκος Γκρ33κ <nikos.gr33k@gmail.com>
First post2013-03-05 23:45 -0800
Last post2013-03-06 08:09 -0800
Articles 20 on this page of 40 — 15 participants

Back to article view | Back to comp.lang.python


Contents

  sync databse table based on current directory data without losign previous values Νίκος Γκρ33κ <nikos.gr33k@gmail.com> - 2013-03-05 23:45 -0800
    Re: sync databse table based on current directory data without losign previous values Lele Gaifax <lele@metapensiero.it> - 2013-03-06 09:19 +0100
      Re: sync databse table based on current directory data without losign previous values Νίκος Γκρ33κ <nikos.gr33k@gmail.com> - 2013-03-06 00:57 -0800
        Re: sync databse table based on current directory data without losign previous values Lele Gaifax <lele@metapensiero.it> - 2013-03-06 10:24 +0100
          Re: sync databse table based on current directory data without losign previous values Νίκος Γκρ33κ <nikos.gr33k@gmail.com> - 2013-03-06 01:41 -0800
            Re: sync databse table based on current directory data without losign previous values Νίκος Γκρ33κ <nikos.gr33k@gmail.com> - 2013-03-06 01:43 -0800
              Re: sync databse table based on current directory data without losign previous values Bryan Devaney <bryan.devaney@gmail.com> - 2013-03-06 02:15 -0800
              Re: sync databse table based on current directory data without losign previous values Bryan Devaney <bryan.devaney@gmail.com> - 2013-03-06 02:15 -0800
            Re: sync databse table based on current directory data without losign previous values Νίκος Γκρ33κ <nikos.gr33k@gmail.com> - 2013-03-06 01:43 -0800
            Re: sync databse table based on current directory data without losign previous values Lele Gaifax <lele@metapensiero.it> - 2013-03-06 11:27 +0100
            Re: sync databse table based on current directory data without losign previous values Dave Angel <davea@davea.name> - 2013-03-06 08:31 -0500
            Re: sync databse table based on current directory data without losign previous values Lele Gaifax <lele@metapensiero.it> - 2013-03-06 15:16 +0100
          Re: sync databse table based on current directory data without losign previous values Νίκος Γκρ33κ <nikos.gr33k@gmail.com> - 2013-03-06 01:41 -0800
      Re: sync databse table based on current directory data without losign previous values Νίκος Γκρ33κ <nikos.gr33k@gmail.com> - 2013-03-06 00:57 -0800
    Set x to  to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) Wong Wah Meng-R32813 <r32813@freescale.com> - 2013-03-06 10:11 +0000
      Re: Set x to  to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) Bryan Devaney <bryan.devaney@gmail.com> - 2013-03-06 02:25 -0800
        RE: Set x to  to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) Wong Wah Meng-R32813 <r32813@freescale.com> - 2013-03-06 12:31 +0000
        Re: Set x to  to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) Dave Angel <davea@davea.name> - 2013-03-06 08:18 -0500
        Re: Set x to  to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) Dave Angel <davea@davea.name> - 2013-03-06 08:25 -0500
      Re: Set x to  to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) Bryan Devaney <bryan.devaney@gmail.com> - 2013-03-06 02:25 -0800
      Re: Set x to  to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-03-06 23:34 +0000
        RE: Set x to  to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) Wong Wah Meng-R32813 <r32813@freescale.com> - 2013-03-07 06:33 +0000
        Re: Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) Chris Angelico <rosuav@gmail.com> - 2013-03-07 18:19 +1100
        RE: Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) Wong Wah Meng-R32813 <r32813@freescale.com> - 2013-03-08 09:08 +0000
        Re: Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-03-08 19:40 -0500
        RE: Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) Wong Wah Meng-R32813 <r32813@freescale.com> - 2013-03-09 08:07 +0000
          Re: Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) Grant Edwards <invalid@invalid.invalid> - 2013-03-09 19:18 +0000
            Re: Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) Roy Smith <roy@panix.com> - 2013-03-09 15:04 -0500
              Re: Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) Grant Edwards <invalid@invalid.invalid> - 2013-03-09 20:35 +0000
                Re: Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) Roy Smith <roy@panix.com> - 2013-03-09 16:44 -0500
                  Re: Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) Grant Edwards <invalid@invalid.invalid> - 2013-03-11 14:27 +0000
        Re: Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) Dave Angel <davea@davea.name> - 2013-03-09 06:02 -0500
        Re: Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) Isaac To <isaac.to@gmail.com> - 2013-03-09 23:02 +0800
    Re: Set x to  to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) Terry Reedy <tjreedy@udel.edu> - 2013-03-06 05:59 -0500
    Re: sync databse table based on current directory data without losign previous values Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-03-06 11:52 +0000
    RE: Set x to  to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64) Wong Wah Meng-R32813 <r32813@freescale.com> - 2013-03-06 12:36 +0000
    Re: sync databse table based on current directory data without losign previous values Chris Angelico <rosuav@gmail.com> - 2013-03-07 00:40 +1100
    Re: sync databse table based on current directory data without losign previous values "Michael Ross" <gmx@ross.cx> - 2013-03-06 15:04 +0100
      Re: sync databse table based on current directory data without losign previous values nagia.retsina@gmail.com - 2013-03-06 08:09 -0800
      Re: sync databse table based on current directory data without losign previous values nagia.retsina@gmail.com - 2013-03-06 08:09 -0800

Page 1 of 2  [1] 2  Next page →


#40609 — sync databse table based on current directory data without losign previous values

FromΝίκος Γκρ33κ <nikos.gr33k@gmail.com>
Date2013-03-05 23:45 -0800
Subjectsync databse table based on current directory data without losign previous values
Message-ID<390f0dc5-5750-4849-9433-a19d90cc8566@googlegroups.com>
I'am using this snipper to read a current directory and insert all filenames into a databse and then display them.

But what happens when files are get removed form the directory?
The inserted records into databse remain.
How can i update  the databse to only contain the existing filenames without losing the previous stored data?

Here is what i ahve so far:

==================================
path = "/home/nikos/public_html/data/files/"

#read the containing folder and insert new filenames
for result in os.walk(path):
	for filename in result[2]:
		try:
			#find the needed counter for the page URL
			cur.execute('''SELECT URL FROM files WHERE URL = %s''', (filename,) ) 
			data = cur.fetchone()        #URL is unique, so should only be one
			
			if not data:
				#first time for file; primary key is automatic, hit is defaulted
				cur.execute('''INSERT INTO files (URL, host, lastvisit) VALUES (%s, %s, %s)''', (filename, host, date) )
		except MySQLdb.Error, e:
			print ( "Query Error: ", sys.exc_info()[1].excepinfo()[2] )
======================

Thank you.

[toc] | [next] | [standalone]


#40612

FromLele Gaifax <lele@metapensiero.it>
Date2013-03-06 09:19 +0100
Message-ID<mailman.2928.1362557959.2939.python-list@python.org>
In reply to#40609
Νίκος Γκρ33κ <nikos.gr33k@gmail.com> writes:

> How can i update  the databse to only contain the existing filenames without losing the previous stored data?

Basically you need to keep a list (or better, a set) containing all
current filenames that you are going to insert, and finally do another
"inverse" loop where you scan all the records and delete those that are
not present anymore.

Of course, this assume you have a "bidirectional" identity between the
filenames you are loading and the records you are inserting, which is
not the case in the code you show:

> #read the containing folder and insert new filenames
> for result in os.walk(path):
> 	for filename in result[2]:

Here "filename" is just that, not the full path: this could result in
collisions, if your are actually loading a *tree* instead of a flat
directory, that is multiple source files are squeezed into a single
record in your database (imagine "/foo/index.html" and
"/foo/subdir/index.html").

With that in mind, I would do something like the following:

  # Compute a set of current fullpaths
  current_fullpaths = set()
  for root, dirs, files in os.walk(path):
    for fullpath in files:
      current_fullpaths.add(os.path.join(root, file))

  # Load'em
  for fullpath in current_fullpaths:
    
    try:
      #find the needed counter for the page URL
      cur.execute('''SELECT URL FROM files WHERE URL = %s''', (fullpath,) ) 
      data = cur.fetchone()        #URL is unique, so should only be one

      if not data:
        #first time for file; primary key is automatic, hit is defaulted
        cur.execute('''INSERT INTO files (URL, host, lastvisit) VALUES (%s, %s, %s)''', (fullpath, host, date) )
    except MySQLdb.Error, e:
      print ( "Query Error: ", sys.exc_info()[1].excepinfo()[2] )

  # Delete spurious
  cur.execute('''SELECT url FROM files''')  
  for rec in cur:
    fullpath = rec[0]
    if fullpath not in current_fullpaths:
      other_cur.execute('''DELETE FROM files WHERE url = %s''', (fullpath,))

Of course here I am assuming a lot (a typical thing we do to answer your
questions :-), in particular that the "url" field content matches the
filesystem layout, which may not be the case. Adapt it to your usecase.

hope this helps,
ciao, lele.
-- 
nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri
real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia.
lele@metapensiero.it  |                 -- Fortunato Depero, 1929.

[toc] | [prev] | [next] | [standalone]


#40613

FromΝίκος Γκρ33κ <nikos.gr33k@gmail.com>
Date2013-03-06 00:57 -0800
Message-ID<3958fc2b-a2fe-4c85-a104-4c6f551cf787@googlegroups.com>
In reply to#40612
Τη Τετάρτη, 6 Μαρτίου 2013 10:19:06 π.μ. UTC+2, ο χρήστης Lele Gaifax έγραψε:
> Νίκος Γκρ33κ <nikos.gr33k@gmail.com> writes:
> 
> 
> 
> > How can i update  the databse to only contain the existing filenames without losing the previous stored data?
> 
> 
> 
> Basically you need to keep a list (or better, a set) containing all
> 
> current filenames that you are going to insert, and finally do another
> 
> "inverse" loop where you scan all the records and delete those that are
> 
> not present anymore.
> 
> 
> 
> Of course, this assume you have a "bidirectional" identity between the
> 
> filenames you are loading and the records you are inserting, which is
> 
> not the case in the code you show:
> 
> 
> 
> > #read the containing folder and insert new filenames
> 
> > for result in os.walk(path):
> 
> > 	for filename in result[2]:
> 
> 
> 
> Here "filename" is just that, not the full path: this could result in
> 
> collisions, if your are actually loading a *tree* instead of a flat
> 
> directory, that is multiple source files are squeezed into a single
> 
> record in your database (imagine "/foo/index.html" and
> 
> "/foo/subdir/index.html").
> 
> 
> 
> With that in mind, I would do something like the following:
> 
> 
> 
>   # Compute a set of current fullpaths
> 
>   current_fullpaths = set()
> 
>   for root, dirs, files in os.walk(path):
> 
>     for fullpath in files:
> 
>       current_fullpaths.add(os.path.join(root, file))
> 
> 
> 
>   # Load'em
> 
>   for fullpath in current_fullpaths:
> 
>     
> 
>     try:
> 
>       #find the needed counter for the page URL
> 
>       cur.execute('''SELECT URL FROM files WHERE URL = %s''', (fullpath,) ) 
> 
>       data = cur.fetchone()        #URL is unique, so should only be one
> 
> 
> 
>       if not data:
> 
>         #first time for file; primary key is automatic, hit is defaulted
> 
>         cur.execute('''INSERT INTO files (URL, host, lastvisit) VALUES (%s, %s, %s)''', (fullpath, host, date) )
> 
>     except MySQLdb.Error, e:
> 
>       print ( "Query Error: ", sys.exc_info()[1].excepinfo()[2] )
> 
> 
> 
>   # Delete spurious
> 
>   cur.execute('''SELECT url FROM files''')  
> 
>   for rec in cur:
> 
>     fullpath = rec[0]
> 
>     if fullpath not in current_fullpaths:
> 
>       other_cur.execute('''DELETE FROM files WHERE url = %s''', (fullpath,))
> 
> 
> 
> Of course here I am assuming a lot (a typical thing we do to answer your
> 
> questions :-), in particular that the "url" field content matches the
> 
> filesystem layout, which may not be the case. Adapt it to your usecase.
> 
> 
> 
> hope this helps,
> 
> ciao, lele.
> 
> -- 
> 
> nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri
> 
> real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia.
> 
> lele@metapensiero.it  |                 -- Fortunato Depero, 1929.

You are fantastic! Your straightforward logic amazes me!

Thank you very much for making things clear to me!!

But there is a slight problem when iam trying to run the code iam presenting this error ehre you can see its output here:

http://superhost.gr/cgi-bin/files.py

[toc] | [prev] | [next] | [standalone]


#40616

FromLele Gaifax <lele@metapensiero.it>
Date2013-03-06 10:24 +0100
Message-ID<mailman.2931.1362561897.2939.python-list@python.org>
In reply to#40613
Νίκος Γκρ33κ <nikos.gr33k@gmail.com> writes:

> Thank you very much for making things clear to me!!

You're welcome, even more if you spend 1 second to trim your answers
removing unneeded citation :-)

>
> But there is a slight problem when iam trying to run the code iam presenting this error ehre you can see its output here:
>
> http://superhost.gr/cgi-bin/files.py

Sorry, this seems completely unrelated, and from the little snippet that
appear on that page I cannot understand what's going on there.

ciao, lele.
-- 
nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri
real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia.
lele@metapensiero.it  |                 -- Fortunato Depero, 1929.

[toc] | [prev] | [next] | [standalone]


#40617

FromΝίκος Γκρ33κ <nikos.gr33k@gmail.com>
Date2013-03-06 01:41 -0800
Message-ID<28828019-279a-4d8c-a7ea-ba677b4a8165@googlegroups.com>
In reply to#40616
Its about the following line of code:

current_fullpaths.add( os.path.join(root, files) )


that presents the following error:

<type 'exceptions.AttributeError'>: 'list' object has no attribute 'startswith' 
      args = ("'list' object has no attribute 'startswith'",) 
      message = "'list' object has no attribute 'startswith'"

join calls some module that find difficulty when parsing its line:

 /usr/lib64/python2.6/posixpath.py in join(a='/home/nikos/public_html/data/files/', *p=(['\xce\x9a\xcf\x8d\xcf\x81\xce\xb9\xce\xb5 \xce\x99\xce\xb7\xcf\x83\xce\xbf\xcf\x8d \xce\xa7\xcf\x81\xce\xb9\xcf\x83\xcf\x84\xce\xad \xce\x95\xce\xbb\xce\xad\xce\xb7\xcf\x83\xce\xbf\xce\xbd \xce\x9c\xce\xb5.mp3', '\xce\xa0\xce\xb5\xcf\x81\xce\xaf \xcf\x84\xcf\x89\xce\xbd \xce\x9b\xce\xbf\xce\xb3\xce\xb9\xcf\x83\xce\xbc\xcf\x8e\xce\xbd.mp3'],))
   63     path = a
   64     for b in p:
   65         if b.startswith('/'):

[toc] | [prev] | [next] | [standalone]


#40619

FromΝίκος Γκρ33κ <nikos.gr33k@gmail.com>
Date2013-03-06 01:43 -0800
Message-ID<b5d4c6ef-4ec7-4059-8e88-022a16745a1d@googlegroups.com>
In reply to#40617
Perhaps because my filenames is in greek letters that thsi error is presented but i'am not sure.....

Maybe we can join root+files and store it to the set() someway differenyl....

[toc] | [prev] | [next] | [standalone]


#40623

FromBryan Devaney <bryan.devaney@gmail.com>
Date2013-03-06 02:15 -0800
Message-ID<fbe1716e-0b95-4abb-9795-b28388515310@googlegroups.com>
In reply to#40619
On Wednesday, March 6, 2013 9:43:34 AM UTC, Νίκος Γκρ33κ wrote:
> Perhaps because my filenames is in greek letters that thsi error is presented but i'am not sure.....
> 
> 
> 
> Maybe we can join root+files and store it to the set() someway differenyl....

well, the error refers to the line "if b.startswith('/'): " and states "'list' object has no attribute 'startswith'" 

so b is assigned to a list type and list does not have a 'startswith' method or attribute.

I Thought .startswith() was a string method but if it's your own method then I apologize (though if it is, I personally would have made a class that inherited from list rather than adding it to list itself)

can you show where you are assigning b (or if its meant to be a list or string object)

[toc] | [prev] | [next] | [standalone]


#40624

FromBryan Devaney <bryan.devaney@gmail.com>
Date2013-03-06 02:15 -0800
Message-ID<mailman.2935.1362565453.2939.python-list@python.org>
In reply to#40619
On Wednesday, March 6, 2013 9:43:34 AM UTC, Νίκος Γκρ33κ wrote:
> Perhaps because my filenames is in greek letters that thsi error is presented but i'am not sure.....
> 
> 
> 
> Maybe we can join root+files and store it to the set() someway differenyl....

well, the error refers to the line "if b.startswith('/'): " and states "'list' object has no attribute 'startswith'" 

so b is assigned to a list type and list does not have a 'startswith' method or attribute.

I Thought .startswith() was a string method but if it's your own method then I apologize (though if it is, I personally would have made a class that inherited from list rather than adding it to list itself)

can you show where you are assigning b (or if its meant to be a list or string object)

[toc] | [prev] | [next] | [standalone]


#40620

FromΝίκος Γκρ33κ <nikos.gr33k@gmail.com>
Date2013-03-06 01:43 -0800
Message-ID<mailman.2933.1362563023.2939.python-list@python.org>
In reply to#40617
Perhaps because my filenames is in greek letters that thsi error is presented but i'am not sure.....

Maybe we can join root+files and store it to the set() someway differenyl....

[toc] | [prev] | [next] | [standalone]


#40626

FromLele Gaifax <lele@metapensiero.it>
Date2013-03-06 11:27 +0100
Message-ID<mailman.2936.1362565666.2939.python-list@python.org>
In reply to#40617
Νίκος Γκρ33κ <nikos.gr33k@gmail.com> writes:

> Its about the following line of code:
>
> current_fullpaths.add( os.path.join(root, files) )

I'm sorry, typo on my part. 

That should have been "fullpath", not "file" (and neither "files" as you
wrongly reported back!):

  # Compute a set of current fullpaths
  current_fullpaths = set()
  for root, dirs, files in os.walk(path):
    for fullpath in files:
      current_fullpaths.add(os.path.join(root, fullpath))

ciao, lele.
-- 
nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri
real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia.
lele@metapensiero.it  |                 -- Fortunato Depero, 1929.

[toc] | [prev] | [next] | [standalone]


#40635

FromDave Angel <davea@davea.name>
Date2013-03-06 08:31 -0500
Message-ID<mailman.2945.1362576723.2939.python-list@python.org>
In reply to#40617
On 03/06/2013 05:27 AM, Lele Gaifax wrote:
> Νίκος Γκρ33κ <nikos.gr33k@gmail.com> writes:
>
>> Its about the following line of code:
>>
>> current_fullpaths.add( os.path.join(root, files) )
>
> I'm sorry, typo on my part.
>
> That should have been "fullpath", not "file" (and neither "files" as you
> wrongly reported back!):
>
>    # Compute a set of current fullpaths
>    current_fullpaths = set()
>    for root, dirs, files in os.walk(path):
>      for fullpath in files:

'fullpath' is a rather misleading name to use, since the 'files' list 
contains only the terminal node of the file name.  It's only a full path 
after you do the following join.

>        current_fullpaths.add(os.path.join(root, fullpath))
>


-- 
DaveA

[toc] | [prev] | [next] | [standalone]


#40639

FromLele Gaifax <lele@metapensiero.it>
Date2013-03-06 15:16 +0100
Message-ID<mailman.2948.1362579381.2939.python-list@python.org>
In reply to#40617
Dave Angel <davea@davea.name> writes:

>>    # Compute a set of current fullpaths
>>    current_fullpaths = set()
>>    for root, dirs, files in os.walk(path):
>>      for fullpath in files:
>
> 'fullpath' is a rather misleading name to use, since the 'files' list
> contains only the terminal node of the file name.  It's only a full
> path after you do the following join.

Yes, you're right. 

Dunno what urged me to ``M-x replace-string file fullpath`` introducing
both an error and a bad variable name, most probably the unconscious
desire of not clobbering builtin names... :-)

Thanks for pointing it out,
ciao, lele.
-- 
nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri
real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia.
lele@metapensiero.it  |                 -- Fortunato Depero, 1929.

[toc] | [prev] | [next] | [standalone]


#40618

FromΝίκος Γκρ33κ <nikos.gr33k@gmail.com>
Date2013-03-06 01:41 -0800
Message-ID<mailman.2932.1362562894.2939.python-list@python.org>
In reply to#40616
Its about the following line of code:

current_fullpaths.add( os.path.join(root, files) )


that presents the following error:

<type 'exceptions.AttributeError'>: 'list' object has no attribute 'startswith' 
      args = ("'list' object has no attribute 'startswith'",) 
      message = "'list' object has no attribute 'startswith'"

join calls some module that find difficulty when parsing its line:

 /usr/lib64/python2.6/posixpath.py in join(a='/home/nikos/public_html/data/files/', *p=(['\xce\x9a\xcf\x8d\xcf\x81\xce\xb9\xce\xb5 \xce\x99\xce\xb7\xcf\x83\xce\xbf\xcf\x8d \xce\xa7\xcf\x81\xce\xb9\xcf\x83\xcf\x84\xce\xad \xce\x95\xce\xbb\xce\xad\xce\xb7\xcf\x83\xce\xbf\xce\xbd \xce\x9c\xce\xb5.mp3', '\xce\xa0\xce\xb5\xcf\x81\xce\xaf \xcf\x84\xcf\x89\xce\xbd \xce\x9b\xce\xbf\xce\xb3\xce\xb9\xcf\x83\xce\xbc\xcf\x8e\xce\xbd.mp3'],))
   63     path = a
   64     for b in p:
   65         if b.startswith('/'):

[toc] | [prev] | [next] | [standalone]


#40615

FromΝίκος Γκρ33κ <nikos.gr33k@gmail.com>
Date2013-03-06 00:57 -0800
Message-ID<mailman.2930.1362560945.2939.python-list@python.org>
In reply to#40612
Τη Τετάρτη, 6 Μαρτίου 2013 10:19:06 π.μ. UTC+2, ο χρήστης Lele Gaifax έγραψε:
> Νίκος Γκρ33κ <nikos.gr33k@gmail.com> writes:
> 
> 
> 
> > How can i update  the databse to only contain the existing filenames without losing the previous stored data?
> 
> 
> 
> Basically you need to keep a list (or better, a set) containing all
> 
> current filenames that you are going to insert, and finally do another
> 
> "inverse" loop where you scan all the records and delete those that are
> 
> not present anymore.
> 
> 
> 
> Of course, this assume you have a "bidirectional" identity between the
> 
> filenames you are loading and the records you are inserting, which is
> 
> not the case in the code you show:
> 
> 
> 
> > #read the containing folder and insert new filenames
> 
> > for result in os.walk(path):
> 
> > 	for filename in result[2]:
> 
> 
> 
> Here "filename" is just that, not the full path: this could result in
> 
> collisions, if your are actually loading a *tree* instead of a flat
> 
> directory, that is multiple source files are squeezed into a single
> 
> record in your database (imagine "/foo/index.html" and
> 
> "/foo/subdir/index.html").
> 
> 
> 
> With that in mind, I would do something like the following:
> 
> 
> 
>   # Compute a set of current fullpaths
> 
>   current_fullpaths = set()
> 
>   for root, dirs, files in os.walk(path):
> 
>     for fullpath in files:
> 
>       current_fullpaths.add(os.path.join(root, file))
> 
> 
> 
>   # Load'em
> 
>   for fullpath in current_fullpaths:
> 
>     
> 
>     try:
> 
>       #find the needed counter for the page URL
> 
>       cur.execute('''SELECT URL FROM files WHERE URL = %s''', (fullpath,) ) 
> 
>       data = cur.fetchone()        #URL is unique, so should only be one
> 
> 
> 
>       if not data:
> 
>         #first time for file; primary key is automatic, hit is defaulted
> 
>         cur.execute('''INSERT INTO files (URL, host, lastvisit) VALUES (%s, %s, %s)''', (fullpath, host, date) )
> 
>     except MySQLdb.Error, e:
> 
>       print ( "Query Error: ", sys.exc_info()[1].excepinfo()[2] )
> 
> 
> 
>   # Delete spurious
> 
>   cur.execute('''SELECT url FROM files''')  
> 
>   for rec in cur:
> 
>     fullpath = rec[0]
> 
>     if fullpath not in current_fullpaths:
> 
>       other_cur.execute('''DELETE FROM files WHERE url = %s''', (fullpath,))
> 
> 
> 
> Of course here I am assuming a lot (a typical thing we do to answer your
> 
> questions :-), in particular that the "url" field content matches the
> 
> filesystem layout, which may not be the case. Adapt it to your usecase.
> 
> 
> 
> hope this helps,
> 
> ciao, lele.
> 
> -- 
> 
> nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri
> 
> real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia.
> 
> lele@metapensiero.it  |                 -- Fortunato Depero, 1929.

You are fantastic! Your straightforward logic amazes me!

Thank you very much for making things clear to me!!

But there is a slight problem when iam trying to run the code iam presenting this error ehre you can see its output here:

http://superhost.gr/cgi-bin/files.py

[toc] | [prev] | [next] | [standalone]


#40622 — Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64)

FromWong Wah Meng-R32813 <r32813@freescale.com>
Date2013-03-06 10:11 +0000
SubjectSet x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64)
Message-ID<mailman.2934.1362564689.2939.python-list@python.org>
In reply to#40609
Hello there,

I am using python 2.7.1 built on HP-11.23 a Itanium 64 bit box. 

I discovered following behavior whereby the python process doesn't seem to release memory utilized even after a variable is set to None, and "deleted". I use glance tool to monitor the memory utilized by this process. Obviously after the for loop is executed, the memory used by this process has hiked to a few MB. However, after "del" is executed to both I and str variables, the memory of that process still stays at where it was. 

Any idea why?

>>> for i in range(100000L):
...     str=str+"%s"%(i,) 
... 
>>> i=None
>>> str=None
>>> del i
>>> del str

[toc] | [prev] | [next] | [standalone]


#40625 — Re: Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64)

FromBryan Devaney <bryan.devaney@gmail.com>
Date2013-03-06 02:25 -0800
SubjectRe: Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64)
Message-ID<18272cff-cfb0-4bd3-8f5d-3b4641ba828e@googlegroups.com>
In reply to#40622
On Wednesday, March 6, 2013 10:11:12 AM UTC, Wong Wah Meng-R32813 wrote:
> Hello there,
> 
> 
> 
> I am using python 2.7.1 built on HP-11.23 a Itanium 64 bit box. 
> 
> 
> 
> I discovered following behavior whereby the python process doesn't seem to release memory utilized even after a variable is set to None, and "deleted". I use glance tool to monitor the memory utilized by this process. Obviously after the for loop is executed, the memory used by this process has hiked to a few MB. However, after "del" is executed to both I and str variables, the memory of that process still stays at where it was. 
> 
> 
> 
> Any idea why?
> 
> 
> 
> >>> for i in range(100000L):
> 
> ...     str=str+"%s"%(i,) 
> 
> ... 
> 
> >>> i=None
> 
> >>> str=None
> 
> >>> del i
> 
> >>> del str

Hi, I'm new here so I'm making mistakes too but I know they don't like it when you ask your question in someone else's question. 

that being said, to answer your question:

Python uses a 'garbage collector'. When you delete something, all references are removed from the object in memory, the memory itself will not be freed until the next time the garbage collector runs. When that happens, all objects without references in memory are removed and the memory freed.  If you wait a while you should see that memory free itself.

[toc] | [prev] | [next] | [standalone]


#40630 — RE: Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64)

FromWong Wah Meng-R32813 <r32813@freescale.com>
Date2013-03-06 12:31 +0000
SubjectRE: Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64)
Message-ID<mailman.2940.1362574941.2939.python-list@python.org>
In reply to#40625
Apologies as after I have left the group for a while I have forgotten how not to post a question on top of another question. Very sorry and appreciate your replies. 

I tried explicitly calling gc.collect() and didn't manage to see the memory footprint reduced. I probably haven't left the process idle long enough to see the internal garbage collection takes place but I will leave it idle for more than 8 hours and check again. Thanks! 

-----Original Message-----
From: Python-list [mailto:python-list-bounces+wahmeng=freescale.com@python.org] On Behalf Of Bryan Devaney
Sent: Wednesday, March 06, 2013 6:25 PM
To: python-list@python.org
Cc: python-list@python.org
Subject: Re: Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64)

On Wednesday, March 6, 2013 10:11:12 AM UTC, Wong Wah Meng-R32813 wrote:
> Hello there,
> 
> 
> 
> I am using python 2.7.1 built on HP-11.23 a Itanium 64 bit box. 
> 
> 
> 
> I discovered following behavior whereby the python process doesn't seem to release memory utilized even after a variable is set to None, and "deleted". I use glance tool to monitor the memory utilized by this process. Obviously after the for loop is executed, the memory used by this process has hiked to a few MB. However, after "del" is executed to both I and str variables, the memory of that process still stays at where it was. 
> 
> 
> 
> Any idea why?
> 
> 
> 
> >>> for i in range(100000L):
> 
> ...     str=str+"%s"%(i,) 
> 
> ... 
> 
> >>> i=None
> 
> >>> str=None
> 
> >>> del i
> 
> >>> del str

Hi, I'm new here so I'm making mistakes too but I know they don't like it when you ask your question in someone else's question. 

that being said, to answer your question:

Python uses a 'garbage collector'. When you delete something, all references are removed from the object in memory, the memory itself will not be freed until the next time the garbage collector runs. When that happens, all objects without references in memory are removed and the memory freed.  If you wait a while you should see that memory free itself.
-- 
http://mail.python.org/mailman/listinfo/python-list

[toc] | [prev] | [next] | [standalone]


#40633 — Re: Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64)

FromDave Angel <davea@davea.name>
Date2013-03-06 08:18 -0500
SubjectRe: Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64)
Message-ID<mailman.2943.1362575899.2939.python-list@python.org>
In reply to#40625
On 03/06/2013 05:25 AM, Bryan Devaney wrote:
> On Wednesday, March 6, 2013 10:11:12 AM UTC, Wong Wah Meng-R32813 wrote:
>> Hello there,
>>
>>
>>
>> I am using python 2.7.1 built on HP-11.23 a Itanium 64 bit box.
>>
>>
>>
>> I discovered following behavior whereby the python process doesn't seem to release memory utilized even after a variable is set to None, and "deleted". I use glance tool to monitor the memory utilized by this process. Obviously after the for loop is executed, the memory used by this process has hiked to a few MB. However, after "del" is executed to both I and str variables, the memory of that process still stays at where it was.
>>
>>   <SNIP>
>>
>
> Python uses a 'garbage collector'. When you delete something, all references are removed from the object in memory, the memory itself will not be freed until the next time the garbage collector runs. When that happens, all objects without references in memory are removed and the memory freed.  If you wait a while you should see that memory free itself.
>

Actually, no.  The problem with monitoring memory usage from outside the 
process is that memory "ownership" is hierarchical, and each hierarchy 
deals in bigger chunks.  So when the CPython runtime calls free() on a 
particular piece of memory, the C runtime may or may not actually 
release the memory for use by other processes.  Since the C runtime 
grabs big pieces from the OS, and parcels out little pieces to CPython, 
a particular big piece can only be freed if ALL the little pieces are 
free.  And even then, it may or may not choose to do so.

Completely separate from that are the two mechanisms that CPython uses 
to free its pieces.  It does reference counting, and it does garbage 
collecting. In this case, only the reference counting is relevant, as 
when it's done there's no garbage left to collect.  When an object is no 
longer referenced by anything, its count will be zero, and it will be 
freed by calling the C library function.  GC is only interesting when 
there are cycles in the references, such as when a list contains as one 
of its elements a tuple, which in turn contains the original list. 
Sound silly?  No, it's quite common once complex objects are created 
which reference each other.  The counts don't go to zero, and the 
objects wait for garbage collection.

OP:  There's no need to set to None and also to del the name.  Since 
there's only one None object, keeping another named reference to that 
object has very little cost.



-- 
DaveA

[toc] | [prev] | [next] | [standalone]


#40634 — Re: Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64)

FromDave Angel <davea@davea.name>
Date2013-03-06 08:25 -0500
SubjectRe: Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64)
Message-ID<mailman.2944.1362576355.2939.python-list@python.org>
In reply to#40625
On 03/06/2013 07:31 AM, Wong Wah Meng-R32813 wrote:
> Apologies as after I have left the group for a while I have forgotten how not to post a question on top of another question. Very sorry and appreciate your replies.
>
> I tried explicitly calling gc.collect() and didn't manage to see the memory footprint reduced. I probably haven't left the process idle long enough to see the internal garbage collection takes place but I will leave it idle for more than 8 hours and check again. Thanks!
>

You're top-posting, which makes things very confusing, since your 
contribution to the message is out of accepted order.  Put your remarks 
after the part you're commenting on, and delete anything following your 
message, as it clearly didn't need your comments.

Once you've called gc.collect(), there's no point in waiting 8 hours for 
it to run again.  It either triggered the C runtime's logic or it 
didn't, and running it again won't help unless in the meantime you 
rearranged the remaining allocated blocks.

Accept the fact that not all freeing of memory blocks can possibly make 
it through to the OS.  If they did, we'd have a minimum object size of 
at least 4k on the Pentium, and larger on some other processors.  We'd 
also have performance that would crawl. So an external tool can only 
give you a very approximate size for what's going on in your own code.

-- 
DaveA

[toc] | [prev] | [next] | [standalone]


#40628 — Re: Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64)

FromBryan Devaney <bryan.devaney@gmail.com>
Date2013-03-06 02:25 -0800
SubjectRe: Set x to to None and del x doesn't release memory in python 2.7.1 (HPUX 11.23, ia64)
Message-ID<mailman.2938.1362568281.2939.python-list@python.org>
In reply to#40622
On Wednesday, March 6, 2013 10:11:12 AM UTC, Wong Wah Meng-R32813 wrote:
> Hello there,
> 
> 
> 
> I am using python 2.7.1 built on HP-11.23 a Itanium 64 bit box. 
> 
> 
> 
> I discovered following behavior whereby the python process doesn't seem to release memory utilized even after a variable is set to None, and "deleted". I use glance tool to monitor the memory utilized by this process. Obviously after the for loop is executed, the memory used by this process has hiked to a few MB. However, after "del" is executed to both I and str variables, the memory of that process still stays at where it was. 
> 
> 
> 
> Any idea why?
> 
> 
> 
> >>> for i in range(100000L):
> 
> ...     str=str+"%s"%(i,) 
> 
> ... 
> 
> >>> i=None
> 
> >>> str=None
> 
> >>> del i
> 
> >>> del str

Hi, I'm new here so I'm making mistakes too but I know they don't like it when you ask your question in someone else's question. 

that being said, to answer your question:

Python uses a 'garbage collector'. When you delete something, all references are removed from the object in memory, the memory itself will not be freed until the next time the garbage collector runs. When that happens, all objects without references in memory are removed and the memory freed.  If you wait a while you should see that memory free itself.

[toc] | [prev] | [next] | [standalone]


Page 1 of 2  [1] 2  Next page →

Back to top | Article view | comp.lang.python


csiph-web