Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #102833 > unrolled thread
| Started by | Paulo da Silva <p_s_d_a_s_i_l_v_a_ns@netcabo.pt> |
|---|---|
| First post | 2016-02-12 00:31 +0000 |
| Last post | 2016-02-12 11:46 +0530 |
| Articles | 20 — 11 participants |
Back to article view | Back to comp.lang.python
Storing a big amount of path names Paulo da Silva <p_s_d_a_s_i_l_v_a_ns@netcabo.pt> - 2016-02-12 00:31 +0000
Re: Storing a big amount of path names Chris Angelico <rosuav@gmail.com> - 2016-02-12 11:39 +1100
Re: Storing a big amount of path names Ben Finney <ben+python@benfinney.id.au> - 2016-02-12 11:44 +1100
Re: Storing a big amount of path names Tim Chase <python.list@tim.thechases.com> - 2016-02-11 19:13 -0600
Re: Storing a big amount of path names Rob Gaddi <rgaddi@highlandtechnology.invalid> - 2016-02-12 02:17 +0000
Re: Storing a big amount of path names MRAB <python@mrabarnett.plus.com> - 2016-02-12 03:13 +0000
Re: Storing a big amount of path names Chris Angelico <rosuav@gmail.com> - 2016-02-12 14:49 +1100
Re: Storing a big amount of path names Paulo da Silva <p_s_d_a_s_i_l_v_a_ns@netcabo.pt> - 2016-02-12 04:15 +0000
Re: Storing a big amount of path names Chris Angelico <rosuav@gmail.com> - 2016-02-12 15:23 +1100
Re: Storing a big amount of path names Paulo da Silva <p_s_d_a_s_i_l_v_a_ns@netcabo.pt> - 2016-02-12 04:45 +0000
Re: Storing a big amount of path names Chris Angelico <rosuav@gmail.com> - 2016-02-12 16:02 +1100
Re: Storing a big amount of path names Paulo da Silva <p_s_d_a_s_i_l_v_a_ns@netcabo.pt> - 2016-02-12 05:49 +0000
Re: Storing a big amount of path names Steven D'Aprano <steve@pearwood.info> - 2016-02-12 16:51 +1100
Re: Storing a big amount of path names Rob Gaddi <rgaddi@highlandtechnology.invalid> - 2016-02-12 17:05 +0000
Re: Storing a big amount of path names Chris Angelico <rosuav@gmail.com> - 2016-02-13 04:18 +1100
Re: Storing a big amount of path names Mark Lawrence <breamoreboy@yahoo.co.uk> - 2016-02-12 21:37 +0000
Re: Storing a big amount of path names Ben Finney <ben+python@benfinney.id.au> - 2016-02-13 08:49 +1100
Re: Storing a big amount of path names Matt Wheeler <funkyhat@gmail.com> - 2016-02-12 23:31 +0000
Re: Storing a big amount of path names mkondrashin@gmail.com - 2016-02-13 12:19 -0800
Re: Storing a big amount of path names srinivas devaki <mr.eightnoteight@gmail.com> - 2016-02-12 11:46 +0530
| From | Paulo da Silva <p_s_d_a_s_i_l_v_a_ns@netcabo.pt> |
|---|---|
| Date | 2016-02-12 00:31 +0000 |
| Subject | Storing a big amount of path names |
| Message-ID | <n9j94f$712$1@gioia.aioe.org> |
Hi!
What is the best (shortest memory usage) way to store lots of pathnames
in memory where:
1. Path names are pathname=(dirname,filename)
2. There many different dirnames but much less than pathnames
3. dirnames have in general many chars
The idea is to share the common dirnames.
More realistically not only the pathnames are stored but objects each
object being a MyFile containing
self.name - <base name>
getPathname(self) - <full pathname>
other stuff
class MyFile:
__allfiles=[]
def __init__(self,dirname,filename):
self.dirname=dirname # But I want to share this with other files
self.name=filename
MyFile.__allfiles.append(self)
...
def getPathname(self):
return os.path.join(self.dirname,self.name)
...
Thanks for any suggestion.
Paulo
[toc] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2016-02-12 11:39 +1100 |
| Message-ID | <mailman.60.1455237579.22075.python-list@python.org> |
| In reply to | #102833 |
On Fri, Feb 12, 2016 at 11:31 AM, Paulo da Silva <p_s_d_a_s_i_l_v_a_ns@netcabo.pt> wrote: > What is the best (shortest memory usage) way to store lots of pathnames > in memory where: > > 1. Path names are pathname=(dirname,filename) > 2. There many different dirnames but much less than pathnames > 3. dirnames have in general many chars > > The idea is to share the common dirnames. > > More realistically not only the pathnames are stored but objects each > object being a MyFile containing > self.name - <base name> > getPathname(self) - <full pathname> > other stuff Just store them in the most obvious way, and don't worry about memory usage. How many path names are you likely to have? A million? You can still afford to have 1KB pathnames and it'll take up no more than a gigabyte of RAM - and most computers throw around gigs of virtual memory like it's nothing. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Ben Finney <ben+python@benfinney.id.au> |
|---|---|
| Date | 2016-02-12 11:44 +1100 |
| Message-ID | <mailman.61.1455237865.22075.python-list@python.org> |
| In reply to | #102833 |
Paulo da Silva <p_s_d_a_s_i_l_v_a_ns@netcabo.pt> writes: > What is the best (shortest memory usage) way to store lots of > pathnames in memory I challenge the premise. Why is “shortest memory usage” your criterion for “best”, here? How have you determined that factors like “easily understandable when reading”, or “using standard Python idioms”, are less important? As for “lots of pathnames”, how many are you expecting? Python's built-in container types are highly optimised for quite large amounts of data. Have you measured an implementation with normal built-in container types with your expected quantity of items, and confirmed that the performance is unacceptable? > Thanks for any suggestion. I would suggest that the assumption you have too much data for Python's built-in container types, is an assumption that should be rigorously tested because it is likely not true. -- \ “We suffer primarily not from our vices or our weaknesses, but | `\ from our illusions.” —Daniel J. Boorstin, historian, 1914–2004 | _o__) | Ben Finney
[toc] | [prev] | [next] | [standalone]
| From | Tim Chase <python.list@tim.thechases.com> |
|---|---|
| Date | 2016-02-11 19:13 -0600 |
| Message-ID | <mailman.63.1455239783.22075.python-list@python.org> |
| In reply to | #102833 |
On 2016-02-12 00:31, Paulo da Silva wrote: > What is the best (shortest memory usage) way to store lots of > pathnames in memory where: > > 1. Path names are pathname=(dirname,filename) > 2. There many different dirnames but much less than pathnames > 3. dirnames have in general many chars > > The idea is to share the common dirnames. Well, you can create a dict that has dirname->list(filenames) which will reduce the dirname to a single instance. You could store that dict in the class, shared by all of the instances, though that starts to pick up a code-smell. But unless you're talking about an obscenely large number of dirnames & filenames, or a severely resource-limited machine, just use the default built-ins. If you start to push the boundaries of system resources, then I'd try the "anydbm" module or use the "shelve" module to marshal them out to disk. Finally, you *could* create an actual sqlite database on disk if size really does exceed reasonable system specs. -tkc
[toc] | [prev] | [next] | [standalone]
| From | Rob Gaddi <rgaddi@highlandtechnology.invalid> |
|---|---|
| Date | 2016-02-12 02:17 +0000 |
| Message-ID | <n9jfcc$oqr$1@dont-email.me> |
| In reply to | #102837 |
Tim Chase wrote: > On 2016-02-12 00:31, Paulo da Silva wrote: >> What is the best (shortest memory usage) way to store lots of >> pathnames in memory where: >> >> 1. Path names are pathname=(dirname,filename) >> 2. There many different dirnames but much less than pathnames >> 3. dirnames have in general many chars >> >> The idea is to share the common dirnames. > > Well, you can create a dict that has dirname->list(filenames) which > will reduce the dirname to a single instance. You could store that > dict in the class, shared by all of the instances, though that starts > to pick up a code-smell. > > But unless you're talking about an obscenely large number of > dirnames & filenames, or a severely resource-limited machine, just > use the default built-ins. If you start to push the boundaries of > system resources, then I'd try the "anydbm" module or use the > "shelve" module to marshal them out to disk. Finally, you *could* > create an actual sqlite database on disk if size really does exceed > reasonable system specs. > > -tkc > Probably more memory efficient to make a list of lists, and just declare that element[0] of each list is the dirname. That way you're not wasting memory on the unused entryies of the hashtable. But unless the OP has both a) plus of a million entries and b) let's say at least 20 filenames to each dirname, it's not worth doing. Now, if you do really have a million entries, one thing that would help with memory is setting __slots__ for MyFile rather than letting it create an instance dictionary for each one. -- Rob Gaddi, Highland Technology -- www.highlandtechnology.com Email address domain is currently out of order. See above to fix.
[toc] | [prev] | [next] | [standalone]
| From | MRAB <python@mrabarnett.plus.com> |
|---|---|
| Date | 2016-02-12 03:13 +0000 |
| Message-ID | <mailman.65.1455246809.22075.python-list@python.org> |
| In reply to | #102833 |
On 2016-02-12 00:31, Paulo da Silva wrote:
> Hi!
>
> What is the best (shortest memory usage) way to store lots of pathnames
> in memory where:
>
> 1. Path names are pathname=(dirname,filename)
> 2. There many different dirnames but much less than pathnames
> 3. dirnames have in general many chars
>
> The idea is to share the common dirnames.
>
> More realistically not only the pathnames are stored but objects each
> object being a MyFile containing
> self.name - <base name>
> getPathname(self) - <full pathname>
> other stuff
>
> class MyFile:
>
> __allfiles=[]
>
> def __init__(self,dirname,filename):
> self.dirname=dirname # But I want to share this with other files
> self.name=filename
> MyFile.__allfiles.append(self)
> ...
>
> def getPathname(self):
> return os.path.join(self.dirname,self.name)
>
> ...
>
Apart from all of the other answers that have been given:
>>> p1 = 'foo/bar'
>>> p2 = 'foo/bar'
>>> id(p1), id(p2)
(982008930176, 982008930120)
>>> d = {}
>>> id(d.setdefault(p1, p1))
982008930176
>>> id(d.setdefault(p2, p2))
982008930176
The dict maps equal strings (dirnames) to the same string, so you won't
have multiple copies.
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2016-02-12 14:49 +1100 |
| Message-ID | <mailman.66.1455248959.22075.python-list@python.org> |
| In reply to | #102833 |
On Fri, Feb 12, 2016 at 2:13 PM, MRAB <python@mrabarnett.plus.com> wrote:
> Apart from all of the other answers that have been given:
>
>>>> p1 = 'foo/bar'
>>>> p2 = 'foo/bar'
>>>> id(p1), id(p2)
> (982008930176, 982008930120)
>>>> d = {}
>>>> id(d.setdefault(p1, p1))
> 982008930176
>>>> id(d.setdefault(p2, p2))
> 982008930176
>
> The dict maps equal strings (dirnames) to the same string, so you won't have
> multiple copies.
Simpler to let the language do that for you:
>>> import sys
>>> p1 = sys.intern('foo/bar')
>>> p2 = sys.intern('foo/bar')
>>> id(p1), id(p2)
(139621017266528, 139621017266528)
ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Paulo da Silva <p_s_d_a_s_i_l_v_a_ns@netcabo.pt> |
|---|---|
| Date | 2016-02-12 04:15 +0000 |
| Message-ID | <n9jm9n$m77$1@gioia.aioe.org> |
| In reply to | #102841 |
Às 03:49 de 12-02-2016, Chris Angelico escreveu:
> On Fri, Feb 12, 2016 at 2:13 PM, MRAB <python@mrabarnett.plus.com> wrote:
>> Apart from all of the other answers that have been given:
>>
...
>
> Simpler to let the language do that for you:
>
>>>> import sys
>>>> p1 = sys.intern('foo/bar')
>>>> p2 = sys.intern('foo/bar')
>>>> id(p1), id(p2)
> (139621017266528, 139621017266528)
>
I didn't know about id or sys.intern :-)
I need to look at them ...
As I can understand I can do in MyFile class
self.dirname=sys.intern(dirname) # dirname passed as arg to the __init__
and the character string doesn't get repeated.
Is this correct?
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2016-02-12 15:23 +1100 |
| Message-ID | <mailman.67.1455251003.22075.python-list@python.org> |
| In reply to | #102842 |
On Fri, Feb 12, 2016 at 3:15 PM, Paulo da Silva
<p_s_d_a_s_i_l_v_a_ns@netcabo.pt> wrote:
> Às 03:49 de 12-02-2016, Chris Angelico escreveu:
>> On Fri, Feb 12, 2016 at 2:13 PM, MRAB <python@mrabarnett.plus.com> wrote:
>>> Apart from all of the other answers that have been given:
>>>
> ...
>>
>> Simpler to let the language do that for you:
>>
>>>>> import sys
>>>>> p1 = sys.intern('foo/bar')
>>>>> p2 = sys.intern('foo/bar')
>>>>> id(p1), id(p2)
>> (139621017266528, 139621017266528)
>>
>
> I didn't know about id or sys.intern :-)
> I need to look at them ...
>
> As I can understand I can do in MyFile class
>
> self.dirname=sys.intern(dirname) # dirname passed as arg to the __init__
>
> and the character string doesn't get repeated.
> Is this correct?
Correct. Two equal strings, passed to sys.intern(), will come back as
identical strings, which means they use the same memory. You can have
a million references to the same string and it takes up no additional
memory.
But I reiterate: Don't even bother with this unless you know your
program is running short of memory. Start by coding things in the
simple and obvious way, and then fix problems only when you see them.
ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Paulo da Silva <p_s_d_a_s_i_l_v_a_ns@netcabo.pt> |
|---|---|
| Date | 2016-02-12 04:45 +0000 |
| Message-ID | <n9jo24$o42$1@gioia.aioe.org> |
| In reply to | #102843 |
Às 04:23 de 12-02-2016, Chris Angelico escreveu:
> On Fri, Feb 12, 2016 at 3:15 PM, Paulo da Silva
> <p_s_d_a_s_i_l_v_a_ns@netcabo.pt> wrote:
>> Às 03:49 de 12-02-2016, Chris Angelico escreveu:
>>> On Fri, Feb 12, 2016 at 2:13 PM, MRAB <python@mrabarnett.plus.com> wrote:
>>>> Apart from all of the other answers that have been given:
>>>>
>> ...
>>>
>>> Simpler to let the language do that for you:
>>>
>>>>>> import sys
>>>>>> p1 = sys.intern('foo/bar')
>>>>>> p2 = sys.intern('foo/bar')
>>>>>> id(p1), id(p2)
>>> (139621017266528, 139621017266528)
>>>
>>
>> I didn't know about id or sys.intern :-)
>> I need to look at them ...
>>
>> As I can understand I can do in MyFile class
>>
>> self.dirname=sys.intern(dirname) # dirname passed as arg to the __init__
>>
>> and the character string doesn't get repeated.
>> Is this correct?
>
> Correct. Two equal strings, passed to sys.intern(), will come back as
> identical strings, which means they use the same memory. You can have
> a million references to the same string and it takes up no additional
> memory.
I have being playing with this and found that it is not always true!
For example:
In [1]: def f(s):
...: print(id(sys.intern(s)))
...:
In [2]: import sys
In [3]: f("12345")
139805480756480
In [4]: f("12345")
139805480755640
In [5]: f("12345")
139805480756480
In [6]: f("12345")
139805480756480
In [7]: f("12345")
139805480750864
I think a dict, as MRAB suggested, is needed.
At the end of the store process I may delete the dict.
>
> But I reiterate: Don't even bother with this unless you know your
> program is running short of memory.
Yes, it is.
This is part of a previous post (sets of equal files) and I need lots of
memory for performance reasons. I only have 2G in this computer.
I already had implemented a solution. I used two dicts. One to map
dirnames to an int handler and the other to map the handler to dir
names. At the end I deleted the 1st. one because I only need to get the
dirname from the handler. But I thought there should be a better choice.
Thanks
Paulo
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2016-02-12 16:02 +1100 |
| Message-ID | <mailman.68.1455253371.22075.python-list@python.org> |
| In reply to | #102844 |
On Fri, Feb 12, 2016 at 3:45 PM, Paulo da Silva
<p_s_d_a_s_i_l_v_a_ns@netcabo.pt> wrote:
>> Correct. Two equal strings, passed to sys.intern(), will come back as
>> identical strings, which means they use the same memory. You can have
>> a million references to the same string and it takes up no additional
>> memory.
> I have being playing with this and found that it is not always true!
> For example:
>
> In [1]: def f(s):
> ...: print(id(sys.intern(s)))
> ...:
>
> In [2]: import sys
>
> In [3]: f("12345")
> 139805480756480
>
> In [4]: f("12345")
> 139805480755640
>
> In [5]: f("12345")
> 139805480756480
>
> In [6]: f("12345")
> 139805480756480
>
> In [7]: f("12345")
> 139805480750864
>
> I think a dict, as MRAB suggested, is needed.
> At the end of the store process I may delete the dict.
I'm not 100% sure of what's going on here, but my suspicion is that a
string that isn't being used is allowed to be flushed from the
dictionary. If you retain a reference to the string (not to its id,
but to the string itself), you shouldn't see that change. By doing the
dict yourself, you guarantee that ALL the strings will be retained,
which can never be _less_ memory than interning them all, and can
easily be _more_.
>> But I reiterate: Don't even bother with this unless you know your
>> program is running short of memory.
>
> Yes, it is.
> This is part of a previous post (sets of equal files) and I need lots of
> memory for performance reasons. I only have 2G in this computer.
How many files, roughly? Do you ever look at the contents of the
files? Most likely, you'll be dwarfing the files' names with their
contents. Unless you actually have over two million unique files, each
one with over a thousand characters in the name, you can't use all
that 2GB with file names.
If virtual memory is active, all that'll happen is that you dip into
the swapper / page file a bit... and THAT is when you start looking at
reducing memory usage. Don't bother optimizing until you need to, and
even then, you measure first to see what part of the program actually
needs to be optimized.
> I already had implemented a solution. I used two dicts. One to map
> dirnames to an int handler and the other to map the handler to dir
> names. At the end I deleted the 1st. one because I only need to get the
> dirname from the handler. But I thought there should be a better choice.
If all your dir names are interned, their identities (approximately
the values returned by id(), but not quite) will be those handlers for
you, without any overhead and without any complexity.
ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Paulo da Silva <p_s_d_a_s_i_l_v_a_ns@netcabo.pt> |
|---|---|
| Date | 2016-02-12 05:49 +0000 |
| Message-ID | <n9jrq0$sce$1@gioia.aioe.org> |
| In reply to | #102845 |
Às 05:02 de 12-02-2016, Chris Angelico escreveu: > On Fri, Feb 12, 2016 at 3:45 PM, Paulo da Silva > <p_s_d_a_s_i_l_v_a_ns@netcabo.pt> wrote: ... >> I think a dict, as MRAB suggested, is needed. >> At the end of the store process I may delete the dict. > > I'm not 100% sure of what's going on here, but my suspicion is that a > string that isn't being used is allowed to be flushed from the > dictionary. You are right. I have tried with a small class and it seems to work. Thanks. ... > > How many files, roughly? Do you ever look at the contents of the > files? Most likely, you'll be dwarfing the files' names with their > contents. Unless you actually have over two million unique files, each > one with over a thousand characters in the name, you can't use all > that 2GB with file names. That's not only the filenames. The more memory I have more expensive but faster algorithm I can implement. Thank you very much for your nice suggestion which also contributed to my Python knowledge. Thank you all who responded. Paulo
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| Date | 2016-02-12 16:51 +1100 |
| Message-ID | <56bd72db$0$1615$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #102845 |
On Fri, 12 Feb 2016 04:02 pm, Chris Angelico wrote:
> On Fri, Feb 12, 2016 at 3:45 PM, Paulo da Silva
> <p_s_d_a_s_i_l_v_a_ns@netcabo.pt> wrote:
>>> Correct. Two equal strings, passed to sys.intern(), will come back as
>>> identical strings, which means they use the same memory. You can have
>>> a million references to the same string and it takes up no additional
>>> memory.
>> I have being playing with this and found that it is not always true!
It is true, but only for the lifetime of the string. Once the string is
garbage collected, it is removed from the cache as well. If you then add
the string again, you may not get the same id.
py> mystr = "hello world"
py> str2 = sys.intern(mystr)
py> str3 = "hello world"
py> mystr is str2 # same string object, as str2 is interned
True
py> mystr is str3 # not the same string object
False
But if we delete all references to the string objects, the intern cache is
also flushed, and we may not get the same id:
py> del str2, str3
py> id(mystr) # remember this ID number
3079482600
py> del mystr
py> id(sys.intern("hello world")) # a new entry in the cache
3079227624
This is the behaviour you want: if a string is completely deleted, you don't
want it remaining in the intern cache taking up memory.
> I'm not 100% sure of what's going on here, but my suspicion is that a
> string that isn't being used is allowed to be flushed from the
> dictionary. If you retain a reference to the string (not to its id,
> but to the string itself), you shouldn't see that change. By doing the
> dict yourself, you guarantee that ALL the strings will be retained,
> which can never be _less_ memory than interning them all, and can
> easily be _more_.
Yep. Back in the early days, interned strings were immortal and lasted
forever. That wasted memory, and is no longer the case.
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | Rob Gaddi <rgaddi@highlandtechnology.invalid> |
|---|---|
| Date | 2016-02-12 17:05 +0000 |
| Message-ID | <n9l3ck$v6o$1@dont-email.me> |
| In reply to | #102843 |
Chris Angelico wrote: > Start by coding things in the > simple and obvious way, and then fix problems only when you see them. Is that statement available in 10 foot letters etched into stone? -- Rob Gaddi, Highland Technology -- www.highlandtechnology.com Email address domain is currently out of order. See above to fix.
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2016-02-13 04:18 +1100 |
| Message-ID | <mailman.82.1455297504.22075.python-list@python.org> |
| In reply to | #102866 |
On Sat, Feb 13, 2016 at 4:05 AM, Rob Gaddi <rgaddi@highlandtechnology.invalid> wrote: > Chris Angelico wrote: > >> Start by coding things in the >> simple and obvious way, and then fix problems only when you see them. > > Is that statement available in 10 foot letters etched into stone? I actually had that built behind my house, at one point. Sadly, the letters sank until they were partly embedded into the ground, and what's left says, in the local language, "Go stick your head in a PHP", so it's lit up only for special celebrations. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2016-02-12 21:37 +0000 |
| Message-ID | <mailman.86.1455313059.22075.python-list@python.org> |
| In reply to | #102866 |
On 12/02/2016 17:05, Rob Gaddi wrote: > Chris Angelico wrote: > >> Start by coding things in the >> simple and obvious way, and then fix problems only when you see them. > > Is that statement available in 10 foot letters etched into stone? > Hopefully not as that would be a waste, it should be made more obvious by using a red hot poker to engrave it onto every newbies' forehead. Even then some simply wouldn't take a blind bit of notice. -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence
[toc] | [prev] | [next] | [standalone]
| From | Ben Finney <ben+python@benfinney.id.au> |
|---|---|
| Date | 2016-02-13 08:49 +1100 |
| Message-ID | <mailman.88.1455313781.22075.python-list@python.org> |
| In reply to | #102866 |
Chris Angelico <rosuav@gmail.com> writes: > I actually had that built behind my house, at one point. Sadly, the > letters sank until they were partly embedded into the ground, and > what's left says, in the local language, "Go stick your head in a > PHP", so it's lit up only for special celebrations. Douglas Adams, you are sorely missed. -- \ “The greatest tragedy in mankind's entire history may be the | `\ hijacking of morality by religion.” —Arthur C. Clarke, 1991 | _o__) | Ben Finney
[toc] | [prev] | [next] | [standalone]
| From | Matt Wheeler <funkyhat@gmail.com> |
|---|---|
| Date | 2016-02-12 23:31 +0000 |
| Message-ID | <mailman.89.1455319898.22075.python-list@python.org> |
| In reply to | #102866 |
On 12 Feb 2016 21:37, "Mark Lawrence" <breamoreboy@yahoo.co.uk> wrote: > Hopefully not as that would be a waste, it should be made more obvious by using a red hot poker to engrave it onto every newbies' forehead. Even then some simply wouldn't take a blind bit of notice. Yes sorry about that, I think our aim was a little off with a few of the brandings. -- Matt Wheeler http://funkyh.at
[toc] | [prev] | [next] | [standalone]
| From | mkondrashin@gmail.com |
|---|---|
| Date | 2016-02-13 12:19 -0800 |
| Message-ID | <243825dd-451a-405e-ad41-855622e80b06@googlegroups.com> |
| In reply to | #102876 |
In my application I have used two approaches: 1. To store paths as a tree (as directories for a tree. 2. For long list of similar paths, to store difference of strings. Though this was c++/obj-c project, I can share a diff code with you if you drip me a line (mkondrashin & gmail , com)
[toc] | [prev] | [next] | [standalone]
| From | srinivas devaki <mr.eightnoteight@gmail.com> |
|---|---|
| Date | 2016-02-12 11:46 +0530 |
| Message-ID | <mailman.69.1455257811.22075.python-list@python.org> |
| In reply to | #102833 |
On Feb 12, 2016 6:05 AM, "Paulo da Silva" <p_s_d_a_s_i_l_v_a_ns@netcabo.pt> wrote: > > Hi! > > What is the best (shortest memory usage) way to store lots of pathnames > in memory where: > > 1. Path names are pathname=(dirname,filename) > 2. There many different dirnames but much less than pathnames > 3. dirnames have in general many chars > > The idea is to share the common dirnames. > > More realistically not only the pathnames are stored but objects each > object being a MyFile containing > self.name - <base name> > getPathname(self) - <full pathname> > other stuff > > class MyFile: > > __allfiles=[] > > def __init__(self,dirname,filename): > self.dirname=dirname # But I want to share this with other files > self.name=filename > MyFile.__allfiles.append(self) > ... > > def getPathname(self): > return os.path.join(self.dirname,self.name) > what you want is Trie data structure, which won't use extra memory if the basepath of your strings are common. instead of having constructing a char Trie, try to make it as string Trie i.e each directory name is a node and all the files and folders are it's children, each node can be of two types a file and folder. if you come to think about it this is most intuitive way to represent the file structure in your program. you can extract the directory name from the file object by traversing it's parents. I hope this helps. Regards Srinivas Devaki Junior (3rd yr) student at Indian School of Mines,(IIT Dhanbad) Computer Science and Engineering Department ph: +91 9491 383 249 telegram_id: @eightnoteight
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web