Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #65473
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Subject | Re: Finding size of Variable |
| Date | 2014-02-05 09:27 +0100 |
| Organization | None |
| References | <8e4c1ab1-e65d-483f-ad9d-6933ae2052c3@googlegroups.com> <mailman.6402.1391541507.18130.python-list@python.org> <723729ee-8e74-4d65-aa6f-742051a94101@googlegroups.com> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.6417.1391588841.18130.python-list@python.org> (permalink) |
Ayushi Dalmia wrote:
> On Wednesday, February 5, 2014 12:51:31 AM UTC+5:30, Dave Angel wrote:
>> Ayushi Dalmia <ayushidalmia2604@gmail.com> Wrote in message:
>>
>>
>>
>> >
>>
>> > Where am I going wrong? What are the alternatives I can try?
>>
>>
>>
>> You've rejected all the alternatives so far without showing your
>>
>> code, or even properly specifying your problem.
>>
>>
>>
>> To get the "total" size of a list of strings, try (untested):
>>
>>
>>
>> a = sys.getsizeof (mylist )
>>
>> for item in mylist:
>>
>> a += sys.getsizeof (item)
>>
>>
>>
>> This can be high if some of the strings are interned and get
>>
>> counted twice. But you're not likely to get closer without some
>>
>> knowledge of the data objects and where they come
>>
>> from.
>>
>>
>>
>> --
>>
>> DaveA
>
> Hello Dave,
>
> I just thought that saving others time is better and hence I explained
> only the subset of my problem. Here is what I am trying to do:
>
> I am trying to index the current wikipedia dump without using databases
> and create a search engine for Wikipedia documents. Note, I CANNOT USE
> DATABASES. My approach:
>
> I am parsing the wikipedia pages using SAX Parser, and then, I am dumping
> the words along with the posting list (a list of doc ids in which the word
> is present) into different files after reading 'X' number of pages. Now
> these files may have the same word and hence I need to merge them and
> write the final index again. Now these final indexes must be of limited
> size as I need to be of limited size. This is where I am stuck. I need to
> know how to determine the size of content in a variable before I write
> into the file.
>
> Here is the code for my merging:
>
> def mergeFiles(pathOfFolder, countFile):
> listOfWords={}
> indexFile={}
> topOfFile={}
> flag=[0]*countFile
> data=defaultdict(list)
> heap=[]
> countFinalFile=0
> for i in xrange(countFile):
> fileName = pathOfFolder+'\index'+str(i)+'.txt.bz2'
> indexFile[i]= bz2.BZ2File(fileName, 'rb')
> flag[i]=1
> topOfFile[i]=indexFile[i].readline().strip()
> listOfWords[i] = topOfFile[i].split(' ')
> if listOfWords[i][0] not in heap:
> heapq.heappush(heap, listOfWords[i][0])
At this point you have already done it wrong as your heap contains the
complete data and you have done a lot of O(N) tests on the heap.
This is both slow and consumes a lot of memory. See
http://code.activestate.com/recipes/491285-iterator-merge/
for a sane way to merge sorted data from multiple files. Your code becomes
(untested)
with open("outfile.txt", "wb") as outfile:
infiles = []
for i in xrange(countFile):
filename = os.path.join(pathOfFolder, 'index'+str(i)+'.txt.bz2')
infiles.append(bz2.BZ2File(filename, "rb"))
outfile.writelines(imerge(*infiles))
for infile in infiles:
infile.close()
Once you have your data in a single file you can read from that file and do
the postprocessing you mention below.
> while any(flag)==1:
> temp = heapq.heappop(heap)
> for i in xrange(countFile):
> if flag[i]==1:
> if listOfWords[i][0]==temp:
>
> //This is where I am stuck. I cannot wait until memory
> //error, as I need to do some postprocessing too. try:
> data[temp].extend(listOfWords[i][1:])
> except MemoryError:
> writeFinalIndex(data, countFinalFile,
> pathOfFolder) data=defaultdict(list)
> countFinalFile+=1
>
> topOfFile[i]=indexFile[i].readline().strip()
> if topOfFile[i]=='':
> flag[i]=0
> indexFile[i].close()
>
os.remove(pathOfFolder+'\index'+str(i)+'.txt.bz2')
> else:
> listOfWords[i] = topOfFile[i].split(' ')
> if listOfWords[i][0] not in heap:
> heapq.heappush(heap, listOfWords[i][0])
> writeFinalIndex(data, countFinalFile, pathOfFolder)
>
> countFile is the number of files and writeFileIndex method writes into the
> file.
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Finding size of Variable Ayushi Dalmia <ayushidalmia2604@gmail.com> - 2014-02-04 03:28 -0800
Re: Finding size of Variable Peter Otten <__peter__@web.de> - 2014-02-04 12:40 +0100
Re: Finding size of Variable Ayushi Dalmia <ayushidalmia2604@gmail.com> - 2014-02-04 04:43 -0800
Re: Finding size of Variable Asaf Las <roegltd@gmail.com> - 2014-02-04 04:53 -0800
Re: Finding size of Variable Ayushi Dalmia <ayushidalmia2604@gmail.com> - 2014-02-04 05:18 -0800
Re: Finding size of Variable Dave Angel <davea@davea.name> - 2014-02-04 08:09 -0500
Re: Finding size of Variable Ayushi Dalmia <ayushidalmia2604@gmail.com> - 2014-02-04 05:19 -0800
Re: Finding size of Variable Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2014-02-04 09:06 -0500
Re: Finding size of Variable Ayushi Dalmia <ayushidalmia2604@gmail.com> - 2014-02-04 21:00 -0800
Re:Finding size of Variable Dave Angel <davea@davea.name> - 2014-02-04 14:21 -0500
Re: Finding size of Variable Ayushi Dalmia <ayushidalmia2604@gmail.com> - 2014-02-04 21:15 -0800
Re: Finding size of Variable Peter Otten <__peter__@web.de> - 2014-02-05 09:27 +0100
Re: Finding size of Variable Tim Golden <mail@timgolden.me.uk> - 2014-02-04 19:28 +0000
Re: Finding size of Variable Tim Chase <python.list@tim.thechases.com> - 2014-02-04 13:29 -0600
Re: Finding size of Variable Ayushi Dalmia <ayushidalmia2604@gmail.com> - 2014-02-04 21:35 -0800
Re: Finding size of Variable Rustom Mody <rustompmody@gmail.com> - 2014-02-04 21:45 -0800
Re: Finding size of Variable Ayushi Dalmia <ayushidalmia2604@gmail.com> - 2014-02-04 22:00 -0800
Re: Finding size of Variable Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-02-05 11:00 +0000
Re: Finding size of Variable Chris Angelico <rosuav@gmail.com> - 2014-02-05 22:44 +1100
Re: Finding size of Variable wxjmfauth@gmail.com - 2014-02-06 02:15 -0800
Re: Finding size of Variable Ned Batchelder <ned@nedbatchelder.com> - 2014-02-06 06:10 -0500
Re: Finding size of Variable wxjmfauth@gmail.com - 2014-02-06 05:51 -0800
Re: Finding size of Variable wxjmfauth@gmail.com - 2014-02-06 06:15 -0800
Re: Finding size of Variable Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-02-08 02:48 +0000
Re: Finding size of Variable Ethan Furman <ethan@stoneleaf.us> - 2014-02-07 19:02 -0800
Re: Finding size of Variable Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-02-08 13:17 +0000
Re: Finding size of Variable David Hutto <dwightdhutto@gmail.com> - 2014-02-08 17:45 -0500
Re: Finding size of Variable Rustom Mody <rustompmody@gmail.com> - 2014-02-08 17:25 -0800
Re: Finding size of Variable David Hutto <dwightdhutto@gmail.com> - 2014-02-08 21:56 -0500
Re: Finding size of Variable Chris Angelico <rosuav@gmail.com> - 2014-02-09 13:59 +1100
Re: Finding size of Variable David Hutto <dwightdhutto@gmail.com> - 2014-02-08 22:07 -0500
Re: Finding size of Variable Ned Batchelder <ned@nedbatchelder.com> - 2014-02-08 22:09 -0500
Re: Finding size of Variable David Hutto <dwightdhutto@gmail.com> - 2014-02-08 22:09 -0500
Re: Finding size of Variable Ned Batchelder <ned@nedbatchelder.com> - 2014-02-08 22:16 -0500
Re: Finding size of Variable Rustom Mody <rustompmody@gmail.com> - 2014-02-08 19:30 -0800
Re: Finding size of Variable wxjmfauth@gmail.com - 2014-02-10 06:07 -0800
Re: Finding size of Variable Asaf Las <roegltd@gmail.com> - 2014-02-10 06:25 -0800
Re: Finding size of Variable Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-02-10 14:39 +0000
Re: Finding size of Variable Tim Chase <python.list@tim.thechases.com> - 2014-02-10 08:43 -0600
Re: Finding size of Variable wxjmfauth@gmail.com - 2014-02-11 10:53 -0800
Re: Finding size of Variable Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-02-11 19:04 +0000
Re: Finding size of Variable wxjmfauth@gmail.com - 2014-02-11 23:49 -0800
Re: Finding size of Variable Chris Angelico <rosuav@gmail.com> - 2014-02-12 19:06 +1100
Re: Finding size of Variable Jussi Piitulainen <jpiitula@ling.helsinki.fi> - 2014-02-12 10:57 +0200
Re: Finding size of Variable Chris Angelico <rosuav@gmail.com> - 2014-02-12 20:24 +1100
Re: Finding size of Variable Jussi Piitulainen <jpiitula@ling.helsinki.fi> - 2014-02-12 11:35 +0200
Working with the set of real numbers (was: Finding size of Variable) Ben Finney <ben+python@benfinney.id.au> - 2014-02-12 19:17 +1100
Re: Working with the set of real numbers (was: Finding size of Variable) wxjmfauth@gmail.com - 2014-02-12 00:35 -0800
Re: Working with the set of real numbers (was: Finding size of Variable) wxjmfauth@gmail.com - 2014-02-12 00:46 -0800
Re: Working with the set of real numbers Ben Finney <ben+python@benfinney.id.au> - 2014-02-12 19:52 +1100
Re: Working with the set of real numbers (was: Finding size of Variable) Grant Edwards <invalid@invalid.invalid> - 2014-02-12 15:24 +0000
Re: Working with the set of real numbers (was: Finding size of Variable) "Gisle Vanem" <gvanem@yahoo.no> - 2014-02-12 17:23 +0100
Re: Working with the set of real numbers (was: Finding size of Variable) Chris Angelico <rosuav@gmail.com> - 2014-02-12 19:47 +1100
Re: Working with the set of real numbers (was: Finding size of Variable) Jussi Piitulainen <jpiitula@ling.helsinki.fi> - 2014-02-12 11:23 +0200
Re: Working with the set of real numbers (was: Finding size of Variable) albert@spenarnc.xs4all.nl (Albert van der Horst) - 2014-03-04 02:45 +0000
Re: Working with the set of real numbers (was: Finding size of Variable) Chris Angelico <rosuav@gmail.com> - 2014-03-04 14:02 +1100
Re: Working with the set of real numbers (was: Finding size of Variable) Rustom Mody <rustompmody@gmail.com> - 2014-03-03 19:13 -0800
Re: Working with the set of real numbers (was: Finding size of Variable) Chris Angelico <rosuav@gmail.com> - 2014-03-04 14:46 +1100
Re: Working with the set of real numbers (was: Finding size of Variable) Rustom Mody <rustompmody@gmail.com> - 2014-03-03 21:19 -0800
Re: Working with the set of real numbers (was: Finding size of Variable) Steven D'Aprano <steve@pearwood.info> - 2014-03-04 05:53 +0000
Re: Working with the set of real numbers (was: Finding size of Variable) Chris Angelico <rosuav@gmail.com> - 2014-03-04 17:35 +1100
Re: Working with the set of real numbers Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-03-05 00:05 +1300
Re: Working with the set of real numbers Chris Angelico <rosuav@gmail.com> - 2014-03-04 23:43 +1100
Re: Working with the set of real numbers Marko Rauhamaa <marko@pacujo.net> - 2014-03-04 21:49 +0200
Re: Working with the set of real numbers Chris Angelico <rosuav@gmail.com> - 2014-03-05 06:58 +1100
Re: Working with the set of real numbers Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2014-03-04 20:55 +0000
Re: Working with the set of real numbers Marko Rauhamaa <marko@pacujo.net> - 2014-03-04 23:05 +0200
Re: Working with the set of real numbers Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2014-03-04 22:08 +0000
Re: Working with the set of real numbers Chris Angelico <rosuav@gmail.com> - 2014-03-05 08:18 +1100
Re: Working with the set of real numbers Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2014-03-04 22:02 +0000
Re: Working with the set of real numbers Chris Angelico <rosuav@gmail.com> - 2014-03-05 09:18 +1100
Re: Working with the set of real numbers Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2014-03-04 22:54 +0000
Re: Working with the set of real numbers Chris Angelico <rosuav@gmail.com> - 2014-03-05 10:01 +1100
Re: Working with the set of real numbers Dave Angel <davea@davea.name> - 2014-03-04 18:20 -0500
Re: Working with the set of real numbers (was: Finding size of Variable) Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-04 04:19 -0700
Re: Working with the set of real numbers (was: Finding size of Variable) albert@spenarnc.xs4all.nl (Albert van der Horst) - 2014-03-05 02:27 +0000
Re: Working with the set of real numbers (was: Finding size of Variable) Ian Kelly <ian.g.kelly@gmail.com> - 2014-03-04 04:23 -0700
Re: Working with the set of real numbers (was: Finding size of Variable) albert@spenarnc.xs4all.nl (Albert van der Horst) - 2014-03-05 02:15 +0000
Re: Working with the set of real numbers (was: Finding size of Variable) Steven D'Aprano <steve@pearwood.info> - 2014-03-05 03:41 +0000
Re: Working with the set of real numbers (was: Finding size of Variable) Rustom Mody <rustompmody@gmail.com> - 2014-03-04 20:15 -0800
Re: Working with the set of real numbers (was: Finding size of Variable) Roy Smith <roy@panix.com> - 2014-03-04 23:25 -0500
Re: Working with the set of real numbers Ben Finney <ben+python@benfinney.id.au> - 2014-03-05 15:37 +1100
Re: Working with the set of real numbers Rustom Mody <rustompmody@gmail.com> - 2014-03-04 20:57 -0800
Re: Working with the set of real numbers Roy Smith <roy@panix.com> - 2014-03-05 00:29 -0500
Re: Working with the set of real numbers Ben Finney <ben+python@benfinney.id.au> - 2014-02-12 19:56 +1100
Re: Working with the set of real numbers Chris Angelico <rosuav@gmail.com> - 2014-02-12 20:16 +1100
Re: Working with the set of real numbers Ben Finney <ben+python@benfinney.id.au> - 2014-02-12 21:07 +1100
Re: Working with the set of real numbers Rustom Mody <rustompmody@gmail.com> - 2014-02-12 06:11 -0800
Re: Working with the set of real numbers Ian Kelly <ian.g.kelly@gmail.com> - 2014-02-12 13:45 -0700
Re: Working with the set of real numbers Rustom Mody <rustompmody@gmail.com> - 2014-02-12 17:47 -0800
Re: Working with the set of real numbers Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-02-13 11:09 +1300
Re: Working with the set of real numbers Steven D'Aprano <steve@pearwood.info> - 2014-02-13 03:31 +0000
Re: Working with the set of real numbers Ben Finney <ben+python@benfinney.id.au> - 2014-02-13 14:45 +1100
Re: Working with the set of real numbers Chris Angelico <rosuav@gmail.com> - 2014-02-13 15:17 +1100
Re: Working with the set of real numbers Chris Angelico <rosuav@gmail.com> - 2014-02-12 21:20 +1100
Re: Working with the set of real numbers wxjmfauth@gmail.com - 2014-02-12 02:55 -0800
Re: Working with the set of real numbers Ned Batchelder <ned@nedbatchelder.com> - 2014-02-12 06:55 -0500
Re: Working with the set of real numbers Marko Rauhamaa <marko@pacujo.net> - 2014-02-12 14:48 +0200
Re: Working with the set of real numbers Chris Angelico <rosuav@gmail.com> - 2014-02-13 00:20 +1100
Re: Working with the set of real numbers Marko Rauhamaa <marko@pacujo.net> - 2014-02-12 16:13 +0200
Re: Working with the set of real numbers Chris Angelico <rosuav@gmail.com> - 2014-02-13 04:52 +1100
Re: Working with the set of real numbers Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-02-13 11:24 +1300
Re: Working with the set of real numbers Dave Angel <davea@davea.name> - 2014-02-12 17:56 -0500
Re: Working with the set of real numbers Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-02-14 18:26 +1300
Re: Working with the set of real numbers Ben Finney <ben+python@benfinney.id.au> - 2014-02-12 22:44 +1100
Re: Working with the set of real numbers Chris Angelico <rosuav@gmail.com> - 2014-02-12 22:58 +1100
Re: Working with the set of real numbers Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-02-13 11:32 +1300
Re: Working with the set of real numbers Grant Edwards <invalid@invalid.invalid> - 2014-02-12 23:23 +0000
Re: Finding size of Variable Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-02-12 14:04 +0000
Re: Finding size of Variable Rustom Mody <rustompmody@gmail.com> - 2014-02-12 06:14 -0800
Re: Finding size of Variable Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-02-12 14:25 +0000
Re: Finding size of Variable Rustom Mody <rustompmody@gmail.com> - 2014-02-12 06:32 -0800
Re: Working with the set of real numbers Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2014-02-13 12:48 +0000
Re: Working with the set of real numbers Marko Rauhamaa <marko@pacujo.net> - 2014-02-13 16:00 +0200
Re: Working with the set of real numbers Chris Angelico <rosuav@gmail.com> - 2014-02-14 06:25 +1100
Re: Working with the set of real numbers Marko Rauhamaa <marko@pacujo.net> - 2014-02-13 21:47 +0200
Re: Working with the set of real numbers Chris Angelico <rosuav@gmail.com> - 2014-02-14 07:08 +1100
Re: Working with the set of real numbers Devin Jeanpierre <jeanpierreda@gmail.com> - 2014-02-13 22:05 -0800
Re: Working with the set of real numbers Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-02-15 00:30 +1300
Re: Working with the set of real numbers Devin Jeanpierre <jeanpierreda@gmail.com> - 2014-02-14 16:26 -0800
Re: Working with the set of real numbers albert@spenarnc.xs4all.nl (Albert van der Horst) - 2014-03-05 02:38 +0000
Re: Working with the set of real numbers Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2014-02-14 19:37 +1300
Re: Working with the set of real numbers Chris Angelico <rosuav@gmail.com> - 2014-02-14 17:44 +1100
Re: Working with the set of real numbers Rustom Mody <rustompmody@gmail.com> - 2014-02-14 07:13 -0800
Re: Working with the set of real numbers Dave Angel <davea@davea.name> - 2014-02-14 07:30 -0500
Re: Working with the set of real numbers Grant Edwards <invalid@invalid.invalid> - 2014-02-14 15:09 +0000
Re: Working with the set of real numbers Rotwang <sg552@hotmail.co.uk> - 2014-02-13 21:29 +0000
Re: Working with the set of real numbers Marko Rauhamaa <marko@pacujo.net> - 2014-02-14 00:00 +0200
Re: Working with the set of real numbers Rotwang <sg552@hotmail.co.uk> - 2014-02-13 22:21 +0000
Re: Working with the set of real numbers Marko Rauhamaa <marko@pacujo.net> - 2014-02-14 01:16 +0200
Re: Working with the set of real numbers Ben Finney <ben+python@benfinney.id.au> - 2014-02-14 03:57 +1100
Re: Finding size of Variable Ned Batchelder <ned@nedbatchelder.com> - 2014-02-10 10:02 -0500
Re: Finding size of Variable Neil Cerutti <neilc@norwich.edu> - 2014-02-11 14:29 +0000
Re: Finding size of Variable Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2014-02-05 22:14 -0500
Re: Finding size of Variable Dave Angel <davea@davea.name> - 2014-02-05 08:43 -0500
Re: Finding size of Variable Ayushi Dalmia <ayushidalmia2604@gmail.com> - 2014-02-05 06:33 -0800
Re: Finding size of Variable Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-02-05 15:22 +0000
csiph-web