Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #64924

Re: Wikipedia XML Dump

Date 2014-01-29 00:47 +0200
From Burak Arslan <burak.arslan@arskom.com.tr>
Subject Re: Wikipedia XML Dump
References <9ec53bc0-f2da-46f4-ad58-2c9a75653dbf@googlegroups.com> <7500190f-18a6-42b2-a77a-982672ce1644@googlegroups.com>
Newsgroups comp.lang.python
Message-ID <mailman.6083.1390949255.18130.python-list@python.org> (permalink)

Show all headers | View raw


hi,

On 01/29/14 00:31, Kevin Glover wrote:
> Thanks for the comments, guys. The Wikipedia download is a single XML document, 43.1GB. Any further thoughts?
>
>

in that case, http://lxml.de/tutorial.html#event-driven-parsing seems to
be your only option.

hth,
burak

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Wikipedia XML Dump kevingloveruk@gmail.com - 2014-01-28 03:45 -0800
  Re: Wikipedia XML Dump Rustom Mody <rustompmody@gmail.com> - 2014-01-28 09:11 -0800
    Re: Wikipedia XML Dump Skip Montanaro <skip@pobox.com> - 2014-01-28 12:15 -0600
  Re: Wikipedia XML Dump Kevin Glover <kevingloveruk@gmail.com> - 2014-01-28 14:31 -0800
    Re: Wikipedia XML Dump Burak Arslan <burak.arslan@arskom.com.tr> - 2014-01-29 00:47 +0200
      Re: Wikipedia XML Dump Rustom Mody <rustompmody@gmail.com> - 2014-01-28 17:52 -0800
  Re: Wikipedia XML Dump alex23 <wuwei23@gmail.com> - 2014-01-29 11:39 +1000

csiph-web