Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #78043
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Subject | Re: program to generate data helpful in finding duplicate large files |
| Date | 2014-09-18 22:23 +0200 |
| Organization | None |
| References | <CALDD_==AYbQNPu29jRoLFp8WPZaZ9mMs79334-m_z3dgdxZRJw@mail.gmail.com> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.14123.1411071819.18130.python-list@python.org> (permalink) |
David Alban wrote:
> * sep = ascii_nul*
>
> * print "%s%c%s%c%d%c%d%c%d%c%d%c%s" % ( thishost, sep, md5sum, sep,
> dev, sep, ino, sep, nlink, sep, size, sep, file_path )*
file_path may contain newlines, therefore you should probably use "\0" to
separate the records. The other fields may not contain whitespace, so it's
safe to use " " as the field separator. When you deserialize the record you
can prevent the file_path from being broken by providing maxsplit to the
str.split() method:
for record in infile.read().split("\0"):
print(record.split(" ", 6))
Splitting into records without reading the whole data into memory left as an
exercise ;)
Back to comp.lang.python | Previous | Next | Find similar | Unroll thread
Re: program to generate data helpful in finding duplicate large files Peter Otten <__peter__@web.de> - 2014-09-18 22:23 +0200
csiph-web