Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #25666
| From | "Larry.Martell@gmail.com" <larry.martell@gmail.com> |
|---|---|
| Newsgroups | comp.lang.python |
| Subject | Re: Finding duplicate file names and modifying them based on elements of the path |
| Date | 2012-07-19 18:01 -0700 |
| Organization | http://groups.google.com |
| Message-ID | <2862aea5-9d5c-4979-8ca2-0bb01f9db32c@m3g2000vbl.googlegroups.com> (permalink) |
| References | <b2f1993c-8872-44ed-9e69-0895e4059532@mi5g2000pbc.googlegroups.com> <7xipdkwuqd.fsf@ruckus.brouhaha.com> <14831ee0-fd74-4906-852c-764ba2d8b1d5@h20g2000yqe.googlegroups.com> <7xipdjilko.fsf@ruckus.brouhaha.com> |
On Jul 19, 1:43 pm, Paul Rubin <no.em...@nospam.invalid> wrote: > "Larry.Mart...@gmail.com" <larry.mart...@gmail.com> writes: > > Thanks for the reply Paul. I had not heard of itertools. It sounds > > like just what I need for this. But I am having 1 issue - how do you > > know how many items are in each group? > > Simplest is: > > for key, group in groupby(xs, lambda x:(x[-1],x[4],x[5])): > gs = list(group) # convert iterator to a list > n = len(gs) # this is the number of elements > > there is some theoretical inelegance in that it requires each group to > fit in memory, but you weren't really going to have billions of files > with the same basename. > > If you're not used to iterators and itertools, note there are some > subtleties to using groupby to iterate over files, because an iterator > actually has state. It bumps a pointer and maybe consumes some input > every time you advance it. In a situation like the above, you've got > some nexted iterators (the groupby iterator generating groups, and the > individual group iterators that come out of the groupby) that wrap the > same file handle, so bad confusion can result if you advance both > iterators without being careful (one can consume file input that you > thought would go to another). It seems that if you do a list(group) you have consumed the list. This screwed me up for a while, and seems very counter-intuitive. > This isn't as bad as it sounds once you get used to it, but it can be > a source of frustration at first. > > BTW, if you just want to count the elements of an iterator (while > consuming it), > > n = sum(1 for x in xs) > > counts the elements of xs without having to expand it into an in-memory > list. > > Itertools really makes Python feel a lot more expressive and clean, > despite little kinks like the above.
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Finding duplicate file names and modifying them based on elements of the path "Larry.Martell@gmail.com" <larry.martell@gmail.com> - 2012-07-18 15:20 -0700
Re: Finding duplicate file names and modifying them based on elements of the path Paul Rubin <no.email@nospam.invalid> - 2012-07-18 15:49 -0700
Re: Finding duplicate file names and modifying them based on elements of the path "Larry.Martell@gmail.com" <larry.martell@gmail.com> - 2012-07-19 12:00 -0700
Re: Finding duplicate file names and modifying them based on elements of the path Paul Rubin <no.email@nospam.invalid> - 2012-07-19 12:43 -0700
Re: Finding duplicate file names and modifying them based on elements of the path "Larry.Martell@gmail.com" <larry.martell@gmail.com> - 2012-07-19 18:01 -0700
Re: Finding duplicate file names and modifying them based on elements of the path Peter Otten <__peter__@web.de> - 2012-07-20 09:35 +0200
Re: Finding duplicate file names and modifying them based on elements of the path Paul Rubin <no.email@nospam.invalid> - 2012-07-20 00:51 -0700
Re: Finding duplicate file names and modifying them based on elements of the path Paul Rudin <paul.nospam@rudin.co.uk> - 2012-07-20 09:37 +0100
Re: Finding duplicate file names and modifying them based on elements of the path "Larry.Martell@gmail.com" <larry.martell@gmail.com> - 2012-07-19 11:52 -0700
Re: Finding duplicate file names and modifying them based on elements of the path Paul Rubin <no.email@nospam.invalid> - 2012-07-19 12:56 -0700
Re: Finding duplicate file names and modifying them based on elements of the path "Larry.Martell@gmail.com" <larry.martell@gmail.com> - 2012-07-19 17:58 -0700
Re: Finding duplicate file names and modifying them based on elements of the path Simon Cropper <simoncropper@fossworkflowguides.com> - 2012-07-19 10:36 +1000
Re: Finding duplicate file names and modifying them based on elements of the path "Larry.Martell@gmail.com" <larry.martell@gmail.com> - 2012-07-19 11:54 -0700
RE: Finding duplicate file names and modifying them based on elements of the path "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2012-07-19 19:02 +0000
Re: Finding duplicate file names and modifying them based on elements of the path "Larry.Martell@gmail.com" <larry.martell@gmail.com> - 2012-07-19 12:06 -0700
Re: Finding duplicate file names and modifying them based on elements of the path MRAB <python@mrabarnett.plus.com> - 2012-07-19 22:32 +0100
Re: Finding duplicate file names and modifying them based on elements of the path "Larry.Martell@gmail.com" <larry.martell@gmail.com> - 2012-07-19 18:01 -0700
Re: Finding duplicate file names and modifying them based on elements of the path "Larry.Martell@gmail.com" <larry.martell@gmail.com> - 2012-07-19 20:07 -0700
Re: Finding duplicate file names and modifying them based on elements of the path MRAB <python@mrabarnett.plus.com> - 2012-07-20 16:45 +0100
csiph-web