Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #25590
| From | Paul Rubin <no.email@nospam.invalid> |
|---|---|
| Newsgroups | comp.lang.python |
| Subject | Re: Finding duplicate file names and modifying them based on elements of the path |
| References | <b2f1993c-8872-44ed-9e69-0895e4059532@mi5g2000pbc.googlegroups.com> |
| Date | 2012-07-18 15:49 -0700 |
| Message-ID | <7xipdkwuqd.fsf@ruckus.brouhaha.com> (permalink) |
| Organization | Nightsong/Fort GNOX |
"Larry.Martell@gmail.com" <larry.martell@gmail.com> writes: > I have an interesting problem I'm trying to solve. I have a solution > almost working, but it's super ugly, and know there has to be a > better, cleaner way to do it. ... > > My solution involves multiple maps and multiple iterations through the > data. How would you folks do this? You could post your code and ask for suggestions how to improve it. There are a lot of not-so-natural constraints in that problem, so it stands to reason that the code will be a bit messy. The whole specification seems like an antipattern though. You should just give a sensible encoding for the filename regardless of whether other fields are duplicated or not. You also don't seem to address the case where basename, dir4, and dir5 are all duplicated. The approach I'd take for the spec as you wrote it is: 1. Sort the list on the (basename, dir4, dir5) triple, saving original location (numeric index) of each item 2. Use itertools.groupby to group together duplicate basenames. 3. Within the groups, use groupby again to gather duplicate dir4's, 4. Within -those- groups, group by dir5 and assign sequence numbers in groups where there's more than one file 5. Unsort to get the rewritten items back into the original order. Actual code is left as an exercise.
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Finding duplicate file names and modifying them based on elements of the path "Larry.Martell@gmail.com" <larry.martell@gmail.com> - 2012-07-18 15:20 -0700
Re: Finding duplicate file names and modifying them based on elements of the path Paul Rubin <no.email@nospam.invalid> - 2012-07-18 15:49 -0700
Re: Finding duplicate file names and modifying them based on elements of the path "Larry.Martell@gmail.com" <larry.martell@gmail.com> - 2012-07-19 12:00 -0700
Re: Finding duplicate file names and modifying them based on elements of the path Paul Rubin <no.email@nospam.invalid> - 2012-07-19 12:43 -0700
Re: Finding duplicate file names and modifying them based on elements of the path "Larry.Martell@gmail.com" <larry.martell@gmail.com> - 2012-07-19 18:01 -0700
Re: Finding duplicate file names and modifying them based on elements of the path Peter Otten <__peter__@web.de> - 2012-07-20 09:35 +0200
Re: Finding duplicate file names and modifying them based on elements of the path Paul Rubin <no.email@nospam.invalid> - 2012-07-20 00:51 -0700
Re: Finding duplicate file names and modifying them based on elements of the path Paul Rudin <paul.nospam@rudin.co.uk> - 2012-07-20 09:37 +0100
Re: Finding duplicate file names and modifying them based on elements of the path "Larry.Martell@gmail.com" <larry.martell@gmail.com> - 2012-07-19 11:52 -0700
Re: Finding duplicate file names and modifying them based on elements of the path Paul Rubin <no.email@nospam.invalid> - 2012-07-19 12:56 -0700
Re: Finding duplicate file names and modifying them based on elements of the path "Larry.Martell@gmail.com" <larry.martell@gmail.com> - 2012-07-19 17:58 -0700
Re: Finding duplicate file names and modifying them based on elements of the path Simon Cropper <simoncropper@fossworkflowguides.com> - 2012-07-19 10:36 +1000
Re: Finding duplicate file names and modifying them based on elements of the path "Larry.Martell@gmail.com" <larry.martell@gmail.com> - 2012-07-19 11:54 -0700
RE: Finding duplicate file names and modifying them based on elements of the path "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2012-07-19 19:02 +0000
Re: Finding duplicate file names and modifying them based on elements of the path "Larry.Martell@gmail.com" <larry.martell@gmail.com> - 2012-07-19 12:06 -0700
Re: Finding duplicate file names and modifying them based on elements of the path MRAB <python@mrabarnett.plus.com> - 2012-07-19 22:32 +0100
Re: Finding duplicate file names and modifying them based on elements of the path "Larry.Martell@gmail.com" <larry.martell@gmail.com> - 2012-07-19 18:01 -0700
Re: Finding duplicate file names and modifying them based on elements of the path "Larry.Martell@gmail.com" <larry.martell@gmail.com> - 2012-07-19 20:07 -0700
Re: Finding duplicate file names and modifying them based on elements of the path MRAB <python@mrabarnett.plus.com> - 2012-07-20 16:45 +0100
csiph-web