Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #25590

Re: Finding duplicate file names and modifying them based on elements of the path

From Paul Rubin <no.email@nospam.invalid>
Newsgroups comp.lang.python
Subject Re: Finding duplicate file names and modifying them based on elements of the path
References <b2f1993c-8872-44ed-9e69-0895e4059532@mi5g2000pbc.googlegroups.com>
Date 2012-07-18 15:49 -0700
Message-ID <7xipdkwuqd.fsf@ruckus.brouhaha.com> (permalink)
Organization Nightsong/Fort GNOX

Show all headers | View raw


"Larry.Martell@gmail.com" <larry.martell@gmail.com> writes:
> I have an interesting problem I'm trying to solve. I have a solution
> almost working, but it's super ugly, and know there has to be a
> better, cleaner way to do it. ...
>
> My solution involves multiple maps and multiple iterations through the
> data. How would you folks do this?

You could post your code and ask for suggestions how to improve it.
There are a lot of not-so-natural constraints in that problem, so it
stands to reason that the code will be a bit messy.  The whole
specification seems like an antipattern though.  You should just give a
sensible encoding for the filename regardless of whether other fields
are duplicated or not.  You also don't seem to address the case where
basename, dir4, and dir5 are all duplicated.

The approach I'd take for the spec as you wrote it is:

1. Sort the list on the (basename, dir4, dir5) triple, saving original
   location (numeric index) of each item  
2. Use itertools.groupby to group together duplicate basenames.
3. Within the groups, use groupby again to gather duplicate dir4's,
4. Within -those- groups, group by dir5 and assign sequence numbers in
   groups where there's more than one file
5. Unsort to get the rewritten items back into the original order.

Actual code is left as an exercise.

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Finding duplicate file names and modifying them based on elements of the path "Larry.Martell@gmail.com" <larry.martell@gmail.com> - 2012-07-18 15:20 -0700
  Re: Finding duplicate file names and modifying them based on elements of the path Paul Rubin <no.email@nospam.invalid> - 2012-07-18 15:49 -0700
    Re: Finding duplicate file names and modifying them based on elements of the path "Larry.Martell@gmail.com" <larry.martell@gmail.com> - 2012-07-19 12:00 -0700
      Re: Finding duplicate file names and modifying them based on elements of the path Paul Rubin <no.email@nospam.invalid> - 2012-07-19 12:43 -0700
        Re: Finding duplicate file names and modifying them based on elements of the path "Larry.Martell@gmail.com" <larry.martell@gmail.com> - 2012-07-19 18:01 -0700
          Re: Finding duplicate file names and modifying them based on elements of the path Peter Otten <__peter__@web.de> - 2012-07-20 09:35 +0200
          Re: Finding duplicate file names and modifying them based on elements of the path Paul Rubin <no.email@nospam.invalid> - 2012-07-20 00:51 -0700
          Re: Finding duplicate file names and modifying them based on elements of the path Paul Rudin <paul.nospam@rudin.co.uk> - 2012-07-20 09:37 +0100
    Re: Finding duplicate file names and modifying them based on elements of the path "Larry.Martell@gmail.com" <larry.martell@gmail.com> - 2012-07-19 11:52 -0700
      Re: Finding duplicate file names and modifying them based on elements of the path Paul Rubin <no.email@nospam.invalid> - 2012-07-19 12:56 -0700
        Re: Finding duplicate file names and modifying them based on elements of the path "Larry.Martell@gmail.com" <larry.martell@gmail.com> - 2012-07-19 17:58 -0700
  Re: Finding duplicate file names and modifying them based on elements of the path Simon Cropper <simoncropper@fossworkflowguides.com> - 2012-07-19 10:36 +1000
    Re: Finding duplicate file names and modifying them based on elements of the path "Larry.Martell@gmail.com" <larry.martell@gmail.com> - 2012-07-19 11:54 -0700
      RE: Finding duplicate file names and modifying them based on elements of the path "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2012-07-19 19:02 +0000
        Re: Finding duplicate file names and modifying them based on elements of the path "Larry.Martell@gmail.com" <larry.martell@gmail.com> - 2012-07-19 12:06 -0700
          Re: Finding duplicate file names and modifying them based on elements of the path MRAB <python@mrabarnett.plus.com> - 2012-07-19 22:32 +0100
            Re: Finding duplicate file names and modifying them based on elements of the path "Larry.Martell@gmail.com" <larry.martell@gmail.com> - 2012-07-19 18:01 -0700
              Re: Finding duplicate file names and modifying them based on elements of the path "Larry.Martell@gmail.com" <larry.martell@gmail.com> - 2012-07-19 20:07 -0700
                Re: Finding duplicate file names and modifying them based on elements of the path MRAB <python@mrabarnett.plus.com> - 2012-07-20 16:45 +0100

csiph-web