Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #25594

Re: Finding duplicate file names and modifying them based on elements of the path

Date 2012-07-19 10:36 +1000
From Simon Cropper <simoncropper@fossworkflowguides.com>
Subject Re: Finding duplicate file names and modifying them based on elements of the path
References <b2f1993c-8872-44ed-9e69-0895e4059532@mi5g2000pbc.googlegroups.com>
Newsgroups comp.lang.python
Message-ID <mailman.2281.1342658195.4697.python-list@python.org> (permalink)

Show all headers | View raw


On 19/07/12 08:20, Larry.Martell@gmail.com wrote:
> I have an interesting problem I'm trying to solve. I have a solution
> almost working, but it's super ugly, and know there has to be a
> better, cleaner way to do it.
>
> I have a list of path names that have this form:
>
> /dir0/dir1/dir2/dir3/dir4/dir5/dir6/file
>
> I need to find all the file names (basenames) in the list that are
> duplicates, and for each one that is a dup, prepend dir4 to the
> filename as long as the dir4/file pair is unique. If there are
> multiple dir4/files in the list, then I also need to add a sequence
> number based on the sorted value of dir5 (which is a date in ddMONyy
> format).
>
> For example, if my list contains:
>
> /dir0/dir1/dir2/dir3/qwer/09Jan12/dir6/file3
> /dir0/dir1/dir2/dir3/abcd/08Jan12/dir6/file1
> /dir0/dir1/dir2/dir3/abcd/08Jan12/dir6/file2
> /dir0/dir1/dir2/dir3/xyz/08Jan12/dir6/file1
> /dir0/dir1/dir2/dir3/qwer/07Jan12/dir6/file3
>
> Then I want to end up with:
>
> /dir0/dir1/dir2/dir3/qwer/09Jan12/dir6/qwer_01_file3
> /dir0/dir1/dir2/dir3/abcd/08Jan12/dir6/abcd_file1
> /dir0/dir1/dir2/dir3/abcd/08Jan12/dir6/file2
> /dir0/dir1/dir2/dir3/xyz/08Jan12/dir6/xyz_file1
> /dir0/dir1/dir2/dir3/qwer/07Jan12/dir6/qwer_00_file3
>
> My solution involves multiple maps and multiple iterations through the
> data. How would you folks do this?
>

Hi Larry,

I am making the assumption that you intend to collapse the directory 
tree and store each file in the same directory, otherwise I can't think 
of why you need to do this.

If this is the case, then I would...

1. import all the files into an array
2. parse path to extract forth level directory name and base name.
3. reiterate through the array
    3.1 check if base filename exists in recipient directory
    3.2 if not, copy to recipient directory
    3.3 if present, append the directory path then save
    3.4 create log of success or failure

Personally, I would not have some files with abcd_file1 and others as 
file2 because if it is important enough to store a file in a separate 
directory you should also note where file2 came from as well. When 
looking at your results at a later date you are going to have to open 
file2 (which I presume must record where it relates to) to figure out 
where it came from. If it is in the name it is easier to review.

In short, consistency is the name of the game; if you are going to do it 
for some then do it for all; and finally it will be easier for others 
later to work out what you have done.

-- 
Cheers Simon

    Simon Cropper - Open Content Creator

    Free and Open Source Software Workflow Guides
    ------------------------------------------------------------
    Introduction               http://www.fossworkflowguides.com
    GIS Packages           http://www.fossworkflowguides.com/gis
    bash / Python    http://www.fossworkflowguides.com/scripting

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Finding duplicate file names and modifying them based on elements of the path "Larry.Martell@gmail.com" <larry.martell@gmail.com> - 2012-07-18 15:20 -0700
  Re: Finding duplicate file names and modifying them based on elements of the path Paul Rubin <no.email@nospam.invalid> - 2012-07-18 15:49 -0700
    Re: Finding duplicate file names and modifying them based on elements of the path "Larry.Martell@gmail.com" <larry.martell@gmail.com> - 2012-07-19 12:00 -0700
      Re: Finding duplicate file names and modifying them based on elements of the path Paul Rubin <no.email@nospam.invalid> - 2012-07-19 12:43 -0700
        Re: Finding duplicate file names and modifying them based on elements of the path "Larry.Martell@gmail.com" <larry.martell@gmail.com> - 2012-07-19 18:01 -0700
          Re: Finding duplicate file names and modifying them based on elements of the path Peter Otten <__peter__@web.de> - 2012-07-20 09:35 +0200
          Re: Finding duplicate file names and modifying them based on elements of the path Paul Rubin <no.email@nospam.invalid> - 2012-07-20 00:51 -0700
          Re: Finding duplicate file names and modifying them based on elements of the path Paul Rudin <paul.nospam@rudin.co.uk> - 2012-07-20 09:37 +0100
    Re: Finding duplicate file names and modifying them based on elements of the path "Larry.Martell@gmail.com" <larry.martell@gmail.com> - 2012-07-19 11:52 -0700
      Re: Finding duplicate file names and modifying them based on elements of the path Paul Rubin <no.email@nospam.invalid> - 2012-07-19 12:56 -0700
        Re: Finding duplicate file names and modifying them based on elements of the path "Larry.Martell@gmail.com" <larry.martell@gmail.com> - 2012-07-19 17:58 -0700
  Re: Finding duplicate file names and modifying them based on elements of the path Simon Cropper <simoncropper@fossworkflowguides.com> - 2012-07-19 10:36 +1000
    Re: Finding duplicate file names and modifying them based on elements of the path "Larry.Martell@gmail.com" <larry.martell@gmail.com> - 2012-07-19 11:54 -0700
      RE: Finding duplicate file names and modifying them based on elements of the path "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2012-07-19 19:02 +0000
        Re: Finding duplicate file names and modifying them based on elements of the path "Larry.Martell@gmail.com" <larry.martell@gmail.com> - 2012-07-19 12:06 -0700
          Re: Finding duplicate file names and modifying them based on elements of the path MRAB <python@mrabarnett.plus.com> - 2012-07-19 22:32 +0100
            Re: Finding duplicate file names and modifying them based on elements of the path "Larry.Martell@gmail.com" <larry.martell@gmail.com> - 2012-07-19 18:01 -0700
              Re: Finding duplicate file names and modifying them based on elements of the path "Larry.Martell@gmail.com" <larry.martell@gmail.com> - 2012-07-19 20:07 -0700
                Re: Finding duplicate file names and modifying them based on elements of the path MRAB <python@mrabarnett.plus.com> - 2012-07-20 16:45 +0100

csiph-web