Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!gegeweb.org!de-l.enfer-du-nord.net!feeder1.enfer-du-nord.net!feeder.news-service.com!news2.euro.net!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
Date: Sun, 28 Aug 2011 00:48:20 +0100
From: MRAB <python@mrabarnett.plus.com>
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:6.0) Gecko/20110812 Thunderbird/6.0
MIME-Version: 1.0
To: python-list@python.org
Subject: Re: Arrange files according to a text file
References: <6j8i57t6cgqunn3c1ci4p7u9mnpnvsrl8s@4ax.com> <mailman.473.1314468400.27778.python-list@python.org> <opji57tr5n326aisk6uv373h4v09rtq876@4ax.com> <mailman.486.1314479310.27778.python-list@python.org> <o8ti57lf00gv696jg9p3ev8pc801rcdv0a@4ax.com>
In-Reply-To: <o8ti57lf00gv696jg9p3ev8pc801rcdv0a@4ax.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Precedence: list
Reply-To: python-list@python.org
Newsgroups: comp.lang.python
Message-ID: <mailman.495.1314488964.27778.python-list@python.org>
Lines: 127
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: x330-a1.tempe.blueboxinc.net comp.lang.python:12320

On 28/08/2011 00:18, Ric@rdo.python.org wrote:
> Thank you so much. The code worked perfectly.
>
> This is what I tried using Emile code. The only time when it picked
> wrong name from the list was when the file was named like this.
>
> Data Mark Stone.doc
>
> How can I fix this? Hope I am not asking too much?
>
Have you tried the alternative word orders, "Mark Stone" as well as
"Stone, Mark", picking whichever name has the best ratio for either?
>
> import os
> from difflib import SequenceMatcher as SM
>
> path = r'D:\Files '
> txt_names = []
>
>
> with open(r'D:/python/log1.txt') as f:
>      for txt_name in f.readlines():
>          txt_names.append(txt_name.strip())
>
> def ignore(x):
>       return x in ' ,.'
>
> for filename in os.listdir(path):
>       ratios = [SM(ignore,filename,txt_name).ratio() for txt_name in
> txt_names]
>       best = max(ratios)
>       owner = txt_names[ratios.index(best)]
>       print filename,":",owner
>
>
>
>
>
> On Sat, 27 Aug 2011 14:08:17 -0700, Emile van Sebille<emile@fenx.com>
> wrote:
>
>> On 8/27/2011 1:15 PM Ric@rdo.python.org said...
>>>
>>> Hello Emile ,
>>>
>>> Thank you for the code below as I have not encountered SequenceMatcher
>>> before and would have to take a look at it closer.
>>>
>>> My question would it work for a text file list of names about 25k
>>> lines and a directory with say 100 files inside?
>>
>> Sure.
>>
>> Emile
>>
>>
>>>
>>> Thank you once again.
>>>
>>>
>>> On Sat, 27 Aug 2011 11:06:22 -0700, Emile van Sebille<emile@fenx.com>
>>> wrote:
>>>
>>>> On 8/27/2011 10:03 AM Ric@rdo.python.org said...
>>>>> Hello,
>>>>>
>>>>> What would be the best way to accomplish this task?
>>>>
>>>> I'd do something like:
>>>>
>>>>
>>>> usernames = """Adler, Jack
>>>> Smith, John
>>>> Smith, Sally
>>>> Stone, Mark""".split('\n')
>>>>
>>>> filenames = """Smith, John - 02-15-75 - business files.doc
>>>> Random Data - Adler Jack - expenses.xls
>>>> More Data Mark Stone files list.doc""".split('\n')
>>>>
>>> >from difflib import SequenceMatcher as SM
>>>>
>>>>
>>>> def ignore(x):
>>>>       return x in ' ,.'
>>>>
>>>>
>>>> for filename in filenames:
>>>>       ratios = [SM(ignore,filename,username).ratio() for username in
>>>> usernames]
>>>>       best = max(ratios)
>>>>       owner = usernames[ratios.index(best)]
>>>>       print filename,":",owner
>>>>
>>>>
>>>> Emile
>>>>
>>>>
>>>>
>>>>> I have many files in separate directories, each file name
>>>>> contain a persons name but never in the same spot.
>>>>> I need to find that name which is listed in a large
>>>>> text file in the following format. Last name, comma
>>>>> and First name. The last name could be duplicate.
>>>>>
>>>>> Adler, Jack
>>>>> Smith, John
>>>>> Smith, Sally
>>>>> Stone, Mark
>>>>> etc.
>>>>>
>>>>>
>>>>> The file names don't necessary follow any standard
>>>>> format.
>>>>>
>>>>> Smith, John - 02-15-75 - business files.doc
>>>>> Random Data - Adler Jack - expenses.xls
>>>>> More Data Mark Stone files list.doc
>>>>> etc
>>>>>
>>>>> I need some way to pull the name from the file name, find it in the
>>>>> text list and then create a directory based on the name on the list
>>>>> "Smith, John" and move all files named with the clients name into that
>>>>> directory.
>>>>
>>