Path: csiph.com!usenet.pasdenom.info!dedibox.gegeweb.org!gegeweb.eu!nntpfeed.proxad.net!proxad.net!feeder1-2.proxad.net!usenet-fr.net!nerim.net!novso.com!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
To: python-list@python.org
From: Neil Cerutti <neilc@norwich.edu>
Subject: Re: Python glob and raw string
Date: Thu, 16 Jan 2014 18:14:30 +0000 (UTC)
Organization: Norwich University
References: <6d3dd7d7-6836-4b55-9d1c-9e70e18b66fd@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
User-Agent: slrn/0.9.9p1/mm/ao (Win32)
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.5596.1389896098.18130.python-list@python.org>
Lines: 69
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:64096

On 2014-01-16, Xaxa Urtiz <urtizvereaxaxa@gmail.com> wrote:
> Hello everybody, i've got a little problem, i've made a script
> which look after some files in some directory, typically my
> folder are organized like this :
>
> [share]
> folder1
> ->20131201
> -->file1.xml
> -->file2.txt
> ->20131202
> -->file9696009.tmp
> -->file421378932.xml
> etc....
> so basically in the share i've got some folder
> (=folder1,folder2.....) and inside these folder i've got these
> folder whose name is the date (20131201,20131202,20131203
> etc...) and inside them i want to find all the xml files.
> So, what i've done is to iterate over all the folder1/2/3 that
> i want and look, for each one, the xml file with that:
>
> for f in glob.glob(dir +r"\20140115\*.xml"):
> ->yield f
>
> dir is the folder1/2/3 everything is ok but i want to do
> something like that :
>
> for i in range(10,16):
> ->for f in glob.glob(dir +r"\201401{0}\*.xml".format(i)):
> -->yield f
>
> but the glob does not find any file.... (and of course there is
> some xml and the old way found them...) 
> Any help would be appreciate :) 

I've done this two different ways. The simple way is very similar
to what you are now doing. It sucks because I have to manually
maintain the list of subdirectories to traverse every time I
create a new subdir.

Here's the other way, using glob and isdir from os.path, adapted
from actual production code.

class Miner:
    def __init__(self, archive):
        # setup goes here; prepare to acquire the data
        self.descend(os.path.join(archive, '*'))

    def descend(self, path):
        for fname in glob.glob(os.path.join(path, '*')):
            if os.path.isdir(fname):
                self.descend(fname)
            else:
                self.process(fname)

    def process(self, path):
        # Do what I want done with an actual file path.
	# This is where I add to the data.

In your case you might not want to process unless the path also
looks like an xml file.

mine = Miner('myxmldir')

Hmmm... I might be doing too much in __init__. ;)

-- 
Neil Cerutti