Groups > comp.lang.python > #64090 > unrolled thread

Python glob and raw string

Started by	Xaxa Urtiz <urtizvereaxaxa@gmail.com>
First post	2014-01-16 08:49 -0800
Last post	2014-01-16 18:45 +0000
Articles	7 — 4 participants

Back to article view | Back to comp.lang.python

  Python glob and raw string Xaxa Urtiz <urtizvereaxaxa@gmail.com> - 2014-01-16 08:49 -0800
    Re: Python glob and raw string Xaxa Urtiz <urtizvereaxaxa@gmail.com> - 2014-01-16 10:03 -0800
    Re: Python glob and raw string Neil Cerutti <neilc@norwich.edu> - 2014-01-16 18:14 +0000
      Re: Python glob and raw string Xaxa Urtiz <urtizvereaxaxa@gmail.com> - 2014-01-17 08:45 -0800
        Re: Python glob and raw string Mark Lawrence <breamoreboy@yahoo.co.uk> - 2014-01-17 17:38 +0000
    Re: Python glob and raw string Chris Angelico <rosuav@gmail.com> - 2014-01-17 05:19 +1100
    Re: Python glob and raw string Neil Cerutti <neilc@norwich.edu> - 2014-01-16 18:45 +0000

#64090 — Python glob and raw string

From	Xaxa Urtiz <urtizvereaxaxa@gmail.com>
Date	2014-01-16 08:49 -0800
Subject	Python glob and raw string
Message-ID	<6d3dd7d7-6836-4b55-9d1c-9e70e18b66fd@googlegroups.com>

Hello everybody, i've got a little problem, i've made a script which look after some files in some directory, typically my folder are organized like this :

[share]
folder1
->20131201
-->file1.xml
-->file2.txt
->20131202
-->file9696009.tmp
-->file421378932.xml
etc....
so basically in the share i've got some folder (=folder1,folder2.....) and inside these folder i've got these folder whose name is the date (20131201,20131202,20131203 etc...) and inside them i want to find all the xml files.
So, what i've done is to iterate over all the folder1/2/3 that i want and look, for each one, the xml file with that:


for f in glob.glob(dir +r"\20140115\*.xml"):
->yield f

dir is the folder1/2/3 everything is ok but i want to do something like that :


for i in range(10,16):
->for f in glob.glob(dir +r"\201401{0}\*.xml".format(i)):
-->yield f

but the glob does not find any file.... (and of course there is some xml and the old way found them...) 
Any help would be appreciate :)

[toc] | [next] | [standalone]

#64094

From	Xaxa Urtiz <urtizvereaxaxa@gmail.com>
Date	2014-01-16 10:03 -0800
Message-ID	<1d76e596-fa6e-44c2-aefd-391788a4fd0d@googlegroups.com>
In reply to	#64090

Le jeudi 16 janvier 2014 17:49:57 UTC+1, Xaxa Urtiz a écrit :
> Hello everybody, i've got a little problem, i've made a script which look after some files in some directory, typically my folder are organized like this :
> 
> 
> 
> [share]
> 
> folder1
> 
> ->20131201
> 
> -->file1.xml
> 
> -->file2.txt
> 
> ->20131202
> 
> -->file9696009.tmp
> 
> -->file421378932.xml
> 
> etc....
> 
> so basically in the share i've got some folder (=folder1,folder2.....) and inside these folder i've got these folder whose name is the date (20131201,20131202,20131203 etc...) and inside them i want to find all the xml files.
> 
> So, what i've done is to iterate over all the folder1/2/3 that i want and look, for each one, the xml file with that:
> 
> 
> 
> 
> 
> for f in glob.glob(dir +r"\20140115\*.xml"):
> 
> ->yield f
> 
> 
> 
> dir is the folder1/2/3 everything is ok but i want to do something like that :
> 
> 
> 
> 
> 
> for i in range(10,16):
> 
> ->for f in glob.glob(dir +r"\201401{0}\*.xml".format(i)):
> 
> -->yield f
> 
> 
> 
> but the glob does not find any file.... (and of course there is some xml and the old way found them...) 
> 
> Any help would be appreciate :)

I feel stupid, my mistake, it works :

for i in range(1,16):
->for f in glob.glob(dir +r"\201401{0:02}\*.xml".format(i)):
-->yield f

[toc] | [prev] | [next] | [standalone]

#64096

From	Neil Cerutti <neilc@norwich.edu>
Date	2014-01-16 18:14 +0000
Message-ID	<mailman.5596.1389896098.18130.python-list@python.org>
In reply to	#64090

On 2014-01-16, Xaxa Urtiz <urtizvereaxaxa@gmail.com> wrote:
> Hello everybody, i've got a little problem, i've made a script
> which look after some files in some directory, typically my
> folder are organized like this :
>
> [share]
> folder1
> ->20131201
> -->file1.xml
> -->file2.txt
> ->20131202
> -->file9696009.tmp
> -->file421378932.xml
> etc....
> so basically in the share i've got some folder
> (=folder1,folder2.....) and inside these folder i've got these
> folder whose name is the date (20131201,20131202,20131203
> etc...) and inside them i want to find all the xml files.
> So, what i've done is to iterate over all the folder1/2/3 that
> i want and look, for each one, the xml file with that:
>
> for f in glob.glob(dir +r"\20140115\*.xml"):
> ->yield f
>
> dir is the folder1/2/3 everything is ok but i want to do
> something like that :
>
> for i in range(10,16):
> ->for f in glob.glob(dir +r"\201401{0}\*.xml".format(i)):
> -->yield f
>
> but the glob does not find any file.... (and of course there is
> some xml and the old way found them...) 
> Any help would be appreciate :) 

I've done this two different ways. The simple way is very similar
to what you are now doing. It sucks because I have to manually
maintain the list of subdirectories to traverse every time I
create a new subdir.

Here's the other way, using glob and isdir from os.path, adapted
from actual production code.

class Miner:
    def __init__(self, archive):
        # setup goes here; prepare to acquire the data
        self.descend(os.path.join(archive, '*'))

    def descend(self, path):
        for fname in glob.glob(os.path.join(path, '*')):
            if os.path.isdir(fname):
                self.descend(fname)
            else:
                self.process(fname)

    def process(self, path):
        # Do what I want done with an actual file path.
	# This is where I add to the data.

In your case you might not want to process unless the path also
looks like an xml file.

mine = Miner('myxmldir')

Hmmm... I might be doing too much in __init__. ;)

-- 
Neil Cerutti

[toc] | [prev] | [next] | [standalone]

#64178

From	Xaxa Urtiz <urtizvereaxaxa@gmail.com>
Date	2014-01-17 08:45 -0800
Message-ID	<78b3f119-4b8b-4c4c-b54e-c9078736869c@googlegroups.com>
In reply to	#64096

Le jeudi 16 janvier 2014 19:14:30 UTC+1, Neil Cerutti a écrit :
> On 2014-01-16, Xaxa Urtiz <> wrote:
> 
> > Hello everybody, i've got a little problem, i've made a script
> 
> > which look after some files in some directory, typically my
> 
> > folder are organized like this :
> 
> >
> 
> > [share]
> 
> > folder1
> 
> > ->20131201
> 
> > -->file1.xml
> 
> > -->file2.txt
> 
> > ->20131202
> 
> > -->file9696009.tmp
> 
> > -->file421378932.xml
> 
> > etc....
> 
> > so basically in the share i've got some folder
> 
> > (=folder1,folder2.....) and inside these folder i've got these
> 
> > folder whose name is the date (20131201,20131202,20131203
> 
> > etc...) and inside them i want to find all the xml files.
> 
> > So, what i've done is to iterate over all the folder1/2/3 that
> 
> > i want and look, for each one, the xml file with that:
> 
> >
> 
> > for f in glob.glob(dir +r"\20140115\*.xml"):
> 
> > ->yield f
> 
> >
> 
> > dir is the folder1/2/3 everything is ok but i want to do
> 
> > something like that :
> 
> >
> 
> > for i in range(10,16):
> 
> > ->for f in glob.glob(dir +r"\201401{0}\*.xml".format(i)):
> 
> > -->yield f
> 
> >
> 
> > but the glob does not find any file.... (and of course there is
> 
> > some xml and the old way found them...) 
> 
> > Any help would be appreciate :) 
> 
> 
> 
> I've done this two different ways. The simple way is very similar
> 
> to what you are now doing. It sucks because I have to manually
> 
> maintain the list of subdirectories to traverse every time I
> 
> create a new subdir.
> 
> 
> 
> Here's the other way, using glob and isdir from os.path, adapted
> 
> from actual production code.
> 
> 
> 
> class Miner:
> 
>     def __init__(self, archive):
> 
>         # setup goes here; prepare to acquire the data
> 
>         self.descend(os.path.join(archive, '*'))
> 
> 
> 
>     def descend(self, path):
> 
>         for fname in glob.glob(os.path.join(path, '*')):
> 
>             if os.path.isdir(fname):
> 
>                 self.descend(fname)
> 
>             else:
> 
>                 self.process(fname)
> 
> 
> 
>     def process(self, path):
> 
>         # Do what I want done with an actual file path.
> 
> 	# This is where I add to the data.
> 
> 
> 
> In your case you might not want to process unless the path also
> 
> looks like an xml file.
> 
> 
> 
> mine = Miner('myxmldir')
> 
> 
> 
> Hmmm... I might be doing too much in __init__. ;)
> 
> 
> 
> -- 
> 
> Neil Cerutti

i only have 1 level of subdirectory, it's just in the case when i don't want to process all the date (otherwise i make a glob on '/*/*', no need to do any recursion.
thanks for the answer !

[toc] | [prev] | [next] | [standalone]

#64186

From	Mark Lawrence <breamoreboy@yahoo.co.uk>
Date	2014-01-17 17:38 +0000
Message-ID	<mailman.5652.1389980335.18130.python-list@python.org>
In reply to	#64178

On 17/01/2014 16:45, Xaxa Urtiz wrote:

[masses of double spaced lines snipped]

Would you please read and action this 
https://wiki.python.org/moin/GoogleGroupsPython to prevent us seeing the 
double line spacing in your posts, thanks.

-- 
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.

Mark Lawrence

[toc] | [prev] | [next] | [standalone]

#64097

From	Chris Angelico <rosuav@gmail.com>
Date	2014-01-17 05:19 +1100
Message-ID	<mailman.5597.1389896390.18130.python-list@python.org>
In reply to	#64090

On Fri, Jan 17, 2014 at 5:14 AM, Neil Cerutti <neilc@norwich.edu> wrote:
> class Miner:
>     def __init__(self, archive):
>         # setup goes here; prepare to acquire the data
>         self.descend(os.path.join(archive, '*'))
>
>     def descend(self, path):
>         for fname in glob.glob(os.path.join(path, '*')):
>             if os.path.isdir(fname):
>                 self.descend(fname)
>             else:
>                 self.process(fname)
>
>     def process(self, path):
>         # Do what I want done with an actual file path.
>         # This is where I add to the data.
>
> In your case you might not want to process unless the path also
> looks like an xml file.
>
> mine = Miner('myxmldir')
>
> Hmmm... I might be doing too much in __init__. ;)

Hmm, why is it even a class? :) I guess you elided all the stuff that
makes it impractical to just use a non-class function.

ChrisA

[toc] | [prev] | [next] | [standalone]

#64098

From	Neil Cerutti <neilc@norwich.edu>
Date	2014-01-16 18:45 +0000
Message-ID	<mailman.5598.1389897976.18130.python-list@python.org>
In reply to	#64090

On 2014-01-16, Chris Angelico <rosuav@gmail.com> wrote:
>> Hmmm... I might be doing too much in __init__. ;)
>
> Hmm, why is it even a class? :) I guess you elided all the
> stuff that makes it impractical to just use a non-class
> function.

I didn't remove anything that makes it obviously class-worthy,
just timestamp checking, and several dicts and sets to store
data.

The original version of that code is just a set of three
functions, but the return result of that version was a single
dict. Once the return value got complicated enough to require
building up a class instance, it became a convenient place to
hang the functions.

-- 
Neil Cerutti

[toc] | [prev] | [standalone]

csiph-web

Python glob and raw string

Contents

#64090 — Python glob and raw string

#64094

#64096

#64178

#64186

#64097

#64098