Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #28606 > unrolled thread

Function for examine content of directory

Started byTigerstyle <laddosingh@gmail.com>
First post2012-09-06 07:56 -0700
Last post2012-09-07 15:21 -0400
Articles 10 — 5 participants

Back to article view | Back to comp.lang.python


Contents

  Function for examine content of directory Tigerstyle <laddosingh@gmail.com> - 2012-09-06 07:56 -0700
    Re: Function for examine content of directory Ian Foote <ian@feete.org> - 2012-09-06 16:06 +0100
    Re: Function for examine content of directory MRAB <python@mrabarnett.plus.com> - 2012-09-06 16:20 +0100
      Re: Function for examine content of directory Tigerstyle <laddosingh@gmail.com> - 2012-09-06 13:26 -0700
      Re: Function for examine content of directory Tigerstyle <laddosingh@gmail.com> - 2012-09-06 13:26 -0700
    Re: Function for examine content of directory Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2012-09-06 17:26 -0400
    Re: Function for examine content of directory Chris Angelico <rosuav@gmail.com> - 2012-09-07 07:48 +1000
    Re: Function for examine content of directory Tigerstyle <laddosingh@gmail.com> - 2012-09-07 07:23 -0700
    Re: Function for examine content of directory Tigerstyle <laddosingh@gmail.com> - 2012-09-07 07:28 -0700
      Re: Function for examine content of directory Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2012-09-07 15:21 -0400

#28606 — Function for examine content of directory

FromTigerstyle <laddosingh@gmail.com>
Date2012-09-06 07:56 -0700
SubjectFunction for examine content of directory
Message-ID<feae1d5a-c477-4e21-8136-c14e38672cb9@googlegroups.com>
Hi guys,

I'm trying to write a module containing a function to examine the contents of the current working directory and print out a count of how many files have each extension (".txt", ".doc", etc.)
 
This is the code so far:
--
import os

path = "v:\\workspace\\Python2_Homework03\\src\\"
dirs = os.listdir( path )
filenames = {"this.txt", "that.txt", "the_other.txt","this.doc","that.doc","this.pdf","first.txt","that.pdf"}
extensions = []
for filename in filenames:
    f = open(filename, "w")
    f.write("Some text\n")
    f.close()
    name , ext = os.path.splitext(f.name)
    extensions.append(ext)

# This would print all the files and directories
for file in dirs:
    print(file)

for ext in extensions:
    print("Count for %s: " %ext, extensions.count(ext))
    
--

When I'm trying to get the module to print how many files each extension has, it prints the count of each ext multiple times for each extension type. Like this:

this.pdf
the_other.txt
this.doc
that.txt
this.txt
that.pdf
first.txt
that.doc
Count for .pdf:  2
Count for .txt:  4
Count for .doc:  2
Count for .txt:  4
Count for .txt:  4
Count for .pdf:  2
Count for .txt:  4
Count for .doc:  2

Any help is appreciated.

T

[toc] | [next] | [standalone]


#28607

FromIan Foote <ian@feete.org>
Date2012-09-06 16:06 +0100
Message-ID<mailman.307.1346944006.27098.python-list@python.org>
In reply to#28606
On 06/09/12 15:56, Tigerstyle wrote:
> Hi guys,
>
> I'm trying to write a module containing a function to examine the contents of the current working directory and print out a count of how many files have each extension (".txt", ".doc", etc.)
>   
> This is the code so far:
> --
> import os
>
> path = "v:\\workspace\\Python2_Homework03\\src\\"
> dirs = os.listdir( path )
> filenames = {"this.txt", "that.txt", "the_other.txt","this.doc","that.doc","this.pdf","first.txt","that.pdf"}
> extensions = []
Try using a set here instead of a list:
     extensions = set()
> for filename in filenames:
>      f = open(filename, "w")
>      f.write("Some text\n")
>      f.close()
>      name , ext = os.path.splitext(f.name)
>      extensions.append(ext)
and use:
         extensions.add(ext)

This should take care of duplicates for you.

Regards,
Ian

[toc] | [prev] | [next] | [standalone]


#28609

FromMRAB <python@mrabarnett.plus.com>
Date2012-09-06 16:20 +0100
Message-ID<mailman.309.1346944827.27098.python-list@python.org>
In reply to#28606
On 06/09/2012 15:56, Tigerstyle wrote:
> Hi guys,
>
> I'm trying to write a module containing a function to examine the contents of the current working directory and print out a count of how many files have each extension (".txt", ".doc", etc.)
>
> This is the code so far:
> --
> import os
>
> path = "v:\\workspace\\Python2_Homework03\\src\\"
> dirs = os.listdir( path )
> filenames = {"this.txt", "that.txt", "the_other.txt","this.doc","that.doc","this.pdf","first.txt","that.pdf"}
> extensions = []
> for filename in filenames:
>      f = open(filename, "w")
>      f.write("Some text\n")
>      f.close()
>      name , ext = os.path.splitext(f.name)
>      extensions.append(ext)
>
> # This would print all the files and directories
> for file in dirs:
>      print(file)
>
> for ext in extensions:
>      print("Count for %s: " %ext, extensions.count(ext))
>
> --
>
> When I'm trying to get the module to print how many files each extension has, it prints the count of each ext multiple times for each extension type. Like this:
>
> this.pdf
> the_other.txt
> this.doc
> that.txt
> this.txt
> that.pdf
> first.txt
> that.doc
> Count for .pdf:  2
> Count for .txt:  4
> Count for .doc:  2
> Count for .txt:  4
> Count for .txt:  4
> Count for .pdf:  2
> Count for .txt:  4
> Count for .doc:  2
>
That's because each extension can occur multiple times in the list.

Try the Counter class:

from collections import Counter

for ext, count in Counter(extensions).items():
     print("Count for %s: " % ext, count)

[toc] | [prev] | [next] | [standalone]


#28636

FromTigerstyle <laddosingh@gmail.com>
Date2012-09-06 13:26 -0700
Message-ID<08938014-e4e0-4f6d-b11a-e843ed4a0da3@googlegroups.com>
In reply to#28609
Thanks, just what I was looking for :-)

T

kl. 17:20:27 UTC+2 torsdag 6. september 2012 skrev MRAB følgende:
> On 06/09/2012 15:56, Tigerstyle wrote:
> 
> > Hi guys,
> 
> >
> 
> > I'm trying to write a module containing a function to examine the contents of the current working directory and print out a count of how many files have each extension (".txt", ".doc", etc.)
> 
> >
> 
> > This is the code so far:
> 
> > --
> 
> > import os
> 
> >
> 
> > path = "v:\\workspace\\Python2_Homework03\\src\\"
> 
> > dirs = os.listdir( path )
> 
> > filenames = {"this.txt", "that.txt", "the_other.txt","this.doc","that.doc","this.pdf","first.txt","that.pdf"}
> 
> > extensions = []
> 
> > for filename in filenames:
> 
> >      f = open(filename, "w")
> 
> >      f.write("Some text\n")
> 
> >      f.close()
> 
> >      name , ext = os.path.splitext(f.name)
> 
> >      extensions.append(ext)
> 
> >
> 
> > # This would print all the files and directories
> 
> > for file in dirs:
> 
> >      print(file)
> 
> >
> 
> > for ext in extensions:
> 
> >      print("Count for %s: " %ext, extensions.count(ext))
> 
> >
> 
> > --
> 
> >
> 
> > When I'm trying to get the module to print how many files each extension has, it prints the count of each ext multiple times for each extension type. Like this:
> 
> >
> 
> > this.pdf
> 
> > the_other.txt
> 
> > this.doc
> 
> > that.txt
> 
> > this.txt
> 
> > that.pdf
> 
> > first.txt
> 
> > that.doc
> 
> > Count for .pdf:  2
> 
> > Count for .txt:  4
> 
> > Count for .doc:  2
> 
> > Count for .txt:  4
> 
> > Count for .txt:  4
> 
> > Count for .pdf:  2
> 
> > Count for .txt:  4
> 
> > Count for .doc:  2
> 
> >
> 
> That's because each extension can occur multiple times in the list.
> 
> 
> 
> Try the Counter class:
> 
> 
> 
> from collections import Counter
> 
> 
> 
> for ext, count in Counter(extensions).items():
> 
>      print("Count for %s: " % ext, count)

[toc] | [prev] | [next] | [standalone]


#28637

FromTigerstyle <laddosingh@gmail.com>
Date2012-09-06 13:26 -0700
Message-ID<mailman.326.1346963199.27098.python-list@python.org>
In reply to#28609
Thanks, just what I was looking for :-)

T

kl. 17:20:27 UTC+2 torsdag 6. september 2012 skrev MRAB følgende:
> On 06/09/2012 15:56, Tigerstyle wrote:
> 
> > Hi guys,
> 
> >
> 
> > I'm trying to write a module containing a function to examine the contents of the current working directory and print out a count of how many files have each extension (".txt", ".doc", etc.)
> 
> >
> 
> > This is the code so far:
> 
> > --
> 
> > import os
> 
> >
> 
> > path = "v:\\workspace\\Python2_Homework03\\src\\"
> 
> > dirs = os.listdir( path )
> 
> > filenames = {"this.txt", "that.txt", "the_other.txt","this.doc","that.doc","this.pdf","first.txt","that.pdf"}
> 
> > extensions = []
> 
> > for filename in filenames:
> 
> >      f = open(filename, "w")
> 
> >      f.write("Some text\n")
> 
> >      f.close()
> 
> >      name , ext = os.path.splitext(f.name)
> 
> >      extensions.append(ext)
> 
> >
> 
> > # This would print all the files and directories
> 
> > for file in dirs:
> 
> >      print(file)
> 
> >
> 
> > for ext in extensions:
> 
> >      print("Count for %s: " %ext, extensions.count(ext))
> 
> >
> 
> > --
> 
> >
> 
> > When I'm trying to get the module to print how many files each extension has, it prints the count of each ext multiple times for each extension type. Like this:
> 
> >
> 
> > this.pdf
> 
> > the_other.txt
> 
> > this.doc
> 
> > that.txt
> 
> > this.txt
> 
> > that.pdf
> 
> > first.txt
> 
> > that.doc
> 
> > Count for .pdf:  2
> 
> > Count for .txt:  4
> 
> > Count for .doc:  2
> 
> > Count for .txt:  4
> 
> > Count for .txt:  4
> 
> > Count for .pdf:  2
> 
> > Count for .txt:  4
> 
> > Count for .doc:  2
> 
> >
> 
> That's because each extension can occur multiple times in the list.
> 
> 
> 
> Try the Counter class:
> 
> 
> 
> from collections import Counter
> 
> 
> 
> for ext, count in Counter(extensions).items():
> 
>      print("Count for %s: " % ext, count)

[toc] | [prev] | [next] | [standalone]


#28644

FromDennis Lee Bieber <wlfraed@ix.netcom.com>
Date2012-09-06 17:26 -0400
Message-ID<mailman.331.1346967004.27098.python-list@python.org>
In reply to#28606
On Thu, 6 Sep 2012 07:56:29 -0700 (PDT), Tigerstyle
<laddosingh@gmail.com> declaimed the following in
gmane.comp.python.general:


>     extensions.append(ext)
> 
	Don't append an ext if it is already in the list...

	if ext not in extensions: extensions.append(ext)
-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
        wlfraed@ix.netcom.com    HTTP://wlfraed.home.netcom.com/

[toc] | [prev] | [next] | [standalone]


#28646

FromChris Angelico <rosuav@gmail.com>
Date2012-09-07 07:48 +1000
Message-ID<mailman.333.1346968110.27098.python-list@python.org>
In reply to#28606
On Fri, Sep 7, 2012 at 12:56 AM, Tigerstyle <laddosingh@gmail.com> wrote:
> I'm trying to write a module containing a function to examine the contents of the current working directory and print out a count of how many files have each extension (".txt", ".doc", etc.)

If you haven't already, look into the Python 'dict' type; you may find
it easier to work with for this sort of job. You can map an extension
("txt") to its count (4) directly.

ChrisA

[toc] | [prev] | [next] | [standalone]


#28689

FromTigerstyle <laddosingh@gmail.com>
Date2012-09-07 07:23 -0700
Message-ID<1eb7dbea-cff5-4038-a441-d22707a4a7de@googlegroups.com>
In reply to#28606
kl. 16:56:29 UTC+2 torsdag 6. september 2012 skrev Tigerstyle følgende:
> Hi guys,
> 
> 
> 
> I'm trying to write a module containing a function to examine the contents of the current working directory and print out a count of how many files have each extension (".txt", ".doc", etc.)
> 
>  
> 
> This is the code so far:
> 
> --
> 
> import os
> 
> 
> 
> path = "v:\\workspace\\Python2_Homework03\\src\\"
> 
> dirs = os.listdir( path )
> 
> filenames = {"this.txt", "that.txt", "the_other.txt","this.doc","that.doc","this.pdf","first.txt","that.pdf"}
> 
> extensions = []
> 
> for filename in filenames:
> 
>     f = open(filename, "w")
> 
>     f.write("Some text\n")
> 
>     f.close()
> 
>     name , ext = os.path.splitext(f.name)
> 
>     extensions.append(ext)
> 
> 
> 
> # This would print all the files and directories
> 
> for file in dirs:
> 
>     print(file)
> 
> 
> 
> for ext in extensions:
> 
>     print("Count for %s: " %ext, extensions.count(ext))
> 
>     
> 
> --
> 
> 
> 
> When I'm trying to get the module to print how many files each extension has, it prints the count of each ext multiple times for each extension type. Like this:
> 
> 
> 
> this.pdf
> 
> the_other.txt
> 
> this.doc
> 
> that.txt
> 
> this.txt
> 
> that.pdf
> 
> first.txt
> 
> that.doc
> 
> Count for .pdf:  2
> 
> Count for .txt:  4
> 
> Count for .doc:  2
> 
> Count for .txt:  4
> 
> Count for .txt:  4
> 
> Count for .pdf:  2
> 
> Count for .txt:  4
> 
> Count for .doc:  2
> 
> 
> 
> Any help is appreciated.
> 
> 
> 
> T

[toc] | [prev] | [next] | [standalone]


#28690

FromTigerstyle <laddosingh@gmail.com>
Date2012-09-07 07:28 -0700
Message-ID<c93f63c5-e34f-4747-ab11-c18f9252e997@googlegroups.com>
In reply to#28606
Ok I'm now totally stuck.

This is the code:

---
import os
from collections import Counter
 
path = ":c\\mypath\dir"
dirs = os.listdir( path )
filenames = {"this.txt", "that.txt", "the_other.txt","this.doc","that.doc","this.pdf","first.txt","that.pdf"}
extensions = []
for filename in filenames:
    f = open(filename, "w")
    f.write("Some text\n")
    f.close()
    name , ext = os.path.splitext(f.name)
    extensions.append(ext)

# This would print all the files and directories
for file in dirs:
    print(file)



for ext, count in Counter(extensions).items(): 
    print("Count for %s: " % ext, count) 

---

I need to make this module into a function and write a separate module to verify by testing that the function gives correct results.

Help and pointers are much appreciated.

T

[toc] | [prev] | [next] | [standalone]


#28706

FromDennis Lee Bieber <wlfraed@ix.netcom.com>
Date2012-09-07 15:21 -0400
Message-ID<mailman.367.1347045722.27098.python-list@python.org>
In reply to#28690
On Fri, 7 Sep 2012 07:28:03 -0700 (PDT), Tigerstyle
<laddosingh@gmail.com> declaimed the following in
gmane.comp.python.general:

> Ok I'm now totally stuck.
> 
> This is the code:
>
	This code is full of errors...
 
> ---
> import os
> from collections import Counter
>  
> path = ":c\\mypath\dir"

	Not a valid Windows path. The format should be "c:\mypath\dir"
(actually, to use \ you should probably declare it a raw string -- much
simpler, since all the python/OS functions don't care, is to use / -- as
in "c:/mypath/dir")

> dirs = os.listdir( path )

	Warning, this will also list items that are not files (like
subdirectories). (hence "dirs" is a misleading name)


> filenames = {"this.txt", "that.txt", "the_other.txt","this.doc","that.doc","this.pdf","first.txt","that.pdf"}
> extensions = []
> for filename in filenames:
>     f = open(filename, "w")
>     f.write("Some text\n")
>     f.close()
>     name , ext = os.path.splitext(f.name)
>     extensions.append(ext)
> 
> # This would print all the files and directories
> for file in dirs:
>     print(file)

	This prints the file/directory /name/

	NOTE: you grabbed the list of names BEFORE you created your test
data files, so...

> 
> 
> 
> for ext, count in Counter(extensions).items(): 
>     print("Count for %s: " % ext, count) 
>
... this is not really a count of files grouped by extension IN the
directory -- this is only the count based on the file names you defined
to be created.

	I'm not going to create test files, nor a test suite, and what I
have done is still too much... but...

-=-=-=-=-
import os
import collections

PATH = "e:/userdata/wulfraed/my documents/python progs"

fids = os.listdir(PATH)

fids.sort()

nmlen = max([len(f) for f in fids])

format = "%%%ss %%10s" % nmlen

cntr = collections.Counter()

for fid in fids:
    prefix, ext = os.path.splitext(fid)
    print format % (prefix, ext)
    cntr.update([ext])

print "\n\n"

for ext, cnt in cntr.items():
    print "%10s %10s" % (ext, cnt)
-=-=-=-=-

                  .project          
             .pydevproject          
                 .settings          
                       ABA       .py
                       ADC       .py
                  BookList      .zip
                 CGIServer          
                      DGen       .py
               DiskCatalog       .py
               DiskCatalog      .pyc
                     Dload       .py
                  Firearms      .csv
                    GWhist       .py
                      HTML       .py
                     Hanoi       .py
                     Hanoi      .pyc
                  HierHead       .py
                 Intervals       .py
                 MBX_Split       .py
                 MySQLTest       .py
                 MySQLTest      .pyc
                   MySQLdb     .html
             MySQLdb_files          
                      NIM1       .py
             NumberPrinter       .py
                PhotoFrame       .py
               Probability       .py
               ProgressBar       .py
              ProgressBar2       .py
              RandomScores       .py
                       SQL       .py
                SQLiteTest       .py
                SampleData      .txt
              SampleFormat      .tsv
                   Script1       .py
                   Script2       .py
                   Script3       .py
                   Script3      .pyc
            Sociable_Chain       .py
            Sociable_Chain      .pyc
                    Stereo       .py
                      TAGS       .py
               azel_interp       .py
                    binadd       .py
                   binadd2       .py
                bsddb-test       .py
                   cgiform       .py
                chessclock       .py
                   counter       .py
             counterthread       .py
                        cp       .py
                      data      .txt
              databasetest       .py
             databasetest2       .py
                    dbfail       .py
                       dbg       .py
                       dbg      .pyc
                     dbtst       .py
                   dirwalk       .py
                   execsub       .py
                 extractor       .py
                   filecnt       .py
                    filter       .py
              fulldicttest       .py
                       h2b       .py
                       h2b      .pyc
                   headers       .py
                 highScore       .py
                 htmlparse       .py
                       i2b       .py
                       i2b      .pyc
                   infile1      .tsv
                   infile2      .tsv
                   infile3      .tsv
                   int2wrd       .py
                   int2wrd      .pyc
                  int2wrd2       .py
                  int2wrd2      .pyc
              intervalfile      .txt
                   invoice      .csv
                      junk       .py
                   justify       .py
                linkedlist       .py
                     llist       .py
                      main       .py
             make_ou_class       .py
             make_ou_class      .pyc
                   mileage       .py
                    minmax       .py
                      mofn       .py
                   mofn.py      .zip
                 movefiles       .py
                    moving       .py
                   mptest1       .py
              myhtmlparser       .py
              myhtmlparser      .pyc
                    mytest       .py
                    mytest      .pyc
                      node       .py
                      node      .pyc
                 pcdtojpeg       .py
                       pst       .py
                   queens1       .py
                   queens2       .py
                queens2.py      .zip
                     query       .py
                  railroad       .py
                       rpg       .py
                       run       .py
                         s      .txt
                    sample      .tsv
                  scramble       .py
                   scratch       .db
                   script1     .html
                   script1      .sql
                   script2     .html
    setuptools-0.6c6-py2.4      .egg
                      sgml       .py
                      spam       .py
                   sqltest       .py
                     sqrot       .py
                       src          
                       sub       .py
                    sub_p1       .py
                    sub_p3       .py
                    sudoku       .py
                 sudoku.py      .bak
                    sudoku      .pyc
              summup_dict1          
              summup_dict2          
             summup_dict2b          
              summup_dict3          
               summup_list          
                         t      .dat
                         t       .py
                  tabspace       .py
                  tabspace      .pyc
                   tdriver       .py
                      test      .csd
                      test       .db
                      test      .sql
                      test      .txt
                   testABA       .py
                   testABA      .pyc
                   tgsetup       .py
                    thread       .py
              threadsample       .py
                threadswap       .py
                  timetest       .py
                    timing       .py
                     trips      .dat
                update_log          
                     ut_00       .py
                  wordprob       .py



                   12
      .pyc         17
      .bak          1
      .sql          2
      .tsv          5
      .csv          2
       .db          2
      .dat          2
       .py         98
      .txt          5
     .html          3
      .csd          1
      .egg          1
      .zip          3
-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
        wlfraed@ix.netcom.com    HTTP://wlfraed.home.netcom.com/

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web