Path: csiph.com!usenet.pasdenom.info!aioe.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.001 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'anyway.': 0.05; 'string': 0.09; '%s"': 0.09; 'filename': 0.09; 'jpg': 0.09; 'literal': 0.09; 'os.path': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'python': 0.11; '"file': 0.16; 'attempted': 0.16; 'direction?': 0.16; 'files:': 0.16; 'message-id:@4ax.com': 0.16; 'presume': 0.16; 'received:80.91.229.3': 0.16; 'received:plane.gmane.org': 0.16; 'skip:{ 30': 0.16; 'subject:File': 0.16; 'extensions': 0.16; 'trying': 0.19; '>>>': 0.22; 'import': 0.22; 'print': 0.22; "aren't": 0.24; 'module,': 0.24; 'tells': 0.24; 'url:home': 0.24; 'file.': 0.24; "haven't": 0.24; 'looks': 0.24; "i've": 0.25; 'script': 0.25; 'defined': 0.27; 'skip:" 20': 0.27; 'header:X -Complaints-To:1': 0.27; 'point': 0.28; 'function': 0.29; '???': 0.30; 'dec': 0.30; 'said,': 0.30; 'skip:( 20': 0.30; 'along': 0.30; "i'm": 0.30; 'apparently': 0.31; 'quotes': 0.31; 'file': 0.32; 'supposed': 0.32; 'raw': 0.33; 'basic': 0.35; 'common': 0.35; 'but': 0.35; 'charset:us-ascii': 0.36; 'list': 0.37; 'skip:o 20': 0.38; 'checks': 0.38; 'to:addr:python-list': 0.38; 'files': 0.38; 'little': 0.38; 'anything': 0.39; 'to:addr:python.org': 0.39; 'either': 0.39; 'received:org': 0.40; 'called': 0.40; 'how': 0.40; 'read': 0.60; 'skip:c 50': 0.60; 'skip:o 30': 0.61; 'first': 0.61; 'making': 0.63; 'forward': 0.65; 'note:': 0.66; 'containing': 0.69; 'received:108': 0.93; '2013': 0.98 X-Injected-Via-Gmane: http://gmane.org/ To: python-list@python.org From: Dennis Lee Bieber Subject: Re: Checking Common File Types Date: Sun, 01 Dec 2013 18:23:22 -0500 Organization: IISS Elusive Unicorn References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Gmane-NNTP-Posting-Host: adsl-108-68-177-200.dsl.klmzmi.sbcglobal.net X-Newsreader: Forte Agent 6.00/32.1186 X-No-Archive: YES X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 68 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1385940214 news.xs4all.nl 15993 [2001:888:2000:d::a6]:38385 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:60848 On Sun, 1 Dec 2013 18:27:16 +0000, jade declaimed the following: >Hello, >I'm trying to create a script that checks all the files in my 'downloaded' directory against common file types and then tells me how many of the files in that directory aren't either a GIF or a JPG file. I'm familiar with basic Python but this is the first time I've attempted anything like this and I'm looking for a little help or a point in the right direction? > >file_sigs = {'\xFF\xD8\xFF':('JPEG','jpg'), '\x47\x49\x46':('GIF','gif')} Apparently you presume the file extensions are inaccurate, as you are digging into the files for signatures. >def readFile(): filename = r'c:/temp/downloads' fh = open(filename, 'r') file_sig = fh.read(4) print '[*] check_sig() File:',filename #, 'Hash Sig:', binascii.hexlify(file_sig) Note: if you are hardcoding forward slashes, you don't need the raw indicator... That said, what is "c:/temp/downloads"? You apparently are opening IT as the file to be examined. Is it supposed to be a directory containing many files, a file containing a list of files, ??? What is "check_sig" -- it looks like a function you haven't defined -- but it's inside the quotes making a string literal that will never be called anyway. If you are just concerned with one directory of files, you might want to read the help file on the glob module, along with os.path (join/splitext/etc). Or just string methods... >>> import glob >>> import os.path >>> TARGET = os.path.join(os.environ["USERPROFILE"], ... "documents/BW-conversion/*") >>> TARGET = os.path.join(os.environ["USERPROFILE"], ... "documents/BW-conversion/*") >>> files = glob.glob(TARGET) >>> for fn in files: ... fp, fx = os.path.splitext(fn) ... print "File %s purports to be of type %s" % (fn, fx.upper()) ... File C:\Users\Wulfraed\documents/BW-conversion\BW-1.jpg purports to be of type .JPG File C:\Users\Wulfraed\documents/BW-conversion\BW-2.jpg purports to be of type .JPG File C:\Users\Wulfraed\documents/BW-conversion\BW-3.jpg purports to be of type .JPG File C:\Users\Wulfraed\documents/BW-conversion\BW-4.jpg purports to be of type .JPG File C:\Users\Wulfraed\documents/BW-conversion\BWConv.html purports to be of type .HTML File C:\Users\Wulfraed\documents/BW-conversion\roo_b1.jpg purports to be of type .JPG File C:\Users\Wulfraed\documents/BW-conversion\roo_b2.jpg purports to be of type .JPG File C:\Users\Wulfraed\documents/BW-conversion\roo_b3.jpg purports to be of type .JPG File C:\Users\Wulfraed\documents/BW-conversion\roo_b4.jpg purports to be of type .JPG File C:\Users\Wulfraed\documents/BW-conversion\roo_b5.jpg purports to be of type .JPG File C:\Users\Wulfraed\documents/BW-conversion\roo_b6.jpg purports to be of type .JPG File C:\Users\Wulfraed\documents/BW-conversion\roo_col.jpg purports to be of type .JPG >>> -- Wulfraed Dennis Lee Bieber AF6VN wlfraed@ix.netcom.com HTTP://wlfraed.home.netcom.com/