Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #89728

Fast way of extracting files from various folders

Newsgroups comp.lang.python
Date 2015-05-01 05:28 -0700
Message-ID <6ba8934e-2f1a-4bcf-b72a-0dd276182ca2@googlegroups.com> (permalink)
Subject Fast way of extracting files from various folders
From subhabrata.banerji@gmail.com

Show all headers | View raw


Dear Group,

I have several millions of documents in several folders and subfolders in my machine.
I tried to write a script as follows, to extract all the .doc files and to convert them in text, but it seems it is taking too much of time. 

import os
from fnmatch import fnmatch
import win32com.client
import zipfile, re
def listallfiles2(n):
    root = 'C:\Cand_Res'
    pattern = "*.doc"
    list1=[]
    for path, subdirs, files in os.walk(root):
        for name in files:
            if fnmatch(name, pattern):
                file_name1=os.path.join(path, name)
                if ".doc" in file_name1:
                    #EXTRACTING ONLY .DOC FILES
                    if ".docx" not in file_name1:
                        #print "It is A Doc file$$:",file_name1
                        try:
                            doc = win32com.client.GetObject(file_name1)
                            text = doc.Range().Text
                            text1=text.encode('ascii','ignore')
                            text_word=text1.split()
                            #print "Text for Document File Is:",text1
                            list1.append(text_word)
                            print "It is a Doc file"
                        except:
                            print "DOC ISSUE"

But it seems it is taking too much of time, to convert to text and to append to list. Is there any way I may do it fast? I am using Python2.7 on Windows 7 Professional Edition. Apology for any indentation error. 

If any one may kindly suggest a solution.

Regards,
Subhabrata Banerjee. 

Back to comp.lang.python | Previous | NextNext in thread | Find similar | Unroll thread


Thread

Fast way of extracting files from various folders subhabrata.banerji@gmail.com - 2015-05-01 05:28 -0700
  Re: Fast way of extracting files from various folders Irmen de Jong <irmen.NOSPAM@xs4all.nl> - 2015-05-01 18:36 +0200
  Re: Fast way of extracting files from various folders subhabrata.banerji@gmail.com - 2015-05-02 02:00 -0700
  Re: Fast way of extracting files from various folders Peter Otten <__peter__@web.de> - 2015-05-02 11:22 +0200
    Re: Fast way of extracting files from various folders subhabrata.banerji@gmail.com - 2015-05-02 03:44 -0700

csiph-web