Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #89748
| Newsgroups | comp.lang.python |
|---|---|
| Date | 2015-05-02 02:00 -0700 |
| References | <6ba8934e-2f1a-4bcf-b72a-0dd276182ca2@googlegroups.com> |
| Message-ID | <fc6342cd-1bcd-4f66-903b-1e99f8fbd4a1@googlegroups.com> (permalink) |
| Subject | Re: Fast way of extracting files from various folders |
| From | subhabrata.banerji@gmail.com |
On Friday, May 1, 2015 at 5:58:50 PM UTC+5:30, subhabrat...@gmail.com wrote:
> Dear Group,
>
> I have several millions of documents in several folders and subfolders in my machine.
> I tried to write a script as follows, to extract all the .doc files and to convert them in text, but it seems it is taking too much of time.
>
> import os
> from fnmatch import fnmatch
> import win32com.client
> import zipfile, re
> def listallfiles2(n):
> root = 'C:\Cand_Res'
> pattern = "*.doc"
> list1=[]
> for path, subdirs, files in os.walk(root):
> for name in files:
> if fnmatch(name, pattern):
> file_name1=os.path.join(path, name)
> if ".doc" in file_name1:
> #EXTRACTING ONLY .DOC FILES
> if ".docx" not in file_name1:
> #print "It is A Doc file$$:",file_name1
> try:
> doc = win32com.client.GetObject(file_name1)
> text = doc.Range().Text
> text1=text.encode('ascii','ignore')
> text_word=text1.split()
> #print "Text for Document File Is:",text1
> list1.append(text_word)
> print "It is a Doc file"
> except:
> print "DOC ISSUE"
>
> But it seems it is taking too much of time, to convert to text and to append to list. Is there any way I may do it fast? I am using Python2.7 on Windows 7 Professional Edition. Apology for any indentation error.
>
> If any one may kindly suggest a solution.
>
> Regards,
> Subhabrata Banerjee.
Thanks. You are right conversions are taking time. I would surely check. Rest part is okay. Regards, Subhabrata Banerjee.
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Fast way of extracting files from various folders subhabrata.banerji@gmail.com - 2015-05-01 05:28 -0700
Re: Fast way of extracting files from various folders Irmen de Jong <irmen.NOSPAM@xs4all.nl> - 2015-05-01 18:36 +0200
Re: Fast way of extracting files from various folders subhabrata.banerji@gmail.com - 2015-05-02 02:00 -0700
Re: Fast way of extracting files from various folders Peter Otten <__peter__@web.de> - 2015-05-02 11:22 +0200
Re: Fast way of extracting files from various folders subhabrata.banerji@gmail.com - 2015-05-02 03:44 -0700
csiph-web