Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #68199 > unrolled thread
| Started by | Jaiprakash Singh <jaiprakash@wisepromo.com> |
|---|---|
| First post | 2014-03-10 22:52 -0700 |
| Last post | 2014-03-11 10:18 -0700 |
| Articles | 2 — 2 participants |
Back to article view | Back to comp.lang.python
querry on queue ( thread safe ) multithreading Jaiprakash Singh <jaiprakash@wisepromo.com> - 2014-03-10 22:52 -0700
Re: querry on queue ( thread safe ) multithreading Jim Gibson <jimsgibson@gmail.com> - 2014-03-11 10:18 -0700
| From | Jaiprakash Singh <jaiprakash@wisepromo.com> |
|---|---|
| Date | 2014-03-10 22:52 -0700 |
| Subject | querry on queue ( thread safe ) multithreading |
| Message-ID | <de667c72-c1f7-449e-86cf-ce8c6818b87d@googlegroups.com> |
hey i am working on scraping a site , so i am using multi-threading concept.
i wrote a code based on queue (thread safe) but still my code block out after sometime, please help , i have searched a lot but unable to resolve it. please help i stuck here.
my code is under ..
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
import subprocess
import multiprocessing
import logging
from scrapy import cmdline
import time
logging.basicConfig(level=logging.DEBUG,
format='[%(levelname)s] (%(threadName)-10s) %(message)s',)
num_fetch_threads = 150
enclosure_queue = multiprocessing.JoinableQueue()
def main3(i, q):
for pth in iter(q.get, None):
try:
cmdline.execute(['scrapy', 'runspider', 'page3_second_scrapy_flipkart.py', '-a', 'pth=%s' %(pth)])
print pth
except:
pass
time.sleep(i + 2)
q.task_done()
q.task_done()
def main2(output):
procs = []
for i in range(num_fetch_threads):
procs.append(multiprocessing.Process(target=main3, args=(i, enclosure_queue,)))
#worker.setDaemon(True)
procs[-1].start()
for pth in output:
enclosure_queue.put(pth)
print '*** Main thread waiting'
enclosure_queue.join()
print '*** Done'
for p in procs:
enclosure_queue.put(None)
enclosure_queue.join()
for p in procs:
p.join()
[toc] | [next] | [standalone]
| From | Jim Gibson <jimsgibson@gmail.com> |
|---|---|
| Date | 2014-03-11 10:18 -0700 |
| Message-ID | <110320141018377289%jimsgibson@gmail.com> |
| In reply to | #68199 |
In article <de667c72-c1f7-449e-86cf-ce8c6818b87d@googlegroups.com>, Jaiprakash Singh <jaiprakash@wisepromo.com> wrote: > hey i am working on scraping a site , so i am using multi-threading concept. > i wrote a code based on queue (thread safe) but still my code block out after > sometime, please help , i have searched a lot but unable to resolve it. > please help i stuck here. Do you really want to subject the web server to 150 simultaneous requests? Some would consider that a denial-of-service attack. When I scrape a site, and I have been doing that occasionally of late, I put a 10-second sleep after each HTTP request. That makes my program more considerate of other people's resources and a better web citizen. It is also much easier to program. -- Jim Gibson
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web