Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #68199 > unrolled thread

querry on queue ( thread safe ) multithreading

Started byJaiprakash Singh <jaiprakash@wisepromo.com>
First post2014-03-10 22:52 -0700
Last post2014-03-11 10:18 -0700
Articles 2 — 2 participants

Back to article view | Back to comp.lang.python


Contents

  querry  on queue ( thread safe ) multithreading Jaiprakash Singh <jaiprakash@wisepromo.com> - 2014-03-10 22:52 -0700
    Re: querry  on queue ( thread safe ) multithreading Jim Gibson <jimsgibson@gmail.com> - 2014-03-11 10:18 -0700

#68199 — querry on queue ( thread safe ) multithreading

FromJaiprakash Singh <jaiprakash@wisepromo.com>
Date2014-03-10 22:52 -0700
Subjectquerry on queue ( thread safe ) multithreading
Message-ID<de667c72-c1f7-449e-86cf-ce8c6818b87d@googlegroups.com>
hey i am working on scraping a site , so  i am using multi-threading concept.
i wrote a code based on queue (thread safe) but still my code block out after sometime, please help , i have searched a lot but unable to resolve it. please help i stuck here.

my code is under ..

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

import subprocess
import multiprocessing
import logging
from scrapy import cmdline
import time

logging.basicConfig(level=logging.DEBUG,
                    format='[%(levelname)s] (%(threadName)-10s) %(message)s',)


num_fetch_threads = 150
enclosure_queue = multiprocessing.JoinableQueue()



def main3(i, q):
    for pth in iter(q.get, None):
        try:
            cmdline.execute(['scrapy',  'runspider',   'page3_second_scrapy_flipkart.py',  '-a',  'pth=%s' %(pth)])
            print pth
        except:
            pass

        time.sleep(i + 2)
        q.task_done()

    q.task_done()




def main2(output):
    procs = []

    for i in range(num_fetch_threads):
        procs.append(multiprocessing.Process(target=main3, args=(i, enclosure_queue,)))
        #worker.setDaemon(True)
        procs[-1].start()

    for pth in output:
        enclosure_queue.put(pth)

    print '*** Main thread waiting'
    enclosure_queue.join()
    print '*** Done'

    for p in procs:
        enclosure_queue.put(None)

    enclosure_queue.join()

    for p in procs:
        p.join()


 

[toc] | [next] | [standalone]


#68232

FromJim Gibson <jimsgibson@gmail.com>
Date2014-03-11 10:18 -0700
Message-ID<110320141018377289%jimsgibson@gmail.com>
In reply to#68199
In article <de667c72-c1f7-449e-86cf-ce8c6818b87d@googlegroups.com>,
Jaiprakash Singh <jaiprakash@wisepromo.com> wrote:

> hey i am working on scraping a site , so  i am using multi-threading concept.
> i wrote a code based on queue (thread safe) but still my code block out after
> sometime, please help , i have searched a lot but unable to resolve it.
> please help i stuck here.

Do you really want to subject the web server to 150 simultaneous
requests? Some would consider that a denial-of-service attack.

When I scrape a site, and I have been doing that occasionally of late,
I put a 10-second sleep after each HTTP request. That makes my program
more considerate of other people's resources and a better web citizen.
It is also much easier to program.

-- 
Jim Gibson

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web