Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #71118 > unrolled thread
| Started by | Andrew McLean <lists@andros.org.uk> |
|---|---|
| First post | 2014-05-08 19:55 +0100 |
| Last post | 2014-05-08 22:56 +0300 |
| Articles | 2 — 2 participants |
Back to article view | Back to comp.lang.python
Real-world use of concurrent.futures Andrew McLean <lists@andros.org.uk> - 2014-05-08 19:55 +0100
Re: Real-world use of concurrent.futures Marko Rauhamaa <marko@pacujo.net> - 2014-05-08 22:56 +0300
| From | Andrew McLean <lists@andros.org.uk> |
|---|---|
| Date | 2014-05-08 19:55 +0100 |
| Subject | Real-world use of concurrent.futures |
| Message-ID | <mailman.9790.1399575348.18130.python-list@python.org> |
I have a problem that would benefit from a multithreaded implementation and having trouble understanding how to approach it using concurrent.futures. The details don't really matter, but it will probably help to be explicit. I have a large CSV file that contains a lot of fields, amongst them one containing email addresses. I want to write a program that validates the email addresses by checking that the domain names have a valid MX record. The output will be a copy of the file with any invalid email addresses removed. Because of latency in the DNS lookup this could benefit from multithreading. I have written similar code in the past using explicit threads communicating via queues. For this example, I could have a thread that read the file using csv.DictReader, putting dicts containing records from the input file into a (finite length) queue. Then I would have a number of worker threads reading the queue, performing the validation and putting validated results in a second queue. A final thread would read from the second queue writing the results to the output file. So far so good. However, I thought this would be an opportunity to explore concurrent.futures and to see whether it offered any benefits over the more explicit approach discussed above. The problem I am having is that all the discussions I can find of the use of concurrent.futures show use with toy problems involving just a few tasks. The url downloader in the documentation is typical, it proceeds as follows: 1. Get an instance of concurrent.futuresThreadPoolExecutor 2. Submit a few tasks to the executer 3. Iterate over the results using concurrent.futures.as_completed That's fine, but I suspect that isn't a helpful pattern if I have a very large number of tasks. In my case I could run out of memory if I tried submitting all of the tasks to the executor before processing any of the results. I'm guessing what I want to do is, submit tasks in batches of perhaps a few hundred, iterate over the results until most are complete, then submit some more tasks and so on. I'm struggling to see how to do this elegantly without a lot of messy code just there to do "bookkeeping". This can't be an uncommon scenario. Am I missing something, or is this just not a job suitable for futures? Regards, Andrew
[toc] | [next] | [standalone]
| From | Marko Rauhamaa <marko@pacujo.net> |
|---|---|
| Date | 2014-05-08 22:56 +0300 |
| Message-ID | <87ppjot57x.fsf@elektro.pacujo.net> |
| In reply to | #71118 |
Andrew McLean <lists@andros.org.uk>: > That's fine, but I suspect that isn't a helpful pattern if I have a > very large number of tasks. In my case I could run out of memory if I > tried submitting all of the tasks to the executor before processing > any of the results. This is related to flow control. You'll need an object for each flow (transaction). When new work comes in from the network, you'll have to see if you are hitting the maximum number of pending transactions, and not start another one before previous transactions have been processed. Whenever a transaction is completed, you pull in more work. Marko
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web