Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.java.programmer > #38701 > unrolled thread
| Started by | Sandro Leinzinger <leinzinger.sandro@googlemail.com> |
|---|---|
| First post | 2019-01-17 06:19 -0800 |
| Last post | 2019-01-18 07:57 +0100 |
| Articles | 8 — 6 participants |
Back to article view | Back to comp.lang.java.programmer
Massiv creating and copying of files Sandro Leinzinger <leinzinger.sandro@googlemail.com> - 2019-01-17 06:19 -0800
Re: Massiv creating and copying of files Arne Vajhøj <arne@vajhoej.dk> - 2019-01-17 09:30 -0500
Re: Massiv creating and copying of files Eric Sosman <esosman@comcast-dot-net.invalid> - 2019-01-17 11:30 -0500
Re: Massiv creating and copying of files Sandro Leinzinger <leinzinger.sandro@googlemail.com> - 2019-01-17 11:11 -0800
Re: Massiv creating and copying of files Eric Douglas <e.d.programmer@gmail.com> - 2019-01-17 11:24 -0800
Re: Massiv creating and copying of files Eric Sosman <esosman@comcast-dot-net.invalid> - 2019-01-17 17:16 -0500
Re: Massiv creating and copying of files Martin Gregorie <martin@mydomain.invalid> - 2019-01-17 23:33 +0000
Re: Massiv creating and copying of files Marcel Mueller <news.5.maazl@spamgourmet.org> - 2019-01-18 07:57 +0100
| From | Sandro Leinzinger <leinzinger.sandro@googlemail.com> |
|---|---|
| Date | 2019-01-17 06:19 -0800 |
| Subject | Massiv creating and copying of files |
| Message-ID | <7e871fd2-1583-4d9c-a5e6-d72c7bfb3f8c@googlegroups.com> |
I wrote a program which does this: 1. Create a temp folder 2. Requesting a file from a web server and copy to temp folder 3. Generating some files in the temp folder from 1. 4. Generating a .iso-file from the temp-folder content with mkisofs 5. Zipping the whole content (3GB) of temp All this operations are happening on the same harddisk array. The porblem is, that the array is getting slower and slower and works on 100% even with only 5 Threads executing this task. Can some one tell me some optimisation in this task? Maybe create them in ram and then copy the zip to the hdd? But this would also bring the disks to 100% of load?
[toc] | [next] | [standalone]
| From | Arne Vajhøj <arne@vajhoej.dk> |
|---|---|
| Date | 2019-01-17 09:30 -0500 |
| Message-ID | <q1q3io$cnk$1@gioia.aioe.org> |
| In reply to | #38701 |
On 1/17/2019 9:19 AM, Sandro Leinzinger wrote: > I wrote a program which does this: > > 1. Create a temp folder > 2. Requesting a file from a web server and copy to temp folder > 3. Generating some files in the temp folder from 1. > 4. Generating a .iso-file from the temp-folder content with mkisofs > 5. Zipping the whole content (3GB) of temp > > All this operations are happening on the same harddisk array. The > porblem is, that the array is getting slower and slower and works on > 100% even with only 5 Threads executing this task. > > Can some one tell me some optimisation in this task? Maybe create > them in ram and then copy the zip to the hdd? But this would also > bring the disks to 100% of load? I don't think there is anything Java specific in this. It seems to involve: * writing 3 GB of files to disk * create ISO which will be another 3 GB * create ZIP which will probably be another 1.0-1.5 GB Depending on the IO system that may take some time. And it sounds like all files eventually need to be on disk, so no real option for avoiding any writes. So back to basic optimization: * ensure that your code read and write really large chunks (not a raw FileXxxxStream but wrapped in a BufferedXxxxStream in Java) * ensure that there is plenty of space on disk (less than half full may be best) to avoid fragmentation * ensure that the OS has sufficient memory for IO buffers But there will be a hard limit on what the optimization can achieve. If the IO system take T time to write 7.5 GB, then that is it. Arne
[toc] | [prev] | [next] | [standalone]
| From | Eric Sosman <esosman@comcast-dot-net.invalid> |
|---|---|
| Date | 2019-01-17 11:30 -0500 |
| Message-ID | <q1qaim$72q$1@dont-email.me> |
| In reply to | #38701 |
On 1/17/2019 9:19 AM, Sandro Leinzinger wrote:
> I wrote a program which does this:
>
> 1. Create a temp folder
> 2. Requesting a file from a web server and copy to temp folder
> 3. Generating some files in the temp folder from 1.
> 4. Generating a .iso-file from the temp-folder content with mkisofs
> 5. Zipping the whole content (3GB) of temp
>
> All this operations are happening on the same harddisk array. The porblem is, that the array is getting slower and slower and works on 100% even with only 5 Threads executing this task.
>
> Can some one tell me some optimisation in this task? Maybe create them in ram and then copy the zip to the hdd? But this would also bring the disks to 100% of load?
See Arne Vajhøj's response. I'll just add one possibility: If the
destination is truly an "array" of multiple disks, you might gain some
speed by splitting the I/O operations between distinct drives. For
example, if the ISO file and the temp folder are on different drives,
so ISO creation reads from one drive while writing to another, you may
spend less time in disk seeks.
However, if the "array" is already configured as one giant virtual
drive this strategy is probably inapplicable.
--
esosman@comcast-dot-net.invalid
Seven hundred thirty-four days to go.
[toc] | [prev] | [next] | [standalone]
| From | Sandro Leinzinger <leinzinger.sandro@googlemail.com> |
|---|---|
| Date | 2019-01-17 11:11 -0800 |
| Message-ID | <2382d05b-5eeb-48d2-8a13-448650eeafe8@googlegroups.com> |
| In reply to | #38703 |
Am Donnerstag, 17. Januar 2019 17:30:25 UTC+1 schrieb Eric Sosman: > On 1/17/2019 9:19 AM, Sandro Leinzinger wrote: > > I wrote a program which does this: > > > > 1. Create a temp folder > > 2. Requesting a file from a web server and copy to temp folder > > 3. Generating some files in the temp folder from 1. > > 4. Generating a .iso-file from the temp-folder content with mkisofs > > 5. Zipping the whole content (3GB) of temp > > > > All this operations are happening on the same harddisk array. The porblem is, that the array is getting slower and slower and works on 100% even with only 5 Threads executing this task. > > > > Can some one tell me some optimisation in this task? Maybe create them in ram and then copy the zip to the hdd? But this would also bring the disks to 100% of load? > > See Arne Vajhøj's response. I'll just add one possibility: If the > destination is truly an "array" of multiple disks, you might gain some > speed by splitting the I/O operations between distinct drives. For > example, if the ISO file and the temp folder are on different drives, > so ISO creation reads from one drive while writing to another, you may > spend less time in disk seeks. > > However, if the "array" is already configured as one giant virtual > drive this strategy is probably inapplicable. > > -- > esosman@comcast-dot-net.invalid > Seven hundred thirty-four days to go. Thanks for your help ... i could boost the process by creating the files in a ram disk (4 at the same time) and then write it to disk. But you are right ... IO takes time and you have to wait :)
[toc] | [prev] | [next] | [standalone]
| From | Eric Douglas <e.d.programmer@gmail.com> |
|---|---|
| Date | 2019-01-17 11:24 -0800 |
| Message-ID | <0c5852a5-cce3-47e8-b264-ea5ef6301060@googlegroups.com> |
| In reply to | #38704 |
On Thursday, January 17, 2019 at 2:11:54 PM UTC-5, Sandro Leinzinger wrote: > > Thanks for your help ... i could boost the process by creating the files in a ram disk (4 at the same time) and then write it to disk. But you are right ... IO takes time and you have to wait :) The simple answer, while we have multiple threads which can simultaneously access memory, as far as I know we still cannot access a hard disk simultaneously, so file writing is only much faster with multi-threading, with each thread accessing a different physical disk. If you need all your output to be on a single disk, but you have other physical disks available you could write to multiple disks then kick off a post-processor to merge the outputs if you need to finish the work faster without completing the final output fast. Only other answer is get a faster disk. Unless you're storing many copies of this output, 3GB is relatively small, so if purchasing disk is an option this could be a relatively cheap solution. I'm looking to get an EVO 970 myself to speed up my personal PC.
[toc] | [prev] | [next] | [standalone]
| From | Eric Sosman <esosman@comcast-dot-net.invalid> |
|---|---|
| Date | 2019-01-17 17:16 -0500 |
| Message-ID | <q1qurg$gdf$1@dont-email.me> |
| In reply to | #38705 |
On 1/17/2019 2:24 PM, Eric Douglas wrote:
>
> The simple answer, while we have multiple threads which can simultaneously access memory, as far as I know we still cannot access a hard disk simultaneously, so file writing is only much faster with multi-threading, with each thread accessing a different physical disk. [...]
Are there *any* disk controllers nowadays that *don't* accept
multiple requests and attempt to perform them in an optimal order?
https://en.wikipedia.org/wiki/Tagged_Command_Queuing
https://en.wikipedia.org/wiki/Native_Command_Queuing
--
esosman@comcast-dot-net.invalid
Seven hundred thirty-four days to go.
[toc] | [prev] | [next] | [standalone]
| From | Martin Gregorie <martin@mydomain.invalid> |
|---|---|
| Date | 2019-01-17 23:33 +0000 |
| Message-ID | <q1r3bi$oa5$1@news.albasani.net> |
| In reply to | #38706 |
On Thu, 17 Jan 2019 17:16:14 -0500, Eric Sosman wrote: > On 1/17/2019 2:24 PM, Eric Douglas wrote: >> >> The simple answer, while we have multiple threads which can >> simultaneously access memory, as far as I know we still cannot access a >> hard disk simultaneously, so file writing is only much faster with >> multi-threading, with each thread accessing a different physical disk. >> [...] > > Are there *any* disk controllers nowadays that *don't* accept > multiple requests and attempt to perform them in an optimal order? > There's another issue, too, which would tend to prevent either the OS or disk controller from re-ordering disk accesses: in the setup the OP has described case there are relatively few threads issuing i/o requests and they all have constraints on ordering: Thread 1 is writing what appear to be serial files to a holding pool, so even if its using asynchronous i/o the majority of its i/o can't be reordered simply because its appending data to a single file. Threads 2-4 are presumably worker threads that work in parallel, each dealing with one of the files received from the web server before taking the next input file from the holding pool. Again, these could work faster if they use async I/O but would seem to be constrained into sequential operation except that, as it appears that each input file gives rise to several outputs to the ISO, these could be handled in parallel if each worker thread spawns a set of subsidiary threads - one for each file being output. Thread 5 looks as though it will always be single threaded because it must first create an ISO image and then compress it into a ZIP archive. AND it can't start until threads 1 to 4 have finished and the ISO image is ready to be read and compressed. Some speed could be gained, though by splitting it into two threads: 5a building the ISO image from the output of threads 2-4 while 5b is compressing it. It just seems to me that, if the OPs five threads are are all that his program uses, there simply aren't enough independently scheduled i/o requesters to allow the disk subsystem, no matter how clever it is, to get any significant advantage from reordering each drive's i/o request queue. OTOH, if threads 2-4 can be further split into parallel threads, this may help, as would splitting thread 1 into a small set of parallel threads, though of course this will only help if inputs from the web server arrive fast enough to make it likely that several requests will be waiting to be serviced. Finally, it looks as though thread 5 must always wait for enough data to fill the ISO image to complete processing before it can start to run, so there's very little room to optimise it apart from using separate ISO reader and ZIP writer threads. -- Martin | martin at Gregorie | gregorie dot org
[toc] | [prev] | [next] | [standalone]
| From | Marcel Mueller <news.5.maazl@spamgourmet.org> |
|---|---|
| Date | 2019-01-18 07:57 +0100 |
| Message-ID | <q1rtd8$or7$1@gwaiyur.mb-net.net> |
| In reply to | #38701 |
Am 17.01.19 um 15:19 schrieb Sandro Leinzinger: > I wrote a program which does this: > > 1. Create a temp folder > 2. Requesting a file from a web server and copy to temp folder > 3. Generating some files in the temp folder from 1. > 4. Generating a .iso-file from the temp-folder content with mkisofs > 5. Zipping the whole content (3GB) of temp > > All this operations are happening on the same harddisk array. The porblem is, that the array is getting slower and slower and works on 100% even with only 5 Threads executing this task. > > Can some one tell me some optimisation in this task? Maybe create them in ram and then copy the zip to the hdd? But this would also bring the disks to 100% of load? There is no Java dependence in your question. Traditional single hard disks slow down roughly by a factor of 10 on random access. The latter is likely with concurrent operations. Disk arrays perform a bit better if well configured as long as the number of concurrent requests is not significantly larger than the number of physical disks. So if you want to speed up things you have exactly three options: 1. reduce the amount of I/O. 2. do preferably linear reads and writes to avoid the slowdown. 3. Use an SSD. SSDs do not show the slowdown on concurrent I/O. The first one can be accomplished by writing less temp files. E.g. you can switch to stream processing. Currently you are writing all data roughly 4 times. (Step 2 to 5) You can eliminate at least step 4 by using a pipe to pass the result of mkisofs on the fly to the compression task. You may further eliminate I/O by passing the result of the Web requests directly to your generation code from step 3, if this is possible. To reduce concurrency of I/O ensure that all buffers used for disk I/O are at least in the order of 5MB. When reading or writing 5MB at once the slowdown of HDDs typically becomes neglectable. This applies to buffered Java InputStreams or OutputStreams as well as to your external applications that read ans write data. Marcel
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.java.programmer
csiph-web