Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.ruby > #4352 > unrolled thread

Optimize write of large file

Started by"Yoann M." <yoann6@gmail.com>
First post2011-05-12 07:58 -0500
Last post2011-05-13 02:21 -0500
Articles 7 — 4 participants

Back to article view | Back to comp.lang.ruby


Contents

  Optimize write of large file "Yoann M." <yoann6@gmail.com> - 2011-05-12 07:58 -0500
    Re: Optimize write of large file Markus Schirp <mbj@seonic.net> - 2011-05-12 08:17 -0500
    Re: Optimize write of large file Jeremy Bopp <jeremy@bopp.net> - 2011-05-12 09:19 -0500
    Re: Optimize write of large file "Yoann M." <yoann6@gmail.com> - 2011-05-12 10:07 -0500
      Re: Optimize write of large file Robert Klemme <shortcutter@googlemail.com> - 2011-05-12 10:25 -0500
      Re: Optimize write of large file Markus Schirp <mbj@seonic.net> - 2011-05-12 10:51 -0500
    Re: Optimize write of large file "Yoann M." <yoann6@gmail.com> - 2011-05-13 02:21 -0500

#4352 — Optimize write of large file

From"Yoann M." <yoann6@gmail.com>
Date2011-05-12 07:58 -0500
SubjectOptimize write of large file
Message-ID<f93da777afad73da30f77526cdcab9ee@ruby-forum.com>
Hello,
I have data to process and to write into files progressively. The data
files are in the end very large, but I append to them small strings. I
suppose buffering the strings before apending to the file would be
faster. I don't need the files to be written before the end of the whole
process (i.e. I don't use their content).

I've searched for info about how File buffer its data but it seems we
can not configure anything about this, did I miss something ?
My first idea was to buffer everything myself, appending lines to a
string, or an array of strings and write when I reach a big enough
amount of data. But if File uses a buffer anyway, it would be a waste of
time I suppose ?
Do you have any advice to optimize the writing of large files ?
Thanks !

-- 
Posted via http://www.ruby-forum.com/.

[toc] | [next] | [standalone]


#4355

FromMarkus Schirp <mbj@seonic.net>
Date2011-05-12 08:17 -0500
Message-ID<20110512131739.GA6897@mbj>
In reply to#4352
Hi,

Ruby and the glibc the kernel etc are doing buffering already. 
There is usually no need for explict buffering from ruby.

You can test this for yourself, try to write the same string for a
million time in a loop. Not each write triggers a disk transaction.

Regards,

Markus 


On Thu, May 12, 2011 at 09:58:36PM +0900, Yoann M. wrote:
> Hello,
> I have data to process and to write into files progressively. The data
> files are in the end very large, but I append to them small strings. I
> suppose buffering the strings before apending to the file would be
> faster. I don't need the files to be written before the end of the whole
> process (i.e. I don't use their content).
> 
> I've searched for info about how File buffer its data but it seems we
> can not configure anything about this, did I miss something ?
> My first idea was to buffer everything myself, appending lines to a
> string, or an array of strings and write when I reach a big enough
> amount of data. But if File uses a buffer anyway, it would be a waste of
> time I suppose ?
> Do you have any advice to optimize the writing of large files ?
> Thanks !
> 
> -- 
> Posted via http://www.ruby-forum.com/.
> 

[toc] | [prev] | [next] | [standalone]


#4365

FromJeremy Bopp <jeremy@bopp.net>
Date2011-05-12 09:19 -0500
Message-ID<4DCBEC4A.9020707@bopp.net>
In reply to#4352
On 5/12/2011 07:58, Yoann M. wrote:
> Hello,
> I have data to process and to write into files progressively. The data
> files are in the end very large, but I append to them small strings. I
> suppose buffering the strings before apending to the file would be
> faster. I don't need the files to be written before the end of the whole
> process (i.e. I don't use their content).
> 
> I've searched for info about how File buffer its data but it seems we
> can not configure anything about this, did I miss something ?
> My first idea was to buffer everything myself, appending lines to a
> string, or an array of strings and write when I reach a big enough
> amount of data. But if File uses a buffer anyway, it would be a waste of
> time I suppose ?
> Do you have any advice to optimize the writing of large files ?

As mentioned, the file writes are already being buffered by lower
layers; however, if you are closing and reopening the files throughout
your processing, the buffers aren't helping you much.  Try to ensure
that you open each file only once and keep those file references around
to use until you know you're permanently done writing to each one.
Unless you have a large number of files to open, you shouldn't have to
worry about resource constraints on the number of concurrently open files.

-Jeremy

[toc] | [prev] | [next] | [standalone]


#4374

From"Yoann M." <yoann6@gmail.com>
Date2011-05-12 10:07 -0500
Message-ID<c43cd613ab24e8cf05e03bc756926032@ruby-forum.com>
In reply to#4352
You're right, doing the buffer myself does not make it faster. For
writing 10 millions lines, with an array of strings, one string, and no
homemade-buffer (code is attached) :
Buffer array : 11.141s
Buffer string : 9.748s
No buffer : 10.344s

Don't you think using more RAM before writing on disk could make the
process faster ? I thought so, then I'd like to say to File how much RAM
it can uses to speed things up, because I can use a lot of RAM.

Regards

Attachments:
http://www.ruby-forum.com/attachment/6191/test_write.rb


-- 
Posted via http://www.ruby-forum.com/.

[toc] | [prev] | [next] | [standalone]


#4379

FromRobert Klemme <shortcutter@googlemail.com>
Date2011-05-12 10:25 -0500
Message-ID<BANLkTinm-EZ0qMAZMGMCoCuOuj4Zq9uRGw@mail.gmail.com>
In reply to#4374
On Thu, May 12, 2011 at 5:07 PM, Yoann M. <yoann6@gmail.com> wrote:
> You're right, doing the buffer myself does not make it faster. For
> writing 10 millions lines, with an array of strings, one string, and no
> homemade-buffer (code is attached) :
> Buffer array : 11.141s
> Buffer string : 9.748s
> No buffer : 10.344s
>
> Don't you think using more RAM before writing on disk could make the
> process faster ? I thought so, then I'd like to say to File how much RAM
> it can uses to speed things up, because I can use a lot of RAM.

No, more does not help more.  With modern operating systems you never
directly write through to the disk.*  The OS is buffering your writes
anyway.  Even worse: using up much memory in the process to hold the
whole file can make your program slower because of the overhead of
memory allocation.  In the worst case your program is paged to disk.
Don't worry too much about this.

* Note there are some circumstances where you write directly to disk
(or rather, the write operation returns only after the disk
acknowledged the data).  This is sometimes called "direct IO".  This
does make sense in special circumstances only (some RDBMS can do it).

> Attachments:
> http://www.ruby-forum.com/attachment/6191/test_write.rb

You can make your life easier by using Benchmark for this.

require 'benchmark'

Benchmark.bm 20 do |x|
  x.report "a test" do
    ...
  end

  x.report "another test" do
    ..
  end
end

Kind regards

robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

[toc] | [prev] | [next] | [standalone]


#4381

FromMarkus Schirp <mbj@seonic.net>
Date2011-05-12 10:51 -0500
Message-ID<20110512155124.GB8710@mbj>
In reply to#4374
IMHO the primary speed bottleneck is the disk drive itself and the "possible"
File-System fragmentation.

RAM just let the operating system do the writes "as optimal as
possible". The effect of drastically more ram wont be more than 1-5%.

When you use a ramdisk this will differ much ;)

But when you are worried about file persistence you should not do this
*g*

I do not knew any details about your use case, there are other
possiblities:
* writing direkt to the block device, bypassing file systems
* mirror ramdisk writes to other machines for persistence
* ?

On Fri, May 13, 2011 at 12:07:30AM +0900, Yoann M. wrote:
> You're right, doing the buffer myself does not make it faster. For
> writing 10 millions lines, with an array of strings, one string, and no
> homemade-buffer (code is attached) :
> Buffer array : 11.141s
> Buffer string : 9.748s
> No buffer : 10.344s
> 
> Don't you think using more RAM before writing on disk could make the
> process faster ? I thought so, then I'd like to say to File how much RAM
> it can uses to speed things up, because I can use a lot of RAM.
> 
> Regards
> 
> Attachments:
> http://www.ruby-forum.com/attachment/6191/test_write.rb
> 
> 
> -- 
> Posted via http://www.ruby-forum.com/.
> 

-- 
Markus Schirp
Phone:  049 201 / 647 59 63
Mobile: 049 178 / 529 91 42
Web:    www.seonic.net
Email:  info@seonic.net
Seonic IT-Systems GbR
Anton Shatalov & Markus Schirp
Walterhohmannstraße 1
D-45141 Essen

[toc] | [prev] | [next] | [standalone]


#4464

From"Yoann M." <yoann6@gmail.com>
Date2011-05-13 02:21 -0500
Message-ID<f9d385ebe8248d0e4a0b667311f71a4b@ruby-forum.com>
In reply to#4352
Thanks for your answers, I'll let the OS optimize this on its own then 
;-)

-- 
Posted via http://www.ruby-forum.com/.

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.ruby


csiph-web