Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #11777 > unrolled thread

How do I scp extremely large files

Started byMike <mikesmith813@gmail.com>
First post2012-02-04 18:03 -0800
Last post2012-02-08 22:37 -0800
Articles 16 — 7 participants

Back to article view | Back to comp.lang.java.programmer


Contents

  How do I scp extremely large files Mike <mikesmith813@gmail.com> - 2012-02-04 18:03 -0800
    Re: How do I scp extremely large files Robert Klemme <shortcutter@googlemail.com> - 2012-02-05 23:54 -0800
      Re: How do I scp extremely large files Mike <mikesmith813@gmail.com> - 2012-02-06 12:54 -0800
        Re: How do I scp extremely large files Lew <lewbloch@gmail.com> - 2012-02-06 17:21 -0800
          Re: How do I scp extremely large files Mike <mikesmith813@gmail.com> - 2012-02-08 16:29 -0800
            Re: How do I scp extremely large files Daniel Pitts <newsgroup.nospam@virtualinfinity.net> - 2012-02-08 21:19 -0800
        Re: How do I scp extremely large files Eric Sosman <esosman@ieee-dot-org.invalid> - 2012-02-07 08:12 -0500
          Re: How do I scp extremely large files Arne Vajhøj <arne@vajhoej.dk> - 2012-02-07 17:02 -0500
            Re: How do I scp extremely large files Eric Sosman <esosman@ieee-dot-org.invalid> - 2012-02-07 20:47 -0500
              Re: How do I scp extremely large files Arne Vajhøj <arne@vajhoej.dk> - 2012-02-07 20:56 -0500
        Re: How do I scp extremely large files Robert Klemme <shortcutter@googlemail.com> - 2012-02-07 22:50 +0100
        Re: How do I scp extremely large files Arne Vajhøj <arne@vajhoej.dk> - 2012-02-07 17:12 -0500
    Re: How do I scp extremely large files Arne Vajhøj <arne@vajhoej.dk> - 2012-02-07 17:09 -0500
    Re: How do I scp extremely large files Arne Vajhøj <arne@vajhoej.dk> - 2012-02-07 17:10 -0500
    Re: How do I scp extremely large files Roedy Green <see_website@mindprod.com.invalid> - 2012-02-08 22:33 -0800
    Re: How do I scp extremely large files Roedy Green <see_website@mindprod.com.invalid> - 2012-02-08 22:37 -0800

#11777 — How do I scp extremely large files

FromMike <mikesmith813@gmail.com>
Date2012-02-04 18:03 -0800
SubjectHow do I scp extremely large files
Message-ID<a7abf79b-a373-4f52-9c1c-6c24d509fbcb@o13g2000vbf.googlegroups.com>
I need to copy extremely large files (30-100G) from a remove server to
a machine where my code will be running.  I have code in place now
that uses an sftp connection to scan the file directory watching for
files to show up for me to copy.  My question is how do I read in
chunks of these large files and write them out in chunks?  I cannot
hold the bytes in memory obviously.  Any help or pseudo code is
greatly appreaciated!

[toc] | [next] | [standalone]


#11788

FromRobert Klemme <shortcutter@googlemail.com>
Date2012-02-05 23:54 -0800
Message-ID<7f3940d0-cc97-466d-87bf-7a420e53df5c@gi10g2000vbb.googlegroups.com>
In reply to#11777
On 5 Feb., 03:03, Mike <mikesmith...@gmail.com> wrote:
> I need to copy extremely large files (30-100G) from a remove server to
> a machine where my code will be running.  I have code in place now
> that uses an sftp connection to scan the file directory watching for
> files to show up for me to copy.  My question is how do I read in
> chunks of these large files and write them out in chunks?  I cannot
> hold the bytes in memory obviously.  Any help or pseudo code is
> greatly appreaciated!

I am not sure I understand your question properly.  Any Java library
which implements sftp or scp's protocols will have a means to copy
remote files or at least open remote files and obtain an InputStream
or Channel, from which you can read in chunks and store data locally.

If you need to do the transfer on a regularly basis maybe rsync is
better than cooking your own version of it.

Kind regards

robert

[toc] | [prev] | [next] | [standalone]


#11794

FromMike <mikesmith813@gmail.com>
Date2012-02-06 12:54 -0800
Message-ID<c3594e5c-4f7d-4964-ba02-f59ed0145765@n12g2000yqb.googlegroups.com>
In reply to#11788
On Feb 6, 2:54 am, Robert Klemme <shortcut...@googlemail.com> wrote:
> I am not sure I understand your question properly.  Any Java library
> which implements sftp or scp's protocols will have a means to copy
> remote files or at least open remote files and obtain an InputStream
> or Channel, from which you can read in chunks and store data locally.
>
> If you need to do the transfer on a regularly basis maybe rsync is
> better than cooking your own version of it.
>
> Kind regards
>
> robert

Thanks for the response Robert. rsync is not an option.  I cannot use
any outside software, open-source or otherwise.  I believe you stated
just what I need to do... obtain a channel to read chunks from and
store locally.  I'm just a little fuzzy on the details.  Haven't done
much with channels or inputStreams for that matter.

[toc] | [prev] | [next] | [standalone]


#11797

FromLew <lewbloch@gmail.com>
Date2012-02-06 17:21 -0800
Message-ID<851500.10.1328577714941.JavaMail.geo-discussion-forums@pbboj1>
In reply to#11794
Mike wrote:
> Thanks for the response Robert. rsync is not an option.  I cannot use
> any outside software, open-source or otherwise.  I believe you stated
> just what I need to do... obtain a channel to read chunks from and
> store locally.  I'm just a little fuzzy on the details.  Haven't done
> much with channels or inputStreams for that matter.

I gather Java doesn't count as "outside software".

You can read these to get a start with streams, readers, channels and all that:

<http://docs.oracle.com/javase/tutorial/essential/io/index.html>
<http://docs.oracle.com/javase/tutorial/networking/sockets/index.html>

the tutorials being a great place to start
<http://docs.oracle.com/javase/tutorial/reallybigindex.html>
if you just need the basics.

Oracle has a ton more Java documentation besides the tutorials, of course, as 
does IBM Developerworks.

-- 
Lew

[toc] | [prev] | [next] | [standalone]


#11861

FromMike <mikesmith813@gmail.com>
Date2012-02-08 16:29 -0800
Message-ID<9a3935b6-2f0f-48b8-a065-a485e013bec6@k6g2000vbz.googlegroups.com>
In reply to#11797
On Feb 6, 8:21 pm, Lew <lewbl...@gmail.com> wrote:
> Mike wrote:
> > Thanks for the response Robert. rsync is not an option.  I cannot use
> > any outside software, open-source or otherwise.  I believe you stated
> > just what I need to do... obtain a channel to read chunks from and
> > store locally.  I'm just a little fuzzy on the details.  Haven't done
> > much with channels or inputStreams for that matter.
>
> I gather Java doesn't count as "outside software".
>
> You can read these to get a start with streams, readers, channels and all that:
>
> <http://docs.oracle.com/javase/tutorial/essential/io/index.html>
> <http://docs.oracle.com/javase/tutorial/networking/sockets/index.html>
>
> the tutorials being a great place to start
> <http://docs.oracle.com/javase/tutorial/reallybigindex.html>
> if you just need the basics.
>
> Oracle has a ton more Java documentation besides the tutorials, of course, as
> does IBM Developerworks.
>
> --
> Lew

Thanks for the advice Lew.  You got me on the right track and I
accomplished what I needed.  You were correct, Java was obviously
allowed just not 3rd party libraries.  Sorry for the confusion I added
to this thread.

[toc] | [prev] | [next] | [standalone]


#11868

FromDaniel Pitts <newsgroup.nospam@virtualinfinity.net>
Date2012-02-08 21:19 -0800
Message-ID<AHIYq.1272$Qb4.564@newsfe21.iad>
In reply to#11861
On 2/8/12 4:29 PM, Mike wrote:
> On Feb 6, 8:21 pm, Lew<lewbl...@gmail.com>  wrote:
>> Mike wrote:
>>> Thanks for the response Robert. rsync is not an option.  I cannot use
>>> any outside software, open-source or otherwise.  I believe you stated
>>> just what I need to do... obtain a channel to read chunks from and
>>> store locally.  I'm just a little fuzzy on the details.  Haven't done
>>> much with channels or inputStreams for that matter.
>>
>> I gather Java doesn't count as "outside software".
>>
>> You can read these to get a start with streams, readers, channels and all that:
>>
>> <http://docs.oracle.com/javase/tutorial/essential/io/index.html>
>> <http://docs.oracle.com/javase/tutorial/networking/sockets/index.html>
>>
>> the tutorials being a great place to start
>> <http://docs.oracle.com/javase/tutorial/reallybigindex.html>
>> if you just need the basics.
>>
>> Oracle has a ton more Java documentation besides the tutorials, of course, as
>> does IBM Developerworks.
>>
>> --
>> Lew
>
> Thanks for the advice Lew.  You got me on the right track and I
> accomplished what I needed.  You were correct, Java was obviously
> allowed just not 3rd party libraries.  Sorry for the confusion I added
> to this thread.
rsync is not a library, but a full, well supported, mature program. It 
is used in many production environments for many different purposes. 
Hopefully you're not overlooking it because of political nonsense.

[toc] | [prev] | [next] | [standalone]


#11812

FromEric Sosman <esosman@ieee-dot-org.invalid>
Date2012-02-07 08:12 -0500
Message-ID<jgr808$cn9$2@dont-email.me>
In reply to#11794
On 2/6/2012 3:54 PM, Mike wrote:
>[...]
> Thanks for the response Robert. rsync is not an option.  I cannot use
> any outside software, open-source or otherwise.  [...]

     How much time have you allotted for writing your own JVM,
Java compiler, operating system, and BIOS?

-- 
Eric Sosman
esosman@ieee-dot-org.invalid

[toc] | [prev] | [next] | [standalone]


#11831

FromArne Vajhøj <arne@vajhoej.dk>
Date2012-02-07 17:02 -0500
Message-ID<4f319f83$0$294$14726298@news.sunsite.dk>
In reply to#11812
On 2/7/2012 8:12 AM, Eric Sosman wrote:
> On 2/6/2012 3:54 PM, Mike wrote:
>> [...]
>> Thanks for the response Robert. rsync is not an option. I cannot use
>> any outside software, open-source or otherwise. [...]
>
> How much time have you allotted for writing your own JVM,
> Java compiler, operating system, and BIOS?

:-)

Arne

[toc] | [prev] | [next] | [standalone]


#11840

FromEric Sosman <esosman@ieee-dot-org.invalid>
Date2012-02-07 20:47 -0500
Message-ID<jgsk74$qbh$1@dont-email.me>
In reply to#11831
On 2/7/2012 5:02 PM, Arne Vajhøj wrote:
> On 2/7/2012 8:12 AM, Eric Sosman wrote:
>> On 2/6/2012 3:54 PM, Mike wrote:
>>> [...]
>>> Thanks for the response Robert. rsync is not an option. I cannot use
>>> any outside software, open-source or otherwise. [...]
>>
>> How much time have you allotted for writing your own JVM,
>> Java compiler, operating system, and BIOS?
>
> :-)

     Okay, my question was rather tongue-in-cheek.  But the issue it's
intended to raise is the matter of "trust" in software: If the O.P. is
willing to trust an externally-supplied JVM, Java compiler, operating
system, and BIOS, the ban on "outside software" is obviously not
absolute.  If it's not absolute, there must be some procedure,
somewhere, that declares "THIS outside software is acceptable; THAT
outside software is not."  And (here's the exciting conclusion) if the
allowed/forbidden procedure bans rsync and its sixteen years of
development history in favor of a not-yet-written, not-yet-debugged,
not-even-designed half-hearted home-grown imitation, ...  Well, the
procedure may have found an anatomically unlikely place to put its head.

-- 
Eric Sosman
esosman@ieee-dot-org.invalid

[toc] | [prev] | [next] | [standalone]


#11841

FromArne Vajhøj <arne@vajhoej.dk>
Date2012-02-07 20:56 -0500
Message-ID<4f31d637$0$293$14726298@news.sunsite.dk>
In reply to#11840
On 2/7/2012 8:47 PM, Eric Sosman wrote:
> On 2/7/2012 5:02 PM, Arne Vajhøj wrote:
>> On 2/7/2012 8:12 AM, Eric Sosman wrote:
>>> On 2/6/2012 3:54 PM, Mike wrote:
>>>> [...]
>>>> Thanks for the response Robert. rsync is not an option. I cannot use
>>>> any outside software, open-source or otherwise. [...]
>>>
>>> How much time have you allotted for writing your own JVM,
>>> Java compiler, operating system, and BIOS?
>>
>> :-)
>
> Okay, my question was rather tongue-in-cheek. But the issue it's
> intended to raise is the matter of "trust" in software: If the O.P. is
> willing to trust an externally-supplied JVM, Java compiler, operating
> system, and BIOS, the ban on "outside software" is obviously not
> absolute. If it's not absolute, there must be some procedure,
> somewhere, that declares "THIS outside software is acceptable; THAT
> outside software is not." And (here's the exciting conclusion) if the
> allowed/forbidden procedure bans rsync and its sixteen years of
> development history in favor of a not-yet-written, not-yet-debugged,
> not-even-designed half-hearted home-grown imitation, ... Well, the
> procedure may have found an anatomically unlikely place to put its head.

I agree.

But weird rules are sometimes seen.

Arne

[toc] | [prev] | [next] | [standalone]


#11830

FromRobert Klemme <shortcutter@googlemail.com>
Date2012-02-07 22:50 +0100
Message-ID<9pdklmF1pmU2@mid.individual.net>
In reply to#11794
On 02/06/2012 09:54 PM, Mike wrote:
> On Feb 6, 2:54 am, Robert Klemme<shortcut...@googlemail.com>  wrote:
>> I am not sure I understand your question properly.  Any Java library
>> which implements sftp or scp's protocols will have a means to copy
>> remote files or at least open remote files and obtain an InputStream
>> or Channel, from which you can read in chunks and store data locally.
>>
>> If you need to do the transfer on a regularly basis maybe rsync is
>> better than cooking your own version of it.

> Thanks for the response Robert. rsync is not an option.

Why?

>  I cannot use
> any outside software, open-source or otherwise.

Why aren't you able to use "outside software"?  Or did you mean you are 
not allowed to?

> I believe you stated
> just what I need to do... obtain a channel to read chunks from and
> store locally.  I'm just a little fuzzy on the details.  Haven't done
> much with channels or inputStreams for that matter.

IO is pretty basic - in any language.  I am surprised you haven't been 
exposed to it yet.  For the details please follow Lew's advice / links.

Kind regards

	robert

[toc] | [prev] | [next] | [standalone]


#11834

FromArne Vajhøj <arne@vajhoej.dk>
Date2012-02-07 17:12 -0500
Message-ID<4f31a1e7$0$294$14726298@news.sunsite.dk>
In reply to#11794
On 2/6/2012 3:54 PM, Mike wrote:
> On Feb 6, 2:54 am, Robert Klemme<shortcut...@googlemail.com>  wrote:
>> I am not sure I understand your question properly.  Any Java library
>> which implements sftp or scp's protocols will have a means to copy
>> remote files or at least open remote files and obtain an InputStream
>> or Channel, from which you can read in chunks and store data locally.
>>
>> If you need to do the transfer on a regularly basis maybe rsync is
>> better than cooking your own version of it.
>
> Thanks for the response Robert. rsync is not an option.  I cannot use
> any outside software, open-source or otherwise.  I believe you stated
 > just what I need to do... obtain a channel to read chunks from and
 > store locally.  I'm just a little fuzzy on the details.  Haven't done
 > much with channels or inputStreams for that matter.

scp and sftp are non trivial to implement.

If reading chunks is a problem, then you will not be
able to implement them.

You need to get whoever made that rule give you
permission to use a library that support one of
those protocols.

Arne


[toc] | [prev] | [next] | [standalone]


#11832

FromArne Vajhøj <arne@vajhoej.dk>
Date2012-02-07 17:09 -0500
Message-ID<4f31a115$0$294$14726298@news.sunsite.dk>
In reply to#11777
On 2/4/2012 9:03 PM, Mike wrote:
> I need to copy extremely large files (30-100G) from a remove server to
> a machine where my code will be running.  I have code in place now
> that uses an sftp connection to scan the file directory watching for
> files to show up for me to copy.  My question is how do I read in
> chunks of these large files and write them out in chunks?  I cannot
> hold the bytes in memory obviously.  Any help or pseudo code is
> greatly appreaciated!

I would assume your SFTP code has some method that allows
reading of data as a stream which will allow you to write as
a stream too.

Arne

[toc] | [prev] | [next] | [standalone]


#11833

FromArne Vajhøj <arne@vajhoej.dk>
Date2012-02-07 17:10 -0500
Message-ID<4f31a168$0$294$14726298@news.sunsite.dk>
In reply to#11777
On 2/4/2012 9:03 PM, Mike wrote:
> I need to copy extremely large files (30-100G) from a remove server to
> a machine where my code will be running.  I have code in place now
> that uses an sftp connection to scan the file directory watching for
> files to show up for me to copy.  My question is how do I read in
> chunks of these large files and write them out in chunks?  I cannot
> hold the bytes in memory obviously.  Any help or pseudo code is
> greatly appreaciated!

Note that scp and sftp is not the same, so your first decision
is whether you want to use scp or sftp.

Arne

[toc] | [prev] | [next] | [standalone]


#11871

FromRoedy Green <see_website@mindprod.com.invalid>
Date2012-02-08 22:33 -0800
Message-ID<f2q6j7dgtg46gudoj95c69m9plmnagjfce@4ax.com>
In reply to#11777
On Sat, 4 Feb 2012 18:03:22 -0800 (PST), Mike <mikesmith813@gmail.com>
wrote, quoted or indirectly quoted someone who said :

>I need to copy extremely large files (30-100G) from a remove server to
>a machine where my code will be running. 

I know you have legal requirement not to use "outside" code. 

Perhaps a VPN or WebDav would pass.

http://mindprod.com/jgloss/vpn.html
http://mindprod.com/jgloss/webdav.html
-- 
Roedy Green Canadian Mind Products
http://mindprod.com
One of the most useful comments you can put in a program is 
"If you change this, remember to change ?XXX? too".
 

[toc] | [prev] | [next] | [standalone]


#11872

FromRoedy Green <see_website@mindprod.com.invalid>
Date2012-02-08 22:37 -0800
Message-ID<76q6j7laa716ckt0cklok9d7n1fbiecmeo@4ax.com>
In reply to#11777
On Sat, 4 Feb 2012 18:03:22 -0800 (PST), Mike <mikesmith813@gmail.com>
wrote, quoted or indirectly quoted someone who said :

>I need to copy extremely large files (30-100G) from a remove server to
>a machine where my code will be running.  I have code in place now
>that uses an sftp connection to scan the file directory watching for
>files to show up for me to copy.  My question is how do I read in
>chunks of these large files and write them out in chunks?  I cannot
>hold the bytes in memory obviously.  Any help or pseudo code is
>greatly appreaciated!

If your roll your own system, you might want to consider bundling the
small files  and zipping as the Replicator does.  You would have to
run a script on the server to unpack them. The Replicator only uploads
files that have changed.  See
http://mindprod.com/webstart/replicator.html

If the files have only minor changes, you might consider hosting a
version control system on the server.  The server can then check out
files that have changed.  You get the advantage of atomic updates,
something you will not get with an FTP system.  You also can check in
files from many sources directly to the server.

-- 
Roedy Green Canadian Mind Products
http://mindprod.com
One of the most useful comments you can put in a program is 
"If you change this, remember to change ?XXX? too".
 

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.java.programmer


csiph-web