Path: csiph.com!3.us.feeder.erje.net!feeder.erje.net!news.snarked.org!news.linkpendium.com!news.linkpendium.com!panix!usenet.stanford.edu!not-for-mail From: Bob Proulx Newsgroups: gnu.bash.bug Subject: Re: bash sockets: printf \x0a does TCP fragmentation Date: Sun, 23 Sep 2018 12:29:14 -0600 Lines: 187 Approved: bug-bash@gnu.org Message-ID: References: <20180922231240358868037@bob.proulx.com> <20180922111950901701520@bob.proulx.com> <20180921231101307758654@bob.proulx.com> <714e1ba0-0052-2f2b-676d-778f2b7129c1@testssl.sh> <7769.1537667711@jinx.noi.kre.to> <24434.1537694402@jinx.noi.kre.to> NNTP-Posting-Host: lists.gnu.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Trace: usenet.stanford.edu 1537727364 19480 208.118.235.17 (23 Sep 2018 18:29:24 GMT) X-Complaints-To: action@cs.stanford.edu To: bug-bash@gnu.org Envelope-to: bug-bash@gnu.org Mail-Followup-To: bug-bash@gnu.org Content-Disposition: inline In-Reply-To: <24434.1537694402@jinx.noi.kre.to> User-Agent: Mutt/1.10.1 (2018-07-13) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 96.88.95.61 X-BeenThere: bug-bash@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Bug reports for the GNU Bourne Again SHell List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Xref: csiph.com gnu.bash.bug:14646 Robert Elz wrote: > Bob Proulx wrote: > | Using the same buffer size > | for input and output is usually most efficient. >=20 > Yes, but as the objective seemed to be to make big packets, that is pro= bably > not as important. The original complaint concerned flushing a data blob content upon every newline (0x0a) character due to line buffering, write(2)'ing the buffer up to that point. As I am sure you already know that will cause the network stack in the kernel to emit the buffered data up to that point with whatever has been read up to that point. Which was apparently a small'ish amount of data. And then instead of having some number of full MTU sized packets there were many more smaller ones. It shouldn't have been about big packets, nor fragmentation, but about streaming efficiency and performance. Though achieving correct behavior with more buffer flushes than desired this was apparently less efficient than they wanted and were therefore complaining about it. They wanted the data blob buffered as much as possible so as to use the fewest number of TCP network packets. My choice of a large one meg buffer size was to be larger than any network MTU size. My intention was that the network stack would then split the data blob up into MTU sizes for transmission. The largest MTU size that I routinely see is 64k. I expect that to increase further in size in the future when 1 meg might not be big enough. And I avoid mentioning jumbo frames. > | $ printf -- "%s\n" one two | strace -o /tmp/out -e write,read dd = status=3Dnone obs=3D1M ; cat /tmp/out > | one > | two > | ... > | read(0, "one\ntwo\n", 512) =3D 8 >=20 > What is relevant there is that you're getrting both lines from the prin= tf in=20 > one read. If that had happened, there would ne no need for any rebuffe= ring. > The point of the original complaint was that that was not ahppening, a= nd > the reads were being broken at the \n ... here it might easily make a=20 > difference whether the output is a pipe or a socket (I have no idea.) I dug into this further and see that we were both right. :-) I was getting misdirected by the Linux kernel's pipeline buffering. The pipeline buffering was causing me to think that it did not matter. But digging deeper I see that it was a race condition timing issue and could go either way. That's obviously a mistake on my part. You are right that depending upon timing this must be handled properly or it might fail. I am wrong that it would always work regardless of timing. However it was working in my test case which is why I had not noticed. Thank you for pushing me to see the problem here. > | It can then use the same buffer that data was read into for the out= put > | buffer directly. >=20 > No, it can't, that's what bs=3D does - you're right, that is most effec= ient, > but there is no rebuffering, whatever is read, is written, and in that = case > even more effecient is not to interpose dd at all. The whole point was > to get the rebuffering. >=20 > Try tests more like >=20 > { printf %s\\n aaa; sleep 1; printf %s\\n bbb ; } | dd .... >=20 > so there will be clearly 2 different writes, and small reads for dd > (however big the input buffer has) - with obs=3D (somethingbig enough) > there will be just 1 write, with bs=3D (anything big enough for the who= le > output) there will still be two writes. $ { command printf "one\n"; command printf "two\n" ;} | strace -v -o /t= mp/dd.strace.out -e write,read dd status=3Dnone bs=3D1M ; head /tmp/*.str= ace.out one two ... read(0, "one\ntwo\n", 1048576) =3D 8 write(1, "one\ntwo\n", 8) =3D 8 read(0, "", 1048576) =3D 0 +++ exited with 0 +++ Above the data is definitely written in two different processes but due to Linux kernel buffering in the pipeline it is read in one read. The data is written into the pipeline so quickly, before the next stage of the pipeline could read it out, that by the time the read eventually happened it was able to read the multiple writes as one data block. This is what I had been seeing but you are right that it is a timing related success and could also be a timing related failure. $ { command printf "one\n"; sleep 1; command printf "two\n" ;} | strace= -v -o /tmp/dd.strace.out -e write,read dd status=3Dnone bs=3D1M ; head /= tmp/*.strace.out one two ... read(0, "one\n", 1048576) =3D 4 write(1, "one\n", 4) =3D 4 read(0, "two\n", 1048576) =3D 4 write(1, "two\n", 4) =3D 4 read(0, "", 1048576) =3D 0 +++ exited with 0 +++ The above illustrates the point you were trying to make. Thank you for persevering in educating me as to the issue. :-) $ { command printf "one\n"; sleep 1; command printf "two\n" ;} | { slee= p 2; strace -v -o /tmp/dd.strace.out -e write,read dd status=3Dnone bs=3D= 1M ; head /tmp/*.strace.out ;} one two ... read(0, "one\ntwo\n", 1048576) =3D 8 write(1, "one\ntwo\n", 8) =3D 8 read(0, "", 1048576) =3D 0 +++ exited with 0 +++ The above is just me showing that it is definitely a race condition problem that can go either way. But obviously race conditions are timing bugs and should never be counted upon always working one way or the other. Just showing why I got sucked into it. :-( $ { command printf "one\n"; sleep 1; command printf "two\n" ;} | strace= -v -o /tmp/dd.strace.out -e write,read dd status=3Dnone obs=3D1M ; head = /tmp/*.strace.out one two ... read(0, "one\n", 512) =3D 4 read(0, "two\n", 512) =3D 4 read(0, "", 512) =3D 0 write(1, "one\ntwo\n", 8) =3D 8 +++ exited with 0 +++ And the above using a large output block size, as you suggest, shows the solution where dd is re-blocking the output. $ { command printf "one\n"; sleep 1; command printf "two\n" ;} | strace= -v -o /tmp/dd.strace.out -e write,read dd status=3Dnone ibs=3D1M obs=3D1= M ; head /tmp/*.strace.out one two ... read(0, "one\n", 1048576) =3D 4 read(0, "two\n", 1048576) =3D 4 read(0, "", 1048576) =3D 0 write(1, "one\ntwo\n", 8) =3D 8 +++ exited with 0 +++ And just for completeness I will show the above with both a large input buffer and a large output buffer of the same size and show that result too. The required dd option, as you correctly insisted, really is obs=3D in order to set the output block size. I stand corrected. :-) I had missed the documented dd behavior: =E2=80=98bs=3DBYTES=E2=80=99 Set both input and output block sizes to BYTES. This makes =E2=80=98= dd=E2=80=99 read and write BYTES per block, overriding any =E2=80=98ibs=E2=80=99= and =E2=80=98obs=E2=80=99 settings. In addition, if no data-transforming =E2=80=98conv=E2=80=99= option is specified, input is copied to the output as soon as it=E2=80=99s rea= d, even if it is smaller than the block size. It is always good to learn something new about fundamental behavior in a command one has been using for some decades! :-) > ps: this is not really the correct place to discuss dd. The help-bash list would be better generally for random shell stuff but the discussion started here in this bug thread and this part of the discussion is topical to the solution for it. This is the right place for this. Bob