Path: csiph.com!3.us.feeder.erje.net!feeder.erje.net!news.snarked.org!news.linkpendium.com!news.linkpendium.com!panix!usenet.stanford.edu!not-for-mail
From: Bob Proulx <bob@proulx.com>
Newsgroups: gnu.bash.bug
Subject: Re: bash sockets: printf \x0a does TCP fragmentation
Date: Sun, 23 Sep 2018 12:29:14 -0600
Lines: 187
Approved: bug-bash@gnu.org
Message-ID: <mailman.1181.1537727363.1284.bug-bash@gnu.org>
References: <20180922231240358868037@bob.proulx.com> <20180922111950901701520@bob.proulx.com> <c6de6616-dda0-570d-de56-419e7676be8a@cbii-hh.de> <20180921231101307758654@bob.proulx.com> <714e1ba0-0052-2f2b-676d-778f2b7129c1@testssl.sh> <7769.1537667711@jinx.noi.kre.to> <24434.1537694402@jinx.noi.kre.to>
NNTP-Posting-Host: lists.gnu.org
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Trace: usenet.stanford.edu 1537727364 19480 208.118.235.17 (23 Sep 2018 18:29:24 GMT)
X-Complaints-To: action@cs.stanford.edu
To: bug-bash@gnu.org
Envelope-to: bug-bash@gnu.org
Mail-Followup-To: bug-bash@gnu.org
Content-Disposition: inline
In-Reply-To: <24434.1537694402@jinx.noi.kre.to>
User-Agent: Mutt/1.10.1 (2018-07-13)
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy]
X-Received-From: 96.88.95.61
X-BeenThere: bug-bash@gnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: Bug reports for the GNU Bourne Again SHell <bug-bash.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/bug-bash>, <mailto:bug-bash-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/bug-bash/>
List-Post: <mailto:bug-bash@gnu.org>
List-Help: <mailto:bug-bash-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/bug-bash>, <mailto:bug-bash-request@gnu.org?subject=subscribe>
Xref: csiph.com gnu.bash.bug:14646

Robert Elz wrote:
> Bob Proulx wrote:
>   | Using the same buffer size
>   | for input and output is usually most efficient.
>=20
> Yes, but as the objective seemed to be to make big packets, that is pro=
bably
> not as important.

The original complaint concerned flushing a data blob content upon
every newline (0x0a) character due to line buffering, write(2)'ing the
buffer up to that point.  As I am sure you already know that will
cause the network stack in the kernel to emit the buffered data up to
that point with whatever has been read up to that point.  Which was
apparently a small'ish amount of data.  And then instead of having
some number of full MTU sized packets there were many more smaller
ones.  It shouldn't have been about big packets, nor fragmentation,
but about streaming efficiency and performance.  Though achieving
correct behavior with more buffer flushes than desired this was
apparently less efficient than they wanted and were therefore
complaining about it.  They wanted the data blob buffered as much as
possible so as to use the fewest number of TCP network packets.  My
choice of a large one meg buffer size was to be larger than any
network MTU size.  My intention was that the network stack would then
split the data blob up into MTU sizes for transmission.  The largest
MTU size that I routinely see is 64k.  I expect that to increase
further in size in the future when 1 meg might not be big enough.  And
I avoid mentioning jumbo frames.

>   |   $ printf -- "%s\n" one two | strace -o /tmp/out -e write,read dd =
status=3Dnone obs=3D1M ; cat /tmp/out
>   |   one
>   |   two
>   |   ...
>   |   read(0, "one\ntwo\n", 512)              =3D 8
>=20
> What is relevant there is that you're getrting both lines from the prin=
tf in=20
> one read.  If that had happened, there would ne no need for any rebuffe=
ring.
> The point of the original complaint was that  that was not ahppening, a=
nd
> the reads were being broken at the \n ... here it might easily make a=20
> difference whether the output is a pipe or a socket (I have no idea.)

I dug into this further and see that we were both right. :-)

I was getting misdirected by the Linux kernel's pipeline buffering.
The pipeline buffering was causing me to think that it did not matter.
But digging deeper I see that it was a race condition timing issue and
could go either way.  That's obviously a mistake on my part.

You are right that depending upon timing this must be handled properly
or it might fail.  I am wrong that it would always work regardless of
timing.  However it was working in my test case which is why I had not
noticed.  Thank you for pushing me to see the problem here.

>   | It can then use the same buffer that data was read into for the out=
put
>   | buffer directly.
>=20
> No, it can't, that's what bs=3D does - you're right, that is most effec=
ient,
> but there is no rebuffering, whatever is read, is written, and in that =
case
> even more effecient is not to interpose dd at all.  The whole point was
> to get the rebuffering.
>=20
> Try tests more like
>=20
> 	{ printf %s\\n aaa; sleep 1; printf %s\\n bbb ; } | dd ....
>=20
> so there will be clearly 2 different writes, and small reads for dd
> (however big the input buffer has) - with obs=3D (somethingbig enough)
> there will be just 1 write, with bs=3D (anything big enough for the who=
le
> output) there will still be two writes.

  $ { command printf "one\n"; command printf "two\n" ;} | strace -v -o /t=
mp/dd.strace.out -e write,read dd status=3Dnone bs=3D1M ; head /tmp/*.str=
ace.out
  one
  two
  ...
  read(0, "one\ntwo\n", 1048576)          =3D 8
  write(1, "one\ntwo\n", 8)               =3D 8
  read(0, "", 1048576)                    =3D 0
  +++ exited with 0 +++

Above the data is definitely written in two different processes but
due to Linux kernel buffering in the pipeline it is read in one read.
The data is written into the pipeline so quickly, before the next
stage of the pipeline could read it out, that by the time the read
eventually happened it was able to read the multiple writes as one
data block.  This is what I had been seeing but you are right that it
is a timing related success and could also be a timing related
failure.

  $ { command printf "one\n"; sleep 1; command printf "two\n" ;} | strace=
 -v -o /tmp/dd.strace.out -e write,read dd status=3Dnone bs=3D1M ; head /=
tmp/*.strace.out
  one
  two
  ...
  read(0, "one\n", 1048576)               =3D 4
  write(1, "one\n", 4)                    =3D 4
  read(0, "two\n", 1048576)               =3D 4
  write(1, "two\n", 4)                    =3D 4
  read(0, "", 1048576)                    =3D 0
  +++ exited with 0 +++

The above illustrates the point you were trying to make.  Thank you
for persevering in educating me as to the issue. :-)

  $ { command printf "one\n"; sleep 1; command printf "two\n" ;} | { slee=
p 2; strace -v -o /tmp/dd.strace.out -e write,read dd status=3Dnone bs=3D=
1M ; head /tmp/*.strace.out ;}
  one
  two
  ...
  read(0, "one\ntwo\n", 1048576)          =3D 8
  write(1, "one\ntwo\n", 8)               =3D 8
  read(0, "", 1048576)                    =3D 0
  +++ exited with 0 +++

The above is just me showing that it is definitely a race condition
problem that can go either way.  But obviously race conditions are
timing bugs and should never be counted upon always working one way or
the other.  Just showing why I got sucked into it.  :-(

  $ { command printf "one\n"; sleep 1; command printf "two\n" ;} | strace=
 -v -o /tmp/dd.strace.out -e write,read dd status=3Dnone obs=3D1M ; head =
/tmp/*.strace.out
  one
  two
  ...
  read(0, "one\n", 512)                   =3D 4
  read(0, "two\n", 512)                   =3D 4
  read(0, "", 512)                        =3D 0
  write(1, "one\ntwo\n", 8)               =3D 8
  +++ exited with 0 +++

And the above using a large output block size, as you suggest, shows
the solution where dd is re-blocking the output.

  $ { command printf "one\n"; sleep 1; command printf "two\n" ;} | strace=
 -v -o /tmp/dd.strace.out -e write,read dd status=3Dnone ibs=3D1M obs=3D1=
M ; head /tmp/*.strace.out
  one
  two
  ...
  read(0, "one\n", 1048576)               =3D 4
  read(0, "two\n", 1048576)               =3D 4
  read(0, "", 1048576)                    =3D 0
  write(1, "one\ntwo\n", 8)               =3D 8
  +++ exited with 0 +++

And just for completeness I will show the above with both a large
input buffer and a large output buffer of the same size and show that
result too.  The required dd option, as you correctly insisted, really
is obs=3D in order to set the output block size.  I stand corrected. :-)

I had missed the documented dd behavior:

  =E2=80=98bs=3DBYTES=E2=80=99
     Set both input and output block sizes to BYTES.  This makes =E2=80=98=
dd=E2=80=99
     read and write BYTES per block, overriding any =E2=80=98ibs=E2=80=99=
 and =E2=80=98obs=E2=80=99
     settings.  In addition, if no data-transforming =E2=80=98conv=E2=80=99=
 option is
     specified, input is copied to the output as soon as it=E2=80=99s rea=
d, even
     if it is smaller than the block size.

It is always good to learn something new about fundamental behavior in
a command one has been using for some decades! :-)

> ps: this is not really the correct place to discuss dd.

The help-bash list would be better generally for random shell stuff
but the discussion started here in this bug thread and this part of
the discussion is topical to the solution for it.  This is the right
place for this.

Bob