Path: csiph.com!xmission!news.snarked.org!news.linkpendium.com!news.linkpendium.com!panix!usenet.stanford.edu!not-for-mail
From: Bob Proulx <bob@proulx.com>
Newsgroups: gnu.utils.bug
Subject: Re: shar
Date: Tue, 24 Mar 2020 16:59:25 -0600
Lines: 182
Approved: bug-gnu-utils@gnu.org
Message-ID: <mailman.374.1585090774.3020.bug-gnu-utils@gnu.org>
References: <BY5PR16MB3240F83B2CA04DD1D99F645B84FA0@BY5PR16MB3240.namprd16.prod.outlook.com> <20200320184258524569738@bob.proulx.com> <20200324162319309000129@bob.proulx.com>
NNTP-Posting-Host: lists.gnu.org
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
To: Ben White <ben.white@cohu.com>, bug-gnu-utils@gnu.org
Envelope-to: bug-gnu-utils@gnu.org
Mail-Followup-To: Ben White <ben.white@cohu.com>, bug-gnu-utils@gnu.org
Content-Disposition: inline
In-Reply-To: <20200320184258524569738@bob.proulx.com>
Precedence: list
Xref: csiph.com gnu.utils.bug:2283

Ben White wrote:
> The shar command is uuencoding some of the files in my archive, even
> though I specified -text-files on the command line.  All of the
> files are all .cpp files.  I suspect the reason for this behavior is
> that some files have line length up to 288 characters.

You are correct.  It is a bug in the older shar 4.13.3.

Bob Proulx wrote:
> Could you share with me one of the files that is triggering the
> uuencode behavior?  Since I know it is a cpp file, you could email it
> to me directly and I won't share it further.

> > I am running shar 4.13.3 on Linux CentOS 7.2.  And this is not an
> > urgent matter; please spend only as much time on this question as
> > you can afford to.

I ran your file through 4.13.3 compiled fresh for the test and
recreated the problem you were reporting.  And on latest 4.15.2 the
problem is not there.

I dug into the source and the problem is in the --text-files option
handling.  Which (unfortunately for me) uses autoopts and so I could
not follow the logic.  But the result is that in the working case the
option handling sets the correct value.

  switch (WHICH_OPT_MIXED_UUENCODE) {
  case VALUE_OPT_TEXT_FILES: return 0;
  case VALUE_OPT_UUENCODE:   return 1;
  default: break;

In the working case the WHICH_OPT_MIXED_UUENCODE macro is
VALUE_OPT_TEXT_FILES but in the non-working case the default case is
hit instead.  In the failing case this breaks, then the long line test
for lines over 200 characters causes shar to uuencode.

This has been fixed and in the most recent 4.15.2 the test returns
properly when -T,--text-files is set.

> May I share this [following] on the mailing list?  I am imagining
> that the work flow might generate some interesting discussion.

Ben White wrote:
> Sure, no problem.

Awesome!

> The reason that I use the options no-md5-digest and
> no-character-count is because I am using shar as a tool to do global
> search and replace on multiple source files.  If those two checks
> were enabled, I would be unable to unpack the shar file at the end
> of the editing process.
>
> The reason for no-timestamp is that the Make utility would not know
> that all the source files need to be re-built.

Ah!  That is an interesting use of shar.

> ... I have found a way to do search and replace across multiple
> source files using the vim editor.  That's my new workflow (if I
> understand your use of that term), and that's also why the shar
> behavior is no longer an urgent matter for me.

This was a good bug report.  The problem is definitely a bug in shar
version 4.13.3.  However since it is fixed in the later version 4.15.2
I don't think there is anything more to be done.  (Other than me
wagging my finger at the maintainer for lack of good changelog
information.)

As to your workflow of modifying files globally...

Perhaps I can suggest using find plus sed?  Here is an example.  Let's
say I need to change all occurrences of foo to bar in all files in a
subdirectory.

  find . -type f -name '*.txt' -exec sed --in-place 's/foo/bar/g' {} +

That runs find to locate all of the files.  I like find because it
allows me to quickly create a list of files to operate upon.  The
feature test capability is rich.  And then having learned find it can
be used with other utilities to do very powerful things.  Let me show
with an example from sharutils source itself.

  find . -type f -name '*.c' -exec grep -q UUENCODE {} \; -print
  ./src/uuencode-opts.c
  ./src/shar.c
  ./src/uuencode.c
  ./src/shar-opts.c

Those are the list of files that contain the word UUENCODE in them
anywhere.  Only looking at .c files.  I could make this arbitrarily
complex.  Just generating a contrived example here.  Let's say that in
those files I want to change VALUE_OPT_TEXT_FILES to MYEXAMPLE
instead?  It is a long name.  I don't need to worry about a common
substring match with it.  It won't match things I don't want to
change.  So I would just do it using sed using the --in-place option.
But first I will test without --in-place.

  find . -type f -name '*.c' -exec grep -q UUENCODE {} \; \
    -exec sed -n s/VALUE_OPT_TEXT_FILES/MYEXAMPLE/gp {} +

I continued it for the mail with a backslash newline so that it would
not be a really long line.  But on the command line I would just do
that in one long line.

The find command processes the "little program" written there on the
command line.  It processes it left to right.  What I have written
above is a little control flow that filters and runs sed on some files.

The "." part says to find through the current directory.

The "-type f -name '*.c'" is a little redundant because it will only
match files that are named *.c.  It would be unusual to have a
symlink, pipe, special device node, or whatever as a .c file in a
source code directory.  But not unheard of!  Therefore I play it safe
and look only for files that are named something .c.

The first expression is "-type f".  If false then the file is skipped
and find moves on to the next file.  If true/success then fine tries
the next action which is "-name '*.c'" which if false stops processing
but if true the find continues to the right.  It is a long logical AND
operation left to right evaluating the expression until it results in
a false which stops the logical AND expression.

The "-exec grep -q PATTERN {} \;" part will run grep across the files
so far matched.  Using "\;" here so that files are fed to grep one at
a time.  I included the -q option to grep.  That won't output anything
from grep and we will only use the return code.  If grep matches then
we get an exit code 0 for success.  If grep does not find the pattern
then it exits non-zero for not-success.  The find command notes this
and continues the processing left to right across the line only
continuing if the expression returns true/success.

The next expression is "-exec sed ..." part.  Here I am testing so I
am using sed to only print the lines I tell it to print.  That's the
sed -n option to say don't print by default.  And then I added 'p' to
the substitute command to say print any line that it is performing
that action.  s/.../.../g'p' It's that p flag at the end that is doing
the printing after doing the substitution globally across the line.
The 'g' flag says do the substitution as many times across the line as
there are matched patterns.  Otherwise the default is to do only the
first match and stop.

  $ find . -type f -name '*.c' -exec grep -q UUENCODE {} \; \
      -exec sed -n s/VALUE_OPT_TEXT_FILES/MYEXAMPLE/gp {} +
    case MYEXAMPLE: return false;
    {  /* entry idx, value */ 12, MYEXAMPLE,

Hey!  That looks like my contrived example was correct.  I would
hack on it until it does what I think it should do.  Then I would
carefully undo my testing.


  $ find . -type f -name '*.c' -exec grep -q UUENCODE {} \; \
      -exec sed s/VALUE_OPT_TEXT_FILES/MYEXAMPLE/g {} + \
    | less

Does that still look like what I want it to do?  Don't forget to
remove the -n or the file will be truncated to just the changed lines
and not everything else!  But it is useful for the debugging part.  So
always carefully review the change to see if it is what you want.  If
yes and only after careful review (and possibly with a safety backup!)
I add the sed option to make the change for real by editing the file
in place.

  $ find . -type f -name '*.c' -exec grep -q UUENCODE {} \; \
      -exec sed --in-place s/VALUE_OPT_TEXT_FILES/MYEXAMPLE/g {} + \

And with that one command it can easily make global changes across a
large set of files.

Since you have a Vim technique that you know and understand then I
would keep doing it there.  I myself would use Emacs which has some
powerful ways to do this too.  The best way is almost always the way
you understand.  Because then you can fix it when it goes wrong.  But
find plus utilities is very powerful and a good thing to learn.

Maybe not useful every day.  But a good technique to have in the
toolbox.  Hope that helps!

Bob