Path: csiph.com!xmission!news.snarked.org!news.linkpendium.com!news.linkpendium.com!panix!usenet.stanford.edu!not-for-mail From: Bob Proulx Newsgroups: gnu.utils.bug Subject: Re: shar Date: Tue, 24 Mar 2020 16:59:25 -0600 Lines: 182 Approved: bug-gnu-utils@gnu.org Message-ID: References: <20200320184258524569738@bob.proulx.com> <20200324162319309000129@bob.proulx.com> NNTP-Posting-Host: lists.gnu.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: usenet.stanford.edu 1585090774 29680 209.51.188.17 (24 Mar 2020 22:59:34 GMT) X-Complaints-To: action@cs.stanford.edu To: Ben White , bug-gnu-utils@gnu.org Envelope-to: bug-gnu-utils@gnu.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=proulx.com; s=dkim2048; t=1585090766; bh=wtrz02EGFtT44kx1gnlDk9P9DgJWugHref1ddxfpBlA=; h=Date:From:To:Subject:References:In-Reply-To:From; b=Nef0Zd7JkUdWIOqydZFEvca6P+X/R9+op2VCUvnMWP27sHhqy/ezv9Ni3/T7Qio4A lU8v5TiCOc7pZbs6oplsC4GmIcHFaIe2eeBCa4K3CuWU6CwFZmLunIGqNHp+d4vC7J WFXVywNwayg/ggsmGGlynYlmbDneGCcNXn+KOE0vw8KCoAY4JAQyqQf25VfvX7z+zN EJoIgKamenLs4Jue78I3CfpLTGI81tKG9SWiRjHx6dMU/RBxgk8NKTFlDnyhRFFRUO 3+8HpywSiloPG/qk2cWjgj1XTe5jpiFSO9heaKu+4gz+l/R6xO79CK7c5bciSA8d9d kUzS9qKEBNgCA== Mail-Followup-To: Ben White , bug-gnu-utils@gnu.org Content-Disposition: inline In-Reply-To: <20200320184258524569738@bob.proulx.com> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 96.88.95.61 X-BeenThere: bug-gnu-utils@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Bug reports for the GNU utilities List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Mailman-Original-Message-ID: <20200324162319309000129@bob.proulx.com> X-Mailman-Original-References: <20200320184258524569738@bob.proulx.com> Xref: csiph.com gnu.utils.bug:2283 Ben White wrote: > The shar command is uuencoding some of the files in my archive, even > though I specified -text-files on the command line. All of the > files are all .cpp files. I suspect the reason for this behavior is > that some files have line length up to 288 characters. You are correct. It is a bug in the older shar 4.13.3. Bob Proulx wrote: > Could you share with me one of the files that is triggering the > uuencode behavior? Since I know it is a cpp file, you could email it > to me directly and I won't share it further. > > I am running shar 4.13.3 on Linux CentOS 7.2. And this is not an > > urgent matter; please spend only as much time on this question as > > you can afford to. I ran your file through 4.13.3 compiled fresh for the test and recreated the problem you were reporting. And on latest 4.15.2 the problem is not there. I dug into the source and the problem is in the --text-files option handling. Which (unfortunately for me) uses autoopts and so I could not follow the logic. But the result is that in the working case the option handling sets the correct value. switch (WHICH_OPT_MIXED_UUENCODE) { case VALUE_OPT_TEXT_FILES: return 0; case VALUE_OPT_UUENCODE: return 1; default: break; In the working case the WHICH_OPT_MIXED_UUENCODE macro is VALUE_OPT_TEXT_FILES but in the non-working case the default case is hit instead. In the failing case this breaks, then the long line test for lines over 200 characters causes shar to uuencode. This has been fixed and in the most recent 4.15.2 the test returns properly when -T,--text-files is set. > May I share this [following] on the mailing list? I am imagining > that the work flow might generate some interesting discussion. Ben White wrote: > Sure, no problem. Awesome! > The reason that I use the options no-md5-digest and > no-character-count is because I am using shar as a tool to do global > search and replace on multiple source files. If those two checks > were enabled, I would be unable to unpack the shar file at the end > of the editing process. > > The reason for no-timestamp is that the Make utility would not know > that all the source files need to be re-built. Ah! That is an interesting use of shar. > ... I have found a way to do search and replace across multiple > source files using the vim editor. That's my new workflow (if I > understand your use of that term), and that's also why the shar > behavior is no longer an urgent matter for me. This was a good bug report. The problem is definitely a bug in shar version 4.13.3. However since it is fixed in the later version 4.15.2 I don't think there is anything more to be done. (Other than me wagging my finger at the maintainer for lack of good changelog information.) As to your workflow of modifying files globally... Perhaps I can suggest using find plus sed? Here is an example. Let's say I need to change all occurrences of foo to bar in all files in a subdirectory. find . -type f -name '*.txt' -exec sed --in-place 's/foo/bar/g' {} + That runs find to locate all of the files. I like find because it allows me to quickly create a list of files to operate upon. The feature test capability is rich. And then having learned find it can be used with other utilities to do very powerful things. Let me show with an example from sharutils source itself. find . -type f -name '*.c' -exec grep -q UUENCODE {} \; -print ./src/uuencode-opts.c ./src/shar.c ./src/uuencode.c ./src/shar-opts.c Those are the list of files that contain the word UUENCODE in them anywhere. Only looking at .c files. I could make this arbitrarily complex. Just generating a contrived example here. Let's say that in those files I want to change VALUE_OPT_TEXT_FILES to MYEXAMPLE instead? It is a long name. I don't need to worry about a common substring match with it. It won't match things I don't want to change. So I would just do it using sed using the --in-place option. But first I will test without --in-place. find . -type f -name '*.c' -exec grep -q UUENCODE {} \; \ -exec sed -n s/VALUE_OPT_TEXT_FILES/MYEXAMPLE/gp {} + I continued it for the mail with a backslash newline so that it would not be a really long line. But on the command line I would just do that in one long line. The find command processes the "little program" written there on the command line. It processes it left to right. What I have written above is a little control flow that filters and runs sed on some files. The "." part says to find through the current directory. The "-type f -name '*.c'" is a little redundant because it will only match files that are named *.c. It would be unusual to have a symlink, pipe, special device node, or whatever as a .c file in a source code directory. But not unheard of! Therefore I play it safe and look only for files that are named something .c. The first expression is "-type f". If false then the file is skipped and find moves on to the next file. If true/success then fine tries the next action which is "-name '*.c'" which if false stops processing but if true the find continues to the right. It is a long logical AND operation left to right evaluating the expression until it results in a false which stops the logical AND expression. The "-exec grep -q PATTERN {} \;" part will run grep across the files so far matched. Using "\;" here so that files are fed to grep one at a time. I included the -q option to grep. That won't output anything from grep and we will only use the return code. If grep matches then we get an exit code 0 for success. If grep does not find the pattern then it exits non-zero for not-success. The find command notes this and continues the processing left to right across the line only continuing if the expression returns true/success. The next expression is "-exec sed ..." part. Here I am testing so I am using sed to only print the lines I tell it to print. That's the sed -n option to say don't print by default. And then I added 'p' to the substitute command to say print any line that it is performing that action. s/.../.../g'p' It's that p flag at the end that is doing the printing after doing the substitution globally across the line. The 'g' flag says do the substitution as many times across the line as there are matched patterns. Otherwise the default is to do only the first match and stop. $ find . -type f -name '*.c' -exec grep -q UUENCODE {} \; \ -exec sed -n s/VALUE_OPT_TEXT_FILES/MYEXAMPLE/gp {} + case MYEXAMPLE: return false; { /* entry idx, value */ 12, MYEXAMPLE, Hey! That looks like my contrived example was correct. I would hack on it until it does what I think it should do. Then I would carefully undo my testing. $ find . -type f -name '*.c' -exec grep -q UUENCODE {} \; \ -exec sed s/VALUE_OPT_TEXT_FILES/MYEXAMPLE/g {} + \ | less Does that still look like what I want it to do? Don't forget to remove the -n or the file will be truncated to just the changed lines and not everything else! But it is useful for the debugging part. So always carefully review the change to see if it is what you want. If yes and only after careful review (and possibly with a safety backup!) I add the sed option to make the change for real by editing the file in place. $ find . -type f -name '*.c' -exec grep -q UUENCODE {} \; \ -exec sed --in-place s/VALUE_OPT_TEXT_FILES/MYEXAMPLE/g {} + \ And with that one command it can easily make global changes across a large set of files. Since you have a Vim technique that you know and understand then I would keep doing it there. I myself would use Emacs which has some powerful ways to do this too. The best way is almost always the way you understand. Because then you can fix it when it goes wrong. But find plus utilities is very powerful and a good thing to learn. Maybe not useful every day. But a good technique to have in the toolbox. Hope that helps! Bob