Groups > comp.lang.python > #94784 > unrolled thread

How to re-write this bash script in Python?

Started by	sutanu.das@gmail.com
First post	2015-07-30 11:31 -0700
Last post	2015-08-01 00:53 +1000
Articles	6 — 5 participants

Back to article view | Back to comp.lang.python

  How to re-write this bash script in Python? sutanu.das@gmail.com - 2015-07-30 11:31 -0700
    Re: How to re-write this bash script in Python? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-07-30 20:36 +0100
    Re: How to re-write this bash script in Python? random832@fastmail.us - 2015-07-30 16:17 -0400
    Re: How to re-write this bash script in Python? Chris Angelico <rosuav@gmail.com> - 2015-07-31 17:47 +1000
      Re: How to re-write this bash script in Python? Grant Edwards <invalid@invalid.invalid> - 2015-07-31 14:26 +0000
        Re: How to re-write this bash script in Python? Chris Angelico <rosuav@gmail.com> - 2015-08-01 00:53 +1000

#94784 — How to re-write this bash script in Python?

From	sutanu.das@gmail.com
Date	2015-07-30 11:31 -0700
Subject	How to re-write this bash script in Python?
Message-ID	<b07086dc-0946-4398-b621-23d564d54507@googlegroups.com>

#!/bin/bash

_maillist='pager@email.com'
_hname=`hostname`
_logdir=/hadoop/logs
_dirlog=${_logdir}/directory_check.log

_year=$(date -d "-5 hour" +%Y)
_month=$(date -d "-5 hour" +%m)
_day=$(date -d "-5 hour" +%d)
_hour=$(date -d "-5 hour" +%H)

_hdfsdir=`hdfs dfs -ls -d /hadoop/flume_ingest_*/$_year/$_month | awk '{print $8}'`

echo "Checking for HDFS directories:" > ${_dirlog}
echo >> ${_dirlog}

for _currdir in $_hdfsdir
do
hdfs dfs -ls -d $_currdir/$_day/$_hour &>> ${_dirlog}
done

if [[ `grep -i "No such file or directory" ${_dirlog}` ]];
then
echo "Verify Flume is working for all  servers" | mailx -s "HDFS Hadoop Failure on Flume: ${_hname}" -a ${_dirlog} ${_maillist}
fi

[toc] | [next] | [standalone]

#94786

From	Mark Lawrence <breamoreboy@yahoo.co.uk>
Date	2015-07-30 20:36 +0100
Message-ID	<mailman.1096.1438284995.3674.python-list@python.org>
In reply to	#94784

On 30/07/2015 19:31, sutanu.das@gmail.com wrote:
> #!/bin/bash
>
> _maillist='pager@email.com'
> _hname=`hostname`
> _logdir=/hadoop/logs
> _dirlog=${_logdir}/directory_check.log
>
> _year=$(date -d "-5 hour" +%Y)
> _month=$(date -d "-5 hour" +%m)
> _day=$(date -d "-5 hour" +%d)
> _hour=$(date -d "-5 hour" +%H)
>
> _hdfsdir=`hdfs dfs -ls -d /hadoop/flume_ingest_*/$_year/$_month | awk '{print $8}'`
>
> echo "Checking for HDFS directories:" > ${_dirlog}
> echo >> ${_dirlog}
>
> for _currdir in $_hdfsdir
> do
> hdfs dfs -ls -d $_currdir/$_day/$_hour &>> ${_dirlog}
> done
>
> if [[ `grep -i "No such file or directory" ${_dirlog}` ]];
> then
> echo "Verify Flume is working for all  servers" | mailx -s "HDFS Hadoop Failure on Flume: ${_hname}" -a ${_dirlog} ${_maillist}
> fi
>

Read the documentation here https://docs.python.org/3/ and then run up 
your favourite editor and start typing.  When and if you hit problems 
come back with a snippet of code that shows the problem, what you 
expected to happen, what actually happened, and the full traceback if 
there is one.  Please use cut and paste to ensure that you get the data 
correct.

-- 
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

[toc] | [prev] | [next] | [standalone]

#94789

From	random832@fastmail.us
Date	2015-07-30 16:17 -0400
Message-ID	<mailman.1097.1438287442.3674.python-list@python.org>
In reply to	#94784

On Thu, Jul 30, 2015, at 14:31, sutanu.das@gmail.com wrote:
> _year=$(date -d "-5 hour" +%Y)
> _month=$(date -d "-5 hour" +%m)
> _day=$(date -d "-5 hour" +%d)
> _hour=$(date -d "-5 hour" +%H)

What is the purpose of the -5 hour offset? Is it an attempt to
compensate for timezones?

[toc] | [prev] | [next] | [standalone]

#94804

From	Chris Angelico <rosuav@gmail.com>
Date	2015-07-31 17:47 +1000
Message-ID	<mailman.1110.1438328836.3674.python-list@python.org>
In reply to	#94784

On Fri, Jul 31, 2015 at 4:31 AM,  <sutanu.das@gmail.com> wrote:
> #!/bin/bash
>
> _maillist='pager@email.com'
> _hname=`hostname`
> _logdir=/hadoop/logs
> _dirlog=${_logdir}/directory_check.log
>
> _year=$(date -d "-5 hour" +%Y)
> _month=$(date -d "-5 hour" +%m)
> _day=$(date -d "-5 hour" +%d)
> _hour=$(date -d "-5 hour" +%H)
>
> _hdfsdir=`hdfs dfs -ls -d /hadoop/flume_ingest_*/$_year/$_month | awk '{print $8}'`
>
> echo "Checking for HDFS directories:" > ${_dirlog}
> echo >> ${_dirlog}
>
> for _currdir in $_hdfsdir
> do
> hdfs dfs -ls -d $_currdir/$_day/$_hour &>> ${_dirlog}
> done
>
> if [[ `grep -i "No such file or directory" ${_dirlog}` ]];
> then
> echo "Verify Flume is working for all  servers" | mailx -s "HDFS Hadoop Failure on Flume: ${_hname}" -a ${_dirlog} ${_maillist}
> fi
> --
> https://mail.python.org/mailman/listinfo/python-list

There are two basic approaches to this kind of job.

1) Go through every line of bash code and translate it into equivalent
Python code. You should then have a Python script which blindly and
naively accomplishes the same goal by the same method.

2) Start by describing what you want to accomplish, and then implement
that in Python, using algorithmic notes from the bash code.

The second option seems like a lot more work, but long-term it often
isn't, because you end up with better code. For example, bash lacks
decent timezone support, so I can well believe random832's guess that
your five-hour offset is a simulation of that; but Python can do much
better work with timezones, so you can get that actually correct.
Also, file handling, searching, and text manipulation and so on can
usually be done more efficiently and readably in Python directly than
by piping things through grep and awk.

ChrisA

[toc] | [prev] | [next] | [standalone]

#94813

From	Grant Edwards <invalid@invalid.invalid>
Date	2015-07-31 14:26 +0000
Message-ID	<mpg0ik$mfi$1@reader1.panix.com>
In reply to	#94804

On 2015-07-31, Chris Angelico <rosuav@gmail.com> wrote:

> There are two basic approaches to this kind of job.
>
> 1) Go through every line of bash code and translate it into
>    equivalent Python code. You should then have a Python script which
>    blindly and naively accomplishes the same goal by the same method.

In my experience, that works OK for C (with a little post-translation
tweaking and re-factoring).  But, it's a pretty lousy method for bash
scripts.  There are a lot of things that are trivial in Python and
complex/hard in bash (and a few vice versa), so a direct translation
usually turns out to be a mess.  You end up with a lot of Python code
where only a couple lines are really needed. You also end up doing
things in a bizarre manner in Python because the simple, easy, right
way wasn't supported by bash.

> 2) Start by describing what you want to accomplish, and then
>    implement that in Python, using algorithmic notes from the bash code.
>
> The second option seems like a lot more work, but long-term it often
> isn't, because you end up with better code.

And the code works. :)

For bash, I really recommend 2)

-- 
Grant Edwards               grant.b.edwards        Yow! GOOD-NIGHT, everybody
                                  at               ... Now I have to go
                              gmail.com            administer FIRST-AID to my
                                                   pet LEISURE SUIT!!

[toc] | [prev] | [next] | [standalone]

#94814

From	Chris Angelico <rosuav@gmail.com>
Date	2015-08-01 00:53 +1000
Message-ID	<mailman.1116.1438354383.3674.python-list@python.org>
In reply to	#94813

On Sat, Aug 1, 2015 at 12:26 AM, Grant Edwards <invalid@invalid.invalid> wrote:
> On 2015-07-31, Chris Angelico <rosuav@gmail.com> wrote:
>
>> There are two basic approaches to this kind of job.
>>
>> 1) Go through every line of bash code and translate it into
>>    equivalent Python code. You should then have a Python script which
>>    blindly and naively accomplishes the same goal by the same method.
>
> In my experience, that works OK for C (with a little post-translation
> tweaking and re-factoring).  But, it's a pretty lousy method for bash
> scripts.  There are a lot of things that are trivial in Python and
> complex/hard in bash (and a few vice versa), so a direct translation
> usually turns out to be a mess.  You end up with a lot of Python code
> where only a couple lines are really needed. You also end up doing
> things in a bizarre manner in Python because the simple, easy, right
> way wasn't supported by bash.

Right. The two techniques I suggested can be generalized to any
language pair, but some work better this way than others do. Shell
scripts are something of a special case, because they're massively
optimized toward running other programs and piping output into input,
which applications languages like Python are not as good at; so the
naive transformation leads to code that goes to ridiculous lengths to
invoke five subprocesses and move data between them, where a more
intelligent approach might invoke one process, and then do the rest in
Python code. The trouble is, you really need to know what your code is
doing, because the non-naive transformation generally has a different
set of assumptions. For instance, the OP's shell script calls on the
'mailx' command. What's it do? Presumably it sends an email... well,
Python can do that. But what if the mailx command on this host has
been carefully configured to pass mail along via a specific relay
host, and that direct access on port 25 has been blocked? How would
you know? So it's not just a matter of translating the script, you
have to know its execution environment as well.

>> 2) Start by describing what you want to accomplish, and then
>>    implement that in Python, using algorithmic notes from the bash code.
>>
>> The second option seems like a lot more work, but long-term it often
>> isn't, because you end up with better code.
>
> And the code works. :)
>
> For bash, I really recommend 2)

Yeah. You remove the ability for environmental changes to unexpectedly
affect the script, which is often a feature and not a bug.

ChrisA

[toc] | [prev] | [standalone]

csiph-web

How to re-write this bash script in Python?

Contents

#94784 — How to re-write this bash script in Python?

#94786

#94789

#94804

#94813

#94814