Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #94784 > unrolled thread
| Started by | sutanu.das@gmail.com |
|---|---|
| First post | 2015-07-30 11:31 -0700 |
| Last post | 2015-08-01 00:53 +1000 |
| Articles | 6 — 5 participants |
Back to article view | Back to comp.lang.python
How to re-write this bash script in Python? sutanu.das@gmail.com - 2015-07-30 11:31 -0700
Re: How to re-write this bash script in Python? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-07-30 20:36 +0100
Re: How to re-write this bash script in Python? random832@fastmail.us - 2015-07-30 16:17 -0400
Re: How to re-write this bash script in Python? Chris Angelico <rosuav@gmail.com> - 2015-07-31 17:47 +1000
Re: How to re-write this bash script in Python? Grant Edwards <invalid@invalid.invalid> - 2015-07-31 14:26 +0000
Re: How to re-write this bash script in Python? Chris Angelico <rosuav@gmail.com> - 2015-08-01 00:53 +1000
| From | sutanu.das@gmail.com |
|---|---|
| Date | 2015-07-30 11:31 -0700 |
| Subject | How to re-write this bash script in Python? |
| Message-ID | <b07086dc-0946-4398-b621-23d564d54507@googlegroups.com> |
#!/bin/bash
_maillist='pager@email.com'
_hname=`hostname`
_logdir=/hadoop/logs
_dirlog=${_logdir}/directory_check.log
_year=$(date -d "-5 hour" +%Y)
_month=$(date -d "-5 hour" +%m)
_day=$(date -d "-5 hour" +%d)
_hour=$(date -d "-5 hour" +%H)
_hdfsdir=`hdfs dfs -ls -d /hadoop/flume_ingest_*/$_year/$_month | awk '{print $8}'`
echo "Checking for HDFS directories:" > ${_dirlog}
echo >> ${_dirlog}
for _currdir in $_hdfsdir
do
hdfs dfs -ls -d $_currdir/$_day/$_hour &>> ${_dirlog}
done
if [[ `grep -i "No such file or directory" ${_dirlog}` ]];
then
echo "Verify Flume is working for all servers" | mailx -s "HDFS Hadoop Failure on Flume: ${_hname}" -a ${_dirlog} ${_maillist}
fi
[toc] | [next] | [standalone]
| From | Mark Lawrence <breamoreboy@yahoo.co.uk> |
|---|---|
| Date | 2015-07-30 20:36 +0100 |
| Message-ID | <mailman.1096.1438284995.3674.python-list@python.org> |
| In reply to | #94784 |
On 30/07/2015 19:31, sutanu.das@gmail.com wrote:
> #!/bin/bash
>
> _maillist='pager@email.com'
> _hname=`hostname`
> _logdir=/hadoop/logs
> _dirlog=${_logdir}/directory_check.log
>
> _year=$(date -d "-5 hour" +%Y)
> _month=$(date -d "-5 hour" +%m)
> _day=$(date -d "-5 hour" +%d)
> _hour=$(date -d "-5 hour" +%H)
>
> _hdfsdir=`hdfs dfs -ls -d /hadoop/flume_ingest_*/$_year/$_month | awk '{print $8}'`
>
> echo "Checking for HDFS directories:" > ${_dirlog}
> echo >> ${_dirlog}
>
> for _currdir in $_hdfsdir
> do
> hdfs dfs -ls -d $_currdir/$_day/$_hour &>> ${_dirlog}
> done
>
> if [[ `grep -i "No such file or directory" ${_dirlog}` ]];
> then
> echo "Verify Flume is working for all servers" | mailx -s "HDFS Hadoop Failure on Flume: ${_hname}" -a ${_dirlog} ${_maillist}
> fi
>
Read the documentation here https://docs.python.org/3/ and then run up
your favourite editor and start typing. When and if you hit problems
come back with a snippet of code that shows the problem, what you
expected to happen, what actually happened, and the full traceback if
there is one. Please use cut and paste to ensure that you get the data
correct.
--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.
Mark Lawrence
[toc] | [prev] | [next] | [standalone]
| From | random832@fastmail.us |
|---|---|
| Date | 2015-07-30 16:17 -0400 |
| Message-ID | <mailman.1097.1438287442.3674.python-list@python.org> |
| In reply to | #94784 |
On Thu, Jul 30, 2015, at 14:31, sutanu.das@gmail.com wrote: > _year=$(date -d "-5 hour" +%Y) > _month=$(date -d "-5 hour" +%m) > _day=$(date -d "-5 hour" +%d) > _hour=$(date -d "-5 hour" +%H) What is the purpose of the -5 hour offset? Is it an attempt to compensate for timezones?
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2015-07-31 17:47 +1000 |
| Message-ID | <mailman.1110.1438328836.3674.python-list@python.org> |
| In reply to | #94784 |
On Fri, Jul 31, 2015 at 4:31 AM, <sutanu.das@gmail.com> wrote:
> #!/bin/bash
>
> _maillist='pager@email.com'
> _hname=`hostname`
> _logdir=/hadoop/logs
> _dirlog=${_logdir}/directory_check.log
>
> _year=$(date -d "-5 hour" +%Y)
> _month=$(date -d "-5 hour" +%m)
> _day=$(date -d "-5 hour" +%d)
> _hour=$(date -d "-5 hour" +%H)
>
> _hdfsdir=`hdfs dfs -ls -d /hadoop/flume_ingest_*/$_year/$_month | awk '{print $8}'`
>
> echo "Checking for HDFS directories:" > ${_dirlog}
> echo >> ${_dirlog}
>
> for _currdir in $_hdfsdir
> do
> hdfs dfs -ls -d $_currdir/$_day/$_hour &>> ${_dirlog}
> done
>
> if [[ `grep -i "No such file or directory" ${_dirlog}` ]];
> then
> echo "Verify Flume is working for all servers" | mailx -s "HDFS Hadoop Failure on Flume: ${_hname}" -a ${_dirlog} ${_maillist}
> fi
> --
> https://mail.python.org/mailman/listinfo/python-list
There are two basic approaches to this kind of job.
1) Go through every line of bash code and translate it into equivalent
Python code. You should then have a Python script which blindly and
naively accomplishes the same goal by the same method.
2) Start by describing what you want to accomplish, and then implement
that in Python, using algorithmic notes from the bash code.
The second option seems like a lot more work, but long-term it often
isn't, because you end up with better code. For example, bash lacks
decent timezone support, so I can well believe random832's guess that
your five-hour offset is a simulation of that; but Python can do much
better work with timezones, so you can get that actually correct.
Also, file handling, searching, and text manipulation and so on can
usually be done more efficiently and readably in Python directly than
by piping things through grep and awk.
ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Grant Edwards <invalid@invalid.invalid> |
|---|---|
| Date | 2015-07-31 14:26 +0000 |
| Message-ID | <mpg0ik$mfi$1@reader1.panix.com> |
| In reply to | #94804 |
On 2015-07-31, Chris Angelico <rosuav@gmail.com> wrote:
> There are two basic approaches to this kind of job.
>
> 1) Go through every line of bash code and translate it into
> equivalent Python code. You should then have a Python script which
> blindly and naively accomplishes the same goal by the same method.
In my experience, that works OK for C (with a little post-translation
tweaking and re-factoring). But, it's a pretty lousy method for bash
scripts. There are a lot of things that are trivial in Python and
complex/hard in bash (and a few vice versa), so a direct translation
usually turns out to be a mess. You end up with a lot of Python code
where only a couple lines are really needed. You also end up doing
things in a bizarre manner in Python because the simple, easy, right
way wasn't supported by bash.
> 2) Start by describing what you want to accomplish, and then
> implement that in Python, using algorithmic notes from the bash code.
>
> The second option seems like a lot more work, but long-term it often
> isn't, because you end up with better code.
And the code works. :)
For bash, I really recommend 2)
--
Grant Edwards grant.b.edwards Yow! GOOD-NIGHT, everybody
at ... Now I have to go
gmail.com administer FIRST-AID to my
pet LEISURE SUIT!!
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2015-08-01 00:53 +1000 |
| Message-ID | <mailman.1116.1438354383.3674.python-list@python.org> |
| In reply to | #94813 |
On Sat, Aug 1, 2015 at 12:26 AM, Grant Edwards <invalid@invalid.invalid> wrote: > On 2015-07-31, Chris Angelico <rosuav@gmail.com> wrote: > >> There are two basic approaches to this kind of job. >> >> 1) Go through every line of bash code and translate it into >> equivalent Python code. You should then have a Python script which >> blindly and naively accomplishes the same goal by the same method. > > In my experience, that works OK for C (with a little post-translation > tweaking and re-factoring). But, it's a pretty lousy method for bash > scripts. There are a lot of things that are trivial in Python and > complex/hard in bash (and a few vice versa), so a direct translation > usually turns out to be a mess. You end up with a lot of Python code > where only a couple lines are really needed. You also end up doing > things in a bizarre manner in Python because the simple, easy, right > way wasn't supported by bash. Right. The two techniques I suggested can be generalized to any language pair, but some work better this way than others do. Shell scripts are something of a special case, because they're massively optimized toward running other programs and piping output into input, which applications languages like Python are not as good at; so the naive transformation leads to code that goes to ridiculous lengths to invoke five subprocesses and move data between them, where a more intelligent approach might invoke one process, and then do the rest in Python code. The trouble is, you really need to know what your code is doing, because the non-naive transformation generally has a different set of assumptions. For instance, the OP's shell script calls on the 'mailx' command. What's it do? Presumably it sends an email... well, Python can do that. But what if the mailx command on this host has been carefully configured to pass mail along via a specific relay host, and that direct access on port 25 has been blocked? How would you know? So it's not just a matter of translating the script, you have to know its execution environment as well. >> 2) Start by describing what you want to accomplish, and then >> implement that in Python, using algorithmic notes from the bash code. >> >> The second option seems like a lot more work, but long-term it often >> isn't, because you end up with better code. > > And the code works. :) > > For bash, I really recommend 2) Yeah. You remove the ability for environmental changes to unexpectedly affect the script, which is often a feature and not a bug. ChrisA
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web