Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #4362

mrjob v0.2.5 released

From Jimmy Retzlaff <jimmy@retzlaff.com>
Date 2011-04-30 10:20 -0700
Subject mrjob v0.2.5 released
Newsgroups comp.lang.python
Message-ID <mailman.1024.1304184062.9059.python-list@python.org> (permalink)

Show all headers | View raw


What is mrjob?
-----------------------

mrjob is a Python package that helps you write and run Hadoop Streaming jobs.

mrjob fully supports Amazon's Elastic MapReduce (EMR) service, which
allows you to buy time on a Hadoop cluster on an hourly basis. It also
works with your own Hadoop cluster.

Some important features:

  * Run jobs on EMR, your own Hadoop cluster, or locally (for testing).
  * Write multi-step jobs (one map-reduce step feeds into the next)
  * Duplicate your production environment inside Hadoop
      * Upload your source tree and put it in your job's $PYTHONPATH
      * Run make and other setup scripts
      * Set environment variables (e.g. $TZ)
      * Easily install python packages from tarballs (EMR only)
      * Setup handled transparently by mrjob.conf config file
  * Automatically interpret error logs from EMR
  * SSH tunnel to hadoop job tracker on EMR
  * Minimal setup
      * To run on EMR, set $AWS_ACCESS_KEY_ID and $AWS_SECRET_ACCESS_KEY
      * To run on your Hadoop cluster, install simplejson and make
sure $HADOOP_HOME is set.

More info:

  * Install mrjob: python setup.py install
  * Documentation: http://packages.python.org/mrjob/
  * PyPI: http://pypi.python.org/pypi/mrjob
  * Discussion: http://groups.google.com/group/mrjob
  * Development is hosted at github: http://github.com/Yelp/mrjob


What's new?
-------------------

v0.2.5, 2011-04-29 -- Hadoop input and output formats
  * Added hadoop_input/output_format options
  * You can now specify a custom Hadoop streaming jar (hadoop_streaming_jar)
  * extra args to hadoop now come before -mapper/-reducer on EMR, so
    that e.g. -libjar will work (worked in hadoop mode since v0.2.2)
  * hadoop mode now supports s3n:// URIs (Issue #53)

Back to comp.lang.python | Previous | Next | Find similar


Thread

mrjob v0.2.5 released Jimmy Retzlaff <jimmy@retzlaff.com> - 2011-04-30 10:20 -0700

csiph-web