Twister Installation

SALSA Group
PTI Indiana University
June 17th 2010

Contents

1. Introduction

2. Prerequisite

3. Single Machine

4. Cluster

5. Amazon EC2

1. Introduction

Twister MapReduce Framework is a Java implementation of MapReduce programming model also with extended features such as supporting Iterative MapReduce computations. It can cooperate with different messaging software to execute MapReduce jobs. Now two kinds of software are available. One is NaradaBrokering, another is ActiveMQ. Here focuses on the settings to Twister and NaradaBrokering.

This Twister Tutorial Package includes all Twister Framework and 3 Twister applications for Tutorial, which are Twister-WourdCount, Twister-BLAST, Twister-Kmeans.

2. Prerequisite

Twister works on Linux system. Before Twister installation, first download Twister Tutorial Package ( here ), then unzip to a directory we call this $TWISTER_HOME. Then download NaradaBrokering ( here ) and unzip it to another directory. We call this $NBHOME. These environment variables are required to be exported to .bashrc (re-login is required).

e.g.

export NBHOME=/full/path/to/NaradaBrokering

export TWISTER_HOME=/full/path/to/Twister

Latest  version Java should be installed and configured first in order to let Java program be executed under command window.

Besides, SSH should be configured to be connecting between machines without password. A sample solution can be viewed here .

Notice:

To start NaradaBrokering and Twister, user need separate command window. To run Twister application, at least two more command window is needed, one for executing Twister command, another is for executing Twister application command.

All Twister nodes are required to have a directory to manage the Twister data on node. This directory name is same on all Twister Node. This is called Twister common data directory.

Once use meet the step of specifying this directory (data_dir in $TWISTER_HOME/bin/twister.properties). User can use ./twister.sh initdir [Directory to create - complete path to the directory] under $TWISTER_HOME to create this directory.

e.g. ./twister.sh initdir /full/path/to/data_dir

3. Single Machine

Configure nb.properties, nodes and twister.properties 3 files under $TWISTER_HOME/bin/.

  1. Set broker_host in nb.properties to "127.0.0.1".
  2. Clean nodes file first, and then put "127.0.0.1" to nodes at the first line.
  3. In twister.properties, set nodes_file. It is the current full path of nodes file. The path is usually $TWISTER_HOME/bin/nodes.
  4. In twister.properties, set app_dir. Jar packages of Twister applications are required to put into this directory. The path is usually $TWISTER_HOME/apps/.
  5. In twister.properties, set data_dir. It is the root of Twister common data directory on different nodes operated by Twister.
  6. ./startbr.sh under $NBHOME/bin/.
  7. ./start_twister.sh under $TWISTER_HOME/bin/.

4. Cluster

Because Twister and NaradaBrokering are installed under NFS which is shared between nodes in cluster, they are only required to be installed and configured once. Here assumes a set of machine IP address are known. Then configure nb.properties, nodes and twister.properties under $TWISTER_HOME/bin/.

  1. Set broker_host in nb.properties to one of machine IP address.
  2. Put machine IP address list to nodes file, each a line.
  3. In twister.properties, set nodes_file to the current location of nodes. The path is usually $TWISTER_HOME/bin/nodes.
  4. Set daemons_per_node and workers_per_daemon in twister.properties, usually set one daemon per node. The number of workers is the same as the number of cores of the machine.
  5. In twister.properties, set app_dir. Jar package of Twister applications is required to put into this directory. The path is usually $TWISTER_HOME/apps/.
  6. In twister.properties, set data_dir. This is the directory for operating Twister application data.
  7. ./startbr.sh under $NBHOME/bin/.
  8. ./start_twister.sh under $TWISTER_HOME/bin/.

5. Amazon EC2

Twister-image (Twister-Tutorial-Img-0.6) is on Amazon EC2 EBS. In this image, Twister and NaradaBrokering are both installed under /home/ directory. Once a user launches several instances, the following steps are needed to start Twister environment.

A. Edit Configuration

  1. Choose one instance, log into it with root account, then do configuration by editing nb.properties, nodes and twister.properties under /home/Twister/bin/.
  2. Set broker_host in nb.properties to the IP address where you start NaradaBrokering. Normally, it is the instance you currently operate on.
  3. Put all IP address you launch to nodes file, each IP address one line.
  4. Set daemons_per_node and workers_per_daemon in twister.properties, usually one daemon per node, the number of workers is the same as the number of cores of the node.

B. Run configure.sh

  1. ./configure.sh under /home/Twister/bin/ to replicate configuration files between nodes.

C. Start Twister

  1. ./startbr.sh under /home/NaradaBrokering/bin/.
  2. ./start_twister.sh under /home/Twister/bin/.