Apache Hadoop Lab 2

Introduction

Hadoop-Blast is an advanced Hadoop program which helps Blast, a bioinformatics application, utilizes the Computing Capability of Hadoop. This exercise shows the detail of its implementation, and provides an example of how to handle similar approaches to other applications.

BLAST (Basic Local Alignment Search Tool) is one of the most widely used bioinformatics applications written in C++, and the version we are using is v2.2.23. This version considered a new version of the software with new features and better performance while BLAST is a legacy software. The database used in the following settings is 8.5GB (nr) database, its full name is Non-redundant protein sequence database.


Goals

Prerequisites

Hands-on Exercise 1: Blast installation

Hands-on Exercise 2: Setting up an Apache Hadoop Cluster for Hadoop-Blast

Hands-on Exercise 3: Running Hadoop-Blast in Distributed Hadoop

Hands-on Exercise 4: Programming the Hadoop-Blast