Science Cloud Reproducible Environment Tutorial

SALSA Group
PTI Indiana University
June 12th 2012

1. Introduction

Reproducibility environment on public/private cloud is very important for any commercial and academic executable binary. With this reproducible feature, many data analysis could be done without worrying the complicated system setup progress, the learning curve of different cloud infrastructure tools, and detail background knowledge of Cloud. Programmer or Scientists can focus on writing their own application and fires it on Cloud with the support of scalability. Here, we provide an on-demand dynamic provisioning infrastructure (SalsaDPI) running on Eucalyptus of FutureGrid Testbed. This tutorial will guide you through the idea of this work, the usage of this infrastructure and examples of using this software to run user-defined binary from sandbox mode to scalable cloud environment.

Before using our provide framework, please make sure you have the following dependencies installed on your own laptop/machine.

Prerequisite Software dependencies:

2. Hands-on Assignments

 

Changes after 07/31/2012 Day 2 tutorial

We have changed the followings for Lab Session: Reproducible Environment for Scientific Applications:

1. The slides has been updated: http://salsahpc.indiana.edu/ScienceCloud/slides/salsaDPI_presentation_v1.pptx, please make sure you remove the old one to avoid you local browser download the cached file.

2. Tutorial pages of "Lab Session: Reproducible Environment for Scientific Applications" have been changed and updated according to the Presentation order of 07/31/2012:  http://salsahpc.indiana.edu/ScienceCloud/reproduce-intro.html 

3. A special update which requires audience downloads a new complied java executable salsaDPI.jar, detail instructions could be seen in hands-on 1 (http://salsahpc.indiana.edu/ScienceCloud/handson1_chef_sandbox.html) and hands-on 2 (http://salsahpc.indiana.edu/ScienceCloud/handson2_chef_cloud.html) pages

4. Everyone, either new or old students, should directly read the slides and the tutorial entrance page to find their needs: http://salsahpc.indiana.edu/ScienceCloud/reproduce-intro.html

FAQ

    FAQ when setting FutureGrid Environment
  1. Where is the eucarc file stored?
    Ans: it is stored on FutureGrid India headnode, india.futuregrid.org, you will need your ssh private generated on Day 1 tutorial. If you haven't generated that, please see this page: https://portal.futuregrid.org/projects/241/register
  2. How could I login FutureGrid with using my ssh private key and download the eucarc and FutureGrid Eucalypus VM ssh private key files?
    Ans: For Linux user, you could use ssh terminal directly. Under Windows, you could login to FutureGrid with your ssh private key, and use WinScp to download eucarc and VM ssh private key to local.
  3. Why can't I login FutureGrid with using ssh key generated inside VirtualBox under Windows (*.ppk)?
    Ans: You have to convert it to OPENSSH format, please see this guide:http://salsahpc.indiana.edu/ScienceCloud/fg_euca_guide.html


  4. FAQ when ruuning salsaDPI.jar
  5. Why can't salsaDPI jar run correctly and generate the output under sandbox/cloud mode?
    Ans: if you cannnot get output from the sandbox mode, it's probably typos error in the template conf. file or the java execution commands. Also, please make sure you get the latest salsaDPI. Instructions can be seen in section "Important update" of hands-on 1 and hands-on 2 pages.
  6. If salsaDPI.jar cannot go through, where could I see the log/error message?
    Ans: salsaDPI generates log and error files under /tmp/ with name of " bootstrap_info_node_*".
  7. What does "Message was generated in the future" or "Message was generated in the past" means in the error message?
    Ans: The clock setting of the VirtualBox VM environment is set differently from the chef server, this causes authentication and authorization errors. Please use command "ntpdate ntp.indiana.edu" to sync it with Indiana time within the VirtualBox.