Science Cloud Reproducible Environment TutorialSALSA Group
PTI Indiana University
June 12th 2012
Reproducibility environment on public/private cloud is very important for any commercial and academic executable binary. With this reproducible feature, many data analysis could be done without worrying the complicated system setup progress, the learning curve of different cloud infrastructure tools, and detail background knowledge of Cloud. Programmer or Scientists can focus on writing their own application and fires it on Cloud with the support of scalability. Here, we provide an on-demand dynamic provisioning infrastructure (SalsaDPI) running on Eucalyptus of FutureGrid Testbed. This tutorial will guide you through the idea of this work, the usage of this infrastructure and examples of using this software to run user-defined binary from sandbox mode to scalable cloud environment.
Before using our provide framework, please make sure you have the following dependencies installed on your own laptop/machine.
Prerequisite Software dependencies:
- Install and use the prepackaged Virtualbox VM Image, installation instruction
2. Hands-on Assignments
- Handson 1 Run User-defined Hadoop WordCount on a Sandbox standalone Machine
- Handson 2 Run User-defined Twister WordCount on FutureGrid Eucalyptus
FAQ when setting FutureGrid Environment
- Where is the eucarc file stored?
Ans: it is stored on FutureGrid India headnode, india.futuregrid.org, you will need your ssh private generated on Day 1 tutorial. If you haven't generated that, please see this page: https://portal.futuregrid.org/projects/241/register
- How could I login FutureGrid with using my ssh private key and download the eucarc and FutureGrid Eucalypus VM ssh private key files?
Ans: For Linux user, you could use ssh terminal directly. Under Windows, you could login to FutureGrid with your ssh private key, and use WinScp to download eucarc and VM ssh private key to local.
- Why can't I login FutureGrid with using ssh key generated inside VirtualBox under Windows (*.ppk)?
Ans: You have to convert it to OPENSSH format, please see this guide:http://salsahpc.indiana.edu/ScienceCloud/fg_euca_guide.html
- Why can't salsaDPI jar run correctly and generate the output under sandbox/cloud mode?
Ans: if you cannnot get output from the sandbox mode, it's probably typos error in the template conf. file or the java execution commands. Also, please make sure you get the latest salsaDPI. Instructions can be seen in section "Important update" of hands-on 1 and hands-on 2 pages.
- If salsaDPI.jar cannot go through, where could I see the log/error message?
Ans: salsaDPI generates log and error files under /tmp/ with name of " bootstrap_info_node_*".
- What does "Message was generated in the future" or "Message was generated in the past" means in the error message?
Ans: The clock setting of the VirtualBox VM environment is set differently from the chef server, this causes authentication and authorization errors. Please use command "ntpdate ntp.indiana.edu" to sync it with Indiana time within the VirtualBox.
FAQ when ruuning salsaDPI.jar