Class Schedule (Tentative)

Lectures

Topics

Literature

Assignments

Lab

Lecture 1
Jan 13, 15

  • Course introduction
  • Data Intensive Sciences
  • Data Center Model
  • Current Clouds with Infrastructure, Platform and Software as a Service

Assignment 0

 

Lecture 2
Jan 20, Jan 22

  • Cloud project ideas
  • Previous cloud projects
  Lab 1 Environment Setup

Lecture 3
Jan 27, Jan 29

  • Apache Data Analysis Open Stack
  • MapReduce
  • Hadoop
    • Hadoop Framework
    • Hadoop tasks
    • Fault Tolerance
  • Hadoop: The Definitive Guide, O‚ÄôReilly publishers.
    • Chapter 3
    • Chapter 4
    • Chapter 6
    • Chapter 7
Project 1 Hadoop WordCount Lab 2 Hadoop configuration

Lecture 4
Feb 3, 5

  • Guest Talk: "Sublinear" Models for streaming and/or distributed data by Prof. Qin Zhang

    lecture slides

  • Programming on a Computer Cluster
  • How Hadoop Runs on a MapReduce Job

Project 2 Hadoop PageRank Lab 3 Hadoop PageRank

Lecture 5
Feb 10, 12

  • Hadoop PageRank
  • Parallel Thinking
    • SIMD vs. MIMD
    • SPMD vs. MPMD

Lecture 6
Feb 17, 19

  • Hadoop BLAST
  • Research Issues
    • Data Locality
    • Task Granularity
    • Resource Utilization and Speculative Execution
  • mpiBLAST
  • CloudBLAST
  • CloudBurst
  • AzureBLAST
  • TwisterBLAST
Project 3 Hadoop BLAST Lab 4 BLAST
Lecture 7
Feb 24, 26

  • Iterative MapReduce and EM algorithms
  • Twister
  • Parallel data mining algorithms using Twister

Lecture 8
Mar 3, 5

  • Performance Issues
  • Data mining algorithms
    • Clustering by Deterministic Annealing (DAC)
    • Multi-Dimensional Scaling (MDS)
    • Latent Dirichlet Allocation (LDA)
   

Lecture 9
Mar 10, 12

  • Midterm Review
  Midterm  

Lecture 10
Mar. 24, 26

  • OpenStack
  • FutureSystems
  • Deploying Virtual cluster
  • Deploying Hadoop Cluster
Lab 5 OpenStack

Lecture 11
Mar 31, April 2

  • RDBMS vs. NoSQL
  • NoSQL Characteristics
  • BigTable
  • HBase
  • HBase Coding
Project 4 HBase WordCount Lab 6 Load Data into HBase

Lecture 12
April 7, 9

  • Indexing Technologies
  • Case Study of Social Media Analysis using IndexedHBase
  Project 5 Building an Inverted Index  

Lecture 13
Apr 14, 16

  • Pig and Hive
  • Pig PageRank
Project 6 Pig PageRank Lab 7 Pig configuration

Lecture 14
Apr 21, 23

  • Pig K-means
  Project 7 Pig K-means  

Lecture 15
Apr 28, 30

  • Build Search Engine
  Project 8 Search Engine Lab 8 Search Engine

Lecture 16
May 5, 7

  • Project Presentations
  • Course Review
  Final Exam