Lec 1


Distributed System Models and Enabling Technologies
Scalable Computing over the Internet, Technologies for Network-Based Systems, System Models for Distributed and Cloud Computing, Software Environments for Distributed Systems and Clouds, Performance, Security

- How to Read a Paper, S. Keshav, 2012.

- Above the Clouds: A Berkeley View of Cloud Computing, Technical Report, 2009.




Lec 2

Computer Clusters for Scalable Computing
Clustering for Massive Parallelism, Computer Clusters and MPP Architectures, Design Principles of Computer Clusters, Cluster Job and Resource Management, Case Studies of Top Supercomputer Systems

- What is Parallel Computing?



Lec 3

Virtual Machines and Virtualization of Clusters and Datacenters
Implementation Levels of Virtualization, Virtualization Structures/Tools and Mechanisms, Virtualization of CPU, Memory, and I/O Devices, Virtual Clusters and Resource Management, Virtualization for Data-Center Automation

- Xen and the Art of Virtualization-2003

- A Comparison of Software and Hardware Techniques for x86 Virtualization-2006



Lec 4

Cloud Platform Architecture over Virtualized Data Centers:
Data Center Design and Networking
What is a Data Center? What does a Data Center Look Like? Warehouse-Scale Data Center Design, Power and Cooling Requirements, Data-Center Interconnection Networks, Design Considerations for WSC


- The Datacenter as a Computer, An Introduction to the Design of Warehouse-Scale Machines,  L. A. Barroso,  U. Hölzle, Google Inc., 2009.

- High Performance Datacenter Networks, Architectures, Algorithms, and Opportunities, D. Abts, J. Kim, 2011.

- A Guided Tour through Data-center Networking, D. Abts, B. Felderman, ACM Queue, May 3, 2012.

- A Scalable, Commodity Data Center Network Architecture, M. Al-Fares, A. Loukissas, A. Vahdat, SIGCOMM’08, August 17–22, 2008.

Videos on Data Centers:

- Explore a Google Data Center with Street View

- Google Container Data Center


Lec 5

Cloud Platform Architecture over Virtualized Data Centers:
Cloud Computing Service Models
Cloud Computing Services Stack, Infrastructure as a Service (IaaS), Platform as a Service (PaaS), Software as a Service (SaaS), Today’s Cloud Services Stack, Public, Private & Hybrid Clouds, Market-Oriented Cloud Architecture, Inter-Cloud Resource Management, Cloud Security and Trust Management

- Amazon Web Services (AWS)     Getting Started with AWS
- Introduction to Amazon Web Services (video tutorial)

- Good App Engine

- Introduction to Google App Engine For Developers (video tutorial)

- Microsoft Azure

Lec 6

Cloud Platform Architecture over Virtualized Data Centers:
Major Cloud Service Providers
Public Clouds, Amazon Web Services (AWS), Google App Engine, Microsoft Azure
Lec 7.1

Cloud Programming and Software Environments:
MapReduce and Hadoop Framework

Big Data and Parallel Computing, History of MapReduce, New Parallel Programming Paradigm: MapReduce, The MapReduce Programming Model, Hadoop Framework, Writing Jobs for Hadoop, Hadoop Distributed File System (HDFS), Hadoop Internals, Hadoop 1.0 vs 2.0, MapReduce Cloud Service


- The Google File System, S. Ghemawat et al., SOSP, 2003.

- MapReduce: Simplied Data Processing on Large Clusters, J. Dean, S. Ghemawat, OSDI, 2004.

- Hadoop home page

- Beyond Batch- The Evolution of the Hadoop Ecosystem - Doug Cutting

- HDFS-Comics

- MapReduce Tutorial (Apache Hadoop 1.2.1) 
- MapReduce Tutorial (Apache Hadoop 2.6.0)
- Google MapReduce Tutorial

Lec 7.2

Cloud Programming and Software Environments:
Introduction to YARN and MapReduce 2
Overview of MapReduce 1 and 2, YARN Architecture, MapReduce v2, Managing a YARN Cluster, Cloudera and MR2


- Hadoop Tutorial: Introducing Apache Hadoop (17 minutes) 
- Hadoop Tutorial: Intro To Hadoop Developer Training | Cloudera (1 hour)
- MapReduce Programming Demo - Global Climate Analysis Example from Hadoop: The Definitive Guide
- Hadoop - Just the Basics for Big Data Rookies (1 hour 25 minutes) 
- Big Data and Hadoop Tutorials - 28 Videos and 20 hours -
- Hadoop MapReduce Fundamentals 1 of 5 
- Intro To MapReduce 

Lec 7.3

Lec 7.4


Cloud Programming and Software Environments:
Hadoop MapReduce 2 Tutorial

Hadoop Ecosystem and HPC Integration

- Hadoop installation and configuration on notebooks: 1-, 2- and 4-node clusters on notebooks using Cloudera 4.1.1 and 5.3 Hadoop Distributions
- Hadoop installation and configuration on
FutureSystems - Indiana University Clusters, our project portal address

Lec 8


Big Data Applications & Analytics Case Study
K-Means, Analysis of 4 Artificial Clusters, KMeans in Java using Mahout, MapReduce Revisited: Advanced Topics, Kmeans and MapReduce Parallelism, PageRank
Lec 9

How to Store Data (NoSQL)
RDBMS vs NoSQL, NoSQL Characteristics, BigTable, Hbase Hbase Coding, Indexing Technologies, Related Work, Socal Media Searches, Analysis Algorithms


Lec 10
How to Build a Search Engine (SaaS)
Architecture for a Search Engine, Google Architecture, Evolution of Google’s Search Systems


