|
|
Lecture Schedule
W |
Lec |
Topics Covered |
Lab |
Technology
Training |
Supplementary |
HW |
0
10/02 |
|
Course Introduction |
|
|
|
HW1 |
1
17/02 |
Lec 1 |
Course Motivation (1/2)
Emerging Technologies, Data Deluge, Industry Trends,
Computing Model: Clouds, Data Centers, Virtualization. |
|
|
|
|
2
24/02
|
|
Course Motivation (2/2)
Research Model: 4th Paradigm, Data Science Process: DIKW,
Features of Data Deluge, Data Analytics, Cloud Applications:
Physics-Informatics, Recommender Systems, Information
Retrieval, Cloud Applications in Research: Science Clouds,
Internet of Things, Parallel Computing and MapReduce. |
Lab 1
Lab 2 |
Python for
Big Data and X-Informatics and
NumPy, SciPy, MatPlotlib
(powerful tools which every data scientist who uses Python
must know.) This training covers Canopy which is an IDE for
Python.
FutureSystems |
|
HW2
|
3
03/03
|
Lec 2
|
Recommender Systems and
Algorithms
Recommender Systems as an Optimization Problem, Kaggle
Competitions, Examples of Recommender Systems: Netfliz,
Google News Personalization Engine, Yahoo Recommender
Systems |
Lab 3
|
Using Plotviz
Plotviz is a data visualization tool developed
at Indiana University for displaying point
distributions in 3D.
|
|
HW3 |
4
09/03
|
|
Recommender Systems and
Algorithms
Algorithms: User-based Nearest-Neighbor
Collaborative Filtering, Vector Space Formulation of
Recommender Systems, Item-based Collaborative Filtering, k
Nearest Neighbors and High Dimensional Spaces
|
Lab 4
Lab 5
|
kNN
Recommender
Systems - K-Nearest Neighbors (Python & Java Track),
Clustering
Clustering and heuristic methods. |
|
HW4 |
5
17/03
|
Lec 3.1
Lec 3.2
|
Cloud Computing Technology
Part I: Introduction, Software and Systems
Cyberinfrastructure, What is Cloud Computing: Introduction,
What and Why is Cloud Computing: Several Other Views, Simple
Examples of Use of Cloud Computing, Value of Cloud Computing
Public, Private and Hybrid Clouds, Cloud Software
Architecture: IaaS and PaaS, Using the HPC-ABDS Software
Stack
Cloud Computing Technology Part II: Architectures,
Applications and Systems
Cloud (Data Center) Architectures, Analysis of Major Cloud
Providers, Commercial Cloud Storage Trends, Cloud
Applications, Science Clouds: Science Applications and
Internet of Things, Security, Comments on Fault Tolerance
and Synchronicity Constraints |
|
|
|
|
6
24/03 |
Lec 3.3 |
Cloud
Computing Technology Part III: Data Systems
The 10 Interaction
Scenarios (access patterns) I, The 10 Interaction Scenarios
– Science Examples, Remaining General Access Patterns, Data
in the Cloud Applications, Processing Big Data |
|
|
|
|
7
31/03 |
|
Midterm Exam |
|
|
|
|
8
07/04 |
Lec 4.1
|
Cloud Programming and
Software Environments:
MapReduce and Hadoop Framework
Big Data and Parallel Computing, History of MapReduce, New
Parallel Programming Paradigm: MapReduce, The MapReduce
Programming Model, Hadoop Framework, Writing Jobs for Hadoop,
Hadoop Distributed File System (HDFS), Hadoop Internals,
Hadoop 1.0 vs 2.0, MapReduce Cloud Service |
|
|
Hadoop
installation and configuration on notebooks: 1-, 2- and
4-node clusters on notebooks using Cloudera 4.1.1 and
5.3 Hadoop Distributions |
|
Hadoop
installation and configuration on
FutureSystems -
Indiana University Clusters,
our project
portal
address |
|
|
HW5 |
9
14/04 |
Lec 4.2
Lec 4.3
Lec 4.4 |
Cloud Programming and
Software Environments:
Introduction to YARN and MapReduce 2
Overview of MapReduce 1 and 2, YARN Architecture,
MapReduce v2, Managing a YARN Cluster, Cloudera and MR2
Hadoop MapReduce 2 Tutorial
Hadoop Ecosystem and HPC
Integration |
|
|
|
|
10
21/04 |
Lec 5
|
Big Data Applications and
Anallytics Case Study:
Web Search and Text Mining
Web and Document/Text Search: The Problem, Information
Retrieval, Web Search Solution in General Starting with
History, Key Fundamental Principles behind Web Search,
Information Retrieval (Web Search) Components, Search
Engines, Boolean and Vector Space Model, Web Crawling and
Document Preparation, Indices, TF-IDF and Probabilistic
Models, Data Analytics for Web Search, Link Structure
Analysis including PageRank, Web Advertising and Search,
Clustering and Topic Models |
|
|
|
|
11
28/04 |
Lec 6
|
Technology for Big Data
Applications & Analytics
K-Means, Analysis of 4 Artificial Clusters, KMeans in Java
using Mahout, MapReduce Revisited: Advanced Topics, Kmeans
and MapReduce Parallelism, PageRank |
|
|
|
|
12
05/05 |
Lec 7
|
How to Store Data (NoSQL)
RRDBMS vs NoSQL, NoSQL Characteristics, BigTable, Hbase
Hbase Coding, Indexing Technologies, Related Work, Socal
Media Searches, Analysis Algorithms |
|
|
|
|
13
12/05 |
Lec 8 |
How to Build a Search Engine
(SaaS)
Architecture for a Search Engine, Google Architecture,
Evolution of Google’s Search Systems |
|
|
|
|
14
19/05 |
|
Project Demonstrations |
|
|
|
|
|