|
|
Gediz University, Computer Engineering
Department
Spring 2015
Tuesday:
13:00 -
14:45,
A-Z04 |
|
Instructor: Halûk
Gümüşkaya |
|
Office:
D107 |
|
Office Hours:
Mon: 15:00 17:00 Tue: 16:00 17:00 |
|
Phone:
0232-355 0000 - 2305 |
|
e-mail: haluk.gumuskaya@gediz.edu.tr |
|
|
|
|
Pages: |
|
|
|
|
|
|
|
|
|
|
Course Description
(3-0-3)
Data deluge, Computing
Model: Clouds, Data Centers, Virtualization, Research Model: 4th
Paradigm, Data Science Process: DIKW, Recommender Systems, Algorithms:
User-based Nearest-Neighbor Collaborative Filtering, Vector Space
Formulation of Recommender Systems, Item-based Collaborative Filtering,
k Nearest Neighbors and High Dimensional Spaces, Basic Principles of
Parallel Computing, Cloud Computing Technologies for Big Data
Applications and Analytics: Apache Data Analysis Open Stack, MapReduce,
Hadoop, Web Search, Text Mining and their Technologies, Kmeans and
MapReduce Parallelism, PageRank, NoSQL, BigTable, HBase, Indexing
Technologies, Pig and Hive, Pig PageRank, Pig K-means, Build Search
Engine, Internet of Things and Sensors.
Prerequisites
None (Catalog), but
recommended courses:
|
COM 440 Distributed Systems
|
|
COM 444 Cloud Computing |
Lecture Schedule
|
This is the tentative lecture schedule.
Please check this page at least once a week during the semester. |
Textbooks
Cloud
Computing
Data
Science and Data Processing Platforms
|
The Fourth Paradigm: Data-Intensive Scientific Discovery, T.
Hey, Tansley and Tolle (Editors), Microsoft Research, 2009. (You can
download the book from its web site). |
|
Phyton for Data Analysis, W.
McKinney, OReilly, 2013. |
|
Machine Learning in Action, P.
Harrington, Manning Publications, 2012. |
|
Hadoop:
The Definitive Guide, Tom White, O'Reilly, 2012.
|
|
Mahout in Action, S. Owen, R.
Anil, T. Dunning, E. Friedman, Manning Publications, 2012. |
Tools
and Platforms
|
FutureSystems -
Indiana University Clusters,
our project
portal
address, and
all projects.
|
|
NumPy, SciPy, MatPlotlib - Powerful tools which every data scientist
who uses Python must know |
|
Canopy - An IDE for Python |
|
Plotviz - A data visualization tool developed at Indiana University
for displaying point distributions in 3D
|
|
Virtualization software:
Oracle
VM Box |
|
Hadoop Ecosystem - Cloud
software tools to develop and run data-intensive applications
|
|
Java development environments
|
Grading
30 % : Project
15 % :
Homework
10 % : Attendance,
Discussion, Contribution
20 % :
Midterm Exam
25 % : Final Exam
|