|
|
Gediz University, Computer Engineering
Department
Spring
2016
Lecture:
Tuesday: 09:00 -
11:45, A-117
|
|
Instructor: Halûk
Gümüşkaya |
Teaching Assistant: |
Office:
D107 |
Office:
|
Office Hours:
Hours:
Mon-Wed: 13:00 -
14:00 |
Office Hours: |
e-mail: haluk.gumuskaya@gediz.edu.tr |
e-mail:
|
|
|
|
|
|
|
|
|
Course Description
Introduction to data
mining. Descriptions of Data, Data Preprocessing: data cleaning,
integration and reduction. Data Warehousing and On-line Analytical
Processing, Association and Correlation Analysis, Classification:
decision trees, naïve bayesian classification, support vector machines,
neural networks, rule-based classification, pattern-based
classification, logistic regression, Cluster Analysis, Outlier Analysis.
Prerequisite
Assumes only a modest statistics or mathematics background, and no database
knowledge is needed.
Lecture Schedule
(tentative)
W |
Lec |
Topics Covered |
0
23/02 |
Lec 0 |
Course
Overview
Data Deluge and Technical
Challenges, Definitions for Data Mining and Related
Sciences, Data Science Related Jobs and University Graduate
Programs, DIKW Process Examples, Tools and Languages for
Data Science, Course Description and Objectives,
Requirements and Assumptions, Course Outline, Text Books,
Other Lecture Materials and Tools, Course Activities and
Grading |
1
01/03 |
Lec
1 |
Introduction
to Data Mining
Why Data Mining? What is Data
Mining? What Kinds of Data can be Mined? Data Mining
Functions Which Sciences and Technologies are Used? Which
Kinds of Applications are Targeted? Major Issues in Data
Mining A Brief History of Data Mining and Data Mining
Society |
2
08/03 |
Lec 2 |
Getting to Know Your Data
Data Objects and Attribute Types, Basic Statistical
Descriptions of Data, Data Visualization, Measuring Data
Similarity and Dissimilarity |
3
15/03 |
Lec 3 |
Data Preprocessing
Data Preprocessing: An Overview, Data Cleaning, Data
Integration, Data Reduction, Data Transformation and Data
Discretization |
4
22/03 |
Lab 1 |
Getting to Know Your Data
and Data Preprocessing using WEKA |
5
29/03 |
Lec 4 |
Association Analysis-Basics
Concepts and Methods
Introduction, Frequent Itemsets, Scalable Mining Methods
and Apriori Algorithm, Finding Association Rules with
Apriori, Learning Association Rules using WEKA |
6
05/04 |
Lec 5 |
Classification: Basic
Concepts, Decision Trees, and Model Evaluation
Basic Concepts, Decision Tree Based Classification,
Practical Issues of Classification, Model Evaluation,
Classification using WEKA |
7
12/04 |
|
Midterm Exam I |
8
19/04 |
|
Classification-Advanced
Methods: Rule-Based Classification, Bayes Classification
Methods, Neural Networks and Classification by
Backpropagation, Support Vector Machines, Classification by
Using Frequent Patterns, Lazy Learners (or Learning from
Your Neighbors) |
9
26/04 |
|
Classification-Additional
Topics: Other Classification Methods, Model Evaluation
and Selection, Techniques to Improve Classification
Accuracy: Ensemble Methods |
10
03/05 |
|
Classification Applications |
11
10/05 |
|
Cluster Analysis-Basic
Concepts and Methods: Cluster Analysis: Basic Concepts,
Partitioning Methods, Hierarchical Methods, Density-Based
Methods, Grid-Based Methods, Evaluation of Clustering |
12
17/05 |
|
Cluster Analysis-Advanced
Methods: Probability Model-Based Clustering, Clustering
High-Dimensional Data, Clustering Graphs and Network Data,
Clustering with Constraints |
13
24/05 |
|
Outlier Analysis:
Outlier and Outlier Analysis, Outlier Detection Methods,
Statistical Approaches, Proximity-Base Approaches,
Clustering-Base Approaches, Classification Approaches,
Mining Contextual and Collective Outliers, Outlier Detection
in High Dimensional Data |
14
31/05 |
|
Midterm Exam II |
Textbooks
Main Textbooks and Materials
|
Data Mining: Concepts and Techniques, 3rd Edition, J. Han, M.
Kamber, J. Pei, Morgan Kaufmann, 769 pp, 2011. |
|
Introduction to Data Mining, P. Tan, M. Steinbach, V. Kumar,
Addison-Wesley, 769 pp, 2006. |
|
Data Mining:
Practical Machine Learning Tools and Techniques, 3rd Edition,
Ian
H. Witten, E. Frank, M. A. Hall, Morgan Kaufmann, 629 pp, 2011. |
|
Two Data Mining with Weka
Courses, (Youtube
Channel) Ian
H. Witten, MOOC (Massive Open Online Courses) from the University of Waikato, New Zealand. |
|
Weka Tutorial by Rushdi Shams |
Recommended
|
Mining Massive Datasets, A. Rajaraman and J. Ullman, 2nd
Edition, Cambridge University Press, 2014, You can be download it
from here
(511 pages, 3 MB). |
|
Introduction to
Machine Learning, 2nd Edition, Ethem Alpaydın, The MIT Press,
2010. |
| Python for Data Analysis, W.
McKinney, OReilly, 2013. |
| Machine Learning in Action, P.
Harrington, Manning Publications, 2012. |
| Mahout in Action, S. Owen, R.
Anil, T. Dunning, E. Friedman, Manning Publications, 2012. |
Tools and Development Environments
|
Weka, Data
Mining Software in Java |
|
Matlab
|
Grading
20 % : HW Assignments
25 % :
Midterm 1
25 % :
Midterm 2
30 % : Final
|