|
|
Gediz University, Computer Engineering
Department
Spring
Semester
2012,
Tuesday |
|
Instructor: Halûk
Gümüşkaya |
Teaching Assistant: |
Office:
D107 |
Office:
|
Office Hours: Mon,
Wed, Thur: 13:00 - 13:45 |
Office Hours: |
Phone:
0232-355 0000 - 2305 |
Phone: |
e-mail: haluk.gumuskaya@gediz.edu.tr |
e-mail:
|
|
|
|
|
|
|
|
|
Course Description
Introduction to data
mining. Descriptions of Data, Data Preprocessing: data cleaning,
integration and reduction. Data Warehousing and On-line Analytical
Processing, Association and Correlation Analysis, Classification:
decision trees, naïve bayesian classification, support vector machines,
neural networks, rule-based classification, pattern-based
classification, logistic regression, Cluster Analysis, Outlier Analysis.
Prerequisite
Probability and Statistics
Lecture Schedule
(tentative)
W |
D |
Lec |
Topics Covered |
1 |
14/02 |
|
Introduction:
An Overview of Data Mining |
2 |
21/02 |
|
Getting to Know Your Data:
Data Objects and Attribute Types, Basic Statistical
Descriptions of Data, Data Visualization, Measuring Data
Similarity and Dissimilarity |
3 |
28/02 |
|
Data Preprocessing (1/2): Data
Preprocessing: An Overview: Data Quality, Major Tasks in
Data Preprocessing, Data Cleaning, Data Integration |
4 |
06/03 |
|
Data Preprocessing (2/2): Data
Reduction, Data Transformation and Data Discretization |
5 |
13/03 |
|
Mining Frequent Patterns,
Association and Correlations-Basic Concepts and Methods:
Basic Concepts, Frequent Itemset Mining Methods, Which
Patterns are Interesting?Pattern Evaluation Methods |
6 |
20/03 |
|
Advanced Frequent Pattern
Mining: Pattern Mining: A Road Map, Pattern Mining in
Multi-Level Multi-Dimensional Space, Constraint-Based
Frequent Pattern Mining, Mining High-Dimensional Data and
Colossal Patterns, Mining Compressed or Approximate
Patterns, Pattern Exploration and Application |
7 |
27/03 |
|
Classification-Basic
Concepts: Classification: Basic Concepts, Decision Tree
Induction |
8 |
03/04 |
|
Classification-Advanced
Methods: Rule-Based Classification, Bayes Classification
Methods, Neural Networks and Classification by
Backpropagation, Support Vector Machines, Classification by
Using Frequent Patterns, Lazy Learners (or Learning from
Your Neighbors) |
9 |
10/04 |
|
Classification-Additional
Topics: Other Classification Methods, Model Evaluation
and Selection, Techniques to Improve Classification
Accuracy: Ensemble Methods |
10 |
17/04 |
|
Cluster Analysis-Basic
Concepts and Methods: Cluster Analysis: Basic Concepts,
Partitioning Methods, Hierarchical Methods, Density-Based
Methods, Grid-Based Methods, Evaluation of Clustering |
11 |
24/04 |
|
Cluster Analysis-Advanced
Methods: Probability Model-Based Clustering, Clustering
High-Dimensional Data, Clustering Graphs and Network Data,
Clustering with Constraints |
12 |
01/05 |
|
Outlier Analysis:
Outlier and Outlier Analysis, Outlier Detection Methods,
Statistical Approaches, Proximity-Base Approaches,
Clustering-Base Approaches, Classification Approaches,
Mining Contextual and Collective Outliers, Outlier Detection
in High Dimensional Data |
13 |
08/05 |
|
Project Demonstrations 1 |
14 |
15/05 |
|
Project Demonstrations 2 |
Textbooks
Main Textbook
Recommended
|
Data Mining:
Practical Machine Learning Tools and Techniques, 3rd Edition, I.
H. Witten, E. Frank, M. A. Hall, Morgan Kaufmann, 629 pp, 2011. |
|
Introduction to Data Mining, P. Tan, M. Steinbach, V. Kumar,
Addison-Wesley, 769 pp, 2006. |
|
Introduction to
Machine Learning, 2nd Edition, Ethem Alpaydın, The MIT Press,
2010. |
Tools and Development Environments
|
Weka, Data
Mining Software in Java |
|
Matlab
|
Grading
10 % :
ADC (Attendance, Discussion and Contribution)
20 % : HW Assignments
20 % :
Midterm 1 (Classification)
20 % :
Midterm 2 (Clustering)
30 % : Project
|