SMY 535 Data Mining

Home

 

Gediz University, Computer Engineering Department
Spring Semester 2012, Tuesday

 
  Instructor: Halûk Gümüşkaya Teaching Assistant:
  Office: D107 Office:
  Office Hours: Mon, Wed, Thur: 13:00 - 13:45 Office Hours:
  Phone: 0232-355 0000 - 2305 Phone:
  e-mail: haluk.gumuskaya@gediz.edu.tr e-mail:
   
bullet

Course Description

bullet

Textbooks

bullet

Prerequists

bullet

Tools and Development Environments

bullet

Lecture Schedule

bullet

Grading

  Course Description

Introduction to data mining. Descriptions of Data, Data Preprocessing: data cleaning, integration and reduction. Data Warehousing and On-line Analytical Processing, Association and Correlation Analysis, Classification: decision trees, naïve bayesian classification, support vector machines, neural networks, rule-based classification, pattern-based classification, logistic regression, Cluster Analysis, Outlier Analysis.

  Prerequisite

   Probability and Statistics

  Lecture Schedule (tentative)

W

D

Lec

 Topics Covered

1 14/02   Introduction: An Overview of Data Mining
2 21/02    Getting to Know Your Data: Data Objects and Attribute Types, Basic Statistical Descriptions of Data, Data Visualization, Measuring Data Similarity and Dissimilarity
3 28/02   Data Preprocessing (1/2): Data Preprocessing: An Overview: Data Quality, Major Tasks in Data Preprocessing, Data Cleaning, Data Integration
4 06/03   Data Preprocessing (2/2): Data Reduction, Data Transformation and Data Discretization
5 13/03   Mining Frequent Patterns, Association and Correlations-Basic Concepts and Methods: Basic Concepts, Frequent Itemset Mining Methods, Which Patterns are Interesting?—Pattern Evaluation Methods
6 20/03   Advanced Frequent Pattern Mining: Pattern Mining: A Road Map, Pattern Mining in Multi-Level Multi-Dimensional Space, Constraint-Based Frequent Pattern Mining, Mining High-Dimensional Data and Colossal Patterns, Mining Compressed or Approximate Patterns, Pattern Exploration and Application
7 27/03   Classification-Basic Concepts: Classification: Basic Concepts, Decision Tree Induction
8 03/04   Classification-Advanced Methods: Rule-Based Classification, Bayes Classification Methods, Neural Networks and Classification by Backpropagation, Support Vector Machines, Classification by Using Frequent Patterns, Lazy Learners (or Learning from Your Neighbors)
9 10/04   Classification-Additional Topics: Other Classification Methods, Model Evaluation and Selection, Techniques to Improve Classification Accuracy: Ensemble Methods
10 17/04   Cluster Analysis-Basic Concepts and Methods: Cluster Analysis: Basic Concepts, Partitioning Methods, Hierarchical Methods, Density-Based Methods, Grid-Based Methods, Evaluation of Clustering
11 24/04   Cluster Analysis-Advanced Methods: Probability Model-Based Clustering, Clustering High-Dimensional Data, Clustering Graphs and Network Data, Clustering with Constraints
12 01/05   Outlier Analysis: Outlier and Outlier Analysis, Outlier Detection Methods, Statistical Approaches, Proximity-Base Approaches, Clustering-Base Approaches, Classification Approaches, Mining Contextual and Collective Outliers, Outlier Detection in High Dimensional Data
13 08/05   Project Demonstrations 1
14 15/05   Project Demonstrations 2

  Textbooks

    Main Textbook

bullet Data Mining: Concepts and Techniques, 3rd Edition, J. Han, M. Kamber, J. Pei, Morgan Kaufmann, 769 pp, 2011.  

    Recommended

bullet Data Mining: Practical Machine Learning Tools and Techniques, 3rd Edition, I. H. Witten, E. Frank, M. A. Hall, Morgan Kaufmann, 629 pp, 2011.
bullet Introduction to Data Mining, P. Tan, M. Steinbach, V. Kumar, Addison-Wesley, 769 pp, 2006.
bullet Introduction to Machine Learning, 2nd Edition, Ethem Alpaydın, The MIT Press, 2010.

  Tools and Development Environments

bullet

Weka, Data Mining Software in Java

bullet

Matlab

  Grading

    10
% : ADC (Attendance, Discussion and Contribution)
    20
% : HW Assignments
    20 % : Midterm 1 (Classification)
    20 % : Midterm 2 (Clustering)
    30
% : Project
 

Home