COM 451 Data Mining

Home

 

Gediz University, Computer Engineering Department
Spring 2016

Lecture: Tuesday: 09:00 - 11:45, A-117 

 
  Instructor: Halûk Gümüşkaya Teaching Assistant:
  Office: D107 Office:
  Office Hours: Hours: Mon-Wed: 13:00 - 14:00 Office Hours:
  e-mail: haluk.gumuskaya@gediz.edu.tr e-mail:
   
bullet

Course Description

bullet

Textbooks

bullet

Prerequists

bullet

Tools and Development Environments

bullet

Lecture Schedule

bullet

Grading

  Course Description

Introduction to data mining. Descriptions of Data, Data Preprocessing: data cleaning, integration and reduction. Data Warehousing and On-line Analytical Processing, Association and Correlation Analysis, Classification: decision trees, naïve bayesian classification, support vector machines, neural networks, rule-based classification, pattern-based classification, logistic regression, Cluster Analysis, Outlier Analysis.

  Prerequisite

   Assumes only a modest statistics or mathematics background, and no database knowledge is needed.

  Lecture Schedule (tentative)

W

Lec

 Topics Covered

0
23/02
Lec 0 Course Overview
Data Deluge and Technical Challenges, Definitions for Data Mining and Related Sciences, Data Science Related Jobs and University Graduate Programs, DIKW Process Examples, Tools and Languages for Data Science, Course Description and Objectives, Requirements and Assumptions, Course Outline, Text Books, Other Lecture Materials and Tools, Course Activities and Grading
1
01/03
Lec 1 Introduction to Data Mining
Why Data Mining? What is Data Mining? What Kinds of Data can be Mined? Data Mining Functions Which Sciences and Technologies are Used? Which Kinds of Applications are Targeted? Major Issues in Data Mining A Brief History of Data Mining and Data Mining Society
2
08/03
Lec 2 Getting to Know Your Data
Data Objects and Attribute Types, Basic Statistical Descriptions of Data, Data Visualization, Measuring Data Similarity and Dissimilarity
3
15/03
Lec 3 Data Preprocessing
Data Preprocessing: An Overview, Data Cleaning, Data Integration, Data Reduction, Data Transformation and Data Discretization
4
22/03
Lab 1 Getting to Know Your Data and Data Preprocessing using WEKA
5
29/03
Lec 4 Association Analysis-Basics Concepts and Methods
Introduction, Frequent Itemsets, Scalable Mining Methods and Apriori Algorithm, Finding Association Rules with Apriori, Learning Association Rules using WEKA
6
05/04
Lec 5 Classification: Basic Concepts, Decision Trees, and Model Evaluation
Basic Concepts, Decision Tree Based Classification, Practical Issues of Classification, Model Evaluation, Classification using WEKA
7
12/04
Midterm Exam I
8
19/04
  Classification-Advanced Methods: Rule-Based Classification, Bayes Classification Methods, Neural Networks and Classification by Backpropagation, Support Vector Machines, Classification by Using Frequent Patterns, Lazy Learners (or Learning from Your Neighbors)
9
26/04
  Classification-Additional Topics: Other Classification Methods, Model Evaluation and Selection, Techniques to Improve Classification Accuracy: Ensemble Methods
10
03/05
  Classification Applications
11
10/05
  Cluster Analysis-Basic Concepts and Methods: Cluster Analysis: Basic Concepts, Partitioning Methods, Hierarchical Methods, Density-Based Methods, Grid-Based Methods, Evaluation of Clustering
12
17/05
  Cluster Analysis-Advanced Methods: Probability Model-Based Clustering, Clustering High-Dimensional Data, Clustering Graphs and Network Data, Clustering with Constraints
13
24/05
  Outlier Analysis: Outlier and Outlier Analysis, Outlier Detection Methods, Statistical Approaches, Proximity-Base Approaches, Clustering-Base Approaches, Classification Approaches, Mining Contextual and Collective Outliers, Outlier Detection in High Dimensional Data
14
31/05
  Midterm Exam II

  Textbooks

   Main Textbooks and Materials

bullet Data Mining: Concepts and Techniques, 3rd Edition, J. Han, M. Kamber, J. Pei, Morgan Kaufmann, 769 pp, 2011. 
bullet Introduction to Data Mining, P. Tan, M. Steinbach, V. Kumar, Addison-Wesley, 769 pp, 2006.
 
bullet Data Mining: Practical Machine Learning Tools and Techniques, 3rd Edition, Ian H. Witten, E. Frank, M. A. Hall, Morgan Kaufmann, 629 pp, 2011.
bullet Two Data Mining with Weka Courses, (Youtube Channel) Ian H. Witten, MOOC (Massive Open Online Courses) from the University of Waikato, New Zealand.
bullet Weka Tutorial by Rushdi Shams

   Recommended

bullet Mining Massive Datasets, A. Rajaraman and J. Ullman, 2nd Edition, Cambridge University Press, 2014, You  can be download it from here (511 pages, 3 MB).
bullet Introduction to Machine Learning, 2nd Edition, Ethem Alpaydın, The MIT Press, 2010.
bulletPython for Data Analysis, W. McKinney, O’Reilly, 2013.
bulletMachine Learning in Action, P. Harrington, Manning Publications, 2012.
bulletMahout in Action, S. Owen, R. Anil, T. Dunning, E. Friedman, Manning Publications, 2012.

  Tools and Development Environments

bullet

Weka, Data Mining Software in Java

bullet

Matlab

  Grading

   
20 % : HW Assignments
    25 % : Midterm 1
    25 % : Midterm 2
   
30 % : Final
 

Home