Course Schedule
(Week 1, Week 2, Week 3, Week 4, Week 5, Week 6, Week 7, Week 8)
Week 1
Tuesday (October 16): Course Introduction
- Topics (Session Slides):
- Getting acquainted
- Course overview, goals, and administrivia
- Introduction to Data Mining
- Overview/goals of data mining (DM) and knowledge discovery (KD)
- Myths about data mining
- Readings:
- TB: Chapters 1 & 2
- Data, data everywhere, The Data deluge, The Economist, 2/10.
- A Golden Vein, The Economist, 1/04.
- Cases:
- Diamonds in the Data Mine, HBR, 5/03
- How Verizon cut Customer Churn, Financial Express, 10/03.
- Hard Hats for Data Miners: Myths and Pitfalls of Data Mining, DMReview, 5/05.
- Skim:
- Resources:
- Online Data Mining Portals: KDnuggets, The Data Mine
Thursday (October 18): The Data Mining Process - Data Extraction and Manipulation
- Topics (Session Slides):
- Overview of the Data Mining Process
- The Relational Data Model and Relational DBMS
- Enterprise Reporting
- Relational Algebra
- Principles of Query Formulation
- Database Definition and Manipulation in MS ACCESS
- Readings:
- TB: Chapter 2
- Skim: CRISP-DM Process Model
- Resources:
- MS Access 2010 Tutorials
- MS Access 2007 Quick (Interface) Tutorial: 1, 2.
- Interactive SQL Tutorial
- Assignments:
- Download and Complete the MS Access Lab Tutorial.
- Assignment 1 -- Data Manipulation Using Access (due Thursday, October 25)
Week 2
Tuesday (October 24): Data Warehousing, OLAP and Multidimensional Data Analysis
- Topics (Session Slides: 1, 2):
- The Case for Datawarehousing
- Multidimensional Databases
- On-Line Analytical Processing
- Demo - Pivot Tables
- Readings:
- H. Watson and B. Haley, "Datawarehousing: Managerial Considerations," Communications of the ACM, Vol. 41, No. 9 (Sept. 1998), Pages 32-37.
- An Introduction to OLAP Multidimensional Terminology and Technology
- Related Links/Resources:
Thursday (October 25): Data Exploration
- Topics (Session Slides):
- Data Exploration
- Review of Descriptive Statistics and Probability Concepts
- Readings:
- Chapters 3 & 4
- Basic Probability Notes
Week 3
Tuesday (October 30): Association Rule Mining
- Topics (Session Slides):
- Market Basket Analysis and Other Applications
- Frequent Itemset and Association Rule Mining
- Rule Support & Confidence
- Readings:
- TB: Chapter 13
- Related Links (FYI):
Thursday (November 1): Association Rule Mining (Continued)
- Topics:
- Apriori Algorithm
- Rule Evaluation
- Sequential patterns
- Mining for Association Rules using XLminer
- Readings:
- R. Agrawal and R. Srikant, "Fast Algorithms for Mining Association Rules," Proc. 20th Int. Conf. Very Large Data Bases (VLDB), 1994. (FYI; only skim)
- Assignments:
- Assignment 2 -- Association Mining (due Tuesday, November 6)
Week 4
Tuesday (November 6): Cluster Analysis
- Topics (Session Slides):
- Segmentation and Personalization
- Similarity Measures
- The K-means algorithm (Excel Spreadsheet Demo)
- Hierarchical Clustering
- Cluster Validation and Interpretation
- Readings:
- TB: Chapter 14
- Related Links (FYI):
Thursday (November 8): Cluster Analysis (Continued)
- Topics:
- Cluster Evaluation
- Clustering using XLminer
- Demo: Synthetic Dataset
- Assignments:
- Assignment 3 -- Cluster Analysis (due Thursday, November 15)
Week 5
Tuesday (November 13): Midterm Exam
Thursday (November 15): Predictive Modeling - Classification and Regression Trees
- Topics (Session Slides):
- General Approach to Solving Classification Problems
- Tree Induction
- Readings:
- Textbook: Chapter 9
- Related Links (FYI):
- Other:
- Solution to Assignment 3 on Cluster Analysis: 14.2_Pharmaceuticals (Spreadsheet), 14.3_Cereals (Spreadsheet)
Week 6
Tuesday (November 20): Model Evaluation
- Topics (Session Slides):
- More on Classification
- Overfitting and Underfitting
- Accuracy & Recall
- Classification (Confusion) Matrix
- Building Decision Tree Models in XLMiner
- Response Modeling (Handout)
- Readings:
- Textbook: Chapter 5
Thursday (November 22): No Class - Happy Thanksgiving!
Week 7
Tuesday (November 27) and Thursday (November 29): Predictive Modeling Using Regression
- Topics (Session Slides):
- Response Modeling (ctd)
- Review of OLS Regression
- Variable Selection and Stepwise Regression
- Logistic Regression
- Model Evaluation and Interpretation
- Readings:
- Chapters 6 & 10
Week 8
Tuesday (December 4): Predictive Modeling Using Neural Networks & Ensamble Methods
- Topics (Session Slides: Neural Networks, Ensambles)
- Introduction to Neural Networks
- Neural Networks vs. Regression
- Ensamble Methods
- Readings:
- Chapter 11
- Related Links (FYI):
- Introduction to Neural Networks
- Classification-based: Forecasting Fraudulent Financial Statements using Data Mining (IJCI 2006).
Tuesday (December 11 @4PM): Final Exam