Course Schedule
(Week 1, Week 2, Week 3, Week 4, Week 5, Week 6, Week 7, Week 8, Week 9)
Week 1
Thursday (September 24): Course Introduction
- Topics (Session Slides):
- Getting acquainted
- Course overview, goals, and administrivia
- Introduction to Data Mining
- Overview/goals of data mining (DM) and knowledge discovery (KD)
- Myths about data mining
- Readings:
- TB: Chapters 1 & 2
- A Golden Vein, The Economist, 1/04.
- Cases:
- Diamonds in the Data Mine, HBR, 5/03
- How Verizon cut Customer Churn, Financial Express, 10/03.
- Hard Hats for Data Miners: Myths and Pitfalls of Data Mining, DMReview, 5/05.
- Skim: Competing on Analytics, HBR, 1/06. Staring You in the Face: The path to new products might start with the customer data you've already collected, WSJ, 9/08.
- Other:
- SAS Data Mining Certificate at Fisher
- Online Data Mining Portals: KDnuggets, The Data Mine
Week 2
Tuesday (September 29): The Data Mining Process - Data Extraction and Manipulation
- Topics (Session Slides):
- Overview of the Data Mining Process
- The Relational Data Model and Relational DBMS
- Enterprise Reporting
- Relational Algebra
- SQL: The Relational Query Language
- Readings:
- TB: Chapter 2
- CRISP-DM Process Model
- Resources:
- MS Access (2003) Primer
- MS Access 2007 Quick (Interface) Tutorial: 1, 2.
- Interactive SQL Tutorial
Thursday (October 1): Data Extraction and Manipulation
- Topics:
- Database Schemas and Instances
- Principles of Query Formulation
- Database Definition and Manipulation in MS ACCESS
- Query formulation (Demo DB)
- Readings:
- Skim through chapters 1-4 in the online access tutorial book.
- Assignments:
- Download and Complete the MS Access Lab Tutorial.
- Assignment 1 -- Data Manipulation Using Access (due Tuesday, October 6)
Week 3
Tuesday (October 6): Data Warehousing
- Topics (Session Slides):
- The Case for Datawarehousing
- Building a datawarehouse
- Readings:
- H. Watson and B. Haley, "Datawarehousing: Managerial Considerations," Communications of the ACM, Vol. 41, No. 9 (Sept. 1998), Pages 32-37.
- Related Links/Resources (FYI):
Thursday (October 8): OLAP and MDDB
- Topics (Session Slides):
- Multidimensional Databases
- On-Line Analytical Processing
- Demo - Pivot Tables
- Readings:
- Assignments:
- Assignment 2 -- Data Partitioning, Binning, and Summarization (due Thursday, October 15)
- Related Links (FYI):
Week 4
Tuesday (October 13): Data Exploration
- Topics (Session Slides):
- Data Types
- Data Summarization and Visualization
- Measures of Association
- Basic probability concepts
- Readings:
- TB: Chapter 3
- Probability Notes
Thursday (October 15): Association & Market-Basked Analysis
- Topics (Session Slides):
- Market Basket Analysis and Other Applications
- Frequent Itemset and Association Rule Mining
- Rule Support & Confidence
- Readings:
- TB: Chapter 11
- Related Links (FYI):
Week 5
Tuesday (October 20): Association Rule Mining (Continued)
- Topics:
- Apriori Algorithm
- Rule Evaluation
- Sequential patterns
- Readings:
- R. Agrawal and R. Srikant, "Fast Algorithms for Mining Association Rules," Proc. 20th Int. Conf. Very Large Data Bases (VLDB), 1994. (FYI; only skim)
- Assignments:
- Assignment 3 -- Association Mining (due Tuesday, October 27)
Thursday (October 22): Association Rule Mining (Continued)
- Topics:
- Mining for Association Rules using XLminer
- Mining for Association Rules using SAS EM
- Resources:
Week 6
Tuesday (October 27): Cluster Analysis
- Topics (Session Slides):
- Segmentation and Personalization
- Similarity Measures
- The K-means algorithm (Excel Spreadsheet Demo)
- Hierarchical Clustering
- Cluster Validation and Interpretation
- Readings:
- TB: Chapter 12
- Related Links (FYI):
Thursday (October 29): Cluster Analysis (Continued)
- Topics:
- Clustering using XLminer
- Clustering using SAS EM
- Resources:
- Assignments:
- Assignment 4 -- Cluster Analysis (due Thursday, November 5)
Week 7
Tuesday (November 3): Midterm Exam
Thursday (November 5): Classification
- Topics (Session Slides):
- General Approach to Solving Classification problems
- Decision Tree Induction
- Readings:
- Textbook: Chapter 7
- Related Links (FYI):
Week 8
Tuesday (November 10): Model Evaluation
- Topics (Session Slides):
- Decision Trees (continued)
- Accuracy measures
- Lift Charts
- Building Decision Tree Models in XLMiner
- Other:
- Solution to Assignment 4: 12.2_Pharmaceuticals (Spreadsheet), 12.3_Cereals (Spreadsheet)
Thursday (November 12): Model Evaluation (Continued)
- Topics:
- Lift Charts
- Response Modeling
Week 9
Teusday (November 17): Predictive Modeling Using Regression
- Topics (Session Slides):
- Introduction to OLS Regressions
- Simple and Multiple Regression
- Variable Selection and Stepwise Regression
- Readings:
- Chapter 5
- Related Links (FYI):
Thursday (November 19): Predictive Modeling Using Logistic Regression & Neural Networks
- Topics (Session Slides: Logistic Regression, Neural Networks):
- Logistic Regression
- Introduction to Neural Networks
- Neural Networks vs. Regression
- Model Evaluation
- Readings:
- Chapters 8 & 9
- Related Links (FYI):
- Assignments:
- Assignment 5 -- Predictive Modeling (due Tuesday, December 1)