Você está na página 1de 5

CSC 588: Introduction to Data Warehousing and Data Mining Instructor

Dr. Ghada Badr


Lectures: Tuesdays 8:00-10:30 am, building 17, part 2 (for parallel program: Base., room 25) Office Location: Building 20, office12 Basement Office hours: TBA E-mail: badrghada at hotmail.com ghbadr at ksu.edu.sa

Teaching Assistants
TBA Labs: Tuesdays 12:00-1:00 pm Office Location: TBA Office hours: TBA E-mail: TBA

About the Course


As an introductory course on data mining, this course introduces the concepts, algorithms, techniques, and systems of data warehousing and data mining, including (1) what is data mining? (2) get to know your data and data preprocessing, integration and transformation, (3) design and implementation of data warehouse and OLAP systems, (4) data cube technology, (5) mining frequent patterns and association: basic concepts and advanced methods, (6) classification and prediction: basic concepts and (7) cluster analysis: basic concepts. The course will serve both senior-level computer science undergraduate students and the first-year graduate students interested in the field. Also, the course may attract students from other disciplines who need to understand, develop, and use data warehouse and data mining systems to analyze large amounts of data.

Prerequisites

Background: "Data Structure and Software Principles" or consent of instructor (good statistics and machine learning knowledge will help better understanding the course materials). Programming: We will give one programming assignments. You will need to be familiar with at least one programming language, such as C++, or Java. We will not cover programming-specific issues in this course.

Textbook

Jiawei Han, Micheline Kamber and Jian Pei, Data Mining: Concepts and Techniques, 2nd ed., Morgan Kaufmann, 2006.

References
The following texts are recommended but not required. There are numerous other books or online resources on data mining available. E. Alpaydin. Introduction to Machine Learning, 2nd ed., MIT Press, 2011. T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed., Springer-Verlag, 2009. T. M. Mitchell, Machine Learning, McGraw Hill, 1997. P.-N. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining, Addison Wesley, 2005. I. H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, Morgan Kaufmann, 2nd ed. 2005. Lecture slides contain most technical briefing and reference materials. Please study the materials in class preparation and class review. There are many research papers that will help understand the course contents. Please check the references of this course to obtain further information

Source codes and implementations of data mining algorithms


Source codes for Frequent Pattern Mining, Clustering, Time Series and Web Mining allgorithms implemented by Chinese Univ. of Hong Kong: http://appsrv.cse.cuhk.edu.hk/~kdd/program.html FIMI workshops: Datasets and source codes for frequent itemset mining implementations: http://fimi.cs.helsinki.fi/ Frequent itemset mining algorithm implementations by Bart Goethal: http://www.adrem.ua.ac.be/~goethals/software/ Repository of implementations of UIUC data mining research package: IlliMine: http://illimine.cs.uiuc.edu/ Weka: Weka 3 - Data Mining with Open Source Machine Learning Software in Java: http://www.cs.waikato.ac.nz/ml/weka/ Graph mining algorithm implemtations: gSpan and CloseGraph

Course Format, Activities, Evaluation


This course will draw materials mainly from the textbook, the course slides are important references. Students will study the materials and complete all the course requirements.

Reading: Before and After Classes


We encourage students to read ahead, before lectures for the materials to be discussed. Please check the schedule page to see what will be covered in each lecture before the class begins.

Homework and programming assignments


There will about 3 assignments, spaced out over the course of the semester. Among these assignments, one (or at least part of it) will be a programming assignment.

Examinations
There will be three exams: Two midterm exams each will be 1.5 hours in length, and the final will be 3 hours in length. We will not normally give make-ups for missed exams.

Evaluation
We plan to determine final grades of the course in the following way: Assignments: 6% (2 homework assignments, 3% each) Quizzes: 6%. (3 quizzes, 2% each) Lab work: 3% (attendance and lab work) Two Midterm exams: 30% (First exam 10%, second exam 20%) Final exam: 40% Project: Option1: survey (10%) + assignment 3 (5%) Option 2: Software project or research project(15 %)

Course project
You can choose one of the following options: 1. Survey: (2-3 students) Writing a focused, comprehensive survey on a focused topic of data warehouse or data mining, for example, a survey on data warehouse architectures, clustering methods, or Frequent Itemsets techniques. You will need to make a talk by the end of the year (no power point presentation is required). For this option you will be required to do and submit assignment 3. 2. Data mining software function maker or a full data warehouse: (4-5 students) Implementing one high-performance, fully documented open source data mining function maker or a full data warehouse application, as discussed in the textbook, in Java or C++ (or any programming language that you may prefer). This should include a user-interface and visualization package. You will be required to write a report and do a presentation. Whoever decides to go with this one will be exempted from assignment 3 with its mark to be added to the project mark. [Note: copying online open source software is considered as plagiarism!] 3. Research Project: (3-4 students) You can also propose and work out a research project. In this project you compare two or three algorithms and try to study the time, accuracy, or space performance of the different algorithms under comparison. You may come up with a conclusion from your results about the best algorithm to use and in which cases. You will be required to write a report and do a presentation. You will be exempted from assignment 3 with its mark to be added to the project mark.

Project Schedule
1. One page proposal (week 4) (1%)
One page project proposal, with name, title, abstract and reference list should be handed in for comments and feedbacks. Please submit the proposal in class or to go to Dr. Ghada Badr office (Room 12, basement) and submit it before 1:30 pm Wednesday, Oct. 5th

2. Mid-term review (week 10) (4%)


Check the progress of the project. Discuss with TA about your problems, progress, and further work.

3. Final submission (week 16) (10%)


Submit a final report or survey, talks and presentations.

Class Schedule for CSC 588


This page provides our class schedule for previous semesters. This semester the schedule may be modified slightly based on the progress of the class. Week# 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 12 12 13 13 14 14 15 15 16 16 17 Topic Class Outline / Chapter 1: Introduction Chapter 1: Introduction Chapter 1: Introduction Chapter 1: Introduction Chapter 2: Data Preprocessing Chapter 2: Data Preprocessing Chapter 2: Data Preprocessing Chapter 2: Data Preprocessing Chapter 2: Data Preprocessing Chapter 2: Data Preprocessing Chapter 3: Data Warehousing and Data Cube Chapter 3: Data Warehousing and Data Cube First Midterm exam Chapter 3: Data Warehousing and Data Cube Chapter 3: OLAP Chapter 3: OLAP Winter break Winter break Chapter 3: OLAP Chapter 3: OLAP Chapter 5: Mining Frequent Patterns Chapter 5: Mining Frequent Patterns Chapter 5: Mining Frequent Patterns Chapter 6: Classification: Basic Concepts Second Midterm exam Chapter 6: Classification: Basic Concepts Chapter 6: Classification: Basic Concepts Chapter 7: Cluster Analysis: Basic Concepts Chapter 7: Cluster Analysis: Basic Concepts Chapter 7: Cluster Analysis: Basic Concepts Presentations Presentations Final Exam Final examine Hand in the Assign#3 due final project and Projects due survey reports Surveys due Assign#3 Assign#2 Due Mid term proj. review Assign#2 Assign#1 Due One page proj. proposal Assign#1 Assignment Out Assignment Due

Você também pode gostar