Você está na página 1de 17

CPE 695 WS: Applied Machine Learning

Lecture 0: Course Logistics and Introduction to ML

Dr. Shucheng Yu, Associate Professor


Department of Electrical and Computer Engineering
Stevens Institute of Technology
Course Logistics

Instructor: Dr. Shucheng Yu, shucheng.yu@stevens.edu, Burchard 412


Office hours: Wednesday 3:00pm – 5:00pm
Course web address: https://sit.instructure.com/courses/32374 (Canvas shell)
Prerequisite: College mathematics; Python programming is a plus
Textbook: No required textbook
Recommended books:
1) Machine Learning, by Tom M. Mitchell, McGraw-Hill

2) Hands-On Machine Learning with Scikit-Learn&Ten-


sorFlow, by Aurelien Geron, O’Reilly Publication.

3) Deep Learning, by Ian Goodfellow etc., MIT Press

4) Pattern Recognition and Machine Learning, by


Christopher Bishop. PDF is available HERE.

2
Course Objectives

After successful completion of this course, students will be able to


• Understand the basic principles and algorithms of representative machine learning systems
including supervised learning, unsupervised learning, batch learning, online learning,
instance-based learning, and model-based learning;
• Select appropriate machine learning algorithms for real-world tasks;
• Implement learning systems and train models with programming languages such as
Python;
• Choose appropriate performance measurement metrics, tune and evaluate the trained
model against the metrics;
• Apply related data analytic techniques for data acquisition, cleaning and visualization.

3
Grading Policy

Homework: 32% 320 points


Final project: 36% 360 points
Midterm exam: 20% 200 points
Final exam: 12% 120 points
-----------------------------------------------------------
Total: 100% 1000 points

4
Grading Policy

Homework: 32% 320 points Final letter grades:

Final project: 36% 360 points A: [93%, 100%]


Midterm exam: 20% 200 points A-: [90%, 93%)
B+: [87%, 90%)
Final exam: 12% 120 points B: [83%, 87%)
----------------------------------------------------------- B-: [80%, 83%)
C+: [77%, 80%)
Total: 100% 1000 points C: [73%, 77%)
C-: [70%, 73%)

5
Homework (320 points)

• Eight (8) homework assignments in total


• Each counts for 40 points on average (some more some less)
• Usually you have two (2) weeks to finish a homework if programming
questions included, or one (1) week if no programming question.
• 5 points off per day after due date
§ Computer broken on the last day?
o But you had two weeks, why last minute?

§ Submitted a wrong file?


o It is your responsibility to double check.

§ Did not notice the deadline.


• Bad excuse.

§ Other excuses …
6
Midterm & Final Exams

• Midterm: 200 points; Final: 120 points


• Both are online take-home open-book exams
• The final exam is NOT cumulative
• NO makeup exams
• Excused absence shall have prior arrangement/consent with instructor
• University integrity code will be strictly followed!

7
Final Group Project (360 points)
• Identify a machine learning problem. The instructor will provide a list of projects
from Kaggle.com.
• Do literature review to find existing solutions (e.g., by other teams on Kaggle) to
your problem. Understand what methods they used.
• Identify your machine learning algorithm(s) for the problem, implement and
optimize it (them), upload to Kaggle and obtain your ranking at Kaggle;
• Your final report shall include: 1) introduction: the statement of the problem, 2)
related work: existing solutions to this problems, 3) your solutions: i) a description
of the dataset, ii) machine learning algorithm(s) you use and why they are
considered, iii) the implementation process, including how did you tune the
parameters to get the best results, 4) comparison of your results with those by
top ranking Kaggle teams in terms of accuracy, computational cost, and other
advantages/disadvantages, 5) any future research directions to improve your
algorithm, 6) conclusion, and 7) references.
• THREE (3) members in each group. Instructor will assign if you can’t find a team.

8
Final Group Project (360 points)
Deliverables:
1. Proposal (20 points): one-page PDF file including
1) Problem statement
2) Description of data set
3) Implementation plan
4) Team members & task allocation

2. Mid-stage report (40 points): ≥ 3 pages in PDF


1) Introduction (problem statement)
2) Related work
3) Your solution
a) Data set description
b) Machine learning algorithm(s) you used
c) Preliminary implementation and results

9
Final Group Project (360 points)
Deliverables:
4. Final report (240 points): ≥ 6 pages in a PDF file
1) Shall contain ALL the seven sections (see slide # 8)
2) Written using IEEE format template (available in Canvas)
Editing tool: latex (most widely used in academia; learn to use it)
3) Source code: made available at Kaggle.com
5. Final presentation video (60 points)
1) UP to 12 minutes each team
2) Slides shall contain enough details but concise
Final Report Grading criteria:
a) PDF report (60 points): mainly based on the quality of writing, e.g., it shall contain all
the seven sections, provide sufficient discussion, and is written and organized
professionally.
b) Overall quality (180 points): mainly based on your ranking at Kaggle.com.
top 10%: 180 points; top 10 – 20%: 170 points; top 20 - 30%: 160 points;
top 30 - 40%: 150 points; top 40 - 50%: 140 points; top 50 - 60%: 130 points;
top 60 - 70%: 120 points; top 70 – 80%: 110 points; top 80 - 90%: 100 points.
other: based on actual code quality but no more than 100 points.

10
Tentative Schedule (see syllabus)

11
Machine Learning Datasets
• UCI Repository:
o http://www.ics.uci.edu/~mlearn/MLRepository.html

• UCI KDD Archive:

o http://kdd.ics.uci.edu/summary.data.application.html

• MNIST handwritten digit database:

o http://yann.lecun.com/exdb/mnist/
• Face Databases (a good collection of various face databases )

o http://web.mit.edu/emeyers/www/face_databases.html#ar

• Statlib:

o http://lib.stat.cmu.edu/

• Delve:
o http://www.cs.utoronto.ca/~delve/

• Kaggle(business applications)

o https://www.kaggle.com/

12
Related Journals and Conferences
• Journals:
– Journal of Machine Learning Research
– Machine Learning
– IEEE Transactions on Pattern Analysis and Machine Intelligence
– IEEE Transactions on Neural Networks
– Neural Computation
– Neural Networks
– ...
• Conferences:
– International Conference on Machine Learning (ICML)
– The IEEE International Conference on Data Mining (ICDM)
– Neural Information Processing Systems (NIPS)
– International conference on Knowledge Discovery and Data
Mining (KDD)
– ...

13
Machine Learning Software

• Python
o Default language for programming assignments & project
• Matlab
• R/Splus
• JAVA
• SAS

14
Ethic Statement

All Stevens graduate students promise to be fully truthful and avoid dishonesty, fraud,
misrepresentation, and deceit of any type in relation to their academic work. A student’s
submission of work for academic credit indicates that the work is the student's own. All outside
assistance must be acknowledged. Any student who violates this code or who knowingly
assists another student in violating this code shall be subject to discipline.

All graduate students are bound to the Graduate Student Code of Academic Integrity by
enrollment in graduate coursework at Stevens. It is the responsibility of each graduate student
to understand and adhere to the Graduate Student Code of Academic Integrity. More
information including types of violations, the process for handling perceived violations, and
types of sanctions can be found at www.stevens.edu/provost/graduate-academics.

Please note that assignments in this class may be submitted to www.turnitin.com, a


web-based anti-plagiarism system, for an evaluation of their originality.

15
Ethic Statement

In addition to university integrity code enforcement, this course will


penalize identical or near identical submissions in the following ways:
• First occurrence: 20% off both submissions
• Second occurrence : 50% off both submissions
• Third occurrence: 100% off both submissions
This policy will be strictly implemented and applicable for ANY assignments
and exams of this course.

16

Você também pode gostar