EKON22 Introduction To Machinelearning

Introduction to
Machine Learning
Max Kleiner 10.2018

ML Agenda
4 Cases with 5 Scripts
 Data Reduction - EKON22_PCA_1.py
 Regression - EKON22_REG_2.py
 Clustering - EKON22_CLU_3.py
 Classification - EKON22_CLA_4.py
 Decision Tree, Random Forest - EKON32_DET_5.py
 Cluster & Classify with different inputs, algos ,config
 Define label, features or topic ratings, hyper-parameters, tests
– assumed/implicit labels, predict versus target, random state
 Conclusions/ ML Process Summary/
2
http://www.softwareschule.ch/examples/machinelearning.jpg
2
PCA (Principal Component Analysis)
C:\maXbox\EKON22\EKON22_scripts\EKON22_PCA_1.py
Visualizing 2 or 3 dimensional data is not that challenging.

However, even Iris dataset used 4 dim. Use PCA to reduce 4 dim
data into 2 or 3 dim so that you can plot & understand data better.
Use StandardScaler for features onto unit scale (mean = 0, variance
= 1) which is a requirement for optimal performance.
http://playground.tensorflow.org_maXbox2 3
https://www.springboard.com/blog/data-mining-python-tutorial/
3
Topic IRIS Classify Task
4
https://sebastianraschka.com/images/blog/2015/principal_component_analysis_files/iris.png
https://www.springboard.com/blog/data-mining-python-tutorial/ 4
Regression and Correlation
@C:\maXbox\EKON22\EKON22_scripts\EKON22_REG_2.py
CASSANDRA System
2. C:\maXbox\mX46210\ntwdblib.dll\UnsharpDetector-master\UnsharpDetector-master\inference_gui.py
5
From Correlation to 4 Dim Cluster
Finding the question is often more important than finding the answer - John Tukey
https://www.soovle.com/ 6
https://answerthepublic.com/reports/ 6
Clustering from module import class
0 1 2 3 4
5 6 37 81 9
@C:\maXbox\EKON22\EKON22_scripts\EKON22_CLU_3.py
CASSANDRA System
7
GEO
Cluster
Story
An agent or probe that collects threat data from the security sensor and correlation
middleware. A console and associated database for managing the solution and its alerts. 8
https://www.esecurityplanet.com/views/article.php/1501001/Security-Threat-Correlation-The-Next-Battlefield.htm
8
IRIS Classification Concept
0 1 2 3 4
5 6 4 3 4
from sklearn import datasets, tree
iris = datasets.load_iris()
clf = tree.DecisionTreeClassifier()
CASSANDRA System
clf = clf.fit(iris.data, iris.target)
y_pred = clf.predict(iris.data)
print('Train accuracy_score: ')

metrics.accuracy_score(iris.target,y_pred)
Demo in VSCode /maXbox4

C:\maXbox\softwareschule\MT-HS12-
05\mentor_xml\casra2017\crawler\plot_iris_dataset_mx.py
@C:\maXbox\EKON22\EKON22_scripts\EKON22_CLA_4.py
9
IRIS Confusion Matrix
10
10
IRIS Decision Tree
11
@C:\maXbox\EKON22\EKON22_scripts\EKON23_DET_5.py
11
MongoDB My Cluster sacred.runs & completed
CASSANDRA System
12
Task II
13
13
What's behind test ? (backend pattern, crossentropy)
60000/60000 [==============================] - 426s 7ms/step - loss: 0.4982 - acc: 0.8510 -
val_loss: 0.0788 - val_acc: 0.9749
Using TensorFlow backend.
INFO - MNIST-Convnet4 - Result: 0.9749
INFO - MNIST-Convnet4 - Completed after 0:07:27
Test loss: 0.0788029053777
Test accuracy: 0.9749
 59392/60000 [============================>.] - ETA: 5s - loss: 0.0571 - acc: 0.9829
 59520/60000 [============================>.] - ETA: 3s - loss: 0.0572 - acc: 0.9829
 59648/60000 [============================>.] - ETA: 2s - loss: 0.0572 - acc: 0.9829
 59776/60000 [============================>.] - ETA: 1s - loss: 0.0572 - acc: 0.9829
 59904/60000 [============================>.] - ETA: 0s - loss: 0.0573 - acc: 0.9829
 60000/60000 [==============================] - 513s 9ms/step - loss: 0.0573 - acc:
0.9829 - val_loss: 0.0312 - val_acc: 0.9891
 INFO - MNIST-Convnet4 - Result: 0.9891
 INFO - MNIST-Convnet4 - Completed after 0:33:28
 Test loss: 0.0311644290059
 Test accuracy: 0.9891 14
14
What's behind code ? (Classification Summary)
from sklearn import datasets, metrics

from sklearn.model_selection import train_test_split
>>> iris = datasets.load_iris()
>>> X = iris.data[0:, 1:3]
>>> y = iris.target
>>> X_train,X_test, y_train, y_test =
train_test_split(X, y,test_size=0.3, random_state=20)
>>> from sklearn import svm
>>> classifier = svm.SVC(kernel='linear', C=1.0)
>>> classifier.fit(X_train, y_train)
>>> y_pred = classifier.predict(X_test)
>>> from sklearn import metrics

>>> print ("Test - Accuracy SVC:", metrics.accuracy_score(y_test, y_pred))
Test - Accuracy SVC: 0.9555555555555556
15
https://www.programcreek.com/python/example/103267/keras.datasets.mnist.load_data
15
What's behind code II ? (Check for duplicates)
print(y_test)
array([0, 1, 1, 2, 1, 1, 2, 0, 2, 0, 2, 1, 2, 0, 0, 2, 0, 1, 2, 1, 1, 2,
2, 0, 1, 1, 1, 0, 2, 2, 1, 1, 0, 0, 0, 2, 1, 0, 1, 2, 1, 2, 0, 1, 1])
>>> unique, counts = np.unique(y_test, return_counts=True)
>>> dict(zip(unique, counts))
{0: 13, 1: 18, 2: 14}
>>> Xyt = np.column_stack((X_test, y_test))
>>> csort = Xyt[Xyt[:,2].argsort()]
>>> dfiris = pd.DataFrame(csort)
>>> dfiris[0:13].groupby([0,1]).size()
3.0 1.1 1
1.4 2
3.1 1.5 2
3.2 1.2 1 - 3.4 1.4 1 1.6 1 1.7 1 -3.5 1.4 1 - 3.7 1.5 1
1.4 1
3.8 1.6 1
>>> sum(dfiris[0:13].groupby([0,1]).size()>1) 2 16
16
What's behind Python: PIP3 Install
pip3 install sacred
Collecting sacred
Downloading
https://files.pythonhosted.org/packages/2d/86/7be3af
a4d4c1c0c76a5de03e5ff779797ab2654e377685255c11c13c0e
a5/sacred-0.7.3-py2.py3-none-any.whl (82kB)
Collecting pymongo
Downloading
https://files.pythonhosted.org/packages/46/39/b9bb7fed3e3a0ea621a1
512a938c105cd996320d7d9894d8239ca9093340/pymongo-3.6.1-cp36-cp36m-
win_amd64.whl (291kB)
100% |¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦| 296kB 728kB/s
Installing collected packages: pymongo
Successfully installed pymongo-3.6.1
https://github.com/pinae/Sacred-MNIST/blob/master/train_convnet.py 17
17
Machine Learning Process Chain
• Collab (Set a control thesis, understand the

problem, get resources Python etc.)
• Collect (Scrapy data, store, data mining,
filter data, inconsistent, incomplete)
• Consolidate or Clean data (normalization and
aggregation, PCA data reduction, Regression,
Filters, slice out irrel. Or ambigous data or
char unicode map prob.)
• Cluster (kmeans for category, collocates for
N-keywords) algorithm – unsupervised)
• Classify (SVM, Sequential, Bayes – supervised)
• Conclude and Control (Predict or report context
thesis and drive data to decision)
http://www.softwareschule.ch/examples/machinelearning.jpg
https://maxbox4.wordpress.com/code/
18
https://www.kaggle.com/ v ( a, j ) v(b, j )
similarity of doc a to doc b = sim(a, b)   
word i  v ( a, j ' )
j'
2
 (b, j ' )
v
j'
2
 A'B'
CASSANDRA System
19
Double Trouble with ML → https://stats.stackexchange.com/
File
Stackexchange,
"C:\Users\max\AppData\Local\Programs\Python\Pyt
THE TEST Stackoverflow
Status Description
OVERVIEW
hon36\lib\site-
QUEUED
The run was just packages\sklearn\metrics\cluster\unsupervised.py",
queued line 254, in calinski_harabaz_score
and not run yet
RUNNING intra_disp += np.sum((cluster_k - mean_k) ** 2)
Currently running (but see below)MemoryError
COMPLETED
File "C:\Users\max\AppData\Local\Programs\Python\Python36\lib\site-
Completed successfully
packages\sklearn\metrics\cluster\unsupervised.py", line 254, in
calinski_harabaz_score
intra_disp += np.sum((cluster_k - mean_k) ** 2) MemoryError
FAILED
The run failed due to an exception No. of URLs removed 76,732,515
INTERRUPTED + No. of robots.txt 3,675,634
The run was cancelled with a requests
KeyboardInterrupt
TIMED_OUT - No. of excludedURLs 3,050,768
The run was aborted using a TimeoutInterrupt = No. of HTTP requests 77,357,381
[custom]
A custom py:class: HTTP requests not 1,763850
~sacred.utils.SacredInterrupt respond
20
occurred
20
QUESTIONS ?
17:45 - 18:30
Machine Learning II
Art. Neural Network
Best Book in my opinion:

Mastering Machine
Learning with
Python in Six Steps
A Practical Implementation
Predictive Data Analytics
21
21

EKON22 Introduction To Machinelearning

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

EKON22 Introduction To Machinelearning

Enviado por

Direitos autorais:

Formatos disponíveis

Introduction to

Max Kleiner 10.2018

Visualizing 2 or 3 dimensional data is not that challenging.

print('Train accuracy_score: ')

Demo in VSCode /maXbox4

from sklearn import datasets, metrics

>>> from sklearn import metrics

• Collab (Set a control thesis, understand the

Best Book in my opinion:

Você também pode gostar