Escolar Documentos
Profissional Documentos
Cultura Documentos
2
What is Graphviz?
CASSANDRA System
ation as diagrams of abstract assumptions made
graphs and networks. It has Distance-based
important applications in (matching, VSIM, k-NN,
networking, bio-informatics, CR, PageRank, Kmeans)
software engineering, database
– Features used
and web design, machine
learning, and in visual – Structures exploited
interfaces for other technical Model-based (rules, BC,
domains. BN, boosting) 3
Social vs content
from keras.datasets import mnist
0 1 2 3 4
5 6 7 8 9
CASSANDRA System
There are 60,000 training images and 10,000 test images, all
of which are 28 pixels by 28 pixels.
76 english, 38 content="voa,
36 美国之音 " 74 special
74 voice 44 voa
36 америки,
36 голос 4
from module import class
0 1 2 3 4
5 6 7 8 9
CASSANDRA System
from sacred import Experiment
from sacred.observers import MongoObserver
from sacred.utils import apply_backspaces_and_linefeeds
import pymongo, pickle, os
import pydot as pdot
import numpy as np
import tensorflow as tf
5
from keras import backend as K
@ex.automain
def define_and_train(batch_size, epochs,
convolution_layers,
maxpooling_pool_size, maxpooling_dropout,
dense_layers, dense_dropout,
CASSANDRA System
final_dropout,_run):
from keras.datasets import mnist
from keras.models import Sequential #convolution
from keras.layers import Dense, Dropout, Flatten, Conv2D,
from keras.utils import to_categorical
from keras.losses import categorical_crossentropy
from keras.optimizers import Adadelta
from keras import backend as K 6
from keras.callbacks import ModelCheckpoint, Callback
GEO Cluster Demo
An agent or probe that collects threat data from the security sensor
Normalization and correlation middleware. A console and associated database
for managing the solution and its alerts.
https://www.esecurityplanet.com/views/article.php/1501001/Security-Threat-Correlation-The-Next-Battlefield.htm 7
7
MongoDB
Start the shell process mongod from
Call from script
CASSANDRA System
8
MongoDB My Cluster sacred.runs & completed
CASSANDRA System
9
Task Manager
10
https://pythonexample.com/search/mnist%20tensorboard%20demo/8 10
Task II
11
https://pythonexample.com/search/mnist%20tensorboard%20demo/8 11
Task III
12
https://pythonexample.com/search/mnist%20tensorboard%20demo/8 12
What's behind test ? (backend pattern, crossentropy)
60000/60000 [==============================] - 426s 7ms/step - loss: 0.4982 - acc: 0.8510 -
val_loss: 0.0788 - val_acc: 0.9749
Using TensorFlow backend.
INFO - MNIST-Convnet4 - Result: 0.9749
INFO - MNIST-Convnet4 - Completed after 0:07:27
Test loss: 0.0788029053777
Test accuracy: 0.9749
59392/60000 [============================>.] - ETA: 5s - loss: 0.0571 - acc: 0.9829
59520/60000 [============================>.] - ETA: 3s - loss: 0.0572 - acc: 0.9829
59648/60000 [============================>.] - ETA: 2s - loss: 0.0572 - acc: 0.9829
59776/60000 [============================>.] - ETA: 1s - loss: 0.0572 - acc: 0.9829
59904/60000 [============================>.] - ETA: 0s - loss: 0.0573 - acc: 0.9829
60000/60000 [==============================] - 513s 9ms/step - loss: 0.0573 - acc:
0.9829 - val_loss: 0.0312 - val_acc: 0.9891
Using TensorFlow backend.
INFO - MNIST-Convnet4 - Result: 0.9891
INFO - MNIST-Convnet4 - Completed after 0:33:28
Test loss: 0.0311644290059 13
Test accuracy: 0.9891
13
What's behind code ? (keras, pymongo, graphviz)
db = pymongo.MongoClient('mongodb://localhost:27017/').sacred
print(tf.__version__)
os.environ["PATH"] += os.pathsep +
'C:/Program Files (x86)/Graphviz2.38/bin/'
ex = Experiment("MNIST-Convnet4")
ex.observers.append(MongoObserver.create())
ex.captured_out_filter = apply_backspaces_and_linefeeds
https://www.programcreek.com/python/example/103267/keras.datasets.mnist.load_data
14
14
PIP3 Install
pip3 install sacred
Collecting sacred
Downloading
https://files.pythonhosted.org/packages/2d/86/7be3af
a4d4c1c0c76a5de03e5ff779797ab2654e377685255c11c13c0e
a5/sacred-0.7.3-py2.py3-none-any.whl (82kB)
Collecting pymongo
Downloading
https://files.pythonhosted.org/packages/46/39/b9bb7fed3e3a0ea621a1
512a938c105cd996320d7d9894d8239ca9093340/pymongo-3.6.1-cp36-cp36m-
win_amd64.whl (291kB)
100% |¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦¦| 296kB 728kB/s
Installing collected packages: pymongo
Successfully installed pymongo-3.6.1
https://github.com/pinae/Sacred-MNIST/blob/master/train_convnet.py 15
15
Cluster with different inputs, parameters, label
from collections import Counter
import matplotlib.pyplot as plx
import nltk, import scipy, import spacy
stemmer = nltk.SnowballStemmer("english")
from collections import defaultdict
from sklearn.cluster import KMeans
from gensim import corpora, models, similarities
0:Title,1:URL,2:Tags,3:Keywords,4:Relevance,5:Text,6:Kword
train_size=70;
CLUSTERSIZE = 8 #11
COMMONKEYWORDS = 90; #2000
SETCOL = 6;
DATSET = 15070; 16
https://www.springboard.com/blog/data-mining-python-tutorial/ 16
[0 1 2 1 1 1 1 0 0 1 1 0 0 2 1]
(array([0, 1, 2]), array([5, 8, 2], dtype=int64))
(array([0, 1, 2]), array([8, 5, 2], dtype=int64)) 17
metrics.adjusted_mutual_info_score(labels1, labels2): 1.0
17
https://backlinko.com/long-tail-keywords
18
18
Create Questions (Method, Algos, Tools)
Dataset Finding the question is often more important than finding the answer
John Tukey
https://www.soovle.com/ 19
https://answerthepublic.com/reports/ 19
Machine Learning Process Chain
• Collab (Set a control thesis, understand the
data, get resources Python etc.)
• Collect (Scrapy data, store, filter data)
- TextCrawled_20180420_IR7_all.xlsx
http://www.softwareschule.ch/examples/machinelearning.jpg
20
20
TASK12: Vectorise Data
Time Series Autocorrelation
https://spacy.io/
21
21
EXAMPLE: keyword gen., Cluster Framework
Keyword List
This tool shows the which words are unusually frequent
(or infrequent) in the corpus in comparison with the words in a
reference corpus. This allows you to identify characteristic words
in the corpus, for example, as part of a genre or ESP study.
22
School of Science and Engineering, Waseda University, 3-4-1 Okubo, Shinjuku-ku,
Tokyo 169-8555, Japan 22
Visualizing Cosine Distance
v ( a, j ) v(b, j )
similarity of doc a to doc b = sim( a, b)
word i v ( a, j ' )
2
(b, j ' )
v 2
CASSANDRA System
|| A || ( a, j ' )
j'
v 2
word 1
doc c
word 2
doc d
...
...
23
word n
Double Trouble
THE TEST OVERVIEW
Status Description
QUEUED
File
The run was just "C:\Users\max\AppData\Local\Programs\P
queued ython\Python36\lib\site-
and not run yet packages\sklearn\metrics\cluster\unsupervis
RUNNING
Currently running (but see below) ed.py", line 254, in calinski_harabaz_score
COMPLETED intra_disp += np.sum((cluster_k -
Completed successfully mean_k) ** 2)
FAILED
The run failed due to an exception MemoryError
INTERRUPTED
The run was cancelled with a
KeyboardInterrupt No. of URLs removed 76,732,515
TIMED_OUT + No. of robots.txt 3,675,634
The run was aborted using a TimeoutInterrupt requests
[custom]
A custom py:class: - No. of excludedURLs 3,050,768
~sacred.utils.SacredInterrupt = No. of HTTP requests 77,357,381
occurred
HTTP requests not 1,763850
respond
24
24
SUMMARY & QUESTIONS
Which Stat / TF Package – Proposal Keras, Bayes & KMeans
Mindtoolset : https://basta.net/speaker/max-kleiner/
KMeans-Watson-ElasticSearchSQLServer-Scrapy-TensorFlow-SVM-
RandomForest-Sacred-MongoDB
https://sacred.readthedocs.io/en/latest/tensorflow.html
https://www.dewresearch.com/
https://ofai.github.io/million-post-corpus/
singularitynet.io
25
25