Você está na página 1de 8

Apache Spark MLlib

What is Apache Spark ?

What is MLlib ?

Functionality

Dependencies

Books

Eco-system

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz

Spark What is it ?

Alternative to Map Reduce for certain applications

A low latency cluster computing system

For very large data sets

May be 100 times faster than Map Reduce

Used with Hadoop / HDFS

Uses in memory cluster computing

Memory access faster than disk access

Has API's written in Scala / Java / Python

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz

Spark MLlib What is it ?

Spark Machine Learning Library

Provided with Spark Install

Code in Scala / Java / Python

Contain libraries

Spark.mllib

Spark.ml ( V1.2 )

Provides common functionality

classification, regression, clustering

collaborative filtering, dimensionality reduction

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz

Spark MLlib Functionality

Basic Stats

Classification and regression

Collaborative Filtering

Clustering

Dimensionality reduction

Feature extraction and transformation

Optimization

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz

Spark MLlib Dependencies

NumPy for Python

Breeze ( linear algebra )

Netlib-java

Jblas

Gfortran runtime library

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz

Available Books

See our Hadoop book from Apress / Springer

Big Data Made Easy

Look out for our Apache Spark based book

from Packt in 2015

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz

Spark Eco system

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz

Contact Us

Feel free to contact us at

www.semtech-solutions.co.nz

info@semtech-solutions.co.nz

We offer IT project consultancy

We are happy to hear about your problems

You can just pay for those hours that you need

To solve your problems

Você também pode gostar