Ebook1,184 pages8 hours

Scala for Machine Learning

Name: Scala for Machine Learning
Author: Nicolas Patrick R.
ISBN: 9781783558759

By Nicolas Patrick R.

Rating: 0 out of 5 stars

()

Read preview

About this ebook

Are you curious about AI? All you need is a good understanding of the Scala programming language, a basic knowledge of statistics, a keen interest in Big Data processing, and this book!

Skip carousel

LanguageEnglish

PublisherPackt Publishing

Release dateDec 17, 2014

ISBN9781783558759

Author

Nicolas Patrick R.

Related authors

Skip carousel

Related to Scala for Machine Learning

Related ebooks

Skip carousel

Scala Design Patterns
Ebook
Scala Design Patterns
byNikolov Ivan
Rating: 0 out of 5 stars
0 ratings
Python: Deeper Insights into Machine Learning
Ebook
Python: Deeper Insights into Machine Learning
byJohn Hearty
Rating: 0 out of 5 stars
0 ratings
Mastering Python Scientific Computing
Ebook
Mastering Python Scientific Computing
byMehta Hemant Kumar
Rating: 4 out of 5 stars
4/5
Julia for Data Science
Ebook
Julia for Data Science
byAnshul Joshi
Rating: 0 out of 5 stars
0 ratings
Mastering Objectoriented Python
Ebook
Mastering Objectoriented Python
bySteven F. Lott
Rating: 5 out of 5 stars
5/5
JavaScript for .NET Developers
Ebook
JavaScript for .NET Developers
byOvais Mehboob Ahmed Khan
Rating: 0 out of 5 stars
0 ratings
Machine Learning with R - Second Edition
Ebook
Machine Learning with R - Second Edition
byBrett Lantz
Rating: 5 out of 5 stars
5/5
Expert Python Programming - Second Edition
Ebook
Expert Python Programming - Second Edition
byTarek Ziadé
Rating: 2 out of 5 stars
2/5
Mastering Django: Core
Ebook
Mastering Django: Core
byNigel George
Rating: 0 out of 5 stars
0 ratings
C# 6 and .NET Core 1.0: Modern Cross-Platform Development
Ebook
C# 6 and .NET Core 1.0: Modern Cross-Platform Development
byMark J. Price
Rating: 0 out of 5 stars
0 ratings
Deep Learning for Vision Systems
Ebook
Deep Learning for Vision Systems
byMohamed Elgendy
Rating: 5 out of 5 stars
5/5
Python: Real-World Data Science
Ebook
Python: Real-World Data Science
byRobert Layton
Rating: 0 out of 5 stars
0 ratings
Learning Apache Mahout
Ebook
Learning Apache Mahout
byTiwary Chandramani
Rating: 0 out of 5 stars
0 ratings
Microsoft Azure Machine Learning
Ebook
Microsoft Azure Machine Learning
bySumit Mund
Rating: 4 out of 5 stars
4/5
Mastering Python
Ebook
Mastering Python
byRick van Hattem
Rating: 0 out of 5 stars
0 ratings
Machine Learning with R - Third Edition: Expert techniques for predictive modeling, 3rd Edition
Ebook
Machine Learning with R - Third Edition: Expert techniques for predictive modeling, 3rd Edition
byBrett Lantz
Rating: 0 out of 5 stars
0 ratings
Functional Python Programming
Ebook
Functional Python Programming
bySteven Lott
Rating: 0 out of 5 stars
0 ratings
Learning Concurrent Programming in Scala - Second Edition
Ebook
Learning Concurrent Programming in Scala - Second Edition
byAleksandar Prokopec
Rating: 0 out of 5 stars
0 ratings
Mastering Scala Machine Learning
Ebook
Mastering Scala Machine Learning
byAlex Kozlov
Rating: 0 out of 5 stars
0 ratings
Scala for Data Science
Ebook
Scala for Data Science
byBugnion Pascal
Rating: 0 out of 5 stars
0 ratings
Scala Test-Driven Development
Ebook
Scala Test-Driven Development
byGaurav Sood
Rating: 0 out of 5 stars
0 ratings
Python Text Processing with NLTK 2.0 Cookbook: LITE
Ebook
Python Text Processing with NLTK 2.0 Cookbook: LITE
byJacob Perkins
Rating: 4 out of 5 stars
4/5
Scala Data Analysis Cookbook
Ebook
Scala Data Analysis Cookbook
byManivannan Arun
Rating: 0 out of 5 stars
0 ratings
Flask By Example
Ebook
Flask By Example
byDwyer Gareth
Rating: 0 out of 5 stars
0 ratings
Lucene 4 Cookbook
Ebook
Lucene 4 Cookbook
byEdwood Ng
Rating: 0 out of 5 stars
0 ratings
Web Application Development with R Using Shiny - Second Edition
Ebook
Web Application Development with R Using Shiny - Second Edition
byBeeley Chris
Rating: 0 out of 5 stars
0 ratings
Python High Performance - Second Edition
Ebook
Python High Performance - Second Edition
byGabriele Lanaro
Rating: 0 out of 5 stars
0 ratings
Apache Spark Graph Processing
Ebook
Apache Spark Graph Processing
byRamamonjison Rindra
Rating: 0 out of 5 stars
0 ratings
Scala Functional Programming Patterns
Ebook
Scala Functional Programming Patterns
byKhot Atul S.
Rating: 0 out of 5 stars
0 ratings
Python Parallel Programming Cookbook
Ebook
Python Parallel Programming Cookbook
byGiancarlo Zaccone
Rating: 5 out of 5 stars
5/5

Computers For You

Skip carousel

101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters
Ebook
101 Awesome Builds: Minecraft® Secrets from the World's Greatest Crafters
byTriumph Books
Rating: 4 out of 5 stars
4/5
The Invisible Rainbow: A History of Electricity and Life
Ebook
The Invisible Rainbow: A History of Electricity and Life
byArthur Firstenberg
Rating: 4 out of 5 stars
4/5
Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics
Ebook
Standard Deviations: Flawed Assumptions, Tortured Data, and Other Ways to Lie with Statistics
byGary Smith
Rating: 4 out of 5 stars
4/5
The Simulation Hypothesis: An MIT Computer Scientist Shows Why AI, Quantum Physics and Eastern Mystics All Agree We Are In a Video Game
Ebook
The Simulation Hypothesis: An MIT Computer Scientist Shows Why AI, Quantum Physics and Eastern Mystics All Agree We Are In a Video Game
byRizwan Virk
Rating: 5 out of 5 stars
5/5
Ultimate Guide to Mastering Command Blocks!: Minecraft Keys to Unlocking Secret Commands
Ebook
Ultimate Guide to Mastering Command Blocks!: Minecraft Keys to Unlocking Secret Commands
byTriumph Books
Rating: 5 out of 5 stars
5/5
The Hacker Crackdown: Law and Disorder on the Electronic Frontier
Ebook
The Hacker Crackdown: Law and Disorder on the Electronic Frontier
byBruce Sterling
Rating: 4 out of 5 stars
4/5
Grokking Algorithms: An illustrated guide for programmers and other curious people
Ebook
Grokking Algorithms: An illustrated guide for programmers and other curious people
byAditya Bhargava
Rating: 4 out of 5 stars
4/5
Elon Musk
Ebook
Elon Musk
byWalter Isaacson
Rating: 4 out of 5 stars
4/5
Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad
Ebook
Procreate for Beginners: Introduction to Procreate for Drawing and Illustrating on the iPad
byAaron Smith
Rating: 0 out of 5 stars
0 ratings
Slenderman: Online Obsession, Mental Illness, and the Violent Crime of Two Midwestern Girls
Ebook
Slenderman: Online Obsession, Mental Illness, and the Violent Crime of Two Midwestern Girls
byKathleen Hale
Rating: 4 out of 5 stars
4/5
Master Builder Roblox: The Essential Guide
Ebook
Master Builder Roblox: The Essential Guide
byTriumph Books
Rating: 4 out of 5 stars
4/5
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
Ebook
Data Science from Scratch: The #1 Data Science Guide for Everything A Data Scientist Needs to Know: Python, Linear Algebra, Statistics, Coding, Applications, Neural Networks, and Decision Trees
bySteven Cooper
Rating: 4 out of 5 stars
4/5
CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61
Ebook
CompTIA IT Fundamentals (ITF+) Study Guide: Exam FC0-U61
byQuentin Docter
Rating: 0 out of 5 stars
0 ratings
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
Ebook
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing
byCea West
Rating: 5 out of 5 stars
5/5
The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology
Ebook
The ChatGPT Millionaire Handbook: Make Money Online With the Power of AI Technology
byTJ Books
Rating: 0 out of 5 stars
0 ratings
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Ebook
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
byWalter Shields
Rating: 4 out of 5 stars
4/5
Artificial Intelligence: The Complete Beginner’s Guide to the Future of A.I.
Ebook
Artificial Intelligence: The Complete Beginner’s Guide to the Future of A.I.
byJohn Adamssen
Rating: 4 out of 5 stars
4/5
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
Ebook
Python for Beginners. A Smarter Way to Learn Python in 5 Days and Remember it Longer. With Easy Step by Step Guidance and Hands on Examples. (Python Crash Course-Programming for Beginners)
byArthur T. Brooks
Rating: 0 out of 5 stars
0 ratings
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are
Ebook
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are
bySeth Stephens-Davidowitz
Rating: 4 out of 5 stars
4/5
CompTIA Security+ Practice Questions
Ebook
CompTIA Security+ Practice Questions
byIP Specialist
Rating: 2 out of 5 stars
2/5
Hacking: Ultimate Beginner's Guide for Computer Hacking in 2018 and Beyond: Hacking in 2018, #1
Ebook
Hacking: Ultimate Beginner's Guide for Computer Hacking in 2018 and Beyond: Hacking in 2018, #1
byDexter Jackson
Rating: 4 out of 5 stars
4/5
The Professional Voiceover Handbook: Voiceover training, #1
Ebook
The Professional Voiceover Handbook: Voiceover training, #1
byPeter Baker
Rating: 5 out of 5 stars
5/5
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
Ebook
Excel Essentials: A Step-by-Step Guide with Pictures for Absolute Beginners to Master the Basics and Start Using Excel with Confidence
byNigel Tillery
Rating: 0 out of 5 stars
0 ratings
CompTIA Certification: The Ultimate Guide To Discover CompTIA. Certified Quickly And Easily Passing The Certification Exam. Real Practice Test With Detailed Screenshots, Answers And Explanations
Ebook
CompTIA Certification: The Ultimate Guide To Discover CompTIA. Certified Quickly And Easily Passing The Certification Exam. Real Practice Test With Detailed Screenshots, Answers And Explanations
byDavid Mayer
Rating: 0 out of 5 stars
0 ratings
Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition
Ebook
Alan Turing: The Enigma: The Book That Inspired the Film The Imitation Game - Updated Edition
byAndrew Hodges
Rating: 4 out of 5 stars
4/5
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
Ebook
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
byMaximus Wilson
Rating: 0 out of 5 stars
0 ratings
Dark Aeon: Transhumanism and the War Against Humanity
Ebook
Dark Aeon: Transhumanism and the War Against Humanity
byJoe Allen
Rating: 5 out of 5 stars
5/5
Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
Ebook
Machine Learning for Beginners: An Introduction for Beginners, Why Machine Learning Matters Today and How Machine Learning Networks, Algorithms, Concepts and Neural Networks Really Work
bySteven Cooper
Rating: 4 out of 5 stars
4/5
Remote/WebCam Notarization : Basic Understanding
Ebook
Remote/WebCam Notarization : Basic Understanding
byJeannie Eunice Franks
Rating: 3 out of 5 stars
3/5
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
Ebook
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
byCea West
Rating: 4 out of 5 stars
4/5

Related podcast episodes

Skip carousel

Getting Started in Python Cybersecurity and Forensics
Podcast episode
Getting Started in Python Cybersecurity and Forensics
byThe Real Python Podcast
0 ratings
0% found this document useful
55: Go on The Web: Summary Andrew Gerrand (@enneff), Developer Advocate at Google & Go core contributor, talks about GoLang and how it is being used in Web Development today as well as the plans for the future of the Go as a platform for the web. Resources Go...
Podcast episode
55: Go on The Web: Summary Andrew Gerrand (@enneff), Developer Advocate at Google & Go core contributor, talks about GoLang and how it is being used in Web Development today as well as the plans for the future of the Go as a platform for the web. Resources Go...
byThe Web Platform Podcast
100%
100% found this document useful
#059 - 10 Python clean code tips drawn from code reviews
Podcast episode
#059 - 10 Python clean code tips drawn from code reviews
byPybites Podcast
0 ratings
0% found this document useful
Making Automated Machine Learning More Accessible With EvalML: An interview with Angela Lin and Jeremy Shih about the open source EvalML framework for building automated machine learning workflows.
Podcast episode
Making Automated Machine Learning More Accessible With EvalML: An interview with Angela Lin and Jeremy Shih about the open source EvalML framework for building automated machine learning workflows.
byThe Python Podcast.__init__
100%
100% found this document useful
Morgan Senkal: Using Epics to Improve Code Quality Within Sprints: Robby speaks with Morgan Senkal, Software Architect at Metal Toad. Morgan recalls a challenging 15-year-old legacy project that was reminiscent of a Stephen King story and explains what to think about when considering a software rewrite. Morgan and Robby keep a running analogy of technical debt and automotive repairs.
Podcast episode
Morgan Senkal: Using Epics to Improve Code Quality Within Sprints: Robby speaks with Morgan Senkal, Software Architect at Metal Toad. Morgan recalls a challenging 15-year-old legacy project that was reminiscent of a Stephen King story and explains what to think about when considering a software rewrite. Morgan and Robby keep a running analogy of technical debt and automotive repairs.
byMaintainable
0 ratings
0% found this document useful
#37 Prophet, Time Series & Causal Inference, with Sean Taylor
Podcast episode
#37 Prophet, Time Series & Causal Inference, with Sean Taylor
byLearning Bayesian Statistics
0 ratings
0% found this document useful
Computational Thinking & Learning Python During an AI Revolution
Podcast episode
Computational Thinking & Learning Python During an AI Revolution
byThe Real Python Podcast
0 ratings
0% found this document useful
Taking A Tour Of PostgreSQL with Jonathan Katz - Episode 42: A Whirlwind Tour Of The PostgreSQL Database (Interview)
Podcast episode
Taking A Tour Of PostgreSQL with Jonathan Katz - Episode 42: A Whirlwind Tour Of The PostgreSQL Database (Interview)
byData Engineering Podcast
100%
100% found this document useful
Nest, Node.js, & F.Secure - Application Security Weekly #None: In the news, the entire Nest ecosystem of smart home devices goes offline, how Alphabet plans to keep hackers away from this year's election, the Node.js Ecosystem is chaotic and insecure, open-source vulnerabilities plague enterprise codebase...
Podcast episode
Nest, Node.js, & F.Secure - Application Security Weekly #None: In the news, the entire Nest ecosystem of smart home devices goes offline, how Alphabet plans to keep hackers away from this year's election, the Node.js Ecosystem is chaotic and insecure, open-source vulnerabilities plague enterprise codebase...
bySecurity Weekly Podcast Network (Video)
0 ratings
0% found this document useful
State In React: In this episode of Syntax, Scott and Wes talk about state in React: local state, global state, UI state, data state, caching, API data and more! LogRocket - Sponsor LogRocket lets you replay what users do on your site, helping you reproduce bugs and...
Podcast episode
State In React: In this episode of Syntax, Scott and Wes talk about state in React: local state, global state, UI state, data state, caching, API data and more! LogRocket - Sponsor LogRocket lets you replay what users do on your site, helping you reproduce bugs and...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful
Building Real Time Applications On Streaming Data With Eventador - Episode 129: An interview with Eventador CEO Kenny Gorman about the challenges of building a managed service for streaming data to simplify building real time applications
Podcast episode
Building Real Time Applications On Streaming Data With Eventador - Episode 129: An interview with Eventador CEO Kenny Gorman about the challenges of building a managed service for streaming data to simplify building real time applications
byData Engineering Podcast
0 ratings
0% found this document useful
120: FastAPI & Typer - Sebastián Ramírez: Sebastián Ramírez is the developer behind FastAPI for Python REST APIs and Typer, for CLI applications. We discuss FastAPI, Typer, Swagger UI, interface design, autocompletion, and more.
Podcast episode
120: FastAPI & Typer - Sebastián Ramírez: Sebastián Ramírez is the developer behind FastAPI for Python REST APIs and Typer, for CLI applications. We discuss FastAPI, Typer, Swagger UI, interface design, autocompletion, and more.
byTest and Code
0 ratings
0% found this document useful
153: Playwright for Python: end to end testing of web apps - Ryan Howard: Playwright is an end to end automated testing framework for web apps with Python support and even a pytest plugin.
Podcast episode
153: Playwright for Python: end to end testing of web apps - Ryan Howard: Playwright is an end to end automated testing framework for web apps with Python support and even a pytest plugin.
byTest and Code
0 ratings
0% found this document useful
gRPC & protocol buffers: with Askhay Shah
Podcast episode
gRPC & protocol buffers: with Askhay Shah
byGo Time: Golang, Software Engineering
0 ratings
0% found this document useful
Hasty Treat - Why should I use React Hooks?: In this Hasty Treat, Scott and Wes talk about React Hooks and why you might want to use them instead of class components. Sentry - Sponsor If you want to know what’s happening with your errors, track them with . Sentry is open-source error...
Podcast episode
Hasty Treat - Why should I use React Hooks?: In this Hasty Treat, Scott and Wes talk about React Hooks and why you might want to use them instead of class components. Sentry - Sponsor If you want to know what’s happening with your errors, track them with . Sentry is open-source error...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful
#111 The Rise of the Julia Programming Language
Podcast episode
#111 The Rise of the Julia Programming Language
byDataFramed
0 ratings
0% found this document useful
Putting Airflow Into Production With James Meickle - Episode 43: Lessons Learned While Building A Data Science Platform With Airflow (Interview)
Podcast episode
Putting Airflow Into Production With James Meickle - Episode 43: Lessons Learned While Building A Data Science Platform With Airflow (Interview)
byData Engineering Podcast
0 ratings
0% found this document useful
How ChatGPT Changes Tech + The End of Remote Work? — With Aaron Levie
Podcast episode
How ChatGPT Changes Tech + The End of Remote Work? — With Aaron Levie
byBig Technology Podcast
100%
100% found this document useful
Level Up Your Data Platform With Active Metadata: A conversation with Atlan co-founder Prukalpa Sankar about the idea of active metadata and how it can reduce the toil involved in managing a data platform
Podcast episode
Level Up Your Data Platform With Active Metadata: A conversation with Atlan co-founder Prukalpa Sankar about the idea of active metadata and how it can reduce the toil involved in managing a data platform
byData Engineering Podcast
0 ratings
0% found this document useful
Eureka moments with natural language processing: featuring Nicholas Mohnacky of bundleIQ
Podcast episode
Eureka moments with natural language processing: featuring Nicholas Mohnacky of bundleIQ
byPractical AI: Machine Learning, Data Science
0 ratings
0% found this document useful
Python, Django, and Channels: with Andrew Godwin, creator of Django Channels
Podcast episode
Python, Django, and Channels: with Andrew Godwin, creator of Django Channels
byThe Changelog: Software Development, Open Source
0 ratings
0% found this document useful
Episode 19 (Python for Data Science - Python Files - Scripts and Modules)
Podcast episode
Episode 19 (Python for Data Science - Python Files - Scripts and Modules)
byHow to Data (Joshiverse- Journey of a Budding Data Scientist)
0 ratings
0% found this document useful
All Things Azure with Dwayne Monroe: Dwayne Monroe is a senior cloud architect at Cloudreach, an organization that helps enterprises maximize their cloud investments, who’s focused on Azure. Prior to joining Cloudreach, Dwayne worked as a senior Microsoft and cloud architect at High Availabi
Podcast episode
All Things Azure with Dwayne Monroe: Dwayne Monroe is a senior cloud architect at Cloudreach, an organization that helps enterprises maximize their cloud investments, who’s focused on Azure. Prior to joining Cloudreach, Dwayne worked as a senior Microsoft and cloud architect at High Availabi
byScreaming in the Cloud
0 ratings
0% found this document useful
#70 Beyond the Language Wars: R & Python for the Modern Data Scientist
Podcast episode
#70 Beyond the Language Wars: R & Python for the Modern Data Scientist
byDataFramed
0 ratings
0% found this document useful
FluentC++ with Jonathan Boccara: Rob and Jason are joined by Jonathan Boccara to talk about the FluentC++ blog and the benefit of doing daily C++ talks at your office. Jonathan Boccara is a passionate C++ developer working for Murex on a large codebase of financial software. His...
Podcast episode
FluentC++ with Jonathan Boccara: Rob and Jason are joined by Jonathan Boccara to talk about the FluentC++ blog and the benefit of doing daily C++ talks at your office. Jonathan Boccara is a passionate C++ developer working for Murex on a large codebase of financial software. His...
byCppCast
0 ratings
0% found this document useful
25: Selenium, pytest, Mozilla – Dave Hunt: Interview with Dave Hunt @davehunt82. We Cover: Selenium Driver: http://www.seleniumhq.org/ pytest: http://docs.pytest.org/ pytest plugins: pytest-selenium: http://pytest-selenium.readthedocs.io/ pytest-html: https://pypi.python.
Podcast episode
25: Selenium, pytest, Mozilla – Dave Hunt: Interview with Dave Hunt @davehunt82. We Cover: Selenium Driver: http://www.seleniumhq.org/ pytest: http://docs.pytest.org/ pytest plugins: pytest-selenium: http://pytest-selenium.readthedocs.io/ pytest-html: https://pypi.python.
byTest and Code
0 ratings
0% found this document useful
040: Graph Databases: Traditional relational databases like MySQL or Postgres are really good at providing many solutions to the problem of persisting state. But these types of database are really horrible at querying highly connected models in an efficient way. Graph datab...
Podcast episode
040: Graph Databases: Traditional relational databases like MySQL or Postgres are really good at providing many solutions to the problem of persisting state. But these types of database are really horrible at querying highly connected models in an efficient way. Graph datab...
byPHPRoundtable Podcast
0 ratings
0% found this document useful
Harnessing Python for Research: Scientific Applications of Python with Michael Kennedy: Still scrabbling with Excel? Consider Python language uses, says programmer and podcaster Michael Kennedy. A general programming language that is easy to use in multiple environments, Python programming is limitless and has numerous open source...
Podcast episode
Harnessing Python for Research: Scientific Applications of Python with Michael Kennedy: Still scrabbling with Excel? Consider Python language uses, says programmer and podcaster Michael Kennedy. A general programming language that is easy to use in multiple environments, Python programming is limitless and has numerous open source...
byFinding Genius Podcast
0 ratings
0% found this document useful
41. Bob Nystrom
Podcast episode
41. Bob Nystrom
byIt's All Widgets! Flutter Podcast
0 ratings
0% found this document useful
Hasty Treat - Webhooks: In this Hasty Treat, Scott and Wes talk about webhooks — one of those concepts that seems a lot scarier than it actually is. Linode - Sponsor Whether you’re working on a personal project or managing enterprise infrastructure, you deserve simple,...
Podcast episode
Hasty Treat - Webhooks: In this Hasty Treat, Scott and Wes talk about webhooks — one of those concepts that seems a lot scarier than it actually is. Linode - Sponsor Whether you’re working on a personal project or managing enterprise infrastructure, you deserve simple,...
bySyntax - Tasty Web Development Treats
0 ratings
0% found this document useful

Skip carousel

Scikit-Learn: The Ultimate Python Library
APC
Article
Scikit-Learn: The Ultimate Python Library
Jul 15, 2019
4 min read
Manipulate Data Like A Pro With Pandas
Linux Format
Article
Manipulate Data Like A Pro With Pandas
Jul 27, 2021
7 min read
Comparing Time Series Data Like A Pro
Linux Format
Article
Comparing Time Series Data Like A Pro
Jun 1, 2021
8 min read
Fast And Easy Image Processing
Linux Format
Article
Fast And Easy Image Processing
Apr 4, 2023
Credit: https://github.com/oguzhaninan/korkut Shashank Sharma is a trial lawyer in New Delhi and an avid Arch user. He’s been writing about open source software for 20 years and lawyering for 10. You wouldn’t think of the command line as the go-to re
4 min read
What Is The Future Of Game Streaming Now That Stadia Is Dead?
APC
Article
What Is The Future Of Game Streaming Now That Stadia Is Dead?
Oct 31, 2022
Once hyped as being ‘the future of gaming’, the Google Stadia game streaming service was officially, just three years after launch and before even making it to Australian shores. When game streaming first launched we did have some apprehension about
2 min read
In Brief
Linux Format
Article
In Brief
Jun 1, 2021
Mu is a code editor for many forms of Python. We can write standard Python 3 code, create web apps and write code for microcontrollers such as the new Raspberry Pi Pico. Mu is designed for new users and does away with complicated IDEs in favour of a
1 min read
2029 VISION Where Technology Is Taking Business
NZBusiness and Management
Article
2029 VISION Where Technology Is Taking Business
May 27, 2019
6 min read
The Three Cornerstones of a Smart Business
Rotman Management
Article
The Three Cornerstones of a Smart Business
Jan 1, 2019
Adaptable Products. Algorithms cannot iterate without the products—the online consumer interface that delivers customer experience directly while gathering consumer feedback to adjust algorithm models. Google’s search bar is a classic example of prod
1 min read
Working With Lists In Python
APC
Article
Working With Lists In Python
Jun 17, 2019
4 min read
The Razor’s Edge
Linux Format
Article
The Razor’s Edge
Mar 10, 2020
10 min read
Top 10 Programming Languages
PC Pro Magazine
Article
Top 10 Programming Languages
Jan 5, 2023
8 min read
How Image Recognition Works
APC
Article
How Image Recognition Works
Nov 4, 2019
4 min read
AWS vs Azure
Linux Format
Article
AWS vs Azure
Aug 22, 2023
9 min read
Set Up Your First Database
Linux Format
Article
Set Up Your First Database
Aug 25, 2020
1 min read
Develop TCP/IP Servers And Clients
Linux Format
Article
Develop TCP/IP Servers And Clients
Aug 23, 2022
RUST OUR EXPERT Get the code for this tutorial from the Linux Format archive: www. linuxformat. com/archives ?issue=293. You can learn more about Rust at www. rust-lang.org. This month we’ll learn how to develop TCP/IP servers and clients in Rust
10 min read
Rokoko Studio 2.0
3D World
Article
Rokoko Studio 2.0
Feb 23, 2021
1 min read
An Introduction To Rabbitmq
Linux Format
Article
An Introduction To Rabbitmq
Jun 29, 2021
RabbitMQ is a Message Broker, which means that it can safely hold messages generated by applications and make them available to other applications. The main advantages are reliability, support for clustering and high-availability queues, tracing capa
1 min read
Build A Search And Analytic Engine
Linux Format
Article
Build A Search And Analytic Engine
Mar 10, 2020
7 min read
» Stochastic Algorithms
Linux Format
Article
» Stochastic Algorithms
Dec 14, 2021
If you’re up for some relatively maths-heavy computer-science reading (and who isn’t?), then consider looking into stochastic algorithms. Sometimes lumped together with machine-learning, stochastic algorithms is a loosely defined category that you co
1 min read
Common Errors
Linux Format
Article
Common Errors
Aug 27, 2019
If you receive a ‘Script not found’ error, this probably means that you don’t have the mod scripts installed in your Minecraft directory. Check that you’ve replaced .minecraft with the one from McPiFoMo; this should include mcpipy, which will be full
1 min read
Meet The Team
Linux Format
Article
Meet The Team
Oct 19, 2021
1 min read
Updating Sites
Linux Format
Article
Updating Sites
Oct 19, 2021
1 min read
Ice Cold With Kali
Linux Format
Article
Ice Cold With Kali
May 2, 2023
3 min read
Intel Core i3-13100F
Linux Format
Article
Intel Core i3-13100F
Jun 27, 2023
2 min read
Right To Repair
Linux Format
Article
Right To Repair
Feb 7, 2023
2 min read
FLASK Web Frameworks
Linux Format
Article
FLASK Web Frameworks
Jun 4, 2019
The main focus of Python has always been to get you cracking on with your coding – the language was never made for web programming. However, this has just made it more interesting to extend the language for the web, or to create an interface to web-b
9 min read
Create Asynchronous Code With Python
Linux Format
Article
Create Asynchronous Code With Python
Jun 29, 2021
8 min read
How To Create Your Own Android Apps Without Coding
TechLife
Article
How To Create Your Own Android Apps Without Coding
Oct 21, 2019
THIS ANDROID HOW-TO explores new territory for many readers, but we really wanted to answer the questions: Can you create a mobile device app without becoming a programmer? Is it affordable, and is it worth the effort? It all started when an app-buil
4 min read
MapReduce: The ‘Big Data’ Idea Inside Your Android Phone
APC
Article
MapReduce: The ‘Big Data’ Idea Inside Your Android Phone
Dec 2, 2019
4 min read
Create A RESTful Server In Go
Linux Format
Article
Create A RESTful Server In Go
Oct 19, 2021
8 min read

Related categories

Skip carousel

Reviews for Scala for Machine Learning

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Scala for Machine Learning - Nicolas Patrick R.

Scala for Machine Learning

Credits

About the Author

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why subscribe?

Free access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

1. Getting Started

Mathematical notation for the curious

Why machine learning?

Classification

Prediction

Optimization

Regression

Why Scala?

Abstraction

Scalability

Configurability

Maintainability

Computation on demand

Model categorization

Taxonomy of machine learning algorithms

Unsupervised learning

Clustering

Dimension reduction

Supervised learning

Generative models

Discriminative models

Reinforcement learning

Tools and frameworks

Java

Scala

Apache Commons Math

Description

Licensing

Installation

JFreeChart

Description

Licensing

Installation

Other libraries and frameworks

Source code

Context versus view bounds

Presentation

Primitives and implicits

Primitive types

Type conversions

Operators

Immutability

Performance of Scala iterators

Let's kick the tires

Overview of computational workflows

Writing a simple workflow

Selecting a dataset

Loading the dataset

Preprocessing the dataset

Basic statistics

Normalization and Gauss distribution

Plotting data

Creating a model (learning)

Classify the data

Summary

2. Hello World!

Modeling

A model by any other name

Model versus design

Selecting a model's features

Extracting features

Designing a workflow

The computational framework

The pipe operator

Monadic data transformation

Dependency injection

Workflow modules

The workflow factory

Examples of workflow components

The preprocessing module

The clustering module

Assessing a model

Validation

Key metrics

Implementation

K-fold cross-validation

Bias-variance decomposition

Overfitting

Summary

3. Data Preprocessing

Time series

Moving averages

The simple moving average

The weighted moving average

The exponential moving average

Fourier analysis

Discrete Fourier transform (DFT)

DFT-based filtering

Detection of market cycles

The Kalman filter

The state space estimation

The transition equation

The measurement equation

The recursive algorithm

Prediction

Correction

Kalman smoothing

Experimentation

Alternative preprocessing techniques

Summary

4. Unsupervised Learning

Clustering

K-means clustering

Measuring similarity

Overview of the K-means algorithm

Step 1 – cluster configuration

Defining clusters

Defining K-means

Initializing clusters

Step 2 – cluster assignment

Step 3 – iterative reconstruction

Curse of dimensionality

Experiment

Tuning the number of clusters

Validation

Expectation-maximization (EM) algorithm

Gaussian mixture model

EM overview

Implementation

Testing

Online EM

Dimension reduction

Principal components analysis (PCA)

Algorithm

Implementation

Test case

Evaluation

Other dimension reduction techniques

Performance considerations

K-means

PCA

Summary

5. Naïve Bayes Classifiers

Probabilistic graphical models

Naïve Bayes classifiers

Introducing the multinomial Naïve Bayes

Formalism

The frequentist perspective

The predictive model

The zero-frequency problem

Implementation

Software design

Training

Classification

Labeling

Results

Multivariate Bernoulli classification

Model

Implementation

Naïve Bayes and text mining

Basics of information retrieval

Implementation

Extraction of terms

Scoring of terms

Testing

Retrieving textual information

Evaluation

Pros and cons

Summary

6. Regression and Regularization

Linear regression

One-variate linear regression

Implementation

Test case

Ordinary least squares (OLS) regression

Design

Implementation

Test case 1 – trending

Test case 2 – features selection

Regularization

Ln roughness penalty

The ridge regression

Implementation

The test case

Numerical optimization

The logistic regression

The logit function

Binomial classification

Software design

The training workflow

Configuring the least squares optimizer

Computing the Jacobian matrix

Defining the exit conditions

Defining the least squares problem

Minimizing the loss function

Test

Classification

Summary

7. Sequential Data Models

Markov decision processes

The Markov property

The first-order discrete Markov chain

The hidden Markov model (HMM)

Notation

The lambda model

HMM execution state

Evaluation (CF-1)

Alpha class (the forward variable)

Beta class (the backward variable)

Training (CF-2)

Baum-Welch estimator (EM)

Decoding (CF-3)

The Viterbi algorithm

Putting it all together

Test case

The hidden Markov model for time series analysis

Conditional random fields

Introduction to CRF

Linear chain CRF

CRF and text analytics

The feature functions model

Software design

Implementation

Building the training set

Generating tags

Extracting data sequences

CRF control parameters

Putting it all together

Tests

The training convergence profile

Impact of the size of the training set

Impact of the L2 regularization factor

Comparing CRF and HMM

Performance consideration

Summary

8. Kernel Models and Support Vector Machines

Kernel functions

Overview

Common discriminative kernels

The support vector machine (SVM)

The linear SVM

The separable case (hard margin)

The nonseparable case (soft margin)

The nonlinear SVM

Max-margin classification

The kernel trick

Support vector classifier (SVC)

The binary SVC

LIBSVM

Software design

Configuration parameters

SVM Formulation

The SVM kernel function

SVM execution

SVM implementation

C-penalty and margin

Kernel evaluation

Application to risk analysis

Features and labels

Anomaly detection with one-class SVC

Support vector regression (SVR)

Overview

SVR versus linear regression

Performance considerations

Summary

9. Artificial Neural Networks

Feed-forward neural networks (FFNN)

The Biological background

The mathematical background

The multilayer perceptron (MLP)

The activation function

The network architecture

Software design

Model definition

Layers

Synapses

Connections

Training cycle/epoch

Step 1 – input forward propagation

The computational model

Objective

Softmax

Step 2 – sum of squared errors

Step 3 – error backpropagation

Error propagation

The computational model

Step 4 – synapse/weights adjustment

Momentum factor for gradient descent

Implementation

Step 5 – convergence criteria

Configuration

Putting all together

Training strategies and classification

Online versus batch training

Regularization

Model instantiation

Prediction

Evaluation

Impact of learning rate

Impact of the momentum factor

Test case

Implementation

Models evaluation

Impact of hidden layers architecture

Benefits and limitations

Summary

10. Genetic Algorithms

Evolution

The origin

NP problems

Evolutionary computing

Genetic algorithms and machine learning

Genetic algorithm components

Encodings

Value encoding

Predicate encoding

Solution encoding

The encoding scheme

Flat encoding

Hierarchical encoding

Genetic operators

Selection

Crossover

Mutation

Fitness score

Implementation

Software design

Key components

Selection

Controlling population growth

GA configuration

Crossover

Population

Chromosomes

Genes

Mutation

Population

Chromosomes

Genes

The reproduction cycle

GA for trading strategies

Definition of trading strategies

Trading operators

The cost/unfitness function

Trading signals

Trading strategies

Signal encoding

Test case

Data extraction

Initial population

Configuration

GA instantiation

GA execution

Tests

The unweighted score

The weighted score

Advantages and risks of genetic algorithms

Summary

11. Reinforcement Learning

Introduction

The problem

A solution – Q-learning

Terminology

Concept

Value of policy

Bellman optimality equations

Temporal difference for model-free learning

Action-value iterative update

Implementation

Software design

States and actions

Search space

Policy and action-value

The Q-learning training

Tail recursion to the rescue

Prediction

Option trading using Q-learning

Option property

Option model

Function approximation

Constrained state-transition

Putting it all together

Evaluation

Pros and cons of reinforcement learning

Learning classifier systems

Introduction to LCS

Why LCS

Terminology

Extended learning classifier systems (XCS)

XCS components

Application to portfolio management

XCS core data

XCS rules

Covering

Example of implementation

Benefits and limitation of learning classifier systems

Summary

12. Scalable Frameworks

Overview

Scala

Controlling object creation

Parallel collections

Processing a parallel collection

Benchmark framework

Performance evaluation

Scalability with Actors

The Actor model

Partitioning

Beyond actors – reactive programming

Akka

Master-workers

Messages exchange

Worker actors

The workflow controller

The master Actor

Master with routing

Distributed discrete Fourier transform

Limitations

Futures

The Actor life cycle

Blocking on futures

Handling future callbacks

Putting all together

Apache Spark

Why Spark

Design principles

In-memory persistency

Laziness

Transforms and Actions

Shared variables

Experimenting with Spark

Deploying Spark

Using Spark shell

MLlib

RDD generation

K-means using Spark

Performance evaluation

Tuning parameters

Tests

Performance considerations

Pros and cons

0xdata Sparkling Water

Summary

A. Basic Concepts

Scala programming

List of libraries

Format of code snippets

Encapsulation

Class constructor template

Companion objects versus case classes

Enumerations versus case classes

Overloading

Design template for classifiers

Data extraction

Data sources

Extraction of documents

Matrix class

Mathematics

Linear algebra

QR Decomposition

LU factorization

LDL decomposition

Cholesky factorization

Singular value decomposition

Eigenvalue decomposition

Algebraic and numerical libraries

First order predicate logic

Jacobian and Hessian matrices

Summary of optimization techniques

Gradient descent methods

Steepest descent

Conjugate gradient

Stochastic gradient descent

Quasi-Newton algorithms

BFGS

L-BFGS

Nonlinear least squares minimization

Gauss-Newton

Levenberg-Marquardt

Lagrange multipliers

Overview of dynamic programming

Finances 101

Fundamental analysis

Technical analysis

Terminology

Trading signals and strategy

Price patterns

Options trading

Financial data sources

Suggested online courses

References

Index

Scala for Machine Learning

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: December 2014

Production reference: 1121214

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78355-874-2

www.packtpub.com

Credits

Author

Patrick R. Nicolas

Reviewers

Subhajit Datta

Rui Gonçalves

Patricia Hoffman, PhD

Md Zahidul Islam

Commissioning Editor

Owen Roberts

Acquisition Editor

Owen Roberts

Content Development Editor

Mohammed Fahad

Technical Editors

Madhuri Das

Taabish Khan

Copy Editors

Janbal Dharmaraj

Vikrant Phadkay

Project Coordinator

Danuta Jones

Proofreaders

Simran Bhogal

Maria Gould

Paul Hindle

Elinor Perry-Smith

Chris Smith

Indexer

Mariammal Chettiyar

Graphics

Sheetal Aute

Valentina D'silva

Disha Haria

Abhinash Sahu

Production Coordinator

Arvindkumar Gupta

Cover Work

Arvindkumar Gupta

About the Author

Patrick R. Nicolas is a lead R&D engineer at Dell in Santa Clara, California. He has 25 years of experience in software engineering and building large-scale applications in C++, Java, and Scala, and has held several managerial positions. His interests include real-time analytics, modeling, and optimization.

Special thanks to the Packt Publishing team: Mohammed Fahad for his patience and encouragement, Owen Roberts for the opportunity, and the reviewers for their guidance and dedication.

About the Reviewers

Subhajit Datta is a passionate software developer.

He did his Bachelor of Engineering in Information Technology (BE in IT) from Indian Institute of Engineering Science and Technology, Shibpur (IIEST, Shibpur), formerly known as Bengal Engineering and Science University, Shibpur.

He completed his Master of Technology in Computer Science and Engineering (MTech CSE) from Indian Institute of Technology Bombay (IIT Bombay); his thesis focused on topics in natural language processing.

He has experience working in the investment banking domain and web application domain, and is a polyglot having worked on Java, Scala, Python, Unix shell scripting, VBScript, JavaScript, C#.Net, and PHP. He is interested in learning and applying new and different technologies.

He believes that choosing the right programming language, tool, and framework for the problem at hand is more important than trying to fit all problems in one technology.

He also has experience working in the Waterfall and Agile processes. He is excited about the Agile software development processes.

Rui Gonçalves is an all-round, hardworking, and dedicated software engineer. He is an enthusiast of software architecture, programming paradigms, algorithms, and data structures with the ambition of developing products and services that have a great impact on society.

He currently works at ShiftForward, where he is a software engineer in the online advertising field. He is focused on designing and implementing highly efficient, concurrent, and scalable systems as well as machine learning solutions. In order to achieve this, he uses Scala as the main development language of these systems on a day-to-day basis.

Patricia Hoffman, PhD, is a consultant at iCube Consulting Service Inc., with over 25 years of experience in modeling and simulation, of which the last six years concentrated on machine learning and data mining technologies. Her software development experience ranges from modeling stochastic partial differential equations to image processing. She is currently an adjunct faculty member at International Technical University, teaching machine learning courses. She also teaches machine learning and data mining at the University of California, Santa Cruz—Silicon Valley Campus. She was Chair of Association for Computing Machinery of the Data Mining Special Interest Group for the San Francisco Bay area for 5 years, organizing monthly lectures and five data mining conferences with over 350 participants.

Patricia has a long list of significant accomplishments. She developed the architecture and software development plan for a collaborative recommendation system while consulting as a data mining expert for Quantum Capital. While consulting for Revolution Analytics, she developed training materials for interfacing the R statistical language with IBM's Netezza data warehouse appliance.

She has also set up the systems used for communication and software development along with technical coordination for GTECH, a medical device start-up.

She has also technically directed, produced, and managed operations concepts and architecture analysis for hardware, software, and firmware. She has performed risk assessments and has written qualification letters, proposals, system specs, and interface control documents. Also, she has coordinated with subcontractors, associate contractors, and various Lockheed departments to produce analysis, documents, technology demonstrations, and integrated systems. She was the Chief Systems Engineer for a $12 million image processing workstation development, and had scored 100 percent from the customer.

The various contributions of Patricia to the publications field are as follows:

A unified view on the rotational symmetry of equilibria of nematic polymers, dipolar nematic polymers, and polymers in higher dimensional space, Communications in Mathematical Sciences,Volume 6, 949-974

She worked as a technical editor on the book Machine Learning in Action, Peter Harrington, Manning Publications Co.

A Distributed Architecture for the C3 I (Command, Control, Communications, and Intelligence) Collection Management Expert System, with Allen Rude, AIC Lockheed

A book review of computer-supported cooperative work, ACM/SIGCHI Bulletin, Volume 21, Issue 2, pages 125-128, ISSN:0736-6906, 1989

Md Zahidul Islam is a software developer working for HSI Health and lives in Concord, California, with his wife.

He has a passion for functional programming, machine learning, and working with data. He is currently working with Scala, Apache Spark, MLlib, Ruby on Rails, ElasticSearch, MongoDB, and Backbone.js. Earlier in his career, he worked with C#, ASP.NET, and everything around the .NET ecosystem.

I would like to thank my wife, Sandra, who lovingly supports me in everything I do. I'd also like to thank Packt Publishing and its staff for the opportunity to contribute to this book.

www.PacktPub.com

Support files, eBooks, discount offers, and more

For support files and downloads related to your book, please visit www.PacktPub.com.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www2.packtpub.com/books/subscription/packtlib

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.

Why subscribe?

Fully searchable across every book published by Packt

Copy and paste, print, and bookmark content

On demand and accessible via a web browser

Free access for Packt account holders

If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access.

To Jennifer, for her kindness and support throughout this long journey.

Preface

Not a single day passes by that we do not hear about Big Data in the news media, technical conferences, and even coffee shops. The ever-increasing amount of data collected in process monitoring, research, or simple human behavior becomes valuable only if you extract knowledge from it. Machine learning is the essential tool to mine data for gold (knowledge).

This book covers the what, why, and how of machine learning:

What are the objectives and the mathematical foundation of machine learning?

Why is Scala the ideal programming language to implement machine learning algorithms?

How can you apply machine learning to solve real-world problems?

Throughout this book, machine learning algorithms are described with diagrams, mathematical formulation, and documented snippets of Scala code, allowing you to understand these key concepts in your own unique way.

What this book covers

Chapter 1, Getting Started, introduces the basic concepts of statistical analysis, classification, regression, prediction, clustering, and optimization. This chapter covers the Scala languages features and libraries, followed by the implementation of a simple application.

Chapter 2, Hello World!, describes a typical workflow for classification, the concept of bias/variance trade-off, and validation using the Scala dependency injection applied to the technical analysis of financial markets.

Chapter 3, Data Preprocessing, covers time series analyses and leverages Scala to implement data preprocessing and smoothing techniques such as moving averages, discrete Fourier transform, and the Kalman recursive filter.

Chapter 4, Unsupervised Learning, focuses on the implementation of some of the most widely used clustering techniques, such as K-means, the expectation-maximization, and the principal component analysis as a dimension reduction method.

Chapter 5, Naïve Bayes Classifiers, introduces probabilistic graphical models, and then describes the implementation of the Naïve Bayes and the multivariate Bernoulli classifiers in the context of text mining.

Chapter 6, Regression and Regularization, covers a typical implementation of the linear and least squares regression, the ridge regression as a regularization technique, and finally, the logistic regression.

Chapter 7, Sequential Data Models, introduces the Markov processes followed by a full implementation of the hidden Markov model, and conditional random fields applied to pattern recognition in financial market data.

Chapter 8, Kernel Models and Support Vector Machines, covers the concept of kernel functions with implementation of support vector machine classification and regression, followed by the application of the one-class SVM to anomaly detection.

Chapter 9, Artificial Neural Networks, describes feed-forward neural networks followed by a full implementation of the multilayer perceptron classifier.

Chapter 10, Genetic Algorithms, covers the basics of evolutionary computing and the implementation of the different components of a multipurpose genetic algorithm.

Chapter 11, Reinforcement Learning, introduces the concept of reinforcement learning with an implementation of the Q-learning algorithm followed by a template to build a learning classifier system.

Chapter 12, Scalable Frameworks, covers some of the artifacts and frameworks to create scalable applications for machine learning such as Scala parallel collections, Akka, and the Apache Spark framework.

Appendix A, Basic Concepts, covers the Scala constructs used throughout the book, elements of linear algebra, and an introduction to investment and trading strategies.

Appendix B, References, provides a chapter-wise list of references for [source entry] in the respective chapters. This appendix is available as an online chapter at https://www.packtpub.com/sites/default/files/downloads/8742OS_AppendixB_References.pdf.

Short test applications using financial data illustrate the large variety of predictive, regression, and classification models.

The interdependencies between chapters are kept to a minimum. You can easily delve into any chapter once you complete Chapter 1, Getting Started, and Chapter 2, Hello World!.

What you need for this book

A decent command of the Scala programming language is a prerequisite. Reading through a mathematical formulation, conveniently defined in an information box, is optional. However, some basic knowledge of mathematics and statistics might be helpful to understand the inner workings of some algorithms.

The book uses the following libraries:

Scala 2.10.3 or higher

Java JDK 1.7.0_45 or 1.8.0_25

SBT 0.13 or higher

JFreeChart 1.0.1

Apache Commons Math library 3.3 (Chapter 3, Data Preprocessing, Chapter 4, Unsupervised Learning, and Chapter 6, Regression and Regularization)

Indian Institute of Technology Bombay CRF 0.2 (Chapter 7, Sequential Data Models)

LIBSVM 0.1.6 (Chapter 8, Kernel Models and Support Vector Machines)

Akka 2.2.4 or higher (or Typesafe activator 1.2.10 or higher) (Chapter 12, Scalable Frameworks)

Apache Spark 1.0.2 or higher (Chapter 12, Scalable Frameworks)

Note

Understanding the mathematical formulation of a model is optional.

Who this book is for

This book is for software developers with a background in Scala programming who want to learn how to create, validate, and apply machine learning algorithms.

The book is also beneficial to data scientists who want to explore functional programming or improve the scalability of their existing applications using Scala.

This book is designed as a tutorial with comparative hands-on exercises using technical analysis of financial markets.

Conventions

In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: Finally, the environment variables JAVA_HOME, PATH, and CLASSPATH have to be updated accordingly.

A block of code is set as follows:

[default]

val lsp = builder.model(lrJacobian)

.weight(wMatrix)

.target(labels)

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

[default]

val lsp = builder.

model(lrJacobian

)

.weight(wMatrix)

.target(labels)

The source code block is described using a reference number embedded as a code comment:

[default]

val lsp = builder.

model(lrJacobian) //1

.weight(wMatrix)

.target(labels)

The reference number is used in the chapter as follows: The model instance is initialized with the Jacobian matrix, lrJacobian (line 1).

Any command-line input or output is written as follows:

sbt/sbt assembly

New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: The loss function is then known as the hinge loss.

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

Note

Mathematical formulas (optional to read) appear in a box like this

For the sake of readability, the elements of the Scala code that are not essential to the understanding of an algorithm such as class, variable, and method qualifiers and validation of arguments, exceptions, or logging are omitted. The convention for code snippets is detailed in the Format of code snippets section in Appendix A, Basic Concepts.

You will be provided with in-text citation of papers, conference, books, and instructional videos throughout the book. The sources are listed in the the Appendix B, References using in the following format:

[In-text citation]

For example, in the chapter, you will find an instance as follows:

This time around RSS increases with before reaching a maximum for > 60. This behavior is consistent with other findings [6:12].

The respective [source entry] is mentioned in Appendix B, References, as follows:

[6:12] Model selection and assessment H. Bravo, R. Irizarry, 2010, available at http://www.cbcb.umd.edu/~hcorrada/PracticalML/pdf/lectures/selection.pdf.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.

To send us general feedback, simply e-mail <feedback@packtpub.com>, and mention the book's title in the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.

To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

Piracy

Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at <copyright@packtpub.com> with a link to the suspected pirated material.

We appreciate your help in protecting our authors and our ability to bring you valuable content.

Questions

If you have a problem with any aspect of this book, you can contact us at <questions@packtpub.com>, and we will do our best to address the problem.

Chapter 1. Getting Started

It is critical for any computer scientist to understand the different classes of machine learning algorithms and be able to select the ones that are relevant to the domain of their expertise and dataset. However, the application of these algorithms represents a small fraction of the overall effort needed to extract an accurate and performing model from input data. A common data mining workflow consists of the following sequential steps:

Loading the data.

Preprocessing, analyzing, and filtering the input data.

Discovering patterns, affinities, clusters, and classes.

Selecting the model features and the appropriate machine learning algorithm(s).

Refining and validating the model.

Improving the computational performance of the implementation.

As we will emphasize throughout this book, each stage of the process is critical to build the right model.

This first chapter introduces you to the taxonomy of machine learning algorithms, the tools and frameworks used in the book, and a simple application of logistic regression to get your feet wet.

Mathematical notation for the curious

Each chapter contains a small section dedicated to the formulation of the algorithms for those interested in the mathematical concepts behind the science and art of machine learning. These sections are optional and defined within a tip box. For example, the mathematical expression of the mean and the variance of a variable X mentioned in a tip box will be as follows:

Tip

Mean value of a variable X = {x} is defined as:

The variance of a variable X = {x} is defined as:

Why machine learning?

The explosion in the number of digital devices generates an ever-increasing amount of data. The best analogy I can find to describe the need, desire, and urgency to extract knowledge from large datasets is the process of extracting a precious metal from a mine, and in some cases, extracting blood from a stone.

Knowledge is quite often defined as a model that can be constantly updated or tweaked as new data comes into play. Models are obviously domain-specific ranging from credit risk assessment, face recognition, maximization of quality of service, classification of pathological symptoms of disease, optimization of computer networks, and security intrusion detection, to customers' online behavior and purchase history.

Machine learning problems are categorized as classification, prediction, optimization, and regression.

Classification

The purpose of classification is to extract knowledge from historical data. For instance, a classifier can be built to identify a disease from a set of symptoms. The scientist collects information regarding the body temperature (continuous variable), congestion (discrete variables HIGH, MEDIUM, and LOW), and the actual diagnostic (flu). This dataset is used to create a model such as IF temperature > 102 AND congestion = HIGH THEN patient has the flu (probability 0.72), which doctors can use in their diagnostic.

Prediction

Once the model is extracted and validated against the past data, it can be used to draw inference from the future data. A doctor collects symptoms from a patient, such as body temperature and nasal congestion, and anticipates the state of his/her health.

Optimization

Some global optimization problems are intractable using traditional linear and non-linear optimization methods. Machine learning techniques improve the chances that the optimization method converges toward a solution (intelligent search). You can imagine that fighting the spread of a new virus requires optimizing a process that may evolve over time as more symptoms and cases are uncovered.

Regression

Regression is a classification technique that is particularly suitable for a continuous model. Linear (least square), polynomial, and logistic regressions are among the most commonly used techniques to fit a parametric model, or function, y= f (xj), to a dataset. Regression is sometimes regarded as a specialized case of classification for which the output variables are continuous instead of categorical.

Why Scala?

Like most functional languages, Scala provides developers and scientists with a toolbox to implement iterative computations that can be easily woven dynamically into a coherent dataflow. To some extent, Scala can be regarded as an extension of the popular MapReduce model for distributed computation of large amounts of data. Among the capabilities of the language, the following features are deemed essential to machine learning and statistical analysis.

Abstraction

Monoids and monads are important concepts in functional programming. Monads are derived from the category and group theory allowing developers to create a high-level abstraction as illustrated in Twitter's Algebird (https://github.com/twitter/algebird) or Google's Breeze Scala (https://github.com/dlwh/breeze) libraries.

A monoid defines a binary operation op on a dataset T with the property of closure, identity operation, and associativity.

Let's consider the + operation is defined for a set T using the following monoidal representation:

trait Monoid[T] {

def zero: T

def op(a: T, b: T): c

}

Monoids are associative operations. For instance, if ts1, ts2, and ts3 are three time series, then the property ts1 + (ts2 + ts3) = (ts1 + ts2) + ts2 is true. The associativity of a monoid operator is critical in regards to parallelization of computational workflows.

Monads are structures that can be seen either as containers by programmers or as a generalization of Monoids. The collections bundled with the Scala standard library (list, map, and so on) are constructed as monads [1:1]. Monads provide the ability for those collections to perform the following functions:

Create the collection.

Transform the elements of the collection.

Flatten nested collections.

A common categorical representation of a monad in Scala is a trait, Monad, parameterized with a container type M:

trait Monad[M[_]] {

def apply[T])(a: T): M[T]

def flatMap[T, U](m: M[T])(f: T=>M[U]): M[U]

}

Monads allow those collections or containers to be chained to generate a workflow. This property is applicable to any scientific computation [1:2].

Scalability

As seen previously, monoids and monads enable parallelization and chaining of data processing functions by leveraging the Scala higher-order methods. In terms of implementation, Actors are the core elements that make Scala scalable. Actors act as coroutines, managing the underlying threads pool. Actors communicate through passing asynchronous messages. A distributed computing Scala framework such as Akka and Spark extends the capabilities of the Scala standard library to support computation on very large datasets. Akka and Spark are described in detail in the last chapter of this book [1:3].

In a nutshell, a workflow is implemented as a sequence of activities or computational tasks. Those tasks consist of high-order Scala methods such as flatMap, map, fold, reduce, collect, join, or filter applied to a large collection of observations. Scala allows these observations to be partitioned by executing those tasks through a cluster of actors. Scala also supports message dispatching and routing of messages between local and remote actors. The engineers can decide to execute a workflow either locally or distributed across CPU cores and servers with no code or very little code changes.

Deployment of a workflow as a distributed computation

In this diagram, a controller, that is, the master node, manages the sequence of tasks 1 to 4 similar to a scheduler. These tasks are actually executed over multiple worker nodes that are implemented by the Scala actors. The master node exchanges messages with the workers to manage the state of the execution of the workflow as well as its reliability. High availability of these tasks is implemented through a hierarchy of supervising actors.

Configurability

Scala supports dependency injection using a combination of abstract variables, self-referenced composition, and stackable traits. One of the most commonly used dependency injection patterns, the cake pattern, is used throughout this book to create dynamic computation workflows and plots.

Maintainability

Scala embeds Domain Specific Languages (DSL) natively. DSLs are syntactic layers built on top of Scala native libraries. DSLs allow software developers to abstract computation in terms that are easily understood by scientists. The most notorious application of DSLs is the definition of the emulation of the syntax used in the MATLAB program, which data scientists are familiar with.

Computation on demand

Lazy methods and values allow developers to execute functions and allocate computing resources on demand. The Spark framework relies on lazy variables and methods to chain Resilient Distributed Datasets (RDD).

Model categorization

A model can be predictive, descriptive, or adaptive.

Predictive models discover patterns in historical data and extract fundamental trends and relationships between factors. They are used to predict and classify future events or observations. Predictive analytics is used in a variety of fields such as marketing, insurance, and pharmaceuticals. Predictive models are created through supervised learning using a preselected training set.

Descriptive models attempt to find unusual patterns or affinities in data by grouping observations into clusters with similar properties. These models define the first level in knowledge discovery. They are generated through unsupervised learning.

A third category of models, known as adaptive modeling, is generated through reinforcement learning. Reinforcement learning consists of one or several decision-making agents that recommend and possibly execute actions in the attempt of solving a problem, optimizing an objective function, or resolving constraints.

Taxonomy of machine learning algorithms

The purpose of machine learning is to teach computers to execute tasks without human intervention. An increasing number of applications such as genomics, social networking, advertising, or risk analysis generate a very large amount of data that can be analyzed or mined to extract knowledge or provide insight into a process, a customer, or an organization. Ultimately, machine learning algorithms consist of identifying and validating models to optimize a performance criterion using historical, present, and future data [1:4].

Data mining is the process of extracting or identifying patterns in a dataset.

Unsupervised learning

The goal of unsupervised learning is to discover patterns of regularities and irregularities in a set of observations. The process known as density estimation in statistics is broken down into two categories: discovery of data clusters and discovery of latent factors. The methodology consists of processing input data to understand patterns similar to the natural learning process in infants or animals. Unsupervised learning does not require labeled data, and therefore, is easy to implement and execute because no expertise is needed to validate an output. However, it is possible to label the output of a clustering algorithm and use it for future classification.

Clustering

The purpose of data clustering is to partition a collection of data into a number of clusters or data segments. Practically, a clustering algorithm is used to organize observations into clusters by minimizing the observations within a cluster and maximizing the observations between clusters. A clustering algorithm consists of the following steps:

Creating a model by making an assumption on the input data.

Selecting the objective function or goal of the clustering.

Evaluating one or more algorithms to optimize the objective function.

Data clustering is also known as data segmentation or data partitioning.

Dimension reduction

Dimension reduction techniques aim at finding the smallest but most relevant set of features that models dataset reliability. There are many reasons for reducing the number of features or parameters in a model, from avoiding overfitting to reducing computation costs.

There are many ways to classify the different techniques used to extract knowledge from data using unsupervised learning. The following taxonomy breaks down these techniques according to their purpose, although the list is far for being exhaustive, as shown in the following diagram:

Supervised learning

The best analogy for supervised learning is function approximation or curve fitting. In its simplest form, supervised learning attempts to extract a relation or function f x → y from a training set {x, y}. Supervised learning is far more accurate and reliable than any other learning strategy. However, a domain expert may be required to label (tag) data as a training set for certain types of problems.

Supervised machine learning algorithms can be broken into two categories:

Generative models

Discriminative models

Generative models

In order to simplify the description of statistics formulas, we adopt the following simplification: the probability of an event X is the same as the probability of the discrete random variable X to have a value x, p(X) = p(X=x). The notation of joint probability (resp. conditional probability) becomes p(X, Y) = p(X=x, Y=y) (resp. p(X|Y)=p(X=x | Y=y).

Generative models attempt to fit a joint

Enjoying the preview?

Page 1 of 1

Scala for Machine Learning

About this ebook

Nicolas Patrick R.

Related authors

Related to Scala for Machine Learning

Related ebooks

Computers For You

Related podcast episodes

Related articles

Related categories

Reviews for Scala for Machine Learning

What did you think?

Book preview

Scala for Machine Learning - Nicolas Patrick R.

Table of Contents

Scala for Machine Learning

Scala for Machine Learning

Credits

About the Author

About the Reviewers

Support files, eBooks, discount offers, and more

Why subscribe?

Free access for Packt account holders

Preface

What this book covers

What you need for this book

Note

Who this book is for

Conventions

Note

Tip

Note

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

Chapter 1. Getting Started

Mathematical notation for the curious

Tip

Why machine learning?

Classification

Prediction

Optimization

Regression

Why Scala?

Abstraction

Scalability

Configurability

Maintainability

Computation on demand

Model categorization

Taxonomy of machine learning algorithms

Unsupervised learning

Clustering

Dimension reduction

Supervised learning

Generative models