Guerrilla Analytics: A Practical Approach to Working with Data

Ebook588 pages5 hours

Guerrilla Analytics: A Practical Approach to Working with Data

Name: Guerrilla Analytics: A Practical Approach to Working with Data
Brand: Elsevier Science
Rating: 4.5 (1 reviews)

By Enda Ridge

Rating: 4.5 out of 5 stars

4.5/5

()

Read preview

About this ebook

Doing data science is difficult. Projects are typically very dynamic with requirements that change as data understanding grows. The data itself arrives piecemeal, is added to, replaced, contains undiscovered flaws and comes from a variety of sources. Teams also have mixed skill sets and tooling is often limited. Despite these disruptions, a data science team must get off the ground fast and begin demonstrating value with traceable, tested work products. This is when you need Guerrilla Analytics.

In this book, you will learn about:

The Guerrilla Analytics Principles: simple rules of thumb for maintaining data provenance across the entire analytics life cycle from data extraction, through analysis to reporting.

Reproducible, traceable analytics: how to design and implement work products that are reproducible, testable and stand up to external scrutiny.

Practice tips and war stories: 90 practice tips and 16 war stories based on real-world project challenges encountered in consulting, pre-sales and research.

Preparing for battle: how to set up your team's analytics environment in terms of tooling, skill sets, workflows and conventions.

Data gymnastics: over a dozen analytics patterns that your team will encounter again and again in projects

The Guerrilla Analytics Principles: simple rules of thumb for maintaining data provenance across the entire analytics life cycle from data extraction, through analysis to reporting
Reproducible, traceable analytics: how to design and implement work products that are reproducible, testable and stand up to external scrutiny
Practice tips and war stories: 90 practice tips and 16 war stories based on real-world project challenges encountered in consulting, pre-sales and research
Preparing for battle: how to set up your team's analytics environment in terms of tooling, skill sets, workflows and conventions
Data gymnastics: over a dozen analytics patterns that your team will encounter again and again in projects

Skip carousel

LanguageEnglish

PublisherElsevier Science

Release dateSep 25, 2014

ISBN9780128005033

Author

Enda Ridge

Enda Ridge is an accomplished data scientist whose experience spans consulting, pre-sales of analytics software and research in academia. He has consulted to clients in the public and private sectors including financial services, insurance, audit and IT security. Enda is an expert in agile analytics for real world projects where data and requirements change often, resources and tooling are sometimes very limited and results must be traceable and auditable for high profile stakeholders. His experience includes analytics to support the forensic investigation of a major US bankruptcy and the remediation a UK bank’s mis-selling of financial products. He has also applied machine learning and NoSQL approaches to problems in document classification, surveillance and IT access controls. His PhD used Design of Experiments techniques to methodically evaluate algorithm performance. Enda has authored or co-authored 12 academic research papers, is an invited contributor to edited books and has spoken at several analytics practitioner conferences. Enda holds a Bachelor’s degree in Mechanical Engineering and Master’s in Applied Computing from the National University of Ireland at Galway and was awarded the National University of Ireland’s Travelling Studentship in Engineering. His PhD was awarded by the University of York, UK.

Related authors

Skip carousel

Related to Guerrilla Analytics

Related ebooks

Skip carousel

Introducing Data Science: Big data, machine learning, and more, using Python tools
Ebook
Introducing Data Science: Big data, machine learning, and more, using Python tools
byDavy Cielen
Rating: 5 out of 5 stars
5/5
Data Virtualization for Business Intelligence Systems: Revolutionizing Data Integration for Data Warehouses
Ebook
Data Virtualization for Business Intelligence Systems: Revolutionizing Data Integration for Data Warehouses
byRick van der Lans
Rating: 4 out of 5 stars
4/5
Big Data: Principles and best practices of scalable realtime data systems
Ebook
Big Data: Principles and best practices of scalable realtime data systems
byJames Warren
Rating: 4 out of 5 stars
4/5
Data Mining Applications with R
Ebook
Data Mining Applications with R
byYanchang Zhao
Rating: 4 out of 5 stars
4/5
R and Data Mining: Examples and Case Studies
Ebook
R and Data Mining: Examples and Case Studies
byYanchang Zhao
Rating: 3 out of 5 stars
3/5
R in Action: Data analysis and graphics with R
Ebook
R in Action: Data analysis and graphics with R
byRobert I. Kabacoff
Rating: 4 out of 5 stars
4/5
Think Like a Data Scientist: Tackle the data science process step-by-step
Ebook
Think Like a Data Scientist: Tackle the data science process step-by-step
byBrian Godsey
Rating: 0 out of 5 stars
0 ratings
Machine Learning: Hands-On for Developers and Technical Professionals
Ebook
Machine Learning: Hands-On for Developers and Technical Professionals
byJason Bell
Rating: 0 out of 5 stars
0 ratings
Mastering Business Intelligence with MicroStrategy
Ebook
Mastering Business Intelligence with MicroStrategy
byNing Ma
Rating: 0 out of 5 stars
0 ratings
Big Data, Data Mining, and Machine Learning: Value Creation for Business Leaders and Practitioners
Ebook
Big Data, Data Mining, and Machine Learning: Value Creation for Business Leaders and Practitioners
byJared Dean
Rating: 3 out of 5 stars
3/5
The Data and Analytics Playbook: Proven Methods for Governed Data and Analytic Quality
Ebook
The Data and Analytics Playbook: Proven Methods for Governed Data and Analytic Quality
byLowell Fryman
Rating: 5 out of 5 stars
5/5
Business Intelligence Strategy and Big Data Analytics: A General Management Perspective
Ebook
Business Intelligence Strategy and Big Data Analytics: A General Management Perspective
bySteve Williams
Rating: 5 out of 5 stars
5/5
Implementing Analytics: A Blueprint for Design, Development, and Adoption
Ebook
Implementing Analytics: A Blueprint for Design, Development, and Adoption
byNauman Sheikh
Rating: 0 out of 5 stars
0 ratings
Big Data Analytics: From Strategic Planning to Enterprise Integration with Tools, Techniques, NoSQL, and Graph
Ebook
Big Data Analytics: From Strategic Planning to Enterprise Integration with Tools, Techniques, NoSQL, and Graph
byDavid Loshin
Rating: 5 out of 5 stars
5/5
Behind Every Good Decision: How Anyone Can Use Business Analytics to Turn Data into Profitable Insight
Ebook
Behind Every Good Decision: How Anyone Can Use Business Analytics to Turn Data into Profitable Insight
byPiyanka Jain
Rating: 5 out of 5 stars
5/5
Data Insights: New Ways to Visualize and Make Sense of Data
Ebook
Data Insights: New Ways to Visualize and Make Sense of Data
byHunter Whitney
Rating: 2 out of 5 stars
2/5
Big Data Analytics with R
Ebook
Big Data Analytics with R
bySimon Walkowiak
Rating: 0 out of 5 stars
0 ratings
Predictive Business Analytics: Forward Looking Capabilities to Improve Business Performance
Ebook
Predictive Business Analytics: Forward Looking Capabilities to Improve Business Performance
byLawrence Maisel
Rating: 0 out of 5 stars
0 ratings
Data Visualization: Representing Information on Modern Web
Ebook
Data Visualization: Representing Information on Modern Web
byAndy Kirk
Rating: 5 out of 5 stars
5/5
Real-Time Big Data Analytics
Ebook
Real-Time Big Data Analytics
byShilpi
Rating: 5 out of 5 stars
5/5
Mastering Predictive Analytics with R
Ebook
Mastering Predictive Analytics with R
byRui Miguel Forte
Rating: 4 out of 5 stars
4/5
Business Intelligence Guidebook: From Data Integration to Analytics
Ebook
Business Intelligence Guidebook: From Data Integration to Analytics
byRick Sherman
Rating: 4 out of 5 stars
4/5
Practical Data Analysis Cookbook
Ebook
Practical Data Analysis Cookbook
byTomasz Drabas
Rating: 0 out of 5 stars
0 ratings
Business Data Science: Combining Machine Learning and Economics to Optimize, Automate, and Accelerate Business Decisions
Ebook
Business Data Science: Combining Machine Learning and Economics to Optimize, Automate, and Accelerate Business Decisions
byMatt Taddy
Rating: 5 out of 5 stars
5/5
Big Data: Using SMART Big Data, Analytics and Metrics To Make Better Decisions and Improve Performance
Ebook
Big Data: Using SMART Big Data, Analytics and Metrics To Make Better Decisions and Improve Performance
byBernard Marr
Rating: 4 out of 5 stars
4/5
Big Data Analytics
Ebook
Big Data Analytics
byVenkat Ankam
Rating: 0 out of 5 stars
0 ratings
Big Data Analytics: Disruptive Technologies for Changing the Game
Ebook
Big Data Analytics: Disruptive Technologies for Changing the Game
byArvind Sathi
Rating: 4 out of 5 stars
4/5
Data Science for Business: Data Mining, Data Warehousing, Data Analytics, Data Visualization, Data Modelling, Regression Analysis, Big Data and Machine Learning
Ebook
Data Science for Business: Data Mining, Data Warehousing, Data Analytics, Data Visualization, Data Modelling, Regression Analysis, Big Data and Machine Learning
byTravis Goleman
Rating: 0 out of 5 stars
0 ratings
Data Modeling Essentials
Ebook
Data Modeling Essentials
byGraeme Simsion
Rating: 4 out of 5 stars
4/5
Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today's Businesses
Ebook
Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today's Businesses
byMichele Chambers
Rating: 0 out of 5 stars
0 ratings

Enterprise Applications For You

Skip carousel

QuickBooks 2024 All-in-One For Dummies
Ebook
QuickBooks 2024 All-in-One For Dummies
byStephen L. Nelson
Rating: 0 out of 5 stars
0 ratings
50 Useful Excel Functions: Excel Essentials, #3
Ebook
50 Useful Excel Functions: Excel Essentials, #3
byM.L. Humphrey
Rating: 5 out of 5 stars
5/5
Bitcoin For Dummies
Ebook
Bitcoin For Dummies
byPrypto
Rating: 4 out of 5 stars
4/5
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
Ebook
Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1
byKevin Clark
Rating: 5 out of 5 stars
5/5
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
Ebook
Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates
byCea West
Rating: 4 out of 5 stars
4/5
Mastering QuickBooks 2020: The ultimate guide to bookkeeping and QuickBooks Online
Ebook
Mastering QuickBooks 2020: The ultimate guide to bookkeeping and QuickBooks Online
byCrystalynn Shelton
Rating: 0 out of 5 stars
0 ratings
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
Ebook
ChatGPT Ultimate User Guide - How to Make Money Online Faster and More Precise Using AI Technology
byMaximus Wilson
Rating: 0 out of 5 stars
0 ratings
SharePoint 2016 For Dummies
Ebook
SharePoint 2016 For Dummies
byRosemarie Withee
Rating: 5 out of 5 stars
5/5
CompTIA Certification: The Ultimate Guide To Discover CompTIA. Certified Quickly And Easily Passing The Certification Exam. Real Practice Test With Detailed Screenshots, Answers And Explanations
Ebook
CompTIA Certification: The Ultimate Guide To Discover CompTIA. Certified Quickly And Easily Passing The Certification Exam. Real Practice Test With Detailed Screenshots, Answers And Explanations
byDavid Mayer
Rating: 0 out of 5 stars
0 ratings
Scrivener For Dummies
Ebook
Scrivener For Dummies
byGwen Hernandez
Rating: 4 out of 5 stars
4/5
Create Income through Self-Publishing: An Author's Approach on Generating Wealth by Self-Publishing
Ebook
Create Income through Self-Publishing: An Author's Approach on Generating Wealth by Self-Publishing
bySimon Marlow
Rating: 5 out of 5 stars
5/5
QuickBooks 2021 For Dummies
Ebook
QuickBooks 2021 For Dummies
byStephen L. Nelson
Rating: 0 out of 5 stars
0 ratings
Excel Formulas and Functions 2020: Excel Academy, #1
Ebook
Excel Formulas and Functions 2020: Excel Academy, #1
byAdam Ramirez
Rating: 4 out of 5 stars
4/5
Notion for Beginners: Notion for Work, Play, and Productivity
Ebook
Notion for Beginners: Notion for Work, Play, and Productivity
byJill Hamilton
Rating: 4 out of 5 stars
4/5
101 Ready-to-Use Excel Formulas
Ebook
101 Ready-to-Use Excel Formulas
byMichael Alexander
Rating: 4 out of 5 stars
4/5
Systems Thinking: Managing Chaos and Complexity: A Platform for Designing Business Architecture
Ebook
Systems Thinking: Managing Chaos and Complexity: A Platform for Designing Business Architecture
byJamshid Gharajedaghi
Rating: 4 out of 5 stars
4/5
Access 2019 For Dummies
Ebook
Access 2019 For Dummies
byLaurie A. Ulrich
Rating: 0 out of 5 stars
0 ratings
Learn Windows PowerShell in a Month of Lunches
Ebook
Learn Windows PowerShell in a Month of Lunches
byDon Jones
Rating: 0 out of 5 stars
0 ratings
Essential Office 365 Third Edition: The Illustrated Guide to Using Microsoft Office
Ebook
Essential Office 365 Third Edition: The Illustrated Guide to Using Microsoft Office
byKevin Wilson
Rating: 3 out of 5 stars
3/5
QuickBooks 2023 All-in-One For Dummies
Ebook
QuickBooks 2023 All-in-One For Dummies
byStephen L. Nelson
Rating: 0 out of 5 stars
0 ratings
Microsoft Office 365 Bible: 10:1 Mastery | Excel in Your Profession, Enhance Time Management, and Foster Exceptional Collaboration [III EDITION]: Career Elevator
Ebook
Microsoft Office 365 Bible: 10:1 Mastery | Excel in Your Profession, Enhance Time Management, and Foster Exceptional Collaboration [III EDITION]: Career Elevator
byKevin Pitch
Rating: 5 out of 5 stars
5/5
Mastering ChatGPT: Create Highly Effective Prompts, Strategies, and Best Practices to Go From Novice to Expert
Ebook
Mastering ChatGPT: Create Highly Effective Prompts, Strategies, and Best Practices to Go From Novice to Expert
byTJ Books
Rating: 3 out of 5 stars
3/5
Enterprise AI For Dummies
Ebook
Enterprise AI For Dummies
byZachary Jarvinen
Rating: 3 out of 5 stars
3/5
Excel Tips and Tricks
Ebook
Excel Tips and Tricks
byM.L. Humphrey
Rating: 0 out of 5 stars
0 ratings
ChatGPT Millionaire 2024 - Bot-Driven Side Hustles, Prompt Engineering Shortcut Secrets, and Automated Income Streams that Print Money While You Sleep. The Ultimate Beginner’s Guide for AI Business
Ebook
ChatGPT Millionaire 2024 - Bot-Driven Side Hustles, Prompt Engineering Shortcut Secrets, and Automated Income Streams that Print Money While You Sleep. The Ultimate Beginner’s Guide for AI Business
byAlec Rowe
Rating: 0 out of 5 stars
0 ratings
Excel 2023 for Beginners: A Complete Quick Reference Guide from Beginner to Advanced with Simple Tips and Tricks to Master All Essential Fundamentals, Formulas, Functions, Charts, Tools, & Shortcuts
Ebook
Excel 2023 for Beginners: A Complete Quick Reference Guide from Beginner to Advanced with Simple Tips and Tricks to Master All Essential Fundamentals, Formulas, Functions, Charts, Tools, & Shortcuts
byTerry R. Hoffmann
Rating: 0 out of 5 stars
0 ratings
Excel for Beginners 2023: A Step-by-Step and Quick Reference Guide to Master the Fundamentals, Formulas, Functions, & Charts in Excel with Practical Examples | A Complete Excel Shortcuts Cheat Sheet
Ebook
Excel for Beginners 2023: A Step-by-Step and Quick Reference Guide to Master the Fundamentals, Formulas, Functions, & Charts in Excel with Practical Examples | A Complete Excel Shortcuts Cheat Sheet
byJames H. Moyle
Rating: 0 out of 5 stars
0 ratings
ChatGPT Side Hustles 2024 - Unlock the Digital Goldmine and Get AI Working for You Fast with More Than 85 Side Hustle Ideas to Boost Passive Income, Create New Cash Flow, and Get Ahead of the Curve
Ebook
ChatGPT Side Hustles 2024 - Unlock the Digital Goldmine and Get AI Working for You Fast with More Than 85 Side Hustle Ideas to Boost Passive Income, Create New Cash Flow, and Get Ahead of the Curve
byAlec Rowe
Rating: 0 out of 5 stars
0 ratings
Excel 2016 For Dummies
Ebook
Excel 2016 For Dummies
byGreg Harvey
Rating: 4 out of 5 stars
4/5
PowerShell for SQL Server Essentials
Ebook
PowerShell for SQL Server Essentials
bySantos Donabel
Rating: 0 out of 5 stars
0 ratings

Related podcast episodes

Skip carousel

040: Graph Databases: Traditional relational databases like MySQL or Postgres are really good at providing many solutions to the problem of persisting state. But these types of database are really horrible at querying highly connected models in an efficient way. Graph datab...
Podcast episode
040: Graph Databases: Traditional relational databases like MySQL or Postgres are really good at providing many solutions to the problem of persisting state. But these types of database are really horrible at querying highly connected models in an efficient way. Graph datab...
byPHPRoundtable Podcast
0 ratings
0% found this document useful
Getting Technical about the Data Center Revolution with Jonathan Friedmann, CEO of Speedata
Podcast episode
Getting Technical about the Data Center Revolution with Jonathan Friedmann, CEO of Speedata
byMaking Data Simple
0 ratings
0% found this document useful
77: How to become a BI Data Journalist w/ Kimberly Herrington: Finding your dream job in the world of data and analytics might not be as hard you think! Our guest today, Kimberly Herrington stands as a testament to this idea and she joins us on AOF to talk about how you can go about identifying and capturing your...
Podcast episode
77: How to become a BI Data Journalist w/ Kimberly Herrington: Finding your dream job in the world of data and analytics might not be as hard you think! Our guest today, Kimberly Herrington stands as a testament to this idea and she joins us on AOF to talk about how you can go about identifying and capturing your...
byAnalytics on Fire
0 ratings
0% found this document useful
78: Mindset of a Rockstar Data Analyst w/ Trevor Tapscott: Our focus for this inspiring episode of AOF is mindset, especially if you want to be a standout data analyst! I have brought one of my first ever followers and day ones! Trevor Tapscott is a VP and Analytics Consultant at Wells Fargo and has been in...
Podcast episode
78: Mindset of a Rockstar Data Analyst w/ Trevor Tapscott: Our focus for this inspiring episode of AOF is mindset, especially if you want to be a standout data analyst! I have brought one of my first ever followers and day ones! Trevor Tapscott is a VP and Analytics Consultant at Wells Fargo and has been in...
byAnalytics on Fire
0 ratings
0% found this document useful
[DataFramed Careers Series #2] What Makes a Great Data Science Portfolio
Podcast episode
[DataFramed Careers Series #2] What Makes a Great Data Science Portfolio
byDataFramed
0 ratings
0% found this document useful
Data Visualization with Manuel Lima: Gabi Ferrara and Jon Foust are back today and joined by fellow Googler Manuel Lima.
Podcast episode
Data Visualization with Manuel Lima: Gabi Ferrara and Jon Foust are back today and joined by fellow Googler Manuel Lima.
byGoogle Cloud Platform Podcast
0 ratings
0% found this document useful
62: Cracking the Data Code w/ Mike Bugembe: Mike Bugembe is a speaker, consultant, and Amazon best selling author of the book Cracking the Data Code. He joins today’s podcast to talk about the things that you can do that will help create successful analytics projects. After being the...
Podcast episode
62: Cracking the Data Code w/ Mike Bugembe: Mike Bugembe is a speaker, consultant, and Amazon best selling author of the book Cracking the Data Code. He joins today’s podcast to talk about the things that you can do that will help create successful analytics projects. After being the...
byAnalytics on Fire
0 ratings
0% found this document useful
126 | FlowingData with Nathan Yau
Podcast episode
126 | FlowingData with Nathan Yau
byData Stories
0 ratings
0% found this document useful
Delivering Data and Analytics Value: CEOs cite data and analytics as the top capability for enabling growth over the next two years. In this podcast, Gartner’s chief of research for data and analytics, Carlie Idoine, highlights the top issues facing chief data and analytics officers (CDAOs) and how to demonstrate value.
Podcast episode
Delivering Data and Analytics Value: CEOs cite data and analytics as the top capability for enabling growth over the next two years. In this podcast, Gartner’s chief of research for data and analytics, Carlie Idoine, highlights the top issues facing chief data and analytics officers (CDAOs) and how to demonstrate value.
byTechWave: A Gartner Podcast for IT Leaders
0 ratings
0% found this document useful
#122 How Organizations Can Bridge the Data Literacy Gap
Podcast episode
#122 How Organizations Can Bridge the Data Literacy Gap
byDataFramed
0 ratings
0% found this document useful
CFO lessons learned in planning and forecasting - with Dan Fletcher, CFO Planful
Podcast episode
CFO lessons learned in planning and forecasting - with Dan Fletcher, CFO Planful
byMetrics that Measure Up
0 ratings
0% found this document useful
Six Rules to Dominate the Decade of Data: Today’s digital economy is more competitive than ever, and making smart data decisions can be what sets the leaders apart from the rest of the field. With new technologies now democratizing data at an accelerated pace, how can companies ensure that their data strategy is helping them stay ahead of the curve? As our third season comes to a close, Cindi takes a look back at some of our most insightful conversations to lay out six essential rules that every Data Chief should follow to dominate the decade of data.
Podcast episode
Six Rules to Dominate the Decade of Data: Today’s digital economy is more competitive than ever, and making smart data decisions can be what sets the leaders apart from the rest of the field. With new technologies now democratizing data at an accelerated pace, how can companies ensure that their data strategy is helping them stay ahead of the curve? As our third season comes to a close, Cindi takes a look back at some of our most insightful conversations to lay out six essential rules that every Data Chief should follow to dominate the decade of data.
byThe Data Chief
0 ratings
0% found this document useful
23: Growing a data community to 200K Followers in 2 years w/ Mike Delgado of Experian: Have you ever thought about starting your own data community? Ever wondered what really goes into it? Data communities are hot and have become the number one source for learning and networking! Our guest today talks to us about exactly how he grew...
Podcast episode
23: Growing a data community to 200K Followers in 2 years w/ Mike Delgado of Experian: Have you ever thought about starting your own data community? Ever wondered what really goes into it? Data communities are hot and have become the number one source for learning and networking! Our guest today talks to us about exactly how he grew...
byAnalytics on Fire
0 ratings
0% found this document useful
DataFramed Careers Series Special Announcement!
Podcast episode
DataFramed Careers Series Special Announcement!
byDataFramed
0 ratings
0% found this document useful
[DataFramed Careers Series #3]: Accelerating Data Careers with Writing
Podcast episode
[DataFramed Careers Series #3]: Accelerating Data Careers with Writing
byDataFramed
0 ratings
0% found this document useful
State of Containers in the Public Cloud
Podcast episode
State of Containers in the Public Cloud
byThe Cloudcast
0 ratings
0% found this document useful
Privacy-aware Data Pipelines with Skyflow’s Piper Keyes: A data analytics pipeline is important to modern businesses because it allows them to extract valuable insights from the large amounts of data they generate and collect on a daily basis. This leads to better decision making, improved efficiency, and ...
Podcast episode
Privacy-aware Data Pipelines with Skyflow’s Piper Keyes: A data analytics pipeline is important to modern businesses because it allows them to extract valuable insights from the large amounts of data they generate and collect on a daily basis. This leads to better decision making, improved efficiency, and ...
byPartially Redacted: Data Privacy, Security & Compliance
0 ratings
0% found this document useful
[Bite] Version Control for Data Scientists
Podcast episode
[Bite] Version Control for Data Scientists
byDataCafé
0 ratings
0% found this document useful
TestContainers to Reduce Developer Frustration
Podcast episode
TestContainers to Reduce Developer Frustration
byThe Cloudcast
0 ratings
0% found this document useful
Humans in the Loop - Lina Weichbrodt
Podcast episode
Humans in the Loop - Lina Weichbrodt
byDataTalks.Club
0 ratings
0% found this document useful
Episode 15: Nagios was the Original Call of Duty: Let’s chat about the Cloud and everything in between. The people in this world are pretty comfortable with not running physical servers on their own, but trusting someone else to run them. Yet, people suffer from the psychological barrier of thinking they
Podcast episode
Episode 15: Nagios was the Original Call of Duty: Let’s chat about the Cloud and everything in between. The people in this world are pretty comfortable with not running physical servers on their own, but trusting someone else to run them. Yet, people suffer from the psychological barrier of thinking they
byScreaming in the Cloud
0 ratings
0% found this document useful
Building ML Apps
Podcast episode
Building ML Apps
byThe Cloudcast
0 ratings
0% found this document useful
Defining Success: Metrics and KPIs - Adam Sroka
Podcast episode
Defining Success: Metrics and KPIs - Adam Sroka
byDataTalks.Club
0 ratings
0% found this document useful
129: How to Test Anything - David Lord: I asked people on twitter to fill in "How do I test _____?" to find out what people want to know how to test. Lots of responses. David Lord agreed to answer them with me. In the process, we come up with lots of great general advice on how to test just about anything.
Podcast episode
129: How to Test Anything - David Lord: I asked people on twitter to fill in "How do I test _____?" to find out what people want to know how to test. Lots of responses. David Lord agreed to answer them with me. In the process, we come up with lots of great general advice on how to test just about anything.
byTest and Code
0 ratings
0% found this document useful
Database Monitoring & Observability
Podcast episode
Database Monitoring & Observability
byThe Cloudcast
0 ratings
0% found this document useful
MLOps Coffee Sessions #10 Analyzing the Article “Continuous Delivery and Automation Pipelines in Machine Learning" // Part 2
Podcast episode
MLOps Coffee Sessions #10 Analyzing the Article “Continuous Delivery and Automation Pipelines in Machine Learning" // Part 2
byMLOps.community
0 ratings
0% found this document useful
Build Your Own Data Pipeline - Andreas Kretz
Podcast episode
Build Your Own Data Pipeline - Andreas Kretz
byDataTalks.Club
0 ratings
0% found this document useful
Developer Security
Podcast episode
Developer Security
byThe Cloudcast
0 ratings
0% found this document useful
From QAos to Chaos Engineering
Podcast episode
From QAos to Chaos Engineering
byThe Cloudcast
0 ratings
0% found this document useful
Automated Data Labeling for AI Apps
Podcast episode
Automated Data Labeling for AI Apps
byThe Cloudcast
0 ratings
0% found this document useful

Skip carousel

Want A Job In Data Science? You Might Have To Take A Standardized Test When Applying
Chicago Tribune
Article
Want A Job In Data Science? You Might Have To Take A Standardized Test When Applying
Jul 10, 2018
3 min read
01 Giving Data Collectors—and Donors—a Real-Time Rush
Fast Company
Article
01 Giving Data Collectors—and Donors—a Real-Time Rush
Mar 20, 2017
7 min read
Understanding ELT & ETL
Techfastly
Article
Understanding ELT & ETL
Apr 1, 2021
8 min read
How to Make Predictive Analytics Work for Your Business
Entrepreneur
Article
How to Make Predictive Analytics Work for Your Business
Jul 1, 2014
1 min read
The Future Of The Database
Linux Format
Article
The Future Of The Database
Aug 27, 2019
7 min read
THREE TRENDS DRIVING THE geospatial AI REVOLUTION
The European Business Review
Article
THREE TRENDS DRIVING THE geospatial AI REVOLUTION
Oct 3, 2019
5 min read
Data-driven Decision Making That Uses Data, Mind And Heart
The European Business Review
Article
Data-driven Decision Making That Uses Data, Mind And Heart
Jan 31, 2020
14 min read
Data Analytics: From Bias to Better Decisions
Rotman Management
Article
Data Analytics: From Bias to Better Decisions
Sep 1, 2018
7 min read
Why Are Leaders Trusting Their Gut Instinct Over Analytics? AND WHAT TO DO ABOUT IT
NZBusiness and Management
Article
Why Are Leaders Trusting Their Gut Instinct Over Analytics? AND WHAT TO DO ABOUT IT
Mar 26, 2019
3 min read
Observability Of The Kernel And Containers
Linux Format
Article
Observability Of The Kernel And Containers
Apr 4, 2023
Mihalis Tsoukalos is currently working on Time Series. You can reach him at: @mactsouk. For our final delve into eBPF, we’re tackling applications, the kernel and Docker containers. At the end of the day, all Linux machines execute code for applicat
10 min read
Powering Costing With Artificial Intelligence: The Case Of Vodafone Procurement
The European Business Review
Article
Powering Costing With Artificial Intelligence: The Case Of Vodafone Procurement
May 25, 2021
8 min read
Build A Static Analysis Development Pipeline
Linux Format
Article
Build A Static Analysis Development Pipeline
Jul 27, 2021
9 min read
Discover Easy-to -build Desktop Apps
Linux Format
Article
Discover Easy-to -build Desktop Apps
Oct 22, 2019
Electron is actually a browser packaged with node.js and a few APIs. Because it’s built on top of the Chromium browser, you have everything available from there to add to your application. GitHub developed it as part of the Atom editor; it was open-s
7 min read
DJANGO Create A Database-driven Website
Linux Format
Article
DJANGO Create A Database-driven Website
Jun 4, 2019
The Django web framework was named after the famous guitarist Django Reinhardt and was first created by web developers at a small newspaper in Kansas. The main goals of Django is to enable fast development of complex websites with database needs. It
7 min read
Manipulate Data Like A Pro With Pandas
Linux Format
Article
Manipulate Data Like A Pro With Pandas
Jul 27, 2021
7 min read
Monitor Systems And Docker Deployments
Linux Format
Article
Monitor Systems And Docker Deployments
Jun 30, 2020
Welcome to Netdata, software for distributed real-time performance and health monitoring of UNIX machines. Don’t you dare turn that page! A key advantage of Netdata is that it collects all of its metrics without introducing too much load on to the Li
8 min read
Network-monitoring software 2024
PC Pro Magazine
Article
Network-monitoring software 2024
Feb 8, 2024
4 min read
Prototype Paves Way For ‘Computer-on-a-chip’
Futurity
Article
Prototype Paves Way For ‘Computer-on-a-chip’
Feb 22, 2019
2 min read
AI See You…
Linux Format
Article
AI See You…
Jun 27, 2023
5 min read
Ask
MacLife
Article
Ask
Dec 7, 2021
6 min read
Geekbench 5 Is Released With All-new Tests
Macworld UK
Article
Geekbench 5 Is Released With All-new Tests
Sep 20, 2019
2 min read
Geekbench 5 Is Released With All-new Tests
iPad & iPhone User
Article
Geekbench 5 Is Released With All-new Tests
Sep 20, 2019
2 min read
Cloud & Hybrid Backup 2022
PC Pro Magazine
Article
Cloud & Hybrid Backup 2022
Aug 7, 2022
4 min read
McAfee Total Protection
Macworld UK
Article
McAfee Total Protection
Aug 21, 2020
4 min read
Hybrid Backup For Business
PC Pro Magazine
Article
Hybrid Backup For Business
Apr 8, 2021
4 min read
Choose The Tech Kit And Services That Are Best For The Planet
PC Pro Magazine
Article
Choose The Tech Kit And Services That Are Best For The Planet
Apr 6, 2023
6 min read
Geekbench 5 Is Released With All-new Tests, Modes, And Scores
MacWorld
Article
Geekbench 5 Is Released With All-new Tests, Modes, And Scores
Oct 15, 2019
2 min read
How To Simulate Electronic Circuits
Linux Format
Article
How To Simulate Electronic Circuits
Jun 2, 2020
9 min read
Edge Computing In Europe: A Key Driver Of Business Innovation
The European Business Review
Article
Edge Computing In Europe: A Key Driver Of Business Innovation
Jan 26, 2024
1 83% of our survey respondents believe that edge computing will be essential to remaining competitive in the future but only 65% are using edge today. 2 Super Integrators — edge adopters that tie edge to business in transformation adoption — compris
8 min read
Experts Solve Your Computing Problems
APC
Article
Experts Solve Your Computing Problems
Nov 4, 2019
4 min read

Related categories

Skip carousel

Reviews for Guerrilla Analytics

Rating: 4.5 out of 5 stars

4.5/5

1 rating0 reviews

Book preview

Guerrilla Analytics - Enda Ridge

Guerrilla Analytics

A Practical Approach to Working with Data

Enda Ridge

Cover

Title page

Copyright Page

Preface

Part 1: Principles

Chapter 1: Introducing Guerrilla Analytics

Summary

1.1. What is data analytics?

1.2. Types of data analytics projects

1.3. Introducing Guerrilla Analytics projects

1.4. Guerrilla Analytics definition

1.5. Example Guerrilla Analytics projects

1.6. Some terminology

1.7. Wrap up

Chapter 2: Guerrilla Analytics: Challenges and Risks

Summary

2.1. The Guerrilla Analytics workflow

2.2. Challenges of managing analytics projects

2.3. Risks

2.4. Impact of failure to address analytics risks

2.5. Wrap up

Chapter 3: Guerrilla Analytics Principles

Summary

3.1. Maintain data provenance despite disruptions

3.2. The principles

3.3. Applying the principles

3.4. Wrap up

Part 2: Practice

Chapter 4: Stage 1: Data Extraction

Summary

4.1. Guerrilla Analytics workflow

4.2. Pitfalls and risks

4.3. Practice tip 1: freeze the source system during data extraction

4.4. Practice tip 2: extract data into an agreed file format

4.5. Practice tip 3: calculate checksums before data extraction

4.6. Practice tip 4: capture front-end reports

4.7. Practice tip 5: save raw copies of web pages

4.8. Practice tip 6: consistency check OCR data

4.9. Wrap up

Chapter 5: Stage 2: Data Receipt

Summary

5.1. Guerrilla Analytics workflow

5.2. Pitfalls and risks

5.3. Practice tip 7: have a single location for all data received

5.4. Practice tip 8: create unique identifiers for received data

5.5. Practice tip 9: store data tracking information in a data log

5.6. Practice tip 10: never modify raw data files

5.7. Practice tip 11: keep supporting material near the data

5.8. Practice tip 12: version-control data received

5.9. Bringing it all together

5.10. Wrap up

Chapter 6: Stage 3: Data Load

Summary

6.1. Guerrilla Analytics Workflow

6.2. Pitfalls and risks

6.3. Practice tip 13: minimize modifications to data before load

6.4. Practice tip 14: do data load preparations on a copy of raw data files

6.5. Practice tip 15: add identifiers to raw data before loading

6.6. Practice tip 16: prefer one-to-one Data Loads

6.7. Practice tip 17: preserve the raw file name and data UID

6.8. Practice tip 18: load data as plain text

6.9. Common challenges

6.10. Wrap up

Chapter 7: Stage 4: Analytics Coding for Ease of Review

Summary

7.1. Guerrilla Analytics workflow

7.2. Pitfalls and risks

7.3. Practice tip 19: use one code file per data output

7.4. Practice tip 20: produce clearly identifiable data outputs

7.5. Practice tip 21: write code that runs from start to finish

7.6. Practice tip 22: favor code that is not embedded in proprietary file formats

7.7. Practice tip 23: clearly label the running order of code files

7.8. Practice tip 24: drop all datasets at the start of code execution

7.9. Practice tip 25: break up data flows into data steps

7.10. Practice tip 26: don’t jump in and out of a code file

7.11. Practice tip 27: log code execution

7.12. Common Challenges

7.13. Wrap up

Chapter 8: Stage 4: Analytics Coding to Maintain Data Provenance

Summary

8.1. Guerrilla Analytics workflow

8.2. Examples

8.3. Pitfalls and risks

8.4. Practice tip 28: clean data at a minimum of locations in a data flow

8.5. Practice tip 29: when cleaning a data field, keep the original raw field

8.6. Practice tip 30: filter data with flags, not deletions

8.7. Practice tip 31: identify fields with metadata

8.8. Practice tip 32: create a unique identifier for DATA records

8.9. Practice tip 33: rename data fields with a field mapping

8.10. Wrap up

Chapter 9: Stage 6: Creating Work Products

Summary

9.1. Guerrilla Analytics workflow

9.2. Examples

9.3. The essence of a work product

9.4. Pitfalls and risks

9.5. Practice tip 34: track work products with a Unique Identifier (UID)

9.6. Practice tip 35: keep work product generators and outputs close together

9.7. Practice tip 36: avoid clutter in the file system

9.8. Practice tip 37: avoid clutter in the DME

9.9. Practice tip 38: give output data records a UID

9.10. Practice tip 39: version control work products

9.11. Practice tip 40: use a convention to name complex outputs

9.12. Practice tip 41: log all Work Products

9.13. Wrap up

Chapter 10: Stage 7: Reporting

Summary

10.1. Guerrilla Analytics workflow

10.2. What is a report?

10.3. Why reports are complicated

10.4. Report components

10.5. Pitfalls and risks

10.6. Practice tip 42: liaise with report writers

10.7. Practice tip 43: create one work product per report component

10.8. Practice tip 44: make presentation quality work products

10.9. Extreme reporting

10.10. Wrap up

Chapter 11: Stage 5: Consolidating Knowledge in Builds

Summary

11.1. Introduction

11.2. Pitfalls and risks

11.3. Example: the customer address problem

11.4. Sources of variation

11.5. Definition of a build

11.6. The customer address example using a Build

11.7. Data Builds

11.8. Service Builds

11.9. When to start a build

11.10. Wrap up

Part 3: Testing

Chapter 12: Introduction to Testing

Summary

12.1. Guerrilla Analytics workflow

12.2. What is testing?

12.3. Why do testing?

12.4. Areas of testing

12.5. Comparing expected and actual

12.6. The challenge of testing Guerrilla Analytics

12.7. Practice Tip 61: establish a testing culture

12.8. Practice Tip 62: test early

12.9. Practice Tip 63: test often

12.10. Practice Tip 64: give tests unique identifiers

12.11. Practice Tip 65: organize test data by test UID

12.12. Next chapters on testing

12.13. Wrap up

Chapter 13: Testing Data

Summary

13.1. Guerrilla Analytics workflow

13.2. The five C’s of testing data

13.3. Testing data completeness

13.4. Testing data correctness

13.5. Testing consistency

13.6. Testing data coherence

13.7. Testing accountability

13.8. Implementing data testing

13.9. Wrap up

Chapter 14: Testing Builds

Summary

14.1. Structure of a data build

14.2. An illustrative example

14.3. Types of build tests

14.4. Test code development

14.5. Organizing build test code

14.6. Organizing test data

14.7. Wrap up

Chapter 15: Testing Work Products

Summary

15.1. Types of testable work products

15.2. Ordinary work products

15.3. General tips on testing ordinary work products

15.4. Testing statistical models

15.5. General tips on testing models

15.6. Wrap up

Part 4: Building Guerrilla Analytics Capability

Introduction

Chapter 16: People

Summary

16.1. That question again – what is data analytics?

16.2. Guerrilla Analytics skills

16.3. Programming

16.4. Substantive expertise

16.5. Communication

16.6. Maths and stats

16.7. Visualization

16.8. Software engineering

16.9. Mindset

16.10. Wrap up

Chapter 17: Process

Summary

17.1. What is workflow management?

17.2. Workflows in Analytics

17.3. Levels of review

17.4. Linking work products

17.5. Classifying work products

17.6. Granularity

17.7. When to use workflow management

17.8. Wrap up

Chapter 18: Technology

Summary

18.1. Analytics capabilities

18.2. Data manipulation environment

18.3. Source code control

18.4. Access to the command line

18.5. High-level scripting language

18.6. Visualization

18.7. Build tool

18.8. Access to the internet

18.9. Encryption

18.10. Code libraries for data wrangling

18.11. Machine learning and statistics libraries

18.12. Centralized and controlled file system

18.13. Additional technology capabilities

18.14. Wrap up

Chapter 19: Closing Remarks

19.1. What was this book about?

19.2. Next steps for Guerrilla Analytics

19.3. Keep in touch

Acknowledgments

Appendix: Data Gymnastics

References

Index

Copyright Page

Acquiring Editor: Steve Elliot

Editorial Project Manager: Kaitlin Herbert

Project Manager: Priya Kumaraguruparan

Designer: Mark Rogers

Morgan Kaufmann is an imprint of Elsevier

225 Wyman Street, Waltham, MA 02451, USA

No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions.

This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein).

Notice

Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary.

Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.

To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein.

British Library Cataloguing-in-Publication Data

A catalogue record for this book is available from the British Library

Library of Congress Cataloging-in-Publication Data

Application submitted

ISBN: 978-0-12-800218-6

For information on all MK publications visit our website at http://www.mkp.com

Preface

Why this book?

Data analytics involves taking some data and exploring and testing it to produce insights. You can put a variety of names on this process from Business Intelligence to Data Science but fundamentally the approach does not change. Understand a problem, identify the right data, prepare the data appropriately, and run the appropriate analysis on it to find insights and report on them. This is difficult. You are probably seeing this data for the first time. Worse still, the data usually has issues you will only uncover during your journey. Meanwhile, the problem domain must be understood so the data that represents it can be understood. But what is discovered in the data often helps define the problem domain itself.

Faced with this open-ended challenge, many analysts become lost in the data. They explore multiple lines of enquiry. One line of enquiry can invalidate or confirm a previous line. The structure and exceptions in the data are discovered during the process and must be accounted for. Many of the analyses themselves can be executed in a multitude of ways, none of which are categorically correct but instead must be interpreted and justified. Just when you thought you had a handle on the problem, new data arrives and everything you have already done is potentially invalidated. This makes planning, executing, and reproducing data analytics challenging.

If you have ever been in this situation then this book is for you.

What this book is and what it is not

First of all, let me cover what this book is not.

• This book is not a prescriptive guide to either specific technologies or analytics techniques. For that you will have to read widely in fields such as machine learning, statistics, database programming, scripting, web development, and data visualization. It is my belief that while technology continues to improve at pace, the fundamental principles of how to do data analytics change little.

• This is not a project management book. I certainly believe project management of analytics needs more attention. Analytics projects are complex and fast-paced and it seems that established project management techniques can struggle to cope with them. This book will help you in areas such as tracking of work but it does not take a project management focus in the presentation of any of its material.

• This book is not about Big Data. It is also not about little data or medium data. Debates about whether Big Data is something new or indeed something at all are left to others. As you will see, this book’s principles and its practice tips are applicable to all types of data analysis regardless of the scale.

• This book is not about how to build large data warehouses and web-based Business Intelligence platforms. These techniques are also well covered in the literature having been tackled in academia and the software development industry for several decades.

My goal in writing this book is to help people who have been in the same situation as me. I want them to benefit from my experiences and the lessons I have learned, very often the hard way. This book aims to help you in the following three ways.

• How to do: This book is a guiding reference for data analysts who must work in dynamic analytics projects. It will help them do high-quality work that is reproducible and testable despite the many disruptions in their project environment and the typically open-ended nature of analytics. It will guide them through each stage of a data analytics job with overarching principles and specific practice tips.

• How to manage: This book is a how-to for data analytics managers. It will help them put in place light weight workflows and team conventions that are easy to understand and implement. Teams managed with this book’s principles in mind will avoid many of the pain points of analytics. They will be well coordinated, their work will be easily reviewed and their knowledge will be easily shared. The team will become safely independent, freeing up the manager to communicate and sell the team’s work instead of being mired in trying to cover every detail of the team’s activities.

• How to build: Finally, this book is a guide for those with the strategic remit of building and growing an analytics team. Chapters describe the people, processes, and technology that need to be put in place to grow an agile and versatile analytics team.

Who should read this book?

Data analytics is a hugely diverse area. Nonetheless, the fundamentals of how to do data analytics, manage analytics teams, and build analytics capability do not change significantly. You will benefit from reading this book if you work in any of the following roles.

• Data Analyst or Data Scientist: You are somebody who works directly with data and needs guidance on best practice for doing that work in an agile, controlled, reproducible way. If you have ever experienced been lost in the data or losing track of your own analyses and data modifications then this book will help you. If you have ever been frustrated with repeated conversations with your colleagues about where data is stored, what it means, or how your colleague analyzed it then this book will help you both.

• Analytics Manager: You are somebody who has several direct reports and you are responsible for guiding and reviewing their analytics work. You have to jump into many different work products from different team members to review their correctness. You do not have time to waste on facing a different approach, coding convention, data location, or test structure every time you sit down to review a piece of work. Your project resources come and go and you want to facilitate fast transitions and handovers with minimal overhead. You need to be able to explain your team’s work with confidence to customers but do not have time to be down in all the details of that work.

• Senior Manager: You are somebody who is busy interfacing with a customer and perhaps architecting a high-level approach to a customer’s problems. You need to know that your team’s work products are reproducible, tested, and traceable. You need to sell the quality, versatility, and speed of mobilization of your team to your customers.

• Team Director/Chief Information Officer/Chief Data Officer: You are somebody who wants to build the best analytics team possible to solve customer problems and respond to a wide variety of analytics challenges. To support this ambition, you need a uniformity of skills and methods in your analytics teams for flexibility of resourcing and sharing of knowledge. You want your teams to produce to high standards without suffocating them with rules or requiring they use expensive niche tools. You want your teams to have the right training and toolsets at their fingertips so they can get on with the work they do best.

• Researcher: You gather and analyze experimental data for research and publication purposes. This could be algorithm design in computer science, instrumentation data in physics, or any field requiring gathering data to test hypotheses. With such an open-ended exploratory approach to your work, you may struggle to coordinate multiple parallel lines of inquiry, multiple versions of analyses, and experiment result data. This book helps you do all of that so you can focus on reproducible and repeatable publication of results.

• Research Director: You are somebody who runs a team of researchers working on multiple concurrent research projects. Your concern is that research is reproducible and sharable among your teams so that the body of knowledge of your team and lab grows over time. You do not have time to be down in the details but you want to know that your team’s work is of publication quality in an academic context or can be easily transferred into production in an industry context.

How this book is organized

I have designed the book so that each chapter is as self-contained as possible and chapters can be read in any order. The book is organized into four parts.

Part 1 Principles introduces Guerrilla Analytics and the Guerrilla Analytics Principles. Begin here if you need an introduction to why analytics is difficult, what can go wrong, and how to mitigate the risks of things going wrong.

• Introducing Guerrilla Analytics

• Guerrilla Analytics: Challenges and Risks

• Guerrilla Analytics Principles

Part 2 Practice covers how to apply the Guerrilla Analytics Principles across the entire analytics workflow. Read any of these chapters if you are working in a particular stage of the Data Analytics Workflow. For example, jump into Data Load if you have just received some data from a customer. Look through Creating Work Products if you are beginning a piece of work that you will deliver to your customer.

• Data Extraction

• Data Receipt

• Data Load

• Analytics Coding for Ease of Review

• Analytics Coding to Maintain Data Provenance

• Creating Work Products

• Reporting

• Consolidating Knowledge in Builds

Part 3 Testing discusses how to test analytics work to discover defects. Begin with the introduction, if testing is new to you.

• Introduction to Testing

• Testing Data

• Testing Builds

• Testing Work Products

Part 4 Building Guerrilla Analytics Capability is all about the people skills, technology, and processes you need to put in place to establish and grow a Guerrilla Analytics team. Pick up one of these chapters if you are setting up a Guerrilla Analytics environment or are looking to hire and train a team in this book’s techniques.

• People

• Process

• Technology

Throughout the book, many of the points will be illustrated with simple examples. War stories will describe instances of how things can go badly wrong without the Guerrilla Analytics Principles. The war stories cover a variety of domains to appeal to as many readers as possible.

Disclaimer

It is important to state that the examples and war stories from this book are fictional and based on a decade of experiences, conversations, projects, and study. While drawn from real-world experiences, they are not particular to any of my employers or clients, past or present, and should not be interpreted as such.

Part 1 Principles

Introducing Guerrilla Analytics

Guerrilla Analytics: Challenges and Risks

Guerrilla Analytics Principles

Chapter 1

Introducing Guerrilla Analytics

Summary

In this chapter, we begin by discussing the very broad field of data analytics and what it means to do data analytics. To help us frame the subject of this book and escape any marketing and hype terminology, we will define what data analytics means for us. We will then look at the various types of projects in which data analytics is performed. This will give us an understanding of the entire spectrum of data analytics projects. We describe the particular type of analytics project that are the subject of this book. Specifically, these projects are defined by being very dynamic and having many disruptions while also being subject to several constraints. These are Guerrilla Analytics projects.

Keywords

Guerrilla Analytics

Introduction

Terminology

Having read this chapter, you will understand

• what data analytics is in a very general sense

• the projects in which data analytics is applied

• the type of analytics that is Guerrilla Analytics

• examples of Guerrilla Analytics projects

1.1. What is data analytics?

The last decade has seen phenomenal growth in the creation of data and in the analysis of data to provide insight. Social media and search giants such as Facebook and Google probably spring to mind. These analytics innovators gather immense amounts of data to understand Internet search and social habits so that they can better target online advertising for their customers. Online digital media is generating hours of content and streaming it around the globe for major sporting events such as the FIFA World Cup and the Olympics. In the Financial Services industry, firms process and store billions of financial transactions every day and analyze those transactions to gain an edge in the market over their competitors. Ubiquitous Telco operators store data on our call patterns to analyze it for indicators of customer churn and up-selling opportunities. Every time you book a hotel, flight, or go to the supermarket, loyalty card data is analyzed to better understand customer-purchasing habits and to better target marketing opportunities.

And this growth in data and analytics is not restricted to businesses. Scientific research centers are also creating immense amounts of experimental data in fields such as particle physics, genetics, and pharmacology. Government departments too are not exempt from this trend.

The complexity and pace of change have created a market for data analytics teams in consulting services firms to help their clients both cope with and profit from new data-driven opportunities.

Unsurprisingly, given the growth in data generation, the last decade has also seen a proliferation of the skills and tools needed for extracting value from data. Names for this field include Data Analytics, Data Mining, Quantitative Analysis, Big Data, Machine Learning, Business Intelligence, Artificial Intelligence, and Data Science. Vendors are frantically racing to provide enterprise grade tools to support work in these fields and to distinguish their offerings from those of their competitors. Universities are trumpeting degree programs that will train a generation of graduates to be conversant in these new technologies and skills.

All of this marketing noise, vendor hype, and pace of change can be confusing and overwhelming for somebody who just wants to get started in data and answer questions to solve problems. Big Data, data velocity, unstructured data, NoSQL, key-value stores, predictive modeling, social network analysis – it is very hard to know where to begin.

Before we get into the details, it will be helpful to step back and think a little about what data analytics can mean and agree on what it means to us in this book. Wikipedia, for example, defines data analytics as:

... a process of inspecting, cleaning, transforming, and modelling data with the goal of discovering useful information, suggesting conclusions, and supporting decision making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, in different business, science, and social science domains. (Anon n.d.)

This definition acknowledges the wide range of activities encompassed by the term data analytics. Tom Davenport’s book Competing on Analytics offers the following definition.

By analytics we mean the extensive use of data, statistical and quantitative analysis, explanatory and predictive models, and fact-based management to drive decisions and actions. The analytics may be input for human decisions or may drive fully automated decisions. (Davenport, 2006)

Again this is a broad definition. Clearly there are many different opinions on what data analytics is and what it should be called. Let’s step back and define data analytics for the purposes of this book.

1.1.1. Data Analytics Definition

First and foremost, this book is a practitioner’s book. We, therefore, need a practical definition of data analytics, so we can agree on what is in scope for discussion and what should be left to academic debate.

Data analytics is any activity that involves applying an analytical process to data to derive insight from the data.

Figure 1 illustrates this definition. A customer and/or a third party provides raw data to an analytics team. Analysis is done on the data, producing some modified data output. This output is returned to the customer to provide the customer with insight.

Figure 1 Definition of data analytics

1.1.2. Examples of Data Analytics

Such a general definition of data analytics means we can recognize analytics in many scenarios. Here are just a small number of data analytics activities.

• A phone company’s customer complaints team keys in 500 poorly scanned customer complaint letters for their data team. The data team reports back on what the common complaint theme is in those letters. They have converted data that was difficult to access into usable data, which was then enriched with complaint keywords. They now have an insight into the common complaint themes from their customers.

• My dad gives me a spreadsheet of household purchases and I tell him how much he spends on groceries per month. I have taken data in the form of dates and purchases and summarized them by month to provide insight into spending patterns.

• Emma, the IT administrator, is concerned about user access controls. She gives Aaron, the data analyst, a year of system log activity. Aaron reports back how users can be grouped based on their activity and what the likely activity is at a particular time on a particular day of the week. Emma now has an insight into who is doing what on the systems she manages.

• Feargus is always looking for new indie bands. An online streaming music website trawls through its user data, mining song plays to make recommendations to Feargus on new artists that he might like.

• A utilities contractor receives its subcontractors’ expense claims in hundreds of spreadsheets every month. These spreadsheets are brought together in a database, cleaned, and used to report on subcontractor expenses and search for potentially fraudulent expense claims.

• A financial services firm called OlcBank, having mis-sold financial products to their customers, is tasked with reconstructing the history of its product sales for inspection by a third party and a government regulator.

• A manufacturing plant Widget Inc., suspicious of fraud in its material purchase approvals wants to search its financial and manufacturing data for evidence of fraud.

There are several points to note from these examples of data analytics activities.

• Technology agnostic: First, there is no mention of any specific technology involved in the data analytics process. The analyst may be dragging formulas in a spreadsheet. They may be pushing data through the latest parallel streaming data processor. They may be training a troop of analytical monkeys to manipulate the data as required. Our definition is independent of the technology used and should not be confounded with the latest technology trend.

• Activity agnostic: Second, there is no differentiation between different types of data analytics activities. Some work is descriptive analytics that creates a summary and profile of data. Other work is data mining that trawls through data looking for patterns. Some work is predictive analytics that builds a model of the data and uses it to make predictions about new data. Some work is combinations of these things. The details of what is done with the data do not matter as long as the data is used to produce insight at some level of sophistication.

• Scale agnostic: Third, there is no attempt to comment on the scale or type of the data being analyzed. The work can deal with 100 rows in a spreadsheet table, 10,000 text documents describing insurance claims or some social media data feed approaching scales currently called Big Data (Franks, 2012).

This book is aimed at people involved in taking a variety of types of data from a variety of sources, analyzing it with a variety of methods of varying sophistication and returning it to their customers with insight. This insight can be used to make recommendations and

Enjoying the preview?

Page 1 of 1

Guerrilla Analytics: A Practical Approach to Working with Data

About this ebook

Enda Ridge

Related authors

Related to Guerrilla Analytics

Related ebooks

Enterprise Applications For You

Related podcast episodes

Related articles

Related categories

Reviews for Guerrilla Analytics

What did you think?

Book preview

Guerrilla Analytics - Enda Ridge

Table of Contents

Cover

Title page

Copyright Page

Preface

Part 1: Principles

Chapter 1: Introducing Guerrilla Analytics

Chapter 2: Guerrilla Analytics: Challenges and Risks

Chapter 3: Guerrilla Analytics Principles

Part 2: Practice

Chapter 4: Stage 1: Data Extraction

Chapter 5: Stage 2: Data Receipt

Chapter 6: Stage 3: Data Load

Chapter 7: Stage 4: Analytics Coding for Ease of Review

Chapter 8: Stage 4: Analytics Coding to Maintain Data Provenance

Chapter 9: Stage 6: Creating Work Products

Chapter 10: Stage 7: Reporting

Chapter 11: Stage 5: Consolidating Knowledge in Builds

Part 3: Testing

Chapter 12: Introduction to Testing

Chapter 13: Testing Data

Chapter 14: Testing Builds

Chapter 15: Testing Work Products

Part 4: Building Guerrilla Analytics Capability

Introduction

Chapter 16: People

Chapter 17: Process

Chapter 18: Technology

Chapter 19: Closing Remarks

Appendix: Data Gymnastics

References

Index

Copyright Page

Notice

Preface

Why this book?

What this book is and what it is not

Who should read this book?

How this book is organized

Disclaimer

Part 1

Principles

Summary

Keywords

1.1. What is data analytics?

1.1.1. Data Analytics Definition

1.1.2. Examples of Data Analytics