Escolar Documentos
Profissional Documentos
Cultura Documentos
Created by
2
Table of
Contents
Getting Started 3
What is a data warehouse and why do you need one?
Evaluation Criteria 5
Data types
Scale
Maintenance
Performance
Cost
Community
Selecting the Right Cloud Data Warehouse for Analytics Table of Contents
3
Getting Started
Clusters Nodes
Selecting the Right Cloud Data Warehouse for Analytics Getting Started
4
Getting Started
With clean, accurate, and complete data, you’ll be prepared to answer busi-
ness critical questions: How much revenue is at stake from unanswered
support tickets? What percent of customers renew after using a certain set
of product features? How much revenue can we expect next quarter from a
specific category or product?
Armed with the right insights, you will be able to more effectively drive prod-
uct design and development, evaluate marketing campaign effectiveness, and
spot potential issues in your user experience.
Selecting the Right Cloud Data Warehouse for Analytics Getting Started
5
Evaluation
Criteria for Data types Scale Maintenance
What type of data you want The amount of data you plan How much engineering effort
Your Data your warehouse to store to store you’re willing and able to
dedicate to your warehouse
Warehouse
Once you’ve decided that a data
warehouse is necessary for your
team’s needs, there are a number of
important factors to consider when
making a selection. Performance Cost Community
How quickly you need your How much you are willing How connected your
data when you query it to spend on your warehouse is to other critical
data warehouse tools and services
Note: Segment supports the following cloud It’s important to consider your use case when selecting a data warehouse. Your specific use
based data warehouses: Amazon Redshift, case will determine the importance of each of these factors. You should also keep in mind
Google BigQuery, IBM Db2 Warehouse, that many of the factors listed will directly influence one another and tradeoffs may be
Postgres, Snowflake. necessary. For example, opting for less scale may decrease performance but will typically
be more cost-effective. Throughout the selection guide we’ll highlight uses cases that will
help you optimize for each factor.
Selecting the Right Cloud Data Warehouse for Analytics Evaluation Criteria for Your Data Warehouse
6
I D E A L U S E C AS E S
Works well with data like User data, inventory Email content, photos, videos
For analysis like User paths, funnel analysis Text mining, language processing Analyzing language, text, or images
— If you’re doing a large amounts of text
Can query with SQL MapReduce, Python
mining, language processing, or image
processing, you’ll need to consider a
non-relational database. Depending on
the type data you are collecting, a data
A relational database works well with structured A non-relational database excels with extremely
warehouse that supports semi-structured
data or data that fits nicely into rows and large amounts of semi-structured data. Classic
data, like Snowflake, would also work.
columns. If your data could be organized into one examples of semi-structured data are emails,
extra large spreadsheet, then a relational data books, social media posts, audio/visual data, Examples:
warehouse would be a good fit for your company. and geographical data. You should consider a
data lake over a classic data warehouse if you are
working with purely unstructured data.
Selecting the Right Cloud Data Warehouse for Analytics Evaluation Criteria for Your Data Warehouse
7
I D E A L U S E C AS E S
Snowflake, BigQuery, and Db2 Warehouse, are able to store massive amounts of Examples:
data without much overhead cost. Most companies won’t need scale beyond what
those warehouses can deliver, especially if analytics is the primary use case.
However, in cases where extreme scale is needed (greater than 2 terabytes of data),
Collecting data from all sources —
a non-relational warehouse will typically be a better fit because they won’t impose You should consider a warehouse
restraints on incoming data, allowing you to write faster. specifically built for large scale if you have
over a terabyte of data and have multiple
users querying your data simultaneously.
There aren’t strict limitations as to how much data each warehouse can handle. However, we’ve found that Larger scale means that you won’t have
each excel within certain bands. any issues storing your data or keeping
your queries fast.
Database Options by Scale
Examples:
Data Size < 1 TB 2 - 64 TB 64 TB - 2 PB+
You’ll also want to consider how a particular warehouse scales during times of demand. For example,
Redshift can support massive amounts of data but will require you to manually add more nodes (for added
storage and compute power). Snowflake, on the other hand, offers an auto-scale function which spins up
and down clusters dynamically, as needed. BigQuery offers automatic management of resources —
invisible to the user — to meet additional needs and also offers a free, batch ingest method that doesn’t
compete with query capacity. Each offers impressive scale but works slightly differently.
Selecting the Right Cloud Data Warehouse for Analytics Evaluation Criteria for Your Data Warehouse
8
I D E A L U S E C AS E S
Working with structured data comes with another advantage: you can use SQL to
query them. SQL is well-known among analysts and engineers, and it’s easier to
No maintenance required — These
learn than most programming languages. Running analytics on semi-structured
warehouses will be a good fit for a smaller
data generally requires an object-oriented programming background or a code- team or a team that doesn’t want to
heavy data science background. Even with the emergence of analytics tools, like dedicate time to tuning. What’s sacrificed
in customization is gained in ease-of-use
Splunk for Hadoop or Slamdata for Mongo, analyzing these types of warehouses
and consistent performance.
requires special skills and far more data maintenance.
Examples:
Selecting the Right Cloud Data Warehouse for Analytics Evaluation Criteria for Your Data Warehouse
9
I D E A L U S E C AS E S
Selecting the Right Cloud Data Warehouse for Analytics Evaluation Criteria for Your Data Warehouse
10
I D E A L U S E C AS E S
Pay per hour Pay for storage Pay for storage Examples:
Cost Pay per query
based on nodes or and compute and compute Free
Structure or flat rate
per bytes scanned time time
Selecting the Right Cloud Data Warehouse for Analytics Evaluation Criteria for Your Data Warehouse
11
I D E A L U S E C AS E S
Selecting the Right Cloud Data Warehouse for Analytics Evaluation Criteria for Your Data Warehouse
12
Scales horizontally by Automatically resizes your Automatically scales to Automatically scales by Requires manual data
Scale
manually adding new warehouse without keep queries fast adding new nodes as partitioning to scale
nodes storage limits needed effectively
You can manually add Uses available resources Clusters automatically spin Auto-scales clusters when Generally run on a single
Performance
nodes to keep queries fast as needed; no configs to up and down depending needed to keep queries machine
improve speed on usage fast
Requires some manual Fully managed Fully managed Fully managed Requires manual
Maintenance
maintenance maintenance
AWS ecosystem Google Cloud Platform Enables data sharing IBM ecosystem Large ecosystem of
Community
ecosystem across Snowflake compatible products
warehouses
Selecting the Right Cloud Data Warehouse for Analytics Quick Reference Guide
13
With Segment, you can bypass these hurdles of turning on a new data warehouse. Once
integrated, Segment will automatically do the ETL and setup the schema for you.
Moreover, Segment also provides tools that help prevent unwanted or “unclean” data from
ever reaching your data warehouse in the first place. This way, your data warehouse stays
fast and your analytics stay accurate.
However you do it, getting your data into a format where it can be holistically analyzed is
essential. Only by having your raw user data in a flexible, SQL format can you answer
granular questions about what your customers are doing across all platforms while
accurately measuring attribution, and building company-specific dashboards.
LEARN MORE
Selecting the Right Cloud Data Warehouse for Analytics How Segment Can Help