Você está na página 1de 37

Data Science & Business Analytics

Data Anal ysis and


Exploration
2023 | 2024

Alexandre Gomes Baptista


Objectivos da UC
Os objetivos desta UC centram-se na obtenção de uma melhor eficácia de decisão e ação
obtida através da utilização do conhecimento extraído das fontes de dados, sejam elas
internas ou externas à organização. Pretende-se com esta UC dotar os participantes
das capacidades de análise e exploração de informação recorrendo a ferramentas de apoio
ao desenvolvimento de soluções de Business Intelligence numa perpetiva de Corporate
BI.
Com estas ferramentas pretende-se:
• Transformar dados em informação
• Automatizar o processo de extração de dados provenientes de diversas fontes
• Disponibilização de informação através de mecanismos que facilitam a sua leitura e interpretação
(elementos visuais)
Todos estes mecanismos permitem um processo de tomada de decisão mais rápido,
simples e eficaz.
Plano de Aulas
Aula Dia Hora Temática Docente
18:00-20:00
1 07/11/2023 Corporate BI – Azure Analysis Services Alexandre Baptista
20:30-22:30
18:00-20:00
2 08/11/2023 Prática Laboratorial em Azure Analysis Services Alexandre Baptista
20:30-22:30
18:00-20:00
3 14/11/2023 Prática Laboratorial em Azure Analysis Services Alexandre Baptista
20:30-22:30
18:00-20:00
4 15/11/2023 DAX - Data Analysis Expressions Alexandre Baptista
20:30-22:30
18:00-20:00
5 21/11/2023 DAX - Data Analysis Expressions Alexandre Baptista
20:30-22:30
18:00-20:00
6 22/11/2023 Prática Laboratorial em Azure Analysis Services Alexandre Baptista
20:30-22:30
18:00-20:00
7 28/11/2023 Prática Laboratorial em Excel Alexandre Baptista
20:30-22:30
12/12/2023 19:00-20:00 Exame Época Normal Alexandre Baptista

Exame Época Recurso Alexandre Baptista


Avaliação
1. Trabalho
• Ponderação 60%

• Elaborado por grupos de 3 a 4 alunos


2. Prova Escrita Individual Final (Época Normal)
• Teste escrito individual, sem consulta, sobre toda a matéria

• Ponderação 40%

• Respostas erradas descontam 100% / número de respostas possíveis (25% se 4 opções)

• Classificação mínima no teste para obter aprovação: 8,5 valores


3. Melhorias de Nota
• Melhorias de Nota não é considerado o trabalho
4. Provas Orais
• Alunos com classificação final superior a 17 valores poderão ser sujeitos a prova oral
Indice
• Data Modeling
• Historical Overview of OLAP
• What is OLAP and how it defers from OLTP
• Storage Modes
• BI Semantic Model
• Analysis Services
• Toolset required for the classes
Data Modeling

Corporate BI
Self-Service BI IT professional
Business analyst
Historical Overview
Edgar Frank "Ted" Codd (19 August 1923 – 18 April 2003) was an
English computer scientist who, while working for IBM, invented the
relational model for database management, the theoretical basis for
relational databases and relational database management systems.

He made other valuable contributions to computer science, but the


relational model, a very influential general theory of data management,
remains his most mentioned, analyzed and celebrated achievement
Online analytical processing (OLAP)
OLAP is an acronym for Online Analytical Processing. OLAP performs multidimensional
analysis of business data and provides the capability for complex calculations, trend
analysis, and sophisticated data modeling.

Knowledge is the foundation of all successful decisions. Successful businesses


continuously plan, analyze and report on sales and operational activities in order to
maximize efficiency, reduce expenditures and gain greater market share.

Statisticians will tell you that the more sample data you have, the more likely the resulting
statistic will be true. Naturally, the more data a company can access about a specific
activity, the more likely that the plan to improve that activity will be effective.
Online analytical processing (OLAP)
All businesses collect data using many different systems, and the challenge remains: how
to get all the data together to create accurate, reliable, fast information about the
business. A company that can take advantage and turn it into shared knowledge,
accurately and quickly, will surely be better positioned to make successful business
decisions and rise above the competition.

OLAP technology has been defined as the ability to achieve “fast access to shared
multidimensional information.” Given OLAP technology’s ability to create very fast
aggregations and calculations of underlying data sets, one can understand its usefulness in
helping business leaders make better, quicker “informed” decisions.
Online analytical processing (OLAP)
Online analytical processing (OLAP) is an approach to answering multi-dimensional analytical
queries. OLAP tools enable users to analyze multidimensional data interactively from multiple
perspectives.
It consists of numeric facts called measures that are categorized by
dimensions.
The measures are placed at the intersections of the cube, which is
spanned by the dimensions as a vector space. The usual interface
to manipulate an OLAP cube is a matrix interface, like Pivot tables in
a spreadsheet program, which performs projection operations
along the dimensions, such as aggregation or averaging.
The cube metadata is typically created from a star schema or
snowflake schema or fact constellation of tables in a relational
database. Measures are derived from the records in the fact table
and dimensions are derived from the dimension tables.
Each measure can be thought of as having a set of labels, or meta-
data associated with it. A dimension is what describes these labels;
it provides information about the measure.
Online analytical processing (OLAP)
OLAP consists of three basic analytical operations:
• Consolidation involves the aggregation* of data that can be
accumulated and computed in one or more dimensions.
• Drill-down is a technique that allows users to navigate through the
details.
• Slicing and dicing is a feature whereby users can take out (slicing) a
specific set of data of the OLAP cube and view (dicing) the slices
from different viewpoints. These viewpoints are sometimes called
dimensions (such as looking at the same sales by salesperson or by
date or by customer or by product or by region, etc.)
Databases configured for OLAP use a multidimensional data model,
allowing for complex analytical and ad hoc queries with a rapid
execution time.
*Aggregations are built from the fact table by changing the granularity
on specific dimensions and aggregating up data along these
dimensions. The number of possible aggregations is determined by
every possible combination of dimension granularities.
Online analytical processing (OLAP)

Three different Storage


Modes:
• ROLAP – Relational Online
Analytical Processing –
Also called directQuery
• HOLAP – Hybrid Online
Analytical Processing
• MOLAP - Multi-dimensional
Online Analytical
Processing
Online analytical processing (OLAP)
MULTIDIMENSIONAL OLAP (MOLAP)
MOLAP (multi-dimensional online analytical processing) is the classic form of
OLAP and is sometimes referred to as just OLAP. MOLAP stores this data in
an optimized multi-dimensional array storage, rather than in a relational
database.
Some MOLAP tools require the pre-computation and storage of derived
data, such as consolidations – the operation known as processing. Such
MOLAP tools generally utilize a pre-calculated data set referred to as a data
cube. The data cube contains all the possible answers to a given range of
questions. As a result, they have a very fast response to queries. On the
other hand, updating can take a long time depending on the degree of pre-
computation. Pre-computation can also lead to what is known as data explosion.
The structure of a multidimensional model is not a series of tables but what is
generally referred to as a cube. Cubes modeled in a multidimensional database
extend the concept associated with spreadsheets: just as a cell in a
spreadsheet represents the intersection of two dimensions (sales of
product by region), a cell in a cube represents the intersection of an infinite
number of dimension members.
As in a spreadsheet, a cell might be calculated by formulas involving other cells.
Online analytical processing (OLAP)
MULTIDIMENSIONAL OLAP (MOLAP)
The MOLAP cube structure allows for particularly fast, flexible data-modeling and
calculations. For one, locating cells is vastly simplified—an application can
identify a cell location by name (at the intersection of dimension members)
rather than by searching an index or the entire model as in a relational
database. Further, multidimensional models incorporate advanced array-
processing techniques and algorithms for managing data and calculations. As a
result, multidimensional databases can store data very efficiently and process
calculations in a fraction of the time required of relational-based products.

Relevant data must be transferred from relational systems, which is a potentially


“redundant” re-creation of data in another (multidimensional) database. Once
data has been transferred, there may be no simple means for updating the
MOLAP “engine” as individual transactions are recorded by the RDBMS. For
some IT departments, introducing a new database system is an anathema, even if
it means significantly greater productivity for the type of planning, analysis and
reporting that end-users rely on the (MOLAP) solution to perform.
Online analytical processing (OLAP)
RELATIONAL OLAP (ROLAP)
ROLAP works directly with relational databases and does not require pre-
computation. The base data and the dimension tables are stored as relational
tables and new tables are created to hold the aggregated information. It does not
require pre-computation. The base data and the dimension tables are stored as
relational tables and new tables are created to hold the aggregated information. It
depends on a specialized schema design. This methodology relies on
manipulating the data stored in the relational database to give the appearance
of traditional OLAP's slicing and dicing functionality.

In essence, each action of slicing and dicing is equivalent to adding a "WHERE"


clause in the SQL statement. ROLAP tools do not use pre-calculated data cubes
but instead pose the query to the standard relational database and its tables in
order to bring back the data required to answer the question. ROLAP tools
feature the ability to ask any question because the methodology does not limit to
the contents of a cube. ROLAP also has the ability to drill down to the lowest level
of detail in the database.
Online analytical processing (OLAP)
RELATIONAL OLAP (ROLAP)
One advantage of ROLAP over the other styles of OLAP analytic tools is that
it is deemed to be more scalable in handling huge amounts of data. ROLAP
sits on top of relational databases therefore enabling it to leverage several
functionalities that a relational database is capable of. ROLAP products enable
organizations to leverage their existing investments in RDBMS (relational
database management system) software. ROLAP products access a relational
database by using SQL (structured query language), which is the standard
language that is used to define and manipulate data in an RDBMS.

The processing via SQL statements is a drawback. SQL is the language of


relational tables. SQL’s vocabulary is limited and its grammar often inflexible,
at least to accommodate the most sophisticated modeling required for
multidimensional analyses. Before end-users can submit requests, the
relevant dimension data must be extracted and reformatted in de-
normalized structures known as star schema or snowflakes. These tabular
structures are necessary to provide acceptable analytical performance.
Online analytical processing (OLAP)
HYBRID OLAP (HOLAP)
The undesirable trade-off between additional ETL cost and slow query
performance has ensured that most commercial OLAP tools now use a "Hybrid
OLAP" (HOLAP) approach, which allows the model designer to decide which
portion of the data will be stored in MOLAP and which portion in ROLAP.
There is no clear agreement across the industry as to what constitutes "Hybrid
OLAP", except that a database will divide data between relational and specialized
storage. For example, for some vendors, a HOLAP database will use relational
tables to hold the larger quantities of detailed data and use specialized storage
for at least some aspects of the smaller quantities of more-aggregate or less-
detailed data.
HOLAP addresses the shortcomings of MOLAP and ROLAP by combining the
capabilities of both approaches. HOLAP tools can utilize both pre-calculated
cubes and relational data sources.
Online analytical processing (OLAP)
How is OLAP different from OLTP?
OLTP (On-Line Transactional Processing) is used to support day-to-day business
operations and is characterized by frequent data updates and contains the most
recent data along with limited historical data based on the retention policy driven by
business needs.
OLAP (On-Line Analytical Processing) is used for analysis purposes to support day-to-
day business decisions and is characterized by less frequent data updates and
contains historical data.
OLAP is a capability or a set of tools which enables the end users to easily and effectively
access the data warehouse data using a wide range of business intelligence tools.
Online analytical processing (OLAP)
BI SEMANTIC MODEL ROLE
The semantic model consists of a
network of concepts and the
relationships between those concepts.
Concepts are a particular idea or topic
with which the user is concerned.
A link between the reporting tool
interface and the physical data models. The semantic data model is a method of
structuring data in order to represent it in
a specific logical way. It is a conceptual
data model that includes semantic
information that adds a basic meaning to
the data and the relationships that lie
between them. This approach to data
modeling and data organization allows for
the easy development of application
programs and also for the easy
maintenance of data consistency when
data is updated.
Online analytical processing (OLAP)
BI SEMANTIC MODEL ROLE Data sources
10 10
01 01

Transform complex data into business- SQL Oracle IBM Teradata Sybase Azure Blob Data feeds Excel Files Others
friendly models
Combine data from multiple data sources
and apply business rules SQL Server Analysis Services
and security
BI semantic model
Match performance to the speed
of business
Explore models and gain instant insights
using your favorite visualization tool Data
modeling
In-memory
cache
Security Business logic
& metrics
Lifecycle
management

Visualize

Excel Power BI Third-party tools


Online analytical processing (OLAP)
BI SEMANTIC MODEL ROLE

Client Tools
Analytics, Reports, Scorecards,
Dashboards, Custom Apps

BI Semantic Model

Data Model

Personal BI Team BI Corporate BI


Business Logic
and Queries PowerPivot for Power BI Analysis
Excel or Power BI Services

Data Access

Data Sources
Databases, LOB Applications, OData Feeds,
Spreadsheets, Text Files
What is SQL Server Analysis Services
SQL Server Analysis Services (SSAS) is an enterprise grade analytical data engine used in decision
support and business analytics, providing the analytical data for business reports and client
applications such as Power BI, Excel, Reporting Services reports, and other data visualization tools.
A typical workflow includes authoring a multidimensional or tabular data model, deploying the model
as a database to Azure Analysis Services server or MS SQL Analysis Services (On-prem instance or
cloud), setting up recurring data processing, and assigning permissions to allow data access by end-
users. When it's ready to go, your semantic data model can be accessed by any client application
supporting Analysis Services as a data source.

Azure Analysis Services enables developers to create BI Semantic Models that can
power highly interactive and rich analytical experiences in BI tools (such as Power BI
and Excel) and custom applications.
What is SQL Server Analysis Services
When installing Analysis Services by using SQL Server Setup, during configuration you specify a server
mode for that instance. Each mode includes different features unique to a particular Analysis Services
solution.
Tabular Mode - Implement in-memory relational data modeling constructs (model, tables, columns,
measures, hierarchies).
Multidimensional and Data Mining Mode - Implement OLAP (On-Line Analytical Processing) modeling
constructs (cubes, dimensions, measures). SSAS allows you to build multidimensional structures
called Cubes to pre-calculate and store complex aggregations, and also to build mining models to
perform data analysis to identify valuable information like trends, patterns, relationships etc.

Visual Studio Data tools is the authoring tool for SQL Server
Analysis Services. It provides a set of usable templates that
can be used when creating a new project.
For SQL Server Analysis Services it contains the following
Templates:
What is SQL Server Analysis Services
What is SQL Server Analysis Services
TABULAR vs MULTIDIMENSIONAL

Tabular models are Analysis Services databases that run in-memory or in DirectQuery mode, accessing
data directly from backend relational data sources.
• In-memory is the default. Using state-of-the-art compression algorithms and multi-threaded query
processor, the in-memory analytics engine delivers fast access to tabular model objects and data
by reporting client applications such as Microsoft Excel and Microsoft Power BI.
• DirectQuery is an alternative query mode for models that are either too big to fit in memory, or
when data volatility precludes a reasonable processing strategy. In this release, DirectQuery
achieves greater parity with in-memory models through support for additional data sources,
ability to handle calculated tables and columns in a DirectQuery model, row level security via DAX
expressions that reach the backend database, and query optimizations that result in faster
throughput than in previous versions.
What is SQL Server Analysis Services
TABULAR vs MULTIDIMENSIONAL

Multidimensional are Analysis Services cube structures for analyzing business data across multiple
dimensions. It includes a query and calculation engine for OLAP data, with MOLAP (Multidimensional
OLAP ), ROLAP (Relational OLAP), and HOLAP (Hybrid OLAP) storage modes to balance performance
with scalable data requirements.
The storage mode of a partition affects the query and processing performance, storage requirements,
and storage locations of the partition and its parent measure group and cube. The choice of storage
mode also affects processing choices.
• For objects that use MOLAP storage, data is saved on disk in the database file folder.
• For ROLAP storage, processing occurs on demand, in response to an MDX query on an object.
• Partitions stored as HOLAP are smaller than the equivalent MOLAP partitions because they do not
contain source data and respond faster than ROLAP partitions for queries involving summary data.
HOLAP storage mode is generally suited for partitions in cubes that require rapid query response
for summaries based on a large amount of source data.
What is SQL Server Analysis Services
MICROSOFT SQL SERVER ANALYSIS
SERVICES ON-PREMISES INSTALLATION
• All Analysis Services instances are
installed with the SQL Server Installer;
• During the installation, users need to
choose the preferred Server Mode

AZURE ANALYSIS SERVICES SERVICE


• Login into the Azure Portal
• Create -> new Resource
• Analysis Services
• Set the details
What is SQL Server Analysis Services
PLATFORM AS A SERVICE (PaaS) Data Service
• Supports both on-premises and Azure Sources • Managed Service
• Easy integration with Power BI – Externalizing • Accessed from the Azure Portal &
compute Management Studio
Connectivity: Analysis Services

Security

Data modeling

In-memory
cache

Azure Analysis Services


Server

Lifecycle management Business logic & metrics


Developing Environment
Visual Studio 2022
• New Project -> Analysis Services Project
Analysis Services Tabular Project
• Introduced in SQL Server 2012 - Previously called “BISM” – BI Semantic Model
• Based on the xVelocity in-memory
analytic engine
• Uses column-store compression
• Calculations defined
using DAX
• Calculated Columns
• Calculated Metrics
• Calculated Tables
Traditional Data Warehouse Architecture

Sources Data Zone Information Zone

Operational Systems
Processing Layer Analytics Layer Visualization & Data Exploration
Files (flat,
xml,...)
Performance Layer Corporate Reporting
Landing Layer Data Warehouse Layer
(optional)

Operacional
Structured Analytical Models
Reporting
Databases

Self Service BI

Applications ODS or Staging Data Warehouse Data Marts Semantic Models


(ERP, CRM,…) Data Discovery

Advanced Analytics Data as a Service


Other
Data Integration - ETL Processes
Complex data to one Version of the Truth
Source data Semantic model
Complex raw data optimized Rich, business user friendly
for processing semantic model

Data
SQL Server Analysis Services

productidentifier
descriptionline1
descriptionline2 Product Id
qtyafterqtysales Product Name
familyidentifier Product Description Product Id
Category Shelf qty
numberofunitssoldtodate Return qty
remaningnumber Order qty
Numberofuntissoldtoday Sale qty
receivedback Category Id
Category Name
locatioidentifier Category Description
identifiertype Sub Category
fieldidentifier
Calculation groups: in Azure AS today!

Reuse DAX calculations to reduce complexity Sales Orders


Will ship in Azure AS SSAS 2019, Power BI Sales YTD Orders YTD
Premium (XMLA endpoint enablement Sales MTD Orders MTD
initially)
Sales QTD Orders QTD
Require new 1470 compatibility level
Sales 3 Month Avg Orders 3 Month Avg
Sales 12 Month Avg Orders 12 Month Avg
Sales Prev Year Orders Prev Year
Sales YoY % Orders YoY %
… …
Required Toolset for the classes

SQL Server 2022 Mng Studio Visual Studio Community Edition Microsoft AS project Template Power BI Desktop
Use: Query Database Use: Development Tools Use: Create Semantic Model Use: Create Power BI Report
Download SQL Server Management Visual Studio 2022 Community https://marketplace.visualstudio.co https://powerbi.microsoft.com/e
Studio (SSMS) - SQL Server Edition – Download Latest Free m/items?itemName=ProBITools.Mic n-us/desktop/
Management Studio (SSMS) | Version (microsoft.com) rosoftAnalysisServicesModelingProj
Microsoft Learn ects2022
Access to Azure
• Access the URL
https://myapps.microsoft.com/?tenantid=bd913315-e9b7-4c21-bbf2-
c927f71d5d22&login_hint=ixxxxx@students.isegexecutive.education

• User your Power BI student Account ex.


ixxxxx@students.isegexecutive.education
• Accept the access to Azure Portal resources
• You won’t be able to access the resource group DAE
Alexandre Baptista
abaptista@iseg.ulisboa.pt (email)
alexandre.baptista@faculty.isegexecutive.education (PBI)

www.isegexecutive.education

Você também pode gostar