Você está na página 1de 3

BUSINESS STATISTICS

Chapter 1: Data and Decisions


1.1
-

What are Data?

Businesses have always relied on data for planning/to improve efficiency and
quality
Now they rely on data to compete in the global marketplace
Companies use data to make decisions about virtually all phases of their
business-from inventory to website design
DATA WAREHOUSES: where data is recorded and stored electronically
Now the info in data warehouses is accessible and used to help make
decisions
BIG DATA: data sets so large that traditional methods of storage and analysis
are inadequate
Data alone cannot help you make better business decisions! You must be able
to summarize, model and understand what the data tells you
STATISTICS: collection of tools and its associated reasoning. It is a way of
reasoning, along with a collection of tools and methods, designed to help us
understand the world
What ARE statistics? (plural) quantities calculated from data
What ARE data? Data is the plural form. The singular is datum. Data are
values along with their context
Whenever you have data and a need to understand the world or make an
informed decision, you need statistics
Statistics can help us make the leap from a smaller sample of data we have at
hand to an understanding of the world at large
VARIATION: the essence of statistics. The key to learning data is
understanding the variation that is all around us
TRANSACTIONAL DATA: data collected for recording the companies
transactions
DATA MINING: also known as predictive analytics. The process of using data,
especially TD to make decisions and predictions
BUSINESS ANALYTICS: general term for data mining. Describes any use of
data and statistical analysis to drive business decisions from data whether
the purpose is predictive or simply descriptive
To make sense of data, we must understand its CONTEXT
Whether the data are numerical, alphabetic, or alphanumerical; we must
know what they represent
5 Ws: who, what, where, when why and How can provide a context for data
values and make them meaningful
DATA TABLE: a table with values that are organized
RESPONDENTS: individuals who answer a survey
SUBJECTS: people on whom we experiment. Also called participants
EXPERIMENTAL UNITS: animals, plants, websites and other inanimate subjects
RECORDS: what rows are called in a database.
The column titles (variable names) tell what has been recorded
A common place to find the WHO of the table is the leftmost column
The information about the data, called the METADATA, might have to come
from the companys database administrator or from the information
technology department of a company

METADATA: typically contains information about how, when and where (and
possibly why) the data were collected; who each case represents; and the
definitions of all the variables
SPREADSHEET: a general term for a data table. Comes from bookkeeping
ledgers of financial information
These days it is common to keep modest-size datasets in a spreadsheet even
if no accounting is involved
Although data tables and spreadsheets are great for relatively small data
sets, they are cumbersome for the complex data sets that companies must
maintain on a day-to-day basis-which is why various other database
architectures are used to store data e.g. relational database
RELATIONAL DATABASE: two or more separate data tables are linked together
so that information can be merged across them. Each data table is a relation
because it is about a specific set of cases with information about each of
these cases for all (or at least most) of the variables (fields)

1.2

Variable types

CATEGORICAL VARIABLE: When the values of a variable are simply the names
of categories
- Categorical responses could include: yes or no, very unsatisfied or unsatisfied
or satisfied, freshman or sophomore or junior or senior etc.
- In a purchase record, price, quantity and time spent on the website are all
quantitative values with units (dollars, count, and seconds)
- An essential part of a quantitative variable is its units
- DISTINCTIONS BETWEEN CATEGORICAL AND QUANTITATIVE VARIABLES SEEM
CLEAR BUT THERE ARE REASONS TO BE CAREFUL
- Some variables can be considered as either categorical or quantitative,
depending on the kind of questions we ask about them e.g. the variable AGE
would be considered quantitative if the responses were numerical and had
units. But on the other hand, a retailer might limp together categories like
child, teen, adult or senior. Then age would be a categorical value
- Area codes may look quantitative but are really categories.
- When the variable contains symbols other than numbers, the software will
correctly type the variable as categorical, but just because a variable has
numbers, it does not mean it is quantitative
Identifiers
- IDENTIFIER VARIABLES: categorical variables whose only purpose is to assign
a unique identifier code to each individual in the data set e.g. student ID
number, social security number and phone number
- Identifier variables make it possible to combine data from different sources,
protect confidentiality and provide unique labels
Other data types
- ORDINAL VARIABLES: when the values of a categorical variable have an
intrinsic order
- NOMINAL VARIABLES: when a categorical variable with unordered categories
- Ordering is not absolute; how the values are ordered depends on the purpose
of the ordering
Cross-Sectional and Time Series Data
- TIME SERIES: an ordered sequence of values of a single quantitative variable
measured at regular intervals over time [common in business]
- Typical measuring points are months, quarters, or years, but virtually any
consistently-spaced time interval is possible
-

CROSS-SECTIONAL DATA: where several variables are measured at the same


time point
E.g. if we collect data on sales revenue, number of customers, and expenses
for last month at EACH Starbucks at one point in time, this would be crosssectional data BUT it isnt a time series because it isnt measured at regular
intervals

1.3
-

Data Sources: Where, How and When

We MUST know who, what, and why to analyze data


If possible, wed like to know the where, how and when of data as well
HOW the data are collected can make the difference between insight and
nonsense
To make inferences from the data you have at hand to the world at large, you
need to ensure that data you have are representative of a larger group
Another way to collect valid data-perform an experiment in which you actively
manipulate variables (called factors) to see what happens
Sometimes, the answer to a question you may have can be found in data that
someone or some organization has already collected.
INTERNALLY: companies may analyze data from their own databases or data
warehouses. They may also supplement or rely entirely on data collected by
others e.g. via the internet
Unless the data were collected in a way that ensures that they are
representative of the population in which you are interested, you may be
misled
It is recommended to list some of the Ws of the data and offer a reference for
the source of the data
First step of data analysis: know WHY you are examining the data, WHOM
each row of your data table refers to, and WHAT the variables record

Tips:
- Dont label a variable as categorical or quantitative without thinking about
the data and what they represent
- Dont assume that a variable is quantitative just because its values are
numbers
- Always be skeptical

Você também pode gostar