Escolar Documentos
Profissional Documentos
Cultura Documentos
Senior Project
1
Abstract
shares to online retail growth rates and then overall mean expenditures with online mean
expenditures across various demographic groups. Our results show that people with
lower incomes seem to be moving more online. We also see that old people have
greater online presence than young people. Interestingly, when we examine whether or
not people have broadband, older and richer demographics seem to have greater online
presence.
Introduction
Prior to the commercialization of the internet in the late 90’s, U.S. retail
became more prominent, retail transactions began migrating online, giving birth to virtual
marketplaces where consumers could shop from the comfort of their web-browser.
According to the Census Bureau, the share of e-commerce sales in total U.S. retail
sales has risen approximately 5% in the last 7 years (Figure 1). While it may be the case
that faster speeds and a more accessible internet are the primary cause of this, we think
that there are other reasons for this growth. It’s likely that not all industries are growing at
this rate, and that some industries are going through a lot of change due to this steady
growth.
2
Figure 1: This graph shows the growth of e-commerce sales as a share of total retail sales in the US. Although this graph is from
Statista, each quarterly share is from the Census Bureau.
In tandem with the constant and steady growth of e-commerce over the past 7
years, Figure 2 shows us that online retail revenue and physical retail revenue has
steadily increased for automotive parts. Online retail also is a small part of total sales.
Figure 2: This graph shows automotive parts industry’s physical retail revenue growth and online retail revenue growth. It seems that
online retail is still a small part of all sales.
However, in figure 3, we see a totally different picture for men’s clothing. Although
there is almost no physical retail growth in this industry, the online retail growth is
3
substantial. As opposed to just 5 years ago, online retail is larger than brick and mortar
for men’s clothing. These two different trends show that in different industries, the impact
Figure 3: This graph shows men’s clothing industry’s physical retail revenue growth and online retail revenue growth. It seems that
online retail has overtaken physical retail.
We believe that e-commerce has varying impacts across industries because
further to say that whether or not a person has broadband internet has an impact on their
shopping behavior.
We see that connection speeds are growing in the US and believe that a person
with broadband will be more likely than a person without it to spend money online. It may
be the case that having broadband allows for faster speeds which makes shopping more
convenient or that people who have broadband spend more of their time online and end
up shopping online as well. Unfortunately, we could not find any data to directly support
these arguments.
4
Figure 4: Average internet connection speed in the US from 2007 to 2017
Research Question:
What demographic factors are driving e-commerce growth for different products? Are
people with broadband more likely to spend money online? Do people with broadband
Hypothesis:
We expect that age and income will have an effect on online spending behavior.
Specifically, we expect the spendings of people of younger age and lower income to
have a positive relationship with the growth of e-commerce. We also expect to see that
younger and poorer people with stronger internet connections are the groups that spend
5
Data Collection
We gather data from three sources: IBISWorld (IBIS), the Consumer Expenditure
Survey (CE) from the Bureau of Labor Statistics, and comScore Web Behavior Database.
IBIS provides Industry Market Research with different specifications and multiple
datasets. Specifically for our purposes, we use the Online Retail dataset under US
Specialized Industry Reports. From this data set, we extract the annual online retail
From CE’s 2016 data, we use three datasets all under the Average expenditure,
share, and standard error tables section: Age of reference person, Deciles of income
before taxes, and Population size of area of residence. Each of these datasets provided
us with the amount a particular consumer group spent on average on a specific product
category. For example, in the Age of reference person dataset, we have the average
amount a person under the age of 25 spent on Medical Supplies in 2016 to be 45 dollars.
In particular, we used the share each of these values were of the demographic group’s
total income. So in this case, we were interested in the data telling us that this spending
on Medical supplies was 0.1 % of the total income of an average person under the age of
25.
From CE’s 2011 and 2012 data, we looked at Aggregate expenditure share tables
for two datasets: Age of reference person and Income before taxes. Each table we use
the proportion a demographic group spent on a specific product category. For example,
6
in the Age of reference person dataset for 2012, we see that in 2011, approximately $17
billion was spent on Medical Supplies and that people under 25 were responsible for
2.2% of this.
We also use expenditure mean data for Population size of area of residence for
online buying behavior of US households, which are represented by unique machine IDs.
We looked at comScore’s 2011 and 2012 Transactions data to use product total price and
product category. From Demographics data, we get age of oldest head of household,
household income, zip code and connection speed. We match these two datasets using
machine IDs included in each year. It should be noted that only approximately 42% and
47% of the machine IDs listed actually had any transaction in 2011 and 2012, respectively.
Overall Expenditure Shares (CE) & Online Retail Growth Rates (IBIS)
Before examining actual online buying behavior potentially described by the
comScore data, we use CE’s average expenditure shares and IBIS from 2016 to see how
the consumer buying behavior and the growth of certain online retail industries are
related. First, we matched the industries from IBIS to the product categories in CE, as
seen in Table 1. We then decided to examine three demographic groups - Young vs. Old,
Poor vs. Rich, and Rural vs. Urban people - using the relevant Average expenditure,
7
Categories
IBISWorld (IBIS) Consumer Expenditure Survey (CE)
Automotive Parts & Accessories Maintenance and repairs
Baby & Infant Apparel Children under 2
Beer, Wine, & Liquor Alcoholic beverages
Camera & Camcorder Other entertainment supplies, equipment, and services
Children's Toys Toys, hobbies, and playground equipment
Eyeglasses & Contact Lens
Medical Supplies
Medical Supplies
Event Ticket Fees and admissions
Computer & Tablet
Flower Miscellaneous household equipment
Hardware & Tools
Grocery Food at home
Home Furnishing Household furnishings and equipment
Household Furniture Furniture
Jewelry & Watch Other apparel products and services
Large Kitchen Appliance Major Appliances
Men's Clothes Men, 16 and over
Perfume & Cosmetic Personal care products and services
Pet food & Pet Supplies Pets
Shoes Footwear
Vitamin & Supplements Drugs
Table 1: We match possible overlapping categories across both using the glossary in the CE and descriptions from IBISWorld. Some
categories in CE are a collection of miscellaneous goods to a certain big investment. For example, ‘Miscellaneous household
equipment’ includes multiple products that one will buy for their home, but because it is under a general ‘Housing’ category, it is
treated as miscellaneous. We bolded these categories that were defined by a collection of categories from IBISWorld.
For Young vs. Old, CE divides its consumers into different age thresholds (e.g
under 25 years, 25 to 34 years, 65 years and older). We defined being under 35 years
old as ‘Young’ and being over 54 as ‘Old.’ In order to calculate the expenditure shares
for Young and Old people, we take their weighted averages. For example, for Young
people, we first add the number of consumers under 25 and between 25 to 34:
Then we proceed with the weighted average calculation. We’ll use alcoholic beverages
as our example:
U nitsunder 25 U nits25 to 34
S hareyoung, alcohol = U nitsyoung · S hareunder 25, alcohol + U nitsyoung · S hare25 to 34, alcohol
8
After calculating all the expenditure shares of each product category for Young
and Old people, we now calculate the percent change in revenue from 2016-2017 for the
above categories. For example, online baby & infant apparel sales grew 7.64 % from
After calculating revenue changes from 2016-2017 for all our product categories,
we then want to compare mean expenditure shares for Young and Old people. First we
take the differences between the weighted averages for the expenditure shares of
Young and Old for each product category. Then we standardize the differences by
finding a z-score. From this, we create a scatterplot of online retail revenue growth rates
as the dependent variable and the z-scores as our independent variable. Finally, we
We repeat the aforementioned steps for Poor vs. Rich people, with ‘Poor’ being
those who are in the lowest three deciles and ‘Rich’ being those who are in the highest
three deciles. Once again, we repeat the same process for people in urban and rural
areas. ‘Urban’ is defined as someone who lives in a Metropolitan Statistical Area, or an
area with a population above 2,500. ‘Rural’ is defined as someone who lives in an area
that is not a Metropolitan Statistical Area and has a population below 2,500.
Now we use the comScore transaction data to calculate the average spending of
Young vs. Old and Poor vs. Rich on the relevant product categories in 2011 and 2012.
9
First we match the comScore product categories with the CE ones as seen below
in Table 2.
thresholds are slightly different across CE and comScore datasets. ‘Poor’ in CE is defined
by someone with an income less than $20,000 per year, but in comScore it is defined by
10
someone with an income of less than $25,000 per year. ‘Rich’ is defined in CE by
someone with an income of more than $70,000 per year, but in comScore it is defined by
someone with an income of more than $75,000 per year. We believe the differences are
To calculate average spending, we put each person in his relevant group (i.e
Young or Old, Poor or Rich). We then sum the product total prices of every transaction in
the group for each product category and divide the sum by the number of unique
the percentage of expenditure accounted for by online purchases). First we need to
obtain average product category expenditures for our demographic groups above from
the relevant CE Aggregate expenditure share tables. Since we are using data from
aggregate rather than average expenditure share tables, we do the following: for ‘Young’
people, we sum the expenditure shares of a product category for those under 25 years
and between 25 to 34; we also sum the consumer units for those categories. Then we
multiply the combined share by the aggregate annual expenditures for the product
category. This gives us the amount ‘Young’ people spent on the product category.
Finally we divide this amount by the number of consumer units who are ‘Young’ to obtain
the average amount spent on the product category by a ‘Young’ person. We repeat this
Finally we divide our comScore averages by our CE averages. This yields the
11
demographics, we graph e-commerce share of product category expenditures according
to years and contrasting demographics. With these proportions, we want to see visually
how e-commerce has grown over the two years, with the demographics split up by color,
and also how each demographic spends, with the years split up by color. The first graph
can show us how e-commerce has grown for each product category, and perhaps which
demographic spends more on the ones that are growing. The second graph is more
certain product online. We use a 45 degree line in each graph to see which side the
points are leaning towards to determine which demographic or year has more spending.
In order to address the issue of broadband connection, we examine the impact of
broadband connection on online spending from two angles: 1) whether or not having
broadband makes a consumer more likely to spend online and 2) how having broadband
We first want to see how broadband affects the likelihood a person spends online.
To start, we determine whether or not a person spent a positive amount online by
creating an expenditure dummy variable T where T = 1 if a person spent more than $0 on
has broadband, 0 if not), young (1 if a person is younger than 35, 0 if older than 64), and
poor (1 if a person makes less than 25k per year, 0 if he makes at least 75k). Since our
12
dependent variable is a dummy variable, we utilize a logit model to estimate the
The logit model will give us parameters that maximize the probability of getting the data
we observed.
T i = β 0 + β 1 connection_speed
T i = β 0 + β 1 connection_speed + β 2 young
T i = β 0 + β 1 connection_speed + β 3 poor
To help us form a hypothesis for Part 2 about whether or not having a broadband
connection causes people to spend more online, we first identify the people that made
transactions in both 2011 and 2012. Among those that spent money in both years, we
identify a total of 34 people that changed connection speeds from 2011 to 2012, whether
it be switching to or losing broadband, and assign them to the relevant group. Within the
two groups, we aggregate the group’s spending for each month and divide by the
number of people in the group. For example, we sum the January 2011 spending of every
person who switched to broadband and divide that sum by the number of people who
switched to broadband. We then plot average spending for each month, with t = 1 and
13
Figure 5: Average monthly expenses for those that changed connection speeds between 2011 and 2012. Vertical dotted line
represents t=13 (January 2012). No person changed connection speed within a year.
There does not appear to be any pattern before and after changing connection
month. Each person has at most 24 observations: his average expenses in January 2011,
February 2011, March 2011, up to and including his average expenses in December 2012.
14
We run the following regressions:
mo_avg_expi = β 0 + β 1 connection_speed
Empirical Results
The first part of our research was looking at the relationship between overall
spending and online growth. After matching the categories above, we ran a regression
across the growth rates of the product categories to the z-scores of differences in
demographic groups. The following tables tell us the strength of the relationships
Each point represents the relationship between the differences in urban and rural
(as the x-variable) and growth rates (as the y-variable) for each of the products. To
interpret this result, we can take a look at the graph. If it were true that spending of
people living in rural areas had a relationship with e-commerce growth, we would see a
positive trend line with rural people spending more on the products that have the higher
online retail growth rates. A negative trend line will tell us that people in urban areas are
spending more on those products. From this regression, however, it is clear spending
15
Figure 6: This is the result of our regression on growth rates with differences in spending by location (rural - urban). Each point on the
graph is defined by a product category.
Rural - Urban
-0.0005 0.00907 0.96027
z-score
Table 3: Regression coefficients with SE for difference in location.
Our Rural-Urban z-score coefficient is not statistically significant.
Overall spending based on location has no relationship with growth rates. With a
very high p-value and a horizontal line of best fit there is clearly no correlation here. This
result still does not mean that location has no relevance for the growth of e-commerce,
but only that there is no relevance to how overall expenditure of rural or urban
demographics are affecting the growth of e-commerce. Overall, neither of rural or urban
are spending more than the other on products that are showing relatively stronger
growth online.
16
Figure 7: This is the result of our regression on growth rates with differences in spending by age (young - old). Each point on the
graph is defined by a product category.
Young - Old
0.00735 0.00929 0.4397
z-score
high. So how young or old consumers spend their income on products overall has little or
no relationship with how much those products are growing online. It may be the case still
that age has relevance to how people spend online. It is also important to point out that
this means that we are comparing how young people spend different portions of their
income on certain products to how old people spend different portions of their income
on those same products. Later on we use shares of aggregate expenditure, and the
17
Figure 8: This is the result of our regression on growth rates with differences in spending by income (poor - rich) populations’
spendings. Each point on the graph is defined by a product category.
can say that people of lower income brackets spend more on the products that are
growing in online retail than people of higher income brackets do. While it is no causal
Looking at growth rates with overall average expenditure shares only gives us a
vague idea for the relationship between consumer spending and the growth of
e-commerce. So the age and location effects may have stronger relationships with online
growth of certain products when analyzing them through different methods. The same
can be said for the strong relationship we found between income and e-commerce
18
growth, in that there may not be a strong relationship after all. For stronger evidence, we
took a look at the online expenditure and compare with overall expenditure.
expenditures for both online (comScore) and overall (CE). With this we found values that
tell us the share of average overall spending that average online spending has. We can
look at two things here: 1. The contrast between years to see which demographics are
driving the growth more and 2. The contrast between demographics for both years to
see which demographics are buying more of the same goods online. We do this by
drawing a 45 degree line to see which side each point is leaning towards. The following
graphs are grouped by the demographics we looked at. For age and income we were
able to show the differences across years, but for census region we did not group
Figure 9: This shows a comparison between years. The (x, y) is (old for 2011, old for 2012) and (young for 2011, young for 2012).
19
Figure 10: This shows a comparison between demographics. (x, y) is (young for 2011, old for 2011) and (young for 2012, old for 2012).
From Figure 9, we can tell that there has been growth for most of the categories
and that there is no real significant difference in whether or not the older or younger
people are driving the growth. In Figure 10, however, it is clear that older people spend
more online on almost all the categories. Other than books, older demographics spend a
Figure 11: This shows a comparison between years. The (x, y) is (poor for 2011, poor for 2012) and (rich for 2011, rich for 2012).
20
Figure 12: This shows a comparison between demographics. (x, y) is (poor for 2011, rich for 2011) and (poor for 2012, rich for 2012).
We see that in Figure 11, all products are growing for rich people from 2011 to
2012. Almost all products are growing for poor people as well, but what is interesting is
that even from this graph we can tell poorer people are spending more online because
of how all the red (rich) points are gathering near the origin and how the green (poor)
points are clustered in the bottom corner. As we see in the second graph, this is seen
clearly as nearly all product categories are leaning towards the poor side.
From this analysis, we see again that poor people have a clear relationship with
online spending. With this, we can say that poor people definitely spend more of their
income on the categories that are growing online stronger and also spend more of their
total expenditures online. We also see that older people are spending more of their total
expenditures online. While this is by all means just a simple graphical analysis, we have
enough evidence from the first analysis of growth rates to overall expenditures and this
one to believe that people of lower incomes have an effect on e-commerce growth.
21
Examining Broadband Connection
To examine broadband connection, we want to measure two effects: 1. If having
broadband made people more likely to spend online and 2. How having broadband
affects online consumer expenditure. To see the first effect, we incorporate a logit model
to find the probability a person may purchase online. Table 6 shows our result for each of
Table 6: Regression of expenditure_dummy on connection_speed, young, poor, and interaction between young and poor.
It seems that having broadband is significantly associated (p-value ≤ 0.001) with
making an online transaction. Here we regress to capture the effect of broadband, and
change add variables such as age and income to see how broadband affects these both
mutually and together. It seems that with broadband, older people were more likely to
spend online, which is in line with our result in our previous analysis of comparing online
22
to overall expenditure. However, it seems here that richer people, not poorer people, are
more likely to spend online with broadband. Until this point in our research we have seen
how heavily in favor of poor people spending more online our analyses were. We will
1
P r(T i = 1) = 1+e−(Xβ)
where X β = β 0 + β 1 x1 + β 2 x2 + ... + β k xk
Probability of transacting
Table 7: Probability of spending money online for person with specific demographic characteristics..
Here we see that an older, richer person with broadband has nearly an 80%
probability that they will spend online. Each kind of consumer seems to be more likely to
spend money if they have broadband. Next we want to see how broadband changes
23
To measure the effect of broadband on online expenditure, we use a linear
regression model with the same variables as our analysis measuring the probability of
Table 8: Regression of monthly average expenses on connection_speed, young, poor, and interaction between young and poor.
It seems that having broadband is significantly associated (p-value ≤ 0.001) with
itself on average causes a person to spend $27.24 more than someone without, it is
quite alarming how negative of an effect being poor has. We see that older and richer
people with broadband spend the most online again, but from our previous analyses we
24
It is most likely that because the way we analyzed these effects of broadband, on
both the probability of spending and monthly spending, we see that poor people are
spending less. In our previous analyses we calculated averages and found shares. We
captured only the expenditures relative to each demographic’s income, but not relative
to total spending. However, in this analysis we measure both parts in reference to total
spending. It is likely that the average rich person spends more than the average poor
person, so it makes sense that here the rich has stronger spending characteristics.
_____________________________________________________________________
Conclusion
it was much more interesting to examine the comprehensive differences and effects of
each measure as the products different demographics were spending on seemed to tell
While we have no concrete evidence to say that e-commerce is being driven by
certain demographics, we have some data to show that it may be the case. In our first
analysis, we saw that poorer people were spending more of their income on the goods
that were growing in online retail. With this result, we also found that poorer people were
spending more online as a share of their total expenditure on nearly all products that we
analyzed. We also found that older people were spending more online from their total
expenditure. With this we were able to say that it is highly likely that poor people spend
more online and are to a certain extent driving the growth of e-commerce. In our last
analysis we saw that with broadband, older people were even more likely to spend
25
online and spent more on average than younger people. We also saw that rich people
were more likely to spend online and spent more on average than poor people. This
effect may be because older people are generally richer, but we still saw that all
demographics were more likely to spend online with broadband connection. While we
have no further evidence to show that broadband definitely brings about more spending,
we can say that there is a strong correlation between online expenditure and broadband
connection.
What we wanted to include in our final analysis was location. From the start we
believed that location may have an impact on online spending for multiple reasons. It
may be that rural people buy products online because stores are too faraway or that
urban people shop online because distribution centers are closer to urban areas. Given
that comScore provides zip codes for each machine ID, we attempted to include this in
our analysis. CE provides 2011 and 2012 annual expenditure means tables for urban and
rural spending on different product categories. We were also able to find a list of urban
areas and their five-digit zip codes from the US Census Bureau. However, we could not
match them with comScore due to possible privacy restrictions that resulted in them
producing many four-digit zip codes. As we could not just exclude the ones without zip
codes for accuracy of the research, we could not produce anything else from our urban
and rural demographics. Analysis on this may be enlightening for future research.
26
Works Cited
1, 2017.
http://clients1.ibisworld.com/reports/us/specializedreports/home.aspx, accessed
https://www.census.gov/retail/ecommerce/historic_releases.html, accessed
September 7, 2017.
● Quarterly share of e-commerce sales of total U.S. retail sales from 1st quarter 2010
https://www.statista.com/statistics/187439/share-of-e-commerce-sales-in-total-us-r
27