Você está na página 1de 8

Project: International Expansion

Complete each section. When you are ready, save your file as a PDF document and submit it
here: https://classroom.udacity.com/nanodegrees/nd008/parts/91294931-aacb-4887-856f-
fd19fe915795/project#

Step 1: Key Decisions


Briefly explain the key decisions and the type of data that you need to conduct this analysis (250
word limit).

Key Decisions:
Answer these three questions

1. What decisions needs to be made?


Find a list of countries which are similar to United States based on economic,
demographic, education and environment.

2. What data is needed to inform those decisions? Please include 2 examples in each of
the following categories: Economic, Environment, Education
Economic: tax rate, number of companies operate in the same industry
Environment: how many environment bidding agreements the country has signed,
bureaucracy level in establishing business,
Education: average education level, expenditure spent on education, number of
graduate student,

Step 2: Explore and Cleanup the Data


Explore and cleanup your dataset. Data is provided in a CSV file for 215 countries with 77
variables (250 word limit)

Here are some guidelines to help you cleanup your data:


1. Country records where most of the variables missing might not be appropriate to be
included in the analysis. The lack of accurate reporting could indicate that these
countries are probably not similar to the United States. You should remove any country
with fewer than 25 missing data points. HINT: You should be left with 144 countries.
2. Some variables are closely related and may be candidates for variable reduction through
Principal Components Analysis.
3. Some variables seem irrelevant for the given analysis involving economy,
demographics, education, and environment. Which variables seem irrelevant?
Irrelevant variables are:

Answer these questions:


1. How many countries did you reduce your dataset to? Please include a bar chart of
number of non-null data points by country, sorted from most to least.

EA: 1. Required: A bar chart is included, but I couldn't


find the answer.

2. Which data categories will be used for Principal Components Analysis (PCA)? There
should be three categories that are targeted for PCA.
Three categories for the PCA are EA: 2. Required: You found two of the topics, but the
- Education (Education average years, Education PCT) third is wrong. Education average years and Education
- Economies PCT are correct. The last topic/category is also
3. Which variables did you decide to be irrelevant for this analysis? Only variables under education based.
the education, economic, and environment categories should be included. Hint: There
should be a total of nine variables removed from the dataset.
a Variable to be deleted Reason: Irrelevant
1 IC_FRM_ISOC_ZS The percentage of firm having
internationally recognized quality
certification is irrelevant in clustering the
area for the purpose of establishing
new store.
2 SG_VAW_BURN_ZS This variable is also irrelevant EA: 3. Required: IC_FRM_ISOC_ZS and other variables
3 SH_DYN_AIDS_ZS Irrelevant also shouldn't removed. We should only remove variables
4 SH_DYN_MORT Irrelevant that aren't from the Economy, Education or
5 SH_TBS_PREV Tuberculosis is irrelevant here Environment category.
6 SH_XPD_PCAP Total health expenditure is irrelevant
7 EG_ELC_ACCS_ZS Percentage of population with access to
electricity is not important here.
8 SE_XPD_TOTL_GD_ZS Government expenditure on education
is irrelevant
9 SH_MED_PHYS_ZS Number of physician is irrelevant

Step 3: Determine Clusters and Methodology


Determine the optimal clustering method and create four clusters. (100 word limit)

Answer this question:

1. What clustering method did you decide to use? Please justify your answer.
Because the manger want to see 4 cluster there for the K mean method will be chosen. This
method enable the user to determine how many cluster to use.
K mean

K median result
Neutral gas method

By comparing the result from 3 methods above. It seems that the neutral gas method is the best
EA: Awesome: Well done justifying the use of neural
one as
gas with the Calinski-Harabasz and Adjusted Rand
- It gives the highest range of Adjusted Rand Indices and CH index
indices.
- It has the most compact value and highest value in the box plot in both index at 4
cluster

Step 4: Run the Data and Visualize


Run the data through your clustering algorithm and visualize the clusters. (250 words limit)

Include at least 2 visualizations to show the clusters that you came up with. At least one of you
visualizations should be a Tableau map.

Answer this question.

1. Do the clusters make sense?


The cluster does make sense as the image below shows, USAs cluster (cluster 2) include
countries that have high economic, developed, high GDP, high education. Cluster 1 include
Russia, China, Venuzuela, Ukraine which are countries that somehow have connection in
Political and economy. Cluster 3 includes countries that are currently developing, emerging
economy such as Brazil, Vietnam, Mexico, Chile Cluster 4 includes countries that are
mainly in Middle East such as India, Pakistan, Yemen, Egypt, Kenya, Indonesia, Thailand,
.
EA: Suggestion: We should use a discrete color scale
for the clusters, not a continuous.

2. What are the four countries in USAs cluster that are closest to the USA in terms of Total
Tax Rate by ATM Machines? Hint: Create a scatterplot to graph the relationship
between these two variables and color the markers by cluster.

According to the image below, four countries in USAs cluster that are closest to the USA in
terms of Total tax rate and ATM machines are
- Great Britain (GBR)
- Australia (AUS)
- Japan (JPN)
- Austria (AUT) or DUE
EA: Suggestion: It's not obvious what
IC_TAX_TOTL_CP_ZS and the other label mean. We
should use better labels. Something like Total Tax Rate
would be easier to understand.

EA: Suggestion: We should use a discrete color scale


for the clusters, not a continuous.

Step 5: Recommendation
Provide your recommended list of countries and justify your recommendation using data from
your analysis (250 words limit)

Please list out the country codes in this section here with this format in alphabetical order.

..

Australia
Belgium
Canada
..

Answer this question:

1. Why did you decide to choose these countries? EA: Suggestion: We should include all countries from
Based on the cluster analysis on education, demographic, economic, these countries are worth the same cluster as USA. This is why we created the
considering more deeply to select the best countries for expanding as they bear the most clustering model.
similarity with USA It's up to the mangagement to make the list shorter.
- Australia
EA: Required: Some important countries are missing
- Canada from the list.
- France
- Finland
- Great Britain
- Holland
- Iceland
- Ireland
- Norway
- New Zealand
- Sweden
Before you Submit

Please check your answers against the requirements of the project dictated by the rubric here.
Reviewers will use this rubric to grade your project.

Você também pode gostar