Escolar Documentos
Profissional Documentos
Cultura Documentos
Complete each section. When you are ready, save your file as a PDF document and submit it
here: https://classroom.udacity.com/nanodegrees/nd008/parts/91294931-aacb-4887-856f-
fd19fe915795/project#
Key Decisions:
Answer these three questions
2. What data is needed to inform those decisions? Please include 2 examples in each of
the following categories: Economic, Environment, Education
Economic: tax rate, number of companies operate in the same industry
Environment: how many environment bidding agreements the country has signed,
bureaucracy level in establishing business,
Education: average education level, expenditure spent on education, number of
graduate student,
2. Which data categories will be used for Principal Components Analysis (PCA)? There
should be three categories that are targeted for PCA.
Three categories for the PCA are EA: 2. Required: You found two of the topics, but the
- Education (Education average years, Education PCT) third is wrong. Education average years and Education
- Economies PCT are correct. The last topic/category is also
3. Which variables did you decide to be irrelevant for this analysis? Only variables under education based.
the education, economic, and environment categories should be included. Hint: There
should be a total of nine variables removed from the dataset.
a Variable to be deleted Reason: Irrelevant
1 IC_FRM_ISOC_ZS The percentage of firm having
internationally recognized quality
certification is irrelevant in clustering the
area for the purpose of establishing
new store.
2 SG_VAW_BURN_ZS This variable is also irrelevant EA: 3. Required: IC_FRM_ISOC_ZS and other variables
3 SH_DYN_AIDS_ZS Irrelevant also shouldn't removed. We should only remove variables
4 SH_DYN_MORT Irrelevant that aren't from the Economy, Education or
5 SH_TBS_PREV Tuberculosis is irrelevant here Environment category.
6 SH_XPD_PCAP Total health expenditure is irrelevant
7 EG_ELC_ACCS_ZS Percentage of population with access to
electricity is not important here.
8 SE_XPD_TOTL_GD_ZS Government expenditure on education
is irrelevant
9 SH_MED_PHYS_ZS Number of physician is irrelevant
1. What clustering method did you decide to use? Please justify your answer.
Because the manger want to see 4 cluster there for the K mean method will be chosen. This
method enable the user to determine how many cluster to use.
K mean
K median result
Neutral gas method
By comparing the result from 3 methods above. It seems that the neutral gas method is the best
EA: Awesome: Well done justifying the use of neural
one as
gas with the Calinski-Harabasz and Adjusted Rand
- It gives the highest range of Adjusted Rand Indices and CH index
indices.
- It has the most compact value and highest value in the box plot in both index at 4
cluster
Include at least 2 visualizations to show the clusters that you came up with. At least one of you
visualizations should be a Tableau map.
2. What are the four countries in USAs cluster that are closest to the USA in terms of Total
Tax Rate by ATM Machines? Hint: Create a scatterplot to graph the relationship
between these two variables and color the markers by cluster.
According to the image below, four countries in USAs cluster that are closest to the USA in
terms of Total tax rate and ATM machines are
- Great Britain (GBR)
- Australia (AUS)
- Japan (JPN)
- Austria (AUT) or DUE
EA: Suggestion: It's not obvious what
IC_TAX_TOTL_CP_ZS and the other label mean. We
should use better labels. Something like Total Tax Rate
would be easier to understand.
Step 5: Recommendation
Provide your recommended list of countries and justify your recommendation using data from
your analysis (250 words limit)
Please list out the country codes in this section here with this format in alphabetical order.
..
Australia
Belgium
Canada
..
1. Why did you decide to choose these countries? EA: Suggestion: We should include all countries from
Based on the cluster analysis on education, demographic, economic, these countries are worth the same cluster as USA. This is why we created the
considering more deeply to select the best countries for expanding as they bear the most clustering model.
similarity with USA It's up to the mangagement to make the list shorter.
- Australia
EA: Required: Some important countries are missing
- Canada from the list.
- France
- Finland
- Great Britain
- Holland
- Iceland
- Ireland
- Norway
- New Zealand
- Sweden
Before you Submit
Please check your answers against the requirements of the project dictated by the rubric here.
Reviewers will use this rubric to grade your project.