Getting A Loan Approval

TOWSON UNIVERSITY
COSC757 SPRING 2001
PROJECT REPORT
ON
GETTING A LOAN APPROVAL
Instructor: Dr. Ramesh K. Karne
Prepared by :
Bohui Qi
Jianping Du
Jin Guo
Lanlan Wang
Yi Yu
Ying Zhang
CONTENTS
1 Data Mining Tool 3
1.1 How to Select Data Mining Tool? 3
1.2 Which Tool do we Select? 4
2 Application Example Chosen 4
2.1 Project Description 4
2.2 Project Implementation 4
3 Preparing Data 5
3.1 Select Appropriate Data for Mining 5
3.2 Perform Data Preprocessing 5
3.3 Perform Data Reduction and Projection 5
3.4 Data List 6
4 Mining Experiment 6
4.1 Conversion of Input Data 6
4.2 Algorithm for Data Mining 6
4.3 Procedures 7
5 Mining Results 7
6 Additional Input 8
6.1 Data Generalization 9
6.2 Model Build and Estimated with Holdout Method 9
6.3 Model Build and Estimated with Cross-Validation 9
7 Additional Mining Results 9
7.1 Test Decision Tree by New Data 10
7.2 Tree Pruning 10
8 Mining Technique Used in the Tool 11
9 Mining Technique Details and Information 12
10 Critical Evaluation of the Mining Technique Used 12
11 Visualization Technique Used in the Tool 14
12 Visualization Technique Details and Information 14
13 Setting Up Environment for the Tool -- Weka 16
13.1 Where to Download? 16
13.2 How to Set Up? 16
13.3 How to Use? 17
13.4 System Environment 18
13.5 Evaluation 18
13.6 Attachment 18
14 Conclusions 20
Appendix 1 21
Appendix 2 24
Appendix 3 45
Appendix 4 66
COSC757 Team Project Paper Page 2 of 80 Spring 2001

Appendix 5 68
Appendix 6 71
Appendix 7 74
Appendix 8 (project proposal) 77

Data Mining Project Report on Getting a Loan Approval
Bohui Qi, Jianping Du, Jin Guo, Lanlan Wang, Yi Yu, Ying Zhang
Department of Computer Science
Towson University
1. Data Mining Tool

1.1 How to Select Data Mining Tool?
With the proliferation of data warehouses, data mining tools are flooding the market. Their
objective is to discover the hidden gold in your data. Many traditional report and query
tools and statistical analysis systems use the term "data mining" in their product
descriptions. What is a data-mining tool?
The ultimate objective of data mining is knowledge discovery. Data mining methodology
extracts hidden predictive information from large databases. With such a broad definition,
however, an online analytical processing (OLAP) product or a statistical package could
qualify as a data-mining tool.
Data mining methodology extracts hidden predictive information from large databases.
That's where the technology comes in: for true knowledge discovery a data mining tool
should unearth the hidden information automatically. By this definition data mining is
data-driven, not user-driven or verification-driven.
One way to identify a true data-mining tool is by evaluating how it operates on the data: is
it manually (top-down) or automatically (bottom-up)? In other words, who originates the
query, the user or the software?
There are two concerns driving the selection of the appropriate data-mining tool your
business objectives and your data structure. Both should guide you to the same tool.
Consider following questions when evaluating a set of potential tools:
Is the data set heavily categorical?
What platforms do your candidate tools support?
Are the candidate tools ODBC-compliant?
What data format can the tools import?

No single tool is likely to provide the answer to your data mining project. Some tools
integrate several technologies into a suite of statistical analysis programs, a neural
network, and a symbolic classifier.
1.2 Which Tool do we Select?
We found a lot of data mining tools on the web. But most of them are not free for use and
we are also not familiar with them due to no detailed information available. Weka is a
good tool for us because of its easy use and it can be easily downloaded for free. We have
a book/manual about how to use this tool.
2. Application Example Chosen

With the fast development of computers and networks, we have entered the Age of
Information. Various business, scientific, and governmental organizations around the
world generate an enormous volume of data everyday. In order to analyze and discover
the hidden gold in such overwhelming amounts of data, scientists have developed
automated computer systems for intelligent data analysis - data mining - on these data.
The ultimate objective of data mining is knowledge discovery. Data mining methodology
extracts hidden predictive information from a large database. In addition, for true
knowledge discovery a data mining tool should unearth hidden information automatically.
Different data mining methods are suited best for different applications. There are now
many commercially available data mining tools. Thus, it is very important to have the
right data mining tools to study the system of interests.
2.1 Project Description
The goal of our project is to display patterns of the amount of loan approval in different
groups in age, income, credit history, and home ownership, etc. The objective of this
project is to use data mining tools to address what facts are and how they affect getting the
approval of a persons application of a certain amount of a loan. From the view of
business values, wed like to set some rules of identifying the critical factors and to what
extent they affect the amount of the credit line approved for credit card companies,
especially for the new startups. Its important and has commercial benefits in real
business world today to attract more and more potential and valuable customers, enlarge
market shares in the industry, and minimize the financial risks for the credit card
companies.
2.2 Project Implementation

For our project, we first organized the database, generated a set of data with attributes of
age, income, credit history and home ownership, etc. All these data sets is prepared in an
excel table and assessed of the structure. Based on objectives and the data structure, we
evaluated several data mining tools and chose the one that is best (Weka data mining tool
sets) suitable to mine this application, getting a loan approval. Finally, after observing and
analyzing the new knowledge, we validated the findings. We concluded the roles of
different intervals in each category in the amount of loan approval, which interval in each
category has the highest approved credit line amount, and the most critical factors in all
categories for the highest amount of credit line approval. We discussed the results of the
analysis with some experts to ensure that the findings are correct and appropriate for the
business objectives.
The following sessions describe all details of our project.
3. Preparing Data
3.1 Select Appropriate Data for Mining
Due to the large set of data, it is more effective to choose meaningful data for data mining.
After our group members discussed several times, we chose the interesting data mining
topic of credit card application approval.
3.2 Perform Data Preprocessing
For this project, we referred the raw data from the website
(ftp://ftp.ics.uci.edu/pub/machine-databases/credit-screening) that introduced by
instructor Dr. Karne. It is a credit card application approval database in UCI machine
learning center.
Data preprocessing is an important step in the data mining process. The preprocessing step
is necessary to resolve several types of problems that frequent occur in large data sets.
These problems include noisy data, redundancy data, and missing data values, etc.
Preprocessing consists of data cleaning and missing value resolution. Database records
often contain fields with bad or useless information. We do data cleaning by discarding
meaningless attributes and resetting some attribute records using clear numeric variables.
3.3 Perform Data Reduction and Projection
Determining useful features in the dataset may further reduce the size of selected dataset.
Often there exist huge amounts of duplicated values in large databases, which are not what
were interested in and they slow down the speed of mining process. So we reduce some

sensor-recorded data that frequently contains long stretches of uninteresting data with no
exciting patterns. Reducing these data is more desirable and efficient.
Data projection determines the best means to represent discovered information. We
transformed some key attribute values so as to make them more reliable.
After above steps, we got the data for our project, data mining of credit card application
approval prepared. We selected four attributes for class application approval as follows:
Credit history
Age
Income
House owner
Please refer to data list in the appendix for details. For each application, credit loan is
granted in following five recommend levels:
$0
$5,000
$10,000
$20,000
$50,000
By using the methods of attribute relevance analysis and our 87 pretest instances, we
calculated Gain(A) as fellows:
Gain(History) = 0.5210; Gain(Age) = 0.2143; Gain(Income) = 0.1926; Gain(House
Owner) = 0.1085 . We think that all these attributes have good meaning.
3.4 Data list
Please refer to the appendix.
4. Mining Experiment
4.1 Conversion of Input Data
Before starting mining the data, we had to convert the data file to ARFF format since
Weka only expects data to be in that format. The data file we found is in Excel format, so
we followed the direction of how to convert data stored in Excel to ARFF format, and
completed it successfully.
4.2 Algorithm for Data Mining
Before starting the experiment, we need to specify the knowledge we want to extract,
because the knowledge specificity determines what kind of mining function to be chosen.

In our project, we want to learn what kind of credit line should be recommended to a new
applicant by mining a set of classified examples found in real world. That is a categorized
problem, therefore we decide to use decision tree, the one of the basic techniques for data
classification, to represent the knowledge that would be minded.
4.3 Procedures
Classification is a form of data analysis, and it can be used to extract models describing
important data class or make future prediction. Through this mining experiment, we built
a decision tree in order to get some classification rules and use them to predict what
amount of credit line should be given to a new applicant.
Data classification is a two-step process. In the first step, a model is built by analyzing a
set of training data through the classification algorithm. This is a learning step because
the learnt model is actually a set of classification rules, and people try to use these rules to
categorize the new data. The second step is to estimate the predictive accuracy of the
model. If the accuracy is considered acceptable, the model can be used to classify future
object which class is unknown. We took the following procedures for the project.
4.3.1 Module Building
In order to build a model to classify the data, we select a set of training data. There are
five attributes remaining after data preparation. First, we chose an attribute named
recommended as class label attribute since we want to learn the proper credit line given
to a custom. Second we created a training data set by selecting 96 tuples randomly based
on customer age. The training data were analyzed by decision tree mining algorithm after
tuples selecting. The learned model was presented in the form of decision tree shown in
Appendix 4.
4.3.2 Increasing the Data Size
General speaking, we can get better classifier if the size of training data is getting large, so
we increased total training samples up to 811 and obtained the accuracy of the
classification rule shown in Appendix 5.
5. Mining Result
We observed our result, and found several facts.
The first part of Appendix 5 is a decision tree in textual form. There are seven
levels in the tree. The first level is split on history attribute, and the second split

on income and house-owner respectively, and so on. The bottom level is split on
age attribute.
Under the tree structure, there are 37 leaves nodes represent class distributions,
and the size of the tree is 72, which represent the total number of nodes in the tree.
Each node denotes a test on an attribute, and branches represent an outcome of the
test.
The last section shows that 798 instances were classified correctly and 13 ones
were misclassified. The correct percentage of classification on test data set is
98%.
The sum of the underline numbers shown in Confusion Matrix is equal to the
number of correctly classified instances, and the sum of the rest numbers is the
total number of misclassified instances.
To make analysis easier, a diagram of decision tree was drawn as Appendix 8 based on its
textual format of Appendix 5. From the tree presentation, we noticed that it was really
hard to analyze the result even though the accuracy of mining result was very high. The
tree level is very deep, equals 8. There were 71 branches in the tree, and the test value
interval between two brunches with the same attribute was too small. This resulted in the
tree being divided in so many parts. We realized that the data analyzed by mining
algorithm were too big, such as the age value of applicant was from 18 to 80, and income
value was from $20,000 to $120,000. The data need to be generalized before mining.
6. Additional Input
In this step, we took two ways to build model and estimate its accuracy by using the
generalized data. First we chose holdout method with its default size, so the input data
were randomly partitioned into two independent set, 66% data were allocated to the
training set to derive the classifier, and remaining 34% was used as test data whose
accuracy is estimated. The result of this method is shown in Appendix 6. This method is
thought pessimistic since only part of initial data is used to build the model. So, we used
10-fold cross-validation as second method for our project. In this method, the algorithm
partitioned the data into 10 mutually exclusive folds with approximately equal size.
Training and test set were performed 10 times. In each time, the subset S i was allocated as
test data, and rest 9 subsets were treated as training data to classifier, so the accuracy

estimation is the overall number of correct classifications from the 10 iterations, divided
by total samples of whole data set.
Procedures
6.1 Data Generalization
We decided to transform the data for further input. We set 4 groups for age attribute:
age1: <20; age2: 20-40; age3: 41-60 and age4: >60, so the raw data were placed by high-
level concept. For income, we also divided it into 4 sets. They are income1: <$30,000;
income2: $30,000-$60,000; income3: $60,000-$90,000, and income4: >$90,000, therefore
the data was generalized from low level to high level. For class label attribute,
recommend1, recommend2, recommend3, recommend4 and recommend5 are represented
credit line $0, $5,000, $10,000, $20,000 and $50,000 respectively.
6.2 Model Build and Estimated with Holdout Method
The result of this method is shown in Appendix 6. In this figure, there were 496 data, only
2/3 training data selected to build a model, therefore it was hardly to tell if all samples
with a certain class was missed out of the training set. Sometimes, the sample used for
training or test set might not be representative. We used another method to estimate a
built model.
6.3 Model Build and Estimated with Cross-Validation
After data had been transformed, we put it into mining algorithm again and got the
following result in Appendix 7. In this step, the model was built based on setting all the
input data as training data, and the first set of measurements is derived from these data.
There are 780 instances classified correctly and 30 instances misclassified. The accuracy
is 96%. Such measurement is optimistic since the classifier has been learned form the
very same training data.
In this step, we didnt set test data by typing the statement: java weak.classifier. j48.J48
t credit.arff, therefore the algorithm automatically performed a ten-fold cross-validation to
evaluate the model. The final section of Appendix 7 presented the result obtained using
this method.
7. Additional Mining Result

It is a four-level tree in Appendix 7. The first level is split on history attribute, and
the second split on income and house-owner respectively, the third level is divided

into age and income and the bottom level is split on age and house-owner
attributes.
Under the tree structure, there are 60 leaves nodes represent class distributions,
and the size of the tree is 80, which represent the total number of nodes in the tree.
The last section shows 764 instances are classified correctly and 46 ones are
misclassified. The correct percentage of classification on test data set is 94%.
We can see that the result shown in Appendix 7 is easier to analyze than Appendix 5, since
the tree is shallower than the pervious one. The accuracy of result is a little bit lower than
the first input, however ninety-four is still high enough and considered to be accessible, so
the knowledge mined form decision tree algorithm can be used to predict future data
samples, and provide a better understanding of the data contents.
7.1 Test Decision Tree by New Data
After the decision tree was built, we used another 15 new data, which are different from
all the training data and test data of our experiment, and used these data to test the
accuracy of classification rules.
The test data shows as below:
16, 78560, none, no, 10000

43, 89630, none, yes, 10000
44, 88888, none, no, 5000
19, 100045, none, no, 5000
19, 112480, bad, yes, 5000
19, 426900, bad, no, 0
20, 22000, good, yes, 10000
30, 21000, good, no, 10000
32, 26580, none, yes, 5000
23, 28000, none, no, 5000
21, 29650, bad, yes, 5000
22, 28500, bad, no, 0
28, 45600, none, yes, 10000
36, 39520, none, no, 5000
52, 36540, good, yes, 0
In those data, we found that 14 new samples fit the rule, but the instance (52, 36540, good,
yes, 0) is incorrect. The accurate rate of the new data is 93%.
7.2 Tree Pruning

The last step of our experiment is tree pruning. We draw the tree based on Appendix 7 as
shown in Appendix 8. We found that some leaves represented the different groups of a
certain attribute belong to the same class, so we tied these brunch together and built a
more simple tree shown in Appendix 10.
We know that the knowledge learnt from decision trees can be extracted and presented in
If-Then rules. We convert the tree to classification rules by tracing the path from the root
node to each leaf node for easy to analysis. Here we only listed part of the rules extracted
from Appendix 7. According to the above classification rules, we are now able to
determine appropriate credit line for individual credit card applicants.
IF history=Good AND income =income1, Then recommended=recommend3

IF history=Good AND income =income2 AND (age = age1OR age = age3OR
age = age4, Then recommended=recommend3
IF history=Good, income =income2, age =age2 AND house-owner =yes, Then
recommended=recommend4
IF history=Good, income =income2, , age =age2 AND house-owner =no, Then
IF history=Bad, house-owner =yes, income =income2, AND age =age2 Then
IF history=Bad, house-owner =no, Then recommended=recommend1
IF history=None, house-owner =yes, income =income2, AND age =age2 Then
IF history=None, house-owner =no, income =income1, Then
IF history=None, house-owner =no, income =income3, AND age=age3, Then
8. Mining Technique Used in the Tool

In Weka3.0 software suit, we choose J48 as our mining tool. Decision tree is the main
mining technique used in the J48 algorithm. In order to improve the classifier accuracy,
we used both holdout and 10-fold cross-validation methods.
9. Mining Technique Details and Information

Decision tree divides the data into groups based on values of the variables. The main
methodology is to use a hierarchy of if-then statements to classify the data. This structure
has a form of a tree. Following this procedure, one eventually finds a conclusion to which
class the considered object should be assigned.
There has been a surge of interest in decision tree-based products, primarily because they
are faster than neural networks for many business problems and easier for users to
understand. However, this method can be applied for solution of classification tasks only
and it may not work with some types of data such as continuous sets of data, like age or
sales, and require that they be grouped into ranges. This limits applicability of the
decision trees method in many fields. The way a range is selected can inadvertently hide
patterns. For instance, if age is broken into a 25 to 34-year-old group, the fact that there is
a significant break at 30 may be concealed. To avoid this problem, it is possible by
assigning values to groups in a fuzzy way -- each instance of the same value may be
assigned to a different group.
To estimate classifier accuracy, holdout and k-fold cross-validation are two common
methods. For holdout, two independent sets of data, a training set and a test set, were
generated. The training set used 2/3 of data while the other 1/3 of date is allocated to the
test set. Then, the classifier is derived from the training set and its accuracy is estimated
with the test set. For 10-fold cross-validation, 10 equal size subsets S 1, S2, S10, were
generated by randomly partition. Subset Si was used to do the test and the remaining 9
subsets were used t train classifier. After performing testing and testing for 10 times, the
accuracy estimate is the overall number of correct classification from the 10 iterations,
divided by the total number of samples in the initial data.
10. Critical Evaluation of the Mining Technique Used

In order to solve business problems, data mining tools seek to address two key business
requirements:
Description -- discovering patterns, associations and clusters of information.

Prediction -- using those patterns to predict future trends and behaviors.
Different data mining tools can help business in different ways. It is very important to
differentiate among these tools with their technologies. Our project addresses what
factors are and how they affect a persons approval of a certain amount of a loan.
Decision tree is a better mining technique for this project. The reasons are as follows:
In data preparation phase of the project, we prepared data description, data
cleaning, data selection and data transformation. This is very crucial for the
development of our model and important to select the data mining tool. After
considering the goals of the project and the data warehouse to be used, we decided
that decision tree technique is a better tool in our project.
Decision tree technique provides a model of classification. In this project, we have
separated the intervals in age group, credit history group, income group, and home
ownership group, and classified them to the different amount of loan provided.
Although there might be some hidden patterns due to the breakdown of the
continuous sets of data, such as age, income, the other two groups, credit history
and home ownership, can be nicely set into different categories. And we paid
special attention on setting the intervals of the two continuous data sets, age and
income. Therefore, we believe that the decision tree is a better mining tool for this
project.
The problem here is not a very complex system. Size of data is relatively small.
Levels of interactions are low. Only a few variables present and their non-linearity
is low too. Therefore, decision tree can give us a pretty good picture of the
patterns.
Decision tree provides a good user interface to facilitate model building or pattern
recognizing. After applying the decision tree analysis, the results are relatively
easier to visualize. Therefore, it is easy and reliable for us to build a model and
explore patterns generated by data mining tools.
Data preparation and access for decision tree is easy. The database is small and
data are all at intervals. Therefore, decision tree has good performance with high
speed and accuracy.
The model provided generated by decision tree is relatively easy to understand and
interpret. In addition, it has interface to many tools that can further help the
knowledge discovery process.

11. Visualization Technique Used in the Tool
We choose Microsoft Excel as an existing visualization for segmentations of this project,
and for decision tree, we use Wittens data mining machine learning tool.
12. Visualization Technique Details and Information

J48 pruned tree give us a clear decision tree result, we use Microsoft word to draw the
graph.
Since the data of this project are multi-dimensioned into 2-D space, an overview of the
entire segmentation is logically a 2-D map. Major search topics are plotted in the decision
tree. The visualization developed for decision tree is shown in Figure 1, Figure 2, and
Figure 3.
The Figure 4 shows the number of instances that belong to each type of the class. The
recommend3 ($10,000) was the maximum, and in decreasing order, it is recommend2,
recommend1, recommend4, and recommend5 respectively. There are only a few in the
recommend5 ($50,000).
The Figure 5 shows the number of the last step, decisions (leaf node) decided by house
owner levels. Most of them were no (house owner). That means no house has more
possibility to get credit loan than owning a house. Because we know that credit history
was bad if no house then directly to recommend1 ($0) from decision tree.
The Figure 6 shows the number of the last step decisions (leaf node) decided by age level.
Each group is very similar with others.
The Figure 7 shows that the number of the last step decisions (leaf node) decided by
income levels. Most of them were income1 (<$30,000). Because low salary ones only be
given either recommend1 ($0) or 2 ($5,000). Providing high enough salary, income
would not be key factor for recommend decision. But income3 sometimes helps gaining
recommend decision to higher levels.

Recommend
300
200
Instances
100
0
r1 r2 r3 r4 r5
Type
Figure 4
The number of instances that belong to each type of the class
Decided by House Ow ner
200
0
yes no
Figure 5
The number of the last step decisions (leaf node) decided by house owner level
Decided by Age
150
100
50
0
age1 age2 age3 age4
Figure 6
The number of the last step decisions (leaf node) decided by age level
Decided by Incom e
100
50
0
income1 income2 income3 income4

Figure 7
The number of the last step decisions (leaf node) decided by income levels
13. Setting Up Environment for the Tool -- Weka

13.1 Where to Download?
We can go to the web site: http://www.cs.waikato.ac.nz/ml/weka/ to download weka
software.
After you access this web page, you can see different versions for your downloading. For
example, you may select the stable GUI version, which includes visualization tools and
lots of other improvements (weka-3-2.jar, 3,669,565 bytes, screenshots). But this version
must use swing technique, if you do not have java1.3 JDK installed on your computer, we
suggest you download another version, the stable book version (weka-3-0-4.jar, 1,576,597
bytes). This version needs to unzip the jar file. If you are under windows environment,
its easier to download self-extracting executables for installing the GUI version of Weka
under Windows weka-3-2.exe (3,874,492 bytes). The author also produces a joint version
combining both Weka package and java JDK weka-3-2jre.exe (11,496,646 bytes, includes
the Java Runtime Environment).
For our project, we select weka-3-2.exe to download. It takes about 27 minutes with 56K
modem to get an executable file weka-3-2.exe. If you download it in Towson University
Computer Science Lab, it only takes about 4 minutes.
13.2 How to Set Up?
Suppose we have downloaded the executable file to the file directory
C:\WINDOWS\Desktop\temp. Now we are ready to set up the software.
Double click the executable file, you will see a pop up window which reminder
you This will install WEKA, do you want to continue. You just press the Yes
button.
Another page will prompt you to close all other applications you are running, and
you click the Next button to continue.
A window comes out to require you to read the license for using Weka, from the
license description, we know this version was released in June 1991. You press the
Yes button to go on.
Now you need to select an installation directory. We select
C:\WINDOWS\Desktop\temp.
After selecting the directory, windows will reminder you WEKA will be added to
start menu group. Click the next button to continue.
Now setup is ready to install WEKA on you computer, click the install button.
It takes about 10 seconds to finish installation. You just click finish to finish this
step.
Open your file to view, there are 18 files in total.
13.3 How to Use?
After finishing the WEKA installation, the next very important thing is to master how to
use this package. We talk about it step by step in the following:
Enter your file directory. In our case, it is C:\WINDOWS\Desktop\temp.
Double click the file name weka.jar. This is a jar file, it contains 628 classes and
when you click it, it will execute because it contains a file named as
manifest.mf, you can easily double click to run it just as if its an .exe file.
Then you got a GUI picture. There are three buttons explicitly on it.
Click Simple CLI first, you access another picture. There are two parts on it.
One is help description and the other small part on the bottom is for you to enter
commands. In command text field, please type: java weka.classifiers.j48.J48 t
C:\WINDOWS\Desktop\temp\data\ weather.arff. Press ENTER, you get your
decision tree output. The attachment in 13.6 is an output when my command is
javaweka.classifiers.j48.J48 t c:\student\5\Weka-3-2\data\weather.arff. It
works in Towson University Computer Science Lab. Note: the directory is
different with what we talked above. For output analysis, please refer to chapter 5.
If you click the Explore button, you can get more features for many data mining
analysis. Now you click the open file button to locate you file for mining. e.g.
we select a data file from our directory:
C:\WINDOWS\Desktop\temp\data\weather.arff. Then you can select which
class you want to mining. Now you can view the first line. There are 6 buttons
you can click. If you want to use the Classify method to analysis data, just click
it. Then you can select the Test options. Suppose we use the default Cross-
validation. This time, you click the Start button to get the output you want.
You can also use other methods such as Cluster and Associate to get different
output. If you want to have a visualized result, just click the Visualize button. A
visual picture is created.

Use the Explore is more convenient than the Simple CLI due to it has more
functions, but it is more difficult to learn and analysis. For the beginning stage of
the studying, we suggest to use the Simple CLI for study purpose.
For details regarding how to use WEKA, please visit the following web:
http://www.cs.waikato.ac.nz/~ml/weka/Experiments.pdf.
13.4 System Environment
The WEKA package requires at least 7.6MB of hard disk space and you need to have Java
Virtual Machine installed on your computer.
13.5 Evaluation
Through using this software, we find WEKA provides a lot of functions for the user. For
example, for implemented schemes for classification, it includes:
decision tree inducers
rule learners
naive Bayes
decision tables
For more detailed WEKA functions, please visit the website:
http://www.cs.waikato.ac.nz/ml/weka/ . Its easy to use for beginners and it is free for
downloading.
13.6 Attachment:
Welcome to the WEKA SimpleCLI
Enter commands in the text field at the bottom of the window. Use the up and down
arrows to move through previous commands.
> help
Command must be one of:

java <classname> <args>
break
kill
cls
exit
help <command>
> java weka.classifiers.j48.J48 -t c:\student\5\Weka-3-2\data\weather.arff

J48 pruned tree
-------------------
outlook = sunny
| humidity <= 75: yes (2.0)
| humidity > 75: no (3.0)
outlook = overcast: yes (4.0)
outlook = rainy
| windy = TRUE: no (2.0)
| windy = FALSE: yes (3.0)
Number of Leaves : 5
Size of the tree : 8
=== Error on training data ===

Correctly Classified Instances 14 100 %
Incorrectly Classified Instances 0 0 %
Kappa statistic 1
Mean absolute error 0
Root mean squared error 0
Relative absolute error 0 %
Root relative squared error 0 %
Total Number of Instances 14
=== Confusion Matrix ===

a b classified as
90 | a = yes
05 | b = no
=== Stratified cross-validation ===

Correctly Classified Instances 9 64.2857 %
Incorrectly Classified Instances 5 35.7143 %

Kappa statistic 0.186
Mean absolute error 0.3036
Root mean squared error 0.4813
Relative absolute error 63.75 %
Root relative squared error 97.5542 %

a b classified as
7 2 | a = yes
3 2 | b = no
14. Conclusions
This project shows that we can use data mining machine learning tool to discover useful
knowledge like credit line granting rules for credit card applicants. And, data mining can
address the question of how best to use historical data to discover general regularities and
improve the process of decision-making.
During the implementation of this project, we learned all of the knowledge that included
in our project proposal. This project is very interesting though it is a hard work to finish.
It is real a team work, each of our group members understands the project and contributes
to the project.
Thanks Dr. Karne for giving us this practice opportunity and a lot of valuable ideas and
directions.

APPENDIX 1
@relation credit
@attribute age real

@attribute income real
@attribute history {good, none, bad}
@attribute house_owner {yes, no}
@attribute recommended {20000, 10000, 5000, 0}
@data
18,22000,good,yes,10000
19,21000,good,no,10000
18,18000,none,yes,10000
19,28000,none,no,5000
19,29650,bad,yes,0
18,28500,bad,no,0
17,32600,good,yes,10000
18,38620,good,no,10000
19,45600,none,yes,10000
19,39520,none,no,5000
19,59300,bad,yes,0
18,54280,bad,no,0
18,68420,good,yes,10000
18,70510,good,no,10000
17,89630,none,yes,10000
16,78560,none,no,10000
19,79465,bad,yes,0
19,88240,bad,no,0
18,96300,good,yes,10000
19,99860,good,no,10000
19,95680,none,yes,10000
19,100045,none,no,5000
19,112480,bad,yes,5000
19,426900,bad,no,0
20,22000,good,yes,10000
30,21000,good,no,10000
32,26580,none,yes,5000
23,28000,none,no,5000

21,29650,bad,yes,5000
22,28500,bad,no,0
25,38240,good,yes,20000
26,38620,good,no,10000
28,45600,none,yes,10000
36,39520,none,no,5000
39,59300,bad,yes,5000
33,64280,bad,no,0
29,68420,good,yes,20000
34,70510,good,no,20000
35,89630,none,yes,10000
32,78560,none,no,5000
33,79465,bad,yes,5000
36,88240,bad,no,0
39,96300,good,yes,5000
38,99860,good,no,5000
38,95680,none,yes,20000
39,100045,none,no,10000
36,112480,bad,yes,10000
25,426900,bad,no,0
42,28260,good,yes,10000
43,27560,good,no,10000
44,29000,none,yes,5000
58,28000,none,no,5000
59,29650,bad,yes,0
59,28500,bad,no,0
44,32600,good,yes,10000
59,38620,good,no,10000
55,45600,none,yes,10000
56,59530,none,no,10000
54,59300,bad,yes,5000
57,54280,bad,no,0
42,68420,good,yes,20000
41,70510,good,no,20000
43,89630,none,yes,10000
44,88888,none,no,5000
46,79465,bad,yes,5000
60,88240,bad,no,0
43,96300,good,yes,5000
42,99860,good,no,20000

44,326000,none,yes,20000
52,100045,none,no,10000
53,242560,bad,yes,5000
58,426900,bad,no,0
69,22000,good,yes,10000
70,28620,good,no,10000
76,29630,none,yes,5000
72,28000,none,no,5000
80,29650,bad,yes,0
64,28500,bad,no,0
62,52600,good,yes,10000
66,38620,good,no,10000
64,45600,none,yes,5000
61,59580,none,no,5000
62,59300,bad,yes,0
65,54280,bad,no,0
63,68420,good,yes,20000
69,70510,good,no,10000
64,89630,none,yes,10000
69,78560,none,no,5000
71,79465,bad,yes,5000
76,88240,bad,no,0
61,96300,good,yes,20000
62,423060,good,no,20000
63,95680,none,yes,10000
64,100045,none,no,5000
64,112480,bad,yes,5000
63,326900,bad,no,0

APPENDIX 2
@relation credit
@attribute age real

@attribute income real
@attribute recommended {50000, 20000, 10000, 5000, 0}
@data
18,22000,good,yes,10000
19,21000,good,no,10000
18,18000,none,yes,10000
19,28000,none,no,5000
19,29650,bad,yes,0
18,28500,bad,no,0
17,32600,good,yes,10000
18,38620,good,no,10000
19,45600,none,yes,10000
19,39520,none,no,5000
19,59300,bad,yes,0
18,54280,bad,no,0
18,68420,good,yes,10000
18,70510,good,no,10000
17,89630,none,yes,10000
16,78560,none,no,10000
19,79465,bad,yes,0
19,88240,bad,no,0
18,96300,good,yes,10000
19,99860,good,no,10000
19,95680,none,yes,10000
19,100045,none,no,5000
19,112480,bad,yes,5000
19,426900,bad,no,0
20,22000,good,yes,10000
30,21000,good,no,10000
32,26580,none,yes,5000
23,28000,none,no,5000

21,29650,bad,yes,5000
22,28500,bad,no,0
25,38240,good,yes,20000
26,38620,good,no,10000
28,45600,none,yes,10000
36,39520,none,no,5000
39,59300,bad,yes,5000
33,64280,bad,no,0
29,68420,good,yes,20000
34,70510,good,no,20000
35,89630,none,yes,10000
32,78560,none,no,5000
33,79465,bad,yes,5000
36,88240,bad,no,0
39,96300,good,yes,50000
38,99860,good,no,50000
38,95680,none,yes,20000
39,100045,none,no,10000
36,112480,bad,yes,10000
25,426900,bad,no,0
42,28260,good,yes,10000
43,27560,good,no,10000
44,29000,none,yes,5000
58,28000,none,no,5000
59,29650,bad,yes,0
59,28500,bad,no,0
44,32600,good,yes,10000
59,38620,good,no,10000
55,45600,none,yes,10000
56,59530,none,no,10000
54,59300,bad,yes,5000
57,54280,bad,no,0
42,68420,good,yes,20000
41,70510,good,no,20000
43,89630,none,yes,10000
44,88888,none,no,5000
46,79465,bad,yes,5000
60,88240,bad,no,0
43,96300,good,yes,50000
42,99860,good,no,20000

44,326000,none,yes,20000
52,100045,none,no,10000
53,242560,bad,yes,5000
58,426900,bad,no,0
69,22000,good,yes,10000
70,28620,good,no,10000
76,29630,none,yes,5000
72,28000,none,no,5000
80,29650,bad,yes,0
64,28500,bad,no,0
62,52600,good,yes,10000
66,38620,good,no,10000
64,45600,none,yes,5000
61,59580,none,no,5000
62,59300,bad,yes,0
65,54280,bad,no,0
63,68420,good,yes,20000
69,70510,good,no,10000
64,89630,none,yes,10000
69,78560,none,no,5000
71,79465,bad,yes,5000
76,88240,bad,no,0
61,96300,good,yes,20000
62,423060,good,no,20000
63,95680,none,yes,10000
64,100045,none,no,5000
64,112480,bad,yes,5000
63,326900,bad,no,0
18,22000,good,yes,10000
19,21000,good,no,10000
18,18000,none,yes,10000
19,28000,none,no,5000
19,29650,bad,yes,0
18,28500,bad,no,0
17,32600,good,yes,10000
18,38620,good,no,10000
19,45600,none,yes,10000
19,39520,none,no,5000
19,59300,bad,yes,0
18,54280,bad,no,0

18,68420,good,yes,10000
18,70510,good,no,10000
17,89630,none,yes,10000
16,78560,none,no,10000
19,79465,bad,yes,0
19,88240,bad,no,0
18,96300,good,yes,10000
19,99860,good,no,10000
19,95680,none,yes,10000
19,100045,none,no,5000
19,112480,bad,yes,5000
19,426900,bad,no,0
20,22000,good,yes,10000
30,21000,good,no,10000
32,26580,none,yes,5000
23,28000,none,no,5000
21,29650,bad,yes,5000
22,28500,bad,no,0
25,38240,good,yes,20000
26,38620,good,no,10000
28,45600,none,yes,10000
36,39520,none,no,5000
39,59300,bad,yes,5000
33,64280,bad,no,0
29,68420,good,yes,20000
34,70510,good,no,20000
35,89630,none,yes,10000
32,78560,none,no,5000
33,79465,bad,yes,5000
36,88240,bad,no,0
39,96300,good,yes,50000
38,99860,good,no,50000
38,95680,none,yes,20000
39,100045,none,no,10000
36,112480,bad,yes,10000
25,426900,bad,no,0
42,28260,good,yes,10000
43,27560,good,no,10000
44,29000,none,yes,5000
58,28000,none,no,5000

59,29650,bad,yes,0
59,28500,bad,no,0
44,32600,good,yes,10000
59,38620,good,no,10000
55,45600,none,yes,10000
56,59530,none,no,10000
54,59300,bad,yes,5000
57,54280,bad,no,0
42,68420,good,yes,20000
41,70510,good,no,20000
43,89630,none,yes,10000
44,88888,none,no,5000
46,79465,bad,yes,5000
60,88240,bad,no,0
43,96300,good,yes,50000
42,99860,good,no,20000
44,326000,none,yes,20000
52,100045,none,no,10000
53,242560,bad,yes,5000
58,426900,bad,no,0
69,22000,good,yes,10000
70,28620,good,no,10000
76,29630,none,yes,5000
72,28000,none,no,5000
80,29650,bad,yes,0
64,28500,bad,no,0
62,52600,good,yes,10000
66,38620,good,no,10000
64,45600,none,yes,5000
61,59580,none,no,5000
62,59300,bad,yes,0
65,54280,bad,no,0
63,68420,good,yes,20000
69,70510,good,no,10000
64,89630,none,yes,10000
69,78560,none,no,5000
71,79465,bad,yes,5000
76,88240,bad,no,0
61,96300,good,yes,20000
62,423060,good,no,20000

63,95680,none,yes,10000
64,100045,none,no,5000
64,112480,bad,yes,5000
63,326900,bad,no,0
56,962450,good,yes,50000
61,864500,good,no,20000
20,362450,good,no,5000
23,356280,good,no,5000
26,356280,good,no,5000
29,289645,good,no,5000
36,295631,good,no,5000
37,423560,none,no,5000
32,365698,none,no,5000
23,295632,none,no,5000
23,469250,bad,yes,0
22,569840,bad,no,0
29,362350,bad,no,0
42,236589,bad,no,0
68,256398,bad,no,0
45,689532,bad,yes,5000
54,59300,bad,yes,5000
57,54280,bad,no,0
42,68420,good,yes,20000
41,70510,good,no,20000
43,89630,none,yes,10000
44,88888,none,no,5000
46,79465,bad,yes,5000
60,88240,bad,no,0
43,96300,good,yes,50000
42,99860,good,no,20000
44,326000,none,yes,20000
52,100045,none,no,10000
53,242560,bad,yes,5000
58,426900,bad,no,0
69,22000,good,yes,10000
70,28620,good,no,10000
76,29630,none,yes,5000
72,28000,none,no,5000
80,29650,bad,yes,0
35,89630,none,yes,10000

32,78560,none,no,5000
33,79465,bad,yes,5000
36,88240,bad,no,0
39,96300,good,yes,50000
38,99860,good,no,50000
38,95680,none,yes,20000
39,100045,none,no,10000
36,112480,bad,yes,10000
25,426900,bad,no,0
42,28260,good,yes,10000
43,27560,good,no,10000
44,29000,none,yes,5000
58,28000,none,no,5000
59,29650,bad,yes,0
59,28500,bad,no,0
44,32600,good,yes,10000
59,38620,good,no,10000
55,45600,none,yes,10000
18,22000,good,yes,10000
19,21000,good,no,10000
18,18000,none,yes,10000
19,28000,none,no,5000
19,29650,bad,yes,0
18,28500,bad,no,0
17,32600,good,yes,10000
18,38620,good,no,10000
19,45600,none,yes,10000
19,39520,none,no,5000
19,59300,bad,yes,0
18,54280,bad,no,0
18,68420,good,yes,10000
18,70510,good,no,10000
17,89630,none,yes,10000
16,78560,none,no,10000
43,89630,none,yes,10000
44,88888,none,no,5000
46,79465,bad,yes,5000
60,88240,bad,no,0
43,96300,good,yes,50000
42,99860,good,no,20000

44,326000,none,yes,20000
52,100045,none,no,10000
53,242560,bad,yes,5000
58,426900,bad,no,0
69,22000,good,yes,10000
70,28620,good,no,10000
76,29630,none,yes,5000
72,28000,none,no,5000
80,29650,bad,yes,0
64,28500,bad,no,0
62,52600,good,yes,10000
66,38620,good,no,10000
18,22000,good,yes,10000
19,21000,good,no,10000
18,18000,none,yes,10000
19,28000,none,no,5000
19,29650,bad,yes,0
18,28500,bad,no,0
17,32600,good,yes,10000
18,38620,good,no,10000
19,45600,none,yes,10000
19,39520,none,no,5000
19,59300,bad,yes,0
18,54280,bad,no,0
18,68420,good,yes,10000
18,70510,good,no,10000
17,89630,none,yes,10000
16,78560,none,no,10000
19,79465,bad,yes,0
19,88240,bad,no,0
18,96300,good,yes,10000
19,99860,good,no,10000
19,95680,none,yes,10000
19,100045,none,no,5000
19,112480,bad,yes,5000
19,426900,bad,no,0
20,22000,good,yes,10000
30,21000,good,no,10000
32,26580,none,yes,5000
23,28000,none,no,5000

21,29650,bad,yes,5000
22,28500,bad,no,0
25,38240,good,yes,20000
26,38620,good,no,10000
28,45600,none,yes,10000
36,39520,none,no,5000
39,59300,bad,yes,5000
33,64280,bad,no,0
29,68420,good,yes,20000
34,70510,good,no,20000
35,89630,none,yes,10000
32,78560,none,no,5000
33,79465,bad,yes,5000
36,88240,bad,no,0
39,96300,good,yes,50000
38,99860,good,no,50000
38,95680,none,yes,20000
39,100045,none,no,10000
36,112480,bad,yes,10000
25,426900,bad,no,0
42,28260,good,yes,10000
43,27560,good,no,10000
44,29000,none,yes,5000
58,28000,none,no,5000
59,29650,bad,yes,0
59,28500,bad,no,0
44,32600,good,yes,10000
59,38620,good,no,10000
55,45600,none,yes,10000
56,59530,none,no,10000
54,59300,bad,yes,5000
57,54280,bad,no,0
42,68420,good,yes,20000
41,70510,good,no,20000
43,89630,none,yes,10000
44,88888,none,no,5000
46,79465,bad,yes,5000
60,88240,bad,no,0
43,96300,good,yes,50000
42,99860,good,no,20000

44,326000,none,yes,20000
52,100045,none,no,10000
53,242560,bad,yes,5000
58,426900,bad,no,0
69,22000,good,yes,10000
70,28620,good,no,10000
76,29630,none,yes,5000
72,28000,none,no,5000
80,29650,bad,yes,0
64,28500,bad,no,0
62,52600,good,yes,10000
66,38620,good,no,10000
64,45600,none,yes,5000
61,59580,none,no,5000
62,59300,bad,yes,0
65,54280,bad,no,0
63,68420,good,yes,20000
69,70510,good,no,10000
64,89630,none,yes,10000
69,78560,none,no,5000
71,79465,bad,yes,5000
76,88240,bad,no,0
61,96300,good,yes,20000
62,423060,good,no,20000
63,95680,none,yes,10000
64,100045,none,no,5000
64,112480,bad,yes,5000
63,326900,bad,no,0
18,22000,good,yes,10000
19,21000,good,no,10000
18,18000,none,yes,10000
19,28000,none,no,5000
19,29650,bad,yes,0
18,28500,bad,no,0
17,32600,good,yes,10000
18,38620,good,no,10000
19,45600,none,yes,10000
19,39520,none,no,5000
19,59300,bad,yes,0
18,54280,bad,no,0

18,68420,good,yes,10000
18,70510,good,no,10000
17,89630,none,yes,10000
16,78560,none,no,10000
19,79465,bad,yes,0
19,88240,bad,no,0
18,96300,good,yes,10000
19,99860,good,no,10000
19,95680,none,yes,10000
19,100045,none,no,5000
19,112480,bad,yes,5000
19,426900,bad,no,0
20,22000,good,yes,10000
30,21000,good,no,10000
32,26580,none,yes,5000
23,28000,none,no,5000
21,29650,bad,yes,5000
22,28500,bad,no,0
25,38240,good,yes,20000
26,38620,good,no,10000
28,45600,none,yes,10000
36,39520,none,no,5000
39,59300,bad,yes,5000
33,64280,bad,no,0
29,68420,good,yes,20000
34,70510,good,no,20000
35,89630,none,yes,10000
32,78560,none,no,5000
33,79465,bad,yes,5000
36,88240,bad,no,0
39,96300,good,yes,50000
38,99860,good,no,50000
38,95680,none,yes,20000
39,100045,none,no,10000
36,112480,bad,yes,10000
25,426900,bad,no,0
42,28260,good,yes,10000
43,27560,good,no,10000
44,29000,none,yes,5000
58,28000,none,no,5000

59,29650,bad,yes,0
59,28500,bad,no,0
44,32600,good,yes,10000
59,38620,good,no,10000
55,45600,none,yes,10000
56,59530,none,no,10000
54,59300,bad,yes,5000
57,54280,bad,no,0
42,68420,good,yes,20000
41,70510,good,no,20000
43,89630,none,yes,10000
44,88888,none,no,5000
46,79465,bad,yes,5000
60,88240,bad,no,0
43,96300,good,yes,50000
42,99860,good,no,20000
44,326000,none,yes,20000
52,100045,none,no,10000
53,242560,bad,yes,5000
58,426900,bad,no,0
69,22000,good,yes,10000
70,28620,good,no,10000
76,29630,none,yes,5000
72,28000,none,no,5000
80,29650,bad,yes,0
64,28500,bad,no,0
62,52600,good,yes,10000
66,38620,good,no,10000
64,45600,none,yes,5000
61,59580,none,no,5000
62,59300,bad,yes,0
65,54280,bad,no,0
63,68420,good,yes,20000
69,70510,good,no,10000
64,89630,none,yes,10000
69,78560,none,no,5000
71,79465,bad,yes,5000
76,88240,bad,no,0
61,96300,good,yes,20000
62,423060,good,no,20000

63,95680,none,yes,10000
64,100045,none,no,5000
64,112480,bad,yes,5000
63,326900,bad,no,0
56,962450,good,yes,50000
61,864500,good,no,20000
20,362450,good,no,5000
23,356280,good,no,5000
26,356280,good,no,5000
29,289645,good,no,5000
36,295631,good,no,5000
37,423560,none,no,5000
32,365698,none,no,5000
23,295632,none,no,5000
23,469250,bad,yes,0
22,569840,bad,no,0
29,362350,bad,no,0
42,236589,bad,no,0
68,256398,bad,no,0
45,689532,bad,yes,5000
54,59300,bad,yes,5000
57,54280,bad,no,0
42,68420,good,yes,20000
41,70510,good,no,20000
43,89630,none,yes,10000
44,88888,none,no,5000
46,79465,bad,yes,5000
60,88240,bad,no,0
43,96300,good,yes,50000
42,99860,good,no,20000
44,326000,none,yes,20000
52,100045,none,no,10000
53,242560,bad,yes,5000
58,426900,bad,no,0
69,22000,good,yes,10000
70,28620,good,no,10000
76,29630,none,yes,5000
72,28000,none,no,5000
80,29650,bad,yes,0
35,89630,none,yes,10000

32,78560,none,no,5000
33,79465,bad,yes,5000
36,88240,bad,no,0
39,96300,good,yes,50000
38,99860,good,no,50000
38,95680,none,yes,20000
39,100045,none,no,10000
36,112480,bad,yes,10000
25,426900,bad,no,0
42,28260,good,yes,10000
43,27560,good,no,10000
44,29000,none,yes,5000
58,28000,none,no,5000
59,29650,bad,yes,0
59,28500,bad,no,0
44,32600,good,yes,10000
59,38620,good,no,10000
55,45600,none,yes,10000
18,22000,good,yes,10000
19,21000,good,no,10000
18,18000,none,yes,10000
19,28000,none,no,5000
19,29650,bad,yes,0
18,28500,bad,no,0
17,32600,good,yes,10000
18,38620,good,no,10000
19,45600,none,yes,10000
19,39520,none,no,5000
19,59300,bad,yes,0
18,54280,bad,no,0
18,68420,good,yes,10000
18,70510,good,no,10000
17,89630,none,yes,10000
16,78560,none,no,10000
43,89630,none,yes,10000
44,88888,none,no,5000
46,79465,bad,yes,5000
60,88240,bad,no,0
43,96300,good,yes,50000
42,99860,good,no,20000

44,326000,none,yes,20000
52,100045,none,no,10000
53,242560,bad,yes,5000
58,426900,bad,no,0
69,22000,good,yes,10000
70,28620,good,no,10000
76,29630,none,yes,5000
72,28000,none,no,5000
80,29650,bad,yes,0
64,28500,bad,no,0
62,52600,good,yes,10000
66,38620,good,no,10000
42,99860,good,no,20000
44,326000,none,yes,20000
52,100045,none,no,10000
53,242560,bad,yes,5000
58,426900,bad,no,0
69,22000,good,yes,10000
70,28620,good,no,10000
76,29630,none,yes,5000
72,28000,none,no,5000
19,45600,none,yes,10000
19,39520,none,no,5000
19,59300,bad,yes,0
18,54280,bad,no,0
18,68420,good,yes,10000
18,70510,good,no,10000
17,89630,none,yes,10000
16,78560,none,no,10000
43,89630,none,yes,10000
44,88888,none,no,5000
46,79465,bad,yes,5000
19,21000,good,no,10000
18,18000,none,yes,10000
19,28000,none,no,5000
19,29650,bad,yes,0
18,28500,bad,no,0
17,32600,good,yes,10000
18,38620,good,no,10000
19,45600,none,yes,10000

19,39520,none,no,5000
19,59300,bad,yes,0
18,54280,bad,no,0
18,68420,good,yes,10000
18,70510,good,no,10000
17,89630,none,yes,10000
16,78560,none,no,10000
43,89630,none,yes,10000
32,26580,none,yes,5000
23,28000,none,no,5000
21,29650,bad,yes,5000
22,28500,bad,no,0
25,38240,good,yes,20000
26,38620,good,no,10000
28,45600,none,yes,10000
36,39520,none,no,5000
39,59300,bad,yes,5000
33,64280,bad,no,0
29,68420,good,yes,20000
34,70510,good,no,20000
35,89630,none,yes,10000
32,78560,none,no,5000
33,79465,bad,yes,5000
36,88240,bad,no,0
39,96300,good,yes,50000
38,99860,good,no,50000
38,95680,none,yes,20000
39,100045,none,no,10000
36,112480,bad,yes,10000
61,59580,none,no,5000
62,59300,bad,yes,0
65,54280,bad,no,0
63,68420,good,yes,20000
69,70510,good,no,10000
64,89630,none,yes,10000
69,78560,none,no,5000
71,79465,bad,yes,5000
76,88240,bad,no,0
61,96300,good,yes,20000
62,423060,good,no,20000

63,95680,none,yes,10000
64,100045,none,no,5000
64,112480,bad,yes,5000
63,326900,bad,no,0
56,962450,good,yes,50000
61,864500,good,no,20000
20,362450,good,no,5000
23,356280,good,no,5000
26,356280,good,no,5000
29,289645,good,no,5000
36,295631,good,no,5000
37,423560,none,no,5000
32,365698,none,no,5000
23,295632,none,no,5000
23,469250,bad,yes,0
22,569840,bad,no,0
29,362350,bad,no,0
42,236589,bad,no,0
68,256398,bad,no,0
45,689532,bad,yes,5000
54,59300,bad,yes,5000
57,54280,bad,no,0
42,68420,good,yes,20000
41,70510,good,no,20000
43,89630,none,yes,10000
44,88888,none,no,5000
46,79465,bad,yes,5000
60,88240,bad,no,0
43,96300,good,yes,50000
42,99860,good,no,20000
44,326000,none,yes,20000
52,100045,none,no,10000
53,242560,bad,yes,5000
58,426900,bad,no,0
69,22000,good,yes,10000
70,28620,good,no,10000
76,29630,none,yes,5000
72,28000,none,no,5000
80,29650,bad,yes,0
35,89630,none,yes,10000

32,78560,none,no,5000
33,79465,bad,yes,5000
36,88240,bad,no,0
39,96300,good,yes,50000
38,99860,good,no,50000
38,95680,none,yes,20000
39,100045,none,no,10000
36,112480,bad,yes,10000
25,426900,bad,no,0
42,28260,good,yes,10000
43,27560,good,no,10000
44,29000,none,yes,5000
58,28000,none,no,5000
59,29650,bad,yes,0
59,28500,bad,no,0
44,32600,good,yes,10000
59,38620,good,no,10000
55,45600,none,yes,10000
18,22000,good,yes,10000
19,21000,good,no,10000
18,18000,none,yes,10000
19,28000,none,no,5000
19,29650,bad,yes,0
18,28500,bad,no,0
17,32600,good,yes,10000
18,38620,good,no,10000
19,45600,none,yes,10000
19,39520,none,no,5000
19,59300,bad,yes,0
18,54280,bad,no,0
18,68420,good,yes,10000
18,70510,good,no,10000
17,89630,none,yes,10000
16,78560,none,no,10000
43,89630,none,yes,10000
44,88888,none,no,5000
19,100045,none,no,5000
19,112480,bad,yes,5000
19,426900,bad,no,0
20,22000,good,yes,10000

30,21000,good,no,10000
32,26580,none,yes,5000
23,28000,none,no,5000
21,29650,bad,yes,5000
22,28500,bad,no,0
25,38240,good,yes,20000
26,38620,good,no,10000
28,45600,none,yes,10000
36,39520,none,no,5000
39,59300,bad,yes,5000
33,64280,bad,no,0
29,68420,good,yes,20000
34,70510,good,no,20000
35,89630,none,yes,10000
32,78560,none,no,5000
33,79465,bad,yes,5000
36,88240,bad,no,0
39,96300,good,yes,50000
38,99860,good,no,50000
38,95680,none,yes,20000
39,100045,none,no,10000
36,112480,bad,yes,10000
25,426900,bad,no,0
42,28260,good,yes,10000
43,27560,good,no,10000
44,29000,none,yes,5000
58,28000,none,no,5000
62,59300,bad,yes,0
65,54280,bad,no,0
63,68420,good,yes,20000
69,70510,good,no,10000
64,89630,none,yes,10000
69,78560,none,no,5000
71,79465,bad,yes,5000
76,88240,bad,no,0
61,96300,good,yes,20000
62,423060,good,no,20000
63,95680,none,yes,10000
64,100045,none,no,5000
64,112480,bad,yes,5000

63,326900,bad,no,0
18,22000,good,yes,10000
19,21000,good,no,10000
18,18000,none,yes,10000
19,28000,none,no,5000
19,29650,bad,yes,0
18,28500,bad,no,0
17,32600,good,yes,10000
18,38620,good,no,10000
19,45600,none,yes,10000
19,39520,none,no,5000
19,59300,bad,yes,0
18,54280,bad,no,0
18,68420,good,yes,10000
62,59300,bad,yes,0
65,54280,bad,no,0
63,68420,good,yes,20000
69,70510,good,no,10000
64,89630,none,yes,10000
69,78560,none,no,5000
71,79465,bad,yes,5000
76,88240,bad,no,0
61,96300,good,yes,20000
62,423060,good,no,20000
63,95680,none,yes,10000
64,100045,none,no,5000
64,112480,bad,yes,5000
63,326900,bad,no,0
18,22000,good,yes,10000
19,21000,good,no,10000
18,18000,none,yes,10000
19,28000,none,no,5000
19,29650,bad,yes,0
18,28500,bad,no,0
17,32600,good,yes,10000
18,38620,good,no,10000
19,45600,none,yes,10000
19,39520,none,no,5000
19,59300,bad,yes,0
18,54280,bad,no,0

18,68420,good,yes,10000
63,326900,bad,no,0
56,962450,good,yes,50000
61,864500,good,no,20000
20,362450,good,no,5000
23,356280,good,no,5000
26,356280,good,no,5000
29,289645,good,no,5000
36,295631,good,no,5000
37,423560,none,no,5000
32,365698,none,no,5000
23,295632,none,no,5000
23,469250,bad,yes,0
22,569840,bad,no,0
29,362350,bad,no,0
42,236589,bad,no,0
20,6790, none, no,10000
23,15689, none, yes,20000
45,36500, bad, no,10000
62,63530, bad, no,10000
36,85640, good, yes,5000
52,36540, good, yes, 0
46,63520, none, yes,0

APPENDIX 3
@relation credit
@attribute age {age1, age2, age3,age4}

@attribute income {income1, income2, income3, income4}
@attribute recommend {recommend1, recommend2,recommend3, recommend4,
recommend5}
@data
age1,income1,good,yes,recommend3
age1,income1,good,no, recommend3
age1,income1,none,yes,recommend3
age1,income1,none,no, recommend2
age1,income1,bad,yes, recommend1
age1,income1,bad,no, recommend1
age1,?, good,yes, recommend3
age1,income2,?,no, recommend3
age1,income3,?,yes, recommend3
?,income4,none,yes, recommend3

age2,income4,good,no,?
age3,income3,bad,yes,recommend2
age3,income3,bad,no,recommend1

age3,income4,good,no,recommend4
age3,income4,none,no,recommend3


















age2,income1, none, no,recommend3
age2,income1, none, yes,recommend4
age3,income2, bad, no,recommend3
age4,income3, bad, no,recommend3
age2,income3, good, yes,recommend2
age3,income2, good, yes, recommend1
age3,income3, none, yes,recommend1

APPENDIX 4
Decision rules:
J48 pruned tree
------------------
history = good
| income <= 59580: 10000 (16.0/1.0)
| income > 59580
| | age <= 23: 10000 (4.0)
| | age > 23
| | | income <= 79465: 20000 (6.0/1.0)
| | | income > 79465
| | | | age <= 39: 50000 (2.0)
| | | | age > 39: 20000 (4.0/1.0)
history = none
| house_owner = yes
| | income <= 64280
| | | age <= 30: 10000 (3.0)
| | | age > 30: 5000 (5.0/1.0)
| | income > 64280: 10000 (8.0/2.0)
| house_owner = no: 5000 (16.0/4.0)
history = bad
| house_owner = yes
| | income <= 95680
| | | age <= 20: 0 (3.0)
| | | age > 20
| | | | age <= 56: 5000 (5.0)
| | | | age > 56: 0 (4.0/1.0)
| | income > 95680: 5000 (4.0/1.0)
| house_owner = no: 0 (16.0)

a b c d e <-- classified as
2 1 0 0 0 | a = 50000
0 8 3 0 0 | b = 20000
0 1 28 6 0 | c = 10000
0 0 0 24 1 | d = 5000
0 0 0 0 22 | e = 0
0 3 0 0 0 | a = 50000
3 5 3 0 0 | b = 20000
0 6 21 8 0 | c = 10000
0 0 5 17 3 | d = 5000
0 0 0 4 18 | e = 0

APPENDIX 5
J48 pruned tree

------------------
history = good
| income <= 70510
| | income <= 59580
| | | income <= 32600: 10000 (82.0)
| | | income > 32600
| | | | income <= 38240: 20000 (7.0/1.0)
| | | | income > 38240: 10000 (35.0)
| | income > 59580
| | | age <= 23: 10000 (20.0)
| | | age > 23
| | | | age <= 66: 20000 (33.0)
| | | | age > 66: 10000 (7.0)
| income > 70510
| | age <= 37
| | | age <= 19: 10000 (8.0)
| | | age > 19: 5000 (21.0)
| | age > 37
| | | age <= 58
| | | | house_owner = yes: 50000 (22.0)
| | | | house_owner = no
| | | | | age <= 39: 50000 (9.0)
| | | | | age > 39: 20000 (10.0)
| | | age > 58: 20000 (18.0)
history = none
| house_owner = yes
| | income <= 89630
| | | income <= 36540
| | | | income <= 22000: 10000 (11.0/1.0)
| | | | income > 22000: 5000 (24.0)
| | | income > 36540
| | | | age <= 59: 10000 (55.0/1.0)
| | | | age > 59
| | | | | income <= 64280: 5000 (4.0)
| | | | | income > 64280: 10000 (7.0)
| | income > 89630

| | | age <= 53
| | | | age <= 28: 10000 (4.0)
| | | | age > 28: 20000 (19.0)
| | | age > 53: 10000 (7.0)
| house_owner = no
| | income <= 45600: 5000 (52.0/1.0)
| | income > 45600
| | | age <= 58
| | | | age <= 46
| | | | | age <= 17: 10000 (9.0)
| | | | | age > 17
| | | | | | age <= 38: 5000 (26.0)
| | | | | | age > 38
| | | | | | | age <= 41: 10000 (9.0)
| | | | | | | age > 41: 5000 (11.0)
| | | | age > 46: 10000 (14.0)
| | | age > 58: 5000 (19.0)
history = bad
| house_owner = yes
| | income <= 68420
| | | age <= 56
| | | | age <= 20: 0 (21.0)
| | | | age > 20: 5000 (19.0)
| | | age > 56: 0 (23.0)
| | income > 68420
| | | age <= 39
| | | | age <= 34
| | | | | age <= 28
| | | | | | age <= 21
| | | | | | | income <= 95680: 0 (4.0)
| | | | | | | income > 95680: 5000 (5.0)
| | | | | | age > 21: 0 (4.0)
| | | | | age > 28: 5000 (9.0)
| | | | age > 34: 10000 (9.0)
| | | age > 39: 5000 (37.0)
| house_owner = no: 0 (137.0/2.0)

31 0 0 0 0 | a = 50000
0 86 1 0 0 | b = 20000
0 0 275 1 2 | c = 10000
0 0 0 226 0 | d = 5000
0 1 1 0 187 | e = 0
31 0 0 0 0 | a = 50000
0 86 1 0 0 | b = 20000
0 0 272 4 2 | c = 10000
0 0 4 222 0 | d = 5000
0 1 1 0 187 | e = 0

APPENDIX 6
J48 pruned tree

------------------
history = good
| income = income1: recommend3 (43.26)
| income = income2: recommend3 (33.53/5.0)
| income = income3
| | age = age1: recommend3 (13.22)
| | age = age2: recommend4 (8.0/2.0)
| | age = age4
| | | house_owner = yes: recommend4 (4.0)
| | | house_owner = no: recommend3 (4.0)
| income = income4
| | age = age2
| | | house_owner = no: recommend2 (20.0/5.0)
| | age = age3
history = none
| house_owner = yes
| | income = income1
| | | age = age1: recommend3 (8.0)
| | | age = age2: recommend2 (6.0/1.0)
| | income = income2: recommend3 (18.0/2.0)
| house_owner = no
| | age = age1
| | | income = income1: recommend2 (8.0)

| | | income = income2: recommend2 (7.34/0.34)
| | age = age3
history = bad
| house_owner = yes
| house_owner = no: recommend1 (84.33/2.33)
109 1 5 0 0 | a = recommend1

0 142 2 1 0 | b = recommend2
3 6 164 0 0 | c = recommend3
0 1 4 41 0 | d = recommend4
0 5 0 1 11 | e = recommend5
=== Error on test data ===
Ignored Class Unknown Instances 1
58 3 1 0 0 | a = recommend1
0 67 1 0 0 | b = recommend2
0 4 82 1 0 | c = recommend3
0 0 2 31 0 | d = recommend4
0 3 0 0 9 | e = recommend5

APPENDIX 7
J48 pruned tree

------------------
history = good
| income = income1: recommend3 (65.24)
| income = income2
| | age = age2
| income = income3
| | age = age4
| income = income4
| | age = age2
| | | house_owner = no: recommend2 (28.0/8.0)
| | age = age3
history = none
| house_owner = yes

| house_owner = no
history = bad
| house_owner = yes

| house_owner = no: recommend1 (137.33/2.33)
183 0 6 0 0 | a = recommend1
0 225 0 1 0 | b = recommend2
2 10 265 1 0 | c = recommend3
0 1 0 86 0 | d = recommend4
0 8 0 1 21 | e = recommend5
179 4 6 0 0 | a = recommend1
0 221 4 1 0 | b = recommend2
2 14 261 1 0 | c = recommend3
0 1 4 82 0 | d = recommend4
0 8 0 1 21 | e = recommend5

APPENDIX 8
Data Mining Project Proposal

for
Getting a Loan Approval
Bohui Qi, Yi Yu, Jianping Du, Lanlan Wang, Ying Zhang, Jin Guo
Department of Computer Science
Towson University
Project Objective
The objective of this project is to use data mining tools to address what factors are and
how they affect getting the approval of a persons application of a certain amount of a
loan. Details listed as below:
Understanding of how a machine-learning package works and what it does.
Knowing how to choose a best-suited data mining method from different kinds for
a certain application.
Through this project, we should understand data mining is actually a knowledge

discovery. Data mining methodology extracts hidden predictive information from
various enormous databases.
Learning how to prepare data for our mining tool, and translate the input data in its
required format.
Understanding and analyzing the observation of new knowledge mined from the
application.
Learning the mining technique used in the tool.
Project Description
By using data mining tools, we address what factors are and how they affect getting the
approval of a persons application of a certain amount of a loan. We organize the
database, then we evaluate several data mining tools and choose one that is suitable to
mine this application, our project. Finally, after observing and analyzing the new
knowledge, we will validate the findings.

Project Implementation
Object Identification
The goal of this project is to display patterns of the amount of loan approval in different
groups in age, income, credit history, and home ownership, etc. We will identify the
critical factors and to what extent they affect the amount of the loan approved.
Data Selection, Preparation and Audition
We will generate a set of data with the above attributes. These data sets will be prepared
in an excel spreadsheet and in word ARFF format. For age group, we will set four
intervals respectively: <20, 20-40, 40-60, >60 years old; For income, the intervals are:
<$30,000, $30-60,000, $60-90,000, >$90,000; For credit history, the categories are: good,
bad, and none; House owned categories are yes and no; Loan approved intervals are:
$0K, $5K, $10K, $20K, and $50K. We will evaluate the nature and the structure of the
database in order to determine the appropriate tools.
Tools Selection
Based on objectives and the data structure, we will select an appropriate data mining tool.
For this project, we will use Weka data mining tool sets.
Solution Formation
The format of the solution is determined by the data audit, the business objective and the
mining tool. In this project, the report will consist of the amount of loan approved as a
function of different intervals in each of the 4 categories.
Expect Output
Through analyzing this project, we will get the following association rules:
If a persons credit history is bad and he/she is not a house-owner, the application
will be denied.
If a person has no credit history before, a loan of $5K for the first time application
will be granted.
If a persons credit history is bad, but he/she is a house-owner and his/her annual
income is more than $90K, or his/her annual income is more than $60K and
his/her age is between 40 and 60, a loan of $5K will be approved.
Different amount of loans will be approved based on different annual income and
the age, etc.
We also expect to get the classification rules and decision trees.
Model Construction

We will use a training set and a test set of data to do the mining test. Based on the test
results, we will construct and evaluate a model. This stage will help the generation of
classification rules, decision trees, clustering sub-groups, scores, and evaluation data/error
rates. We will conclude the roles of different intervals in each category in the amount of
loan approval, which interval in each category has the highest approved loan amount, and
the most critical factor in all categories for the highest amount of loan approval.
Findings Validation and Delivering
We will discuss the results of the analysis with some experts to ensure that the findings are
correct and appropriate for the business objectives. Then a final report is delivered with
documentations of the entire data mining process including data preparation, tools used,
mining techniques used in the tools and the detailed information, test results, visualizing
techniques used and its detailed information, source code and rules.
Operating System Environment
Windows 95, 98 and Windows 2000
Project Schedule
We will hold weekly team meeting regularly. Our schedule is as following:
02/01/01 --- 02/15/01 Search information and chose project topic

02/16/01 --- 02/25/01 Choose data mining tools, prepare data and proposal
02/26/01 --- 03/20/01 Fully understanding the tool and implement the application
03/21/01 --- 04/01/01 Test the output data
04/02/01 --- 04/14/01 Analyze the result
04/15/01 --- 04/30/01 Prepare the project report
05/01/01 --- 05/07/01 Prepare presentation materials

Getting A Loan Approval

Enviado por

Dados do documento

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Getting A Loan Approval

Enviado por

Direitos autorais:

Formatos disponíveis

TOWSON UNIVERSITY

COSC757 SPRING 2001

Instructor: Dr. Ramesh K. Karne

COSC757 Team Project Paper Page 2 of 80 Spring 2001

COSC757 Team Project Paper Page 3 of 80 Spring 2001

1. Data Mining Tool

COSC757 Team Project Paper Page 4 of 80 Spring 2001

2. Application Example Chosen

COSC757 Team Project Paper Page 5 of 80 Spring 2001

COSC757 Team Project Paper Page 6 of 80 Spring 2001

COSC757 Team Project Paper Page 7 of 80 Spring 2001

COSC757 Team Project Paper Page 8 of 80 Spring 2001

COSC757 Team Project Paper Page 9 of 80 Spring 2001

7. Additional Mining Result

COSC757 Team Project Paper Page 10 of 80 Spring 2001

16, 78560, none, no, 10000

COSC757 Team Project Paper Page 11 of 80 Spring 2001

IF history=Good AND income =income1, Then recommended=recommend3

8. Mining Technique Used in the Tool

COSC757 Team Project Paper Page 12 of 80 Spring 2001

9. Mining Technique Details and Information

10. Critical Evaluation of the Mining Technique Used

COSC757 Team Project Paper Page 13 of 80 Spring 2001

COSC757 Team Project Paper Page 14 of 80 Spring 2001

12. Visualization Technique Details and Information

COSC757 Team Project Paper Page 15 of 80 Spring 2001

Decided by House Ow ner

COSC757 Team Project Paper Page 16 of 80 Spring 2001

13. Setting Up Environment for the Tool -- Weka

COSC757 Team Project Paper Page 18 of 80 Spring 2001

Command must be one of:

COSC757 Team Project Paper Page 19 of 80 Spring 2001

Size of the tree : 8

=== Error on training data ===

=== Confusion Matrix ===

=== Stratified cross-validation ===

COSC757 Team Project Paper Page 20 of 80 Spring 2001

=== Confusion Matrix ===

COSC757 Team Project Paper Page 21 of 80 Spring 2001

@attribute age real

COSC757 Team Project Paper Page 22 of 80 Spring 2001

COSC757 Team Project Paper Page 23 of 80 Spring 2001

COSC757 Team Project Paper Page 24 of 80 Spring 2001

@attribute age real

COSC757 Team Project Paper Page 25 of 80 Spring 2001

COSC757 Team Project Paper Page 26 of 80 Spring 2001

COSC757 Team Project Paper Page 27 of 80 Spring 2001

COSC757 Team Project Paper Page 28 of 80 Spring 2001

COSC757 Team Project Paper Page 29 of 80 Spring 2001

COSC757 Team Project Paper Page 30 of 80 Spring 2001

COSC757 Team Project Paper Page 31 of 80 Spring 2001

COSC757 Team Project Paper Page 32 of 80 Spring 2001

COSC757 Team Project Paper Page 33 of 80 Spring 2001

COSC757 Team Project Paper Page 34 of 80 Spring 2001

COSC757 Team Project Paper Page 35 of 80 Spring 2001

COSC757 Team Project Paper Page 36 of 80 Spring 2001

COSC757 Team Project Paper Page 37 of 80 Spring 2001

COSC757 Team Project Paper Page 38 of 80 Spring 2001

COSC757 Team Project Paper Page 39 of 80 Spring 2001

COSC757 Team Project Paper Page 40 of 80 Spring 2001

COSC757 Team Project Paper Page 41 of 80 Spring 2001

COSC757 Team Project Paper Page 42 of 80 Spring 2001

COSC757 Team Project Paper Page 43 of 80 Spring 2001