Você está na página 1de 28

Project for Technological Innovations in

Management

Group 08
Lloyd Fernandes 2017026
Nikhil K. G. Dessai 2017033
Tejas Bhat 2017060
Akshay Nadkarni 2017195
Contents
Blockchain ................................................................................................................................................ 2
Business process on blockchain ......................................................................................................2
Smart contract Possibilities: ...........................................................................................................3
Codes from Hyperledger .................................................................................................................4
Participants: .......................................................................................................................................... 4
Sales Agreement ................................................................................................................................... 4
Land Title (Before): ............................................................................................................................... 5
Land Title (After): .................................................................................................................................. 5
Transaction............................................................................................................................................ 6

AI and Machine learning .................................................................................................................... 7


Data preparation ................................................................................................................................... 9
Important predictors............................................................................................................................. 9
Logistic Regression ....................................................................................................................... 10
Predictive model ................................................................................................................................. 10
Web hosting of evaluated model ........................................................................................................ 14
Business Implications .......................................................................................................................... 15
Classification Tree ........................................................................................................................ 16
Predictive model ................................................................................................................................. 16
Web hosting of evaluated model ........................................................................................................ 19
Business context: ................................................................................................................................ 20
K-NN Algorithm ............................................................................................................................ 21
Predictive model ................................................................................................................................. 21
Web hosting of evaluated model ........................................................................................................ 25
Business implications .......................................................................................................................... 26

Conclusion .............................................................................................................................................. 27

1
Blockchain
Business process on blockchain

2
Smart contract Possibilities:

Type of Smart Triggers Steps in Smart


Contract Contract
1. Digital signature Enter of Land  Retrieve
Verification credentials Encrypted details
 Use private key to
verify details

2. Upload land for sale Seller enters his  Verification of


private key to set the ownership using
land up for sale digital key
 Change land
status for sale
 Notify the
Registrar
 Notify relevant
sellers
 Upload the
relevant
documents

3. Buyer bids for the land buyer bids for the  Buyer request is
posted price or above sent to the seller
 Buyer information
is sent to the
seller
 Registrar receives
notification

3
Codes from Hyperledger
Participants:

Sales Agreement

4
Land Title (Before):

Land Title (After):

5
Transaction

6
AI and Machine learning

On reviewing the predictor variables based on the qualitative definition we can guess the following
variables are expected to have a greater role in credit decision

VARIABLES Correlation WRT Response Reasoning


CHK_ACCT It has a positive correlation except for the This can be attributed to the fact that a greater
missing/ambiguous value bank balance gives a better credit
DURATION Negative correlation Longer the duration of credit the higher the
chance of bad credit.
HISTORY Negative correlation If you take a look at the coding done, its evident
that a higher score relates to a higher risk
profile and that would attribute to a bad credit.
NEW_CAR N/A N/A
USED_CAR N/A N/A
FURNITURE N/A N/A
RADIO/TV N/A N/A
EDUCATION N/A According to the group based on the level of
education and the institution (which is not
mentioned) could have an effect on credit
because a higher education from a reputed
institute enhances earning potential
RETRAINING N/A N/A
AMOUNT Negative Regression As the amount of credit taken increases the
probability of defaulting rises
SAV_ACCT Positive correlation The reasoning is same as CHK_ACCT
EMPLOYMENT Negative correlation We believe a positive correlation should exists.
Negative correlation could be attributed to the
fact that unemployed people aren’t given loans
and hence are perceived as good creditors as
per the data
INSTALL_RATE Negative Regression Higher the instalment rate in comparison in
disposal income the higher the probability of
defaulting
MALE_DIV Negative correlation This can be attributed to the sentiment that a
divorced male is likely to pay alimony, in
addition depression and other factors could
also default to defaulting
MALE_SINGLE N/A N/A
MALE_MAR_WID Positive correlation This sentiment is the opposite as that observed
in that of MALE_DIV.
CO-APPLICANT Positive correlation A co-applicant provides added surety, and
chances of defaulting reduces
GUARANTOR Positive correlation A guarantor makes defaulting less as if the

7
applicant defaults the guarantor is held
responsible.
PRESENT_RESIDENT Positive correlation The longer you are in a present residency is
indicative of financial stability and reduces the
probability of defaulting.
REAL_ESTATE Positive correlation Real estate can be leveraged by mortgage or as
security therefore reducing the probability of
defaulting
PROP_UNKN_NONE Negative Correlation Property is unknown so no guarantee
AGE Positive Correlation Higher is the age , more is the saving
OTHER_INSTALL Positive correlation Other Instalment so more plans of repaying and
experience in the pay
RENT No correlation Rent would depend on the region
OWN_RES Positive correlation People have assets so very less risk
NUM_CREDITS Negative correlation High existing credit means there will be less
provision for the new credit
JOB Negative correlation Manager would take lower amount loan as
compared to worker and would decrease the
credit rating
NUM_DEPENDENTS Negative correlation High number of dependencies means more
money and more risk of payback
TELEPHONE No correlation N/A
FOREIGN Error Ins coding N/A

8
Data preparation

 There was no missing data so data cleaning for that matter was not required.
 Clean data module was used wherever required.
 Since all the data types were not same, data normalization module was used to normalize the
columns of Duration, Age and Amount.
 Models whose accuracy decreased after normalization the normalization modules were removed.
 Filter base feature selection was also used to try and improve the outputs. This affected the output
variable and hence the module was removed from all modules.
 The foreigner variable had errors and had 100% correlation with the output variable response hence
including it or eliminating the variable would not affect the model.
 The company would be required to have accurate data as it could be an important variable in
predicting if the customer will have good credit or bad credit which would help it to reduce the
uncertainty using this ML/AI technique.

Important predictors

 Duration was found to be an important predictor.


 Amount of loan was also important

9
Logistic Regression

Predictive model

Fig: 1.1

The Model for estimating whether the applicant posses a good credit (1) or bad credit (0) was done
using logistic regression on Microsoft azure. Fig 1.1 is a snapshot of the process undertaken, which
included Normalizing of the data set as well as getting descriptive statistics.
The data was divided in the ratio of 60% for training and 40% for testing using the split data function
available and two class logistic regression model with binary outcomes to train predictive model based
of supervised learning using classification.

The ROC graph plots sensitivity vs 1- specificity to give us the predictive probability of the model,
suppose we were to take a random pair of observations, one with Y=1 and one with Y=0, the
observation with Y=1 has a higher predicted probability than the other.

10
Fig: 1.2

In this case the area under the curve is definitely greater than 50% indicative the model has a higher
predictive probability in predicting a positive outcome, the area under the curve needs to be compared
with other methods and the method that yields the greatest area under the curve could be indicative of
being a better model.

11
Fig 1.3

So, after several iteration a threshold of 0.33 was found to have the highest accuracy while keeping
other aspects like precision, true positives and reasonable levels. As it can be seen from fig 1.3. at this
threshold we get a sensitivity rate of 92.7% and a specificity rate of 35.7%.

12
Fig 1.4

Fig 1.4 is a snapshot of the result based on filter-based selection logistic regression

We observe that the positive and negative labels change from 1 and 0 to 0.654 and -1.527 respectively.
For this reason the model was dropped and the original model without the filter based feature selection
was used.

13
Web hosting of evaluated model

Fig 1.5

Fig 1.5 and Fig 1.6 are a snapshot that show the hosting of the evaluated model on azure cloud. The
details are mentioned above.

14
Business Implications

The predictive modeling that allows us to determine the credit rating i.e. good credit or bad credit have
direct business implications.
1. Banks and Non-banking financial institutions while loaning out money to people a credit rating
will help identify good creditors from bad creditors based on the above mentioned 30 variables,
by doing so the rate of defaulting will reduce and the institutions lower their risk and improve
overall performance.
2. From the perspective by understating the factors that contribute to the overall credit rating one
can improve on certain aspects to improve their credit rating so as to enhance their chance of
getting a loan in their time of need

15
Classification Tree

Predictive model

The decision tree model creates a binary classifier using a boosted decision tree algorithm. When
properly configured, boosted decision trees are the easiest methods with which to get top performance
on a wide variety of machine learning tasks. For the decision tree, we have used two class boosted
decision tree function in order to train the data set which is used to find out the credit ratings of the
person based on several factors. A boosted decision tree which the second tree corrects for the errors of
the first tree, the third tree corrects for the errors of the first and second trees, and so forth thus
ensuring the output is fine tune and more precision. Thus it helps to reduce the credit risk.

16
The graph of true positive rate v/s the false positive rate shows scored database entry is more than the
threshold line which signifies the decision made by the bank is correct for the most of the loan
applicants as compared to the other. The output also shows the one-fourth of the outputs are wrongly
chosen as can been see

17
Confusion Matrix and Accuracy when the threshold is 0.13

Confusion Matrix

Confusion Matrix states that for the accuracy and Precision of 0.748 and 0.763 at the threshold of 0.13
wrongly predicted. The accuracy and precision is highest for the threshold of 0.13. Hence about the 75%
of the outputs were correctly predicted based on demographics.

After the prediction results, the experiment can be published as a web service so that you can deploy it
in various applications and call it to obtain class predictions on any new credit ratings.

18
Web hosting of evaluated model

For the decision tree, we have used two class boosted decision tree function in order to train the data
set which is used to find out the credit ratings of the person based on several factors. A boosted decision
tree which the second tree corrects for the errors of the first tree, the third tree corrects for the errors
of the first and second trees, and so forth thus ensuring the output is fine tune and more precision. Thus
it helps to reduce the credit risk.

19
Business context:
When a bank receives a loan application, based on applicant’s profile the bank has to make a decision
regarding whether to accept or reject it. The risks associated with the decision are-

1. If the Applicant is likely to repay the loan, then not approving the loan to person results in loss
of business to the bank
2. If the Applicant is not likely to repay the loan, then approving the loan to the person results in
financial loss to the bank

It may be concluded that the second risk is a greater risk as compared to first as lending money to fraud
party have a higher amount of effect than not giving the credit. This model would greatly help to
evaluate or verify the decision of credit response of the company.

According to this model, 299 loan applicants (Both accept and reject) have been identified correctly
whether they are eligible or not.

According to model, 79 loan applicants are at the risk of non-paying the amount for which loan is
approved and bank should recheck their application.

While the 22 application which has been rejected by the bank has potential to pay to bank thus losing
out some amount of profit.

Thus the bank would be at the risk of the financial loss in future days.

20
K-NN Algorithm
Predictive model

K-means clustering is a type of unsupervised learning, which is used when you have unlabeled data (i.e.,
data without defined categories or groups). The goal of this algorithm is to find groups in the data, with
the number of groups represented by the variable K. The algorithm works iteratively to assign each data
point to one of K groups based on the features that are provided. Data points are clustered based on
feature similarity. The results of the K-means clustering algorithm are:

 The centroids of the K clusters, which can be used to label new data
 Labels for the training data (each data point is assigned to a single cluster)

Since our model requires Binary output the value of K=2

21
The output of this model is shown in the above figure.

We can see that the values in cluster one are approximately 700 and in the cluster zero are 300. From the
initial information known about the dataset we see the that the model is accurate and it can be used to
find a pattern of customer in each cluster and predict if they will default on the loan.

22
The above figure shows the clustering on Normalized and feature selected model. The output of the same
can be seen in the figure below.

23
From this we can see that the allocation to the cluster 0 is approximately 550 and to cluster 1 is 450. This
shows huge deviation from the actual nature of the data hence the filter based feature selection id omitted in
web hosting

The above graph shows the 3D distributions of the two clusters on a 2D graph. The size of the clusters is indicative of
the number of observations in each cluster.
24
Web hosting of evaluated model

The link is shown in the above dashboard screenshot.

25
Business implications
The clustering exercise is done to identify similar observations or group the data into clusters which
have similar characteristics.

The current model has divided the data into two distinct and exhaustive clusters. By studying the
characters of each cluster the company can categorize their perspective clients or loan applicants and
pre-determine if they will have a good score or bad score.

26
Conclusion
 The project exposed us to real life applications of the theoretical aspects learned in class.
 Transparency in land dealings can be accomplished using the Blockchain.
 In the credit score part the bank can reduce financial losses which are associated with giving
loans.

27

Você também pode gostar