Você está na página 1de 5

K Means Clustering

Clustering is use to group together similar Data points, it is an unsupervised technique in


which only input will be given to the classifier.
I do that up to two dimensions by using of scatter command and for more dimensions
Matlab may need some other command. And I use total numbers of 6 iterations, if
clustering done successfully before that it will just repeat the plots, and almost it is.

Code
clear all;
close all;
clc

[D,N,K] = Input_Data()
%Taking Input data from users
%D is dimension 2-D or More-D
%N is total number of Data points
% while K is total Number of cluster
%And defining of K doesn't mean that the K-mean Clustering is supervised clustering
can be supervised only in special cases
%Randomly Generating Data
xn = floor(abs(rand(D,N))*255);%floor is use to remove or ignore float data
u = floor(abs(rand(D,K))*255); %u is means of the data
c = zeros(1,N) %c is the centroid location initially we place it randomly

show_data(xn,K,u,c,1)%calling function the first user data showing/plotting

for ite = 1:6 %i m using total number of iteration 6 if clustering done


%and convergence occur before 6 it will just repeat & plot
%Clusters Assignments
for i = 1:N % 1st loop up to total number of data point N
for j = 1:K %2nd Loop up to total number of clusters K
%The commend Norm is by default is euclidean norm
dist(j) = norm((xn(:,i) - u(:,j)));% calculating distance
%b/w each data point xn and means u
end
[val, c(i)] = min(dist);% Taking value which minimize the distance
%it must be an argument value
end
%when once done with assigning of centroid or means then we start the
%calculation of means of the nearest data points and repeat it until
convergence/repeatition
%Finding of New Clusters
for j = 1:K%loop up yo No of cluster which is usr input
c_sum = zeros(D,1); %c_sum initialize having D rows and one column
n = 0;%initialize
for i = 1:N%loop up to total No Of Data point user data
if(c(i)==j)
c_sum = c_sum + xn(:,i); %using formula of calculating means numerator part
n = n+1; % calculating means denumenator part of uj
end
end
new_val = c_sum/n; %uj
u(:,j) = new_val;
end
show_data(xn,K,u,c,ite+1) %caling function to show the whole data up to 6 iterations
end

show_data(xn,K,u,c,ite+2)%caling showing data funtion to show the final data clusterize


data.
Function
function [] = show_data(xn,K,u,c,ite)
[row,N] = size(xn);%size of the data points
color = 'k';%defining colour black
figure(ite) %shows figures of each iterations
for data = 1:N
if (c(data)==1)
color = 'r'; %defining red color
end
if (c(data)==2)
color = 'g'; % defining green color
end
if (c(data)==3)
color = 'b'; % defining Blue color
end
if (c(data)==4)
%defining yellow color
color = 'y';
end
if (c(data)==5)
color = 'm';
end
if (c(data)==6)
color = 'c';
end

scatter(xn(1,data),xn(2,data),'o',color) %using scatter command it is only for 2D Data


hold on
end

for points = 1:K


if (points==1)
color = 'r';
end
if (points==2)
color = 'g';
end
if (points==3)
color = 'b';
end
if (points==4)
color = 'y';
end
if (points==5)
color = 'm';
end
if (points==6)
color = 'c';
end

scatter(u(1,points),u(2,points),'x',color)
hold on
end
hold off

end
Function
function [D,N,K] = Input_Data()
%Taking Inputs
D = input('Enter Dimension of Data: ')
N = input('Enter Number of Points: ')
K = input('Enter Number of Clusters: ')

end
Plots
User Input and Random Data with random centroids

First Euclidean Distance and Assigning Clusters to the nearest centroid/mean


After 4th iteration convergence start

Você também pode gostar