Data Mining Schemas and Clustering Techniques

Data Mining
Assignment No. 1
Title :
Aim : For an organization of your choice, choose a set of business processes. Design star /
snowflake schemas for analyzing these processes. Create a fact constellation schema by
combining them. Extract data from different data sources, apply suitable transformations and
load into destination tables using an ETL tool. For Example: Business Origination: Sales,
Order, Marketing Process.
Theory :
Multidimensional schema:
Multidimensional schema is especially designed to model data warehouse
systems. The schemas are designed to address the unique needs of very large databases
designed for the analytical purpose (OLAP).
Following are 3 chief types of multidimensional :
1.Star schema-
The star schema is the simplest type of Data Warehouse schema. It is known as star
schema as its structure resembles a star. In the Star schema, the center of the star can have
one fact tables and numbers of associated dimension tables. It is also known as Star Join
Schema and is optimized for querying large data sets.
2.Snowflake schema-
A Snowflake Schema is an extension of a Star Schema, and it adds additional
dimensions . It is called snowflake because its diagram resembles a Snowflake.The
dimension tables are normalized which splits data into additional tables.
3.Constellation schema-
A Galaxy Schema contains two fact table that shares dimension tables. It is also
called Fact Constellation Schema. The schema is viewed as a collection of stars hence the
name Galaxy Schema.
Program:The programs are implemented in DMQL(Data Mining Query Language)

1.Program for Star schema-
define cube sales star [time, item, branch, location]:
dollars sold = sum(sales in dollars), units sold = count(*)
define dimension time as (time key, day, day of week, month, quarter, year)
define dimension item as (item key, item name, brand, type, supplier type)
define dimension branch as (branch key, branch name, branch type)
define dimension location as (location key, street, city, province or state, country
2.Program for Snowflake schema-

define cube sales snowflake [time, item, branch, location]:

define dimension item as (item key, item name, brand, type, supplier (supplier key, supplier
type))
define dimension location as (location key, street, city (city key, city, province or state,
country)
3.Program for constellation schema-

define cube sales [time, item, branch, location]:
define dimension item as (item key, item name, brand, type, supplier type)
define dimension location as (location key, street, city, province or state,country)
define cube shipping [time, item, shipper, from location, to location]:
dollars cost = sum(cost in dollars), units shipped = count(*)
define dimension time as time in cube sales

define dimension item as item in cube sales
define dimension shipper as (shipper key, shipper name, location as location in cube sales,
shipper type)
define dimension from location as location in cube sales
define dimension to location as location in cube sales
Conclusion :
Thus,we have performed this assignment of designing various schemas and
extraction information using ETL tool.
Assignment No.2
Title : Clustering
Aim : Consider a suitable dataset. For clustering of data instances in different groups, apply
different clustering techniques (minimum 2). Visualize the clusters using suitable tool.
Theory :
Cluster analysis or clustering is the task of grouping a set of objects in such a way
that objects in the same group (called a cluster) are more similar (in some sense) to each
other than to those in other groups (clusters). It is a main task of exploratory data mining,
and a common technique for statistical data analysis, used in many fields, including machine
learning, pattern recognition, image analysis, information retrieval, bioinformatics, data
compression, and computer graphics.
K-means clustering:-
1. Clusters the data into k groups where k is predefined.
2. Select k points at random as cluster centers.
3. Assign objects to their closest cluster center according to the Euclidean distance
function.
4. Calculate the centroid or mean of all objects in each cluster.
5. Repeat steps 2, 3 and 4 until the same points are assigned to each cluster in
consecutive rounds.
Program:-
#importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.cluster import KMeans
#importing the Iris dataset with pandas
dataset = pd.read_csv('../input/Iris.csv')
x = dataset.iloc[:, [1, 2, 3, 4]].values
#Applying kmeans to the dataset / Creating the kmeans classifier
kmeans = KMeans(n_clusters = 3, init = 'k-means++', max_iter = 300, n_init = 10,
random_state = 0)
y_kmeans = kmeans.fit_predict(x)
#Visualising the clusters
plt.scatter(x[y_kmeans == 0, 0], x[y_kmeans == 0, 1], s = 100, c = 'green', label =
'Iris-versicolour')
plt.scatter(x[y_kmeans == 1, 0], x[y_kmeans == 1, 1], s = 100, c = 'red', label = 'Iris-
setosa')
plt.scatter(x[y_kmeans == 2, 0], x[y_kmeans == 2, 1], s = 100, c = 'blue', label = 'Iris-
virginica')
#Plotting the centroids of the clusters
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:,1], s = 100, c =
'yellow', label = 'Centroids')
plt.legend()
Hierarchical clustering:-
Let X = {x1, x2, x3, ..., xn} be the set of data points.
1) Begin with the disjoint clustering having level L(0) = 0 and sequence number m =
0.
2) Find the least distance pair of clusters in the current clustering, say pair (r), (s),
according to d[(r),(s)] = min d[(i),(j)] where the minimum is over all pairs of clusters
in the current clustering.
3) Increment the sequence number: m = m +1.Merge clusters (r) and (s) into a single
cluster to form the next clustering m. Set the level of this clustering to L(m) = d[(r),
(s)].
4) Update the distance matrix, D, by deleting the rows and columns corresponding to
clusters (r) and (s) and adding a row and column corresponding to the newly formed
cluster. The distance between the new cluster, denoted (r,s) and old cluster(k) is
defined in this way: d[(k), (r,s)] = min (d[(k),(r)], d[(k),(s)]).
5) If all the data points are in one cluster then stop, else repeat from step 2).
Program:-
import sys
import math
import os
import heapq
import itertools
class Hierarchical_Clustering:
def __init__(self, ipt_data, ipt_k):
self.input_file_name = ipt_data
self.k = ipt_k
self.dataset = None
self.dataset_size = 0
self.dimension = 0
self.heap = []
self.clusters = []
self.gold_standard = {}
def initialize(self):
if not os.path.isfile(self.input_file_name):
self.quit("Input file doesn't exist or it's not a file")
self.dataset, self.clusters, self.gold_standard =
self.load_data(self.input_file_name)
self.dataset_size = len(self.dataset)
if self.dataset_size == 0:
self.quit("Input file doesn't include any data")
if self.k == 0:
self.quit("k = 0, no cluster will be generated")
if self.k > self.dataset_size:
self.quit("k is larger than the number of existing clusters")
self.dimension = len(self.dataset[0]["data"])
if self.dimension == 0:
self.quit("dimension for dataset cannot be zero")
def euclidean_distance(self, data_point_one, data_point_two):
size = len(data_point_one)
result = 0.0
for i in range(size):
f1 = float(data_point_one[i]) # feature for data one
f2 = float(data_point_two[i]) # feature for data two
tmp = f1 - f2
result += pow(tmp, 2)
result = math.sqrt(result)
return result
def compute_pairwise_distance(self, dataset):
result = []
dataset_size = len(dataset)
for i in range(dataset_size-1):# ignore last i
for j in range(i+1, dataset_size): # ignore duplication
dist = self.euclidean_distance(dataset[i]["data"], dataset[j]["data"])
result.append( (dist, [dist, [[i], [j]]]) )
return result
def build_priority_queue(self, distance_list):

heapq.heapify(distance_list)
self.heap = distance_list
return self.heap
def compute_centroid_two_clusters(self, current_clusters,
data_points_index):
size = len(data_points_index)
dim = self.dimension
centroid = [0.0]*dim
for index in data_points_index:
dim_data = current_clusters[str(index)]["centroid"]
for i in range(dim):
centroid[i] += float(dim_data[i])
centroid[i] /= size
return centroid
def compute_centroid(self, dataset, data_points_index):
size = len(data_points_index)
dim = self.dimension
centroid = [0.0]*dim
for idx in data_points_index:
dim_data = dataset[idx]["data"]
centroid[i] += float(dim_data[i])
centroid[i] /= size
return centroid
def hierarchical_clustering(self):
dataset = self.dataset
current_clusters = self.clusters
old_clusters = []
heap = hc.compute_pairwise_distance(dataset)
heap = hc.build_priority_queue(heap)
while len(current_clusters) > self.k:
dist, min_item = heapq.heappop(heap)
pair_data = min_item[1]
if not self.valid_heap_node(min_item, old_clusters):
continue
new_cluster = {}
new_cluster_elements = sum(pair_data, [])
new_cluster_cendroid = self.compute_centroid(dataset,
new_cluster_elements)
new_cluster_elements.sort()
new_cluster.setdefault("centroid", new_cluster_cendroid)
new_cluster.setdefault("elements", new_cluster_elements)
for pair_item in pair_data:
old_clusters.append(pair_item)
del current_clusters[str(pair_item)]
self.add_heap_entry(heap, new_cluster, current_clusters)
current_clusters[str(new_cluster_elements)] = new_cluster
current_clusters.sort()
return current_clusters
def valid_heap_node(self, heap_node, old_clusters):

pair_dist = heap_node[0]
pair_data = heap_node[1]
for old_cluster in old_clusters:
if old_cluster in pair_data:
return False
return True
if __name__ == '__main__':
ipt_data = sys.argv[1] # input data, e.g. iris.dat
ipt_k = int(sys.argv[2])# number of clusters, e.g. 3
hc = Hierarchical_Clustering(ipt_data, ipt_k)
hc.initialize()
current_clusters = hc.hierarchical_clustering()
precision, recall = hc.evaluate(current_clusters)
hc.display(current_clusters, precision, recall)
Conclusion :Hence , we implemented hierarchical clustering and k-means clustering

algorithms in python.
Assignment No.3
Title : Apriori algorithm
Aim : Apply a-priori algorithm to find frequently occurring items from given data and
generate strong association rules using support and confidence thresholds. For Example:
Market Basket Analysis
Theory :
With the quick growth in e-commerce applications, there is an accumulation vast
quantity of data in months not in years. Data Mining, also known as Knowledge Discovery in
Databases(KDD), to find anomalies, correlations, patterns, and trends to predict outcomes.
Apriori algorithm is a classical algorithm in data mining. It is used for mining frequent
itemsets and relevant association rules. It is devised to operate on a database containing a
lot of transactions, for instance, items brought by customers in a store.
Algorithm:-
Program:-
import sys
from itertools import chain, combinations
from collections import defaultdict
from optparse import OptionParser
def subsets(arr):
return chain(*[combinations(arr, i + 1) for i, a in enumerate(arr)])
def returnItemsWithMinSupport(itemSet, transactionList, minSupport, freqSet):
_itemSet = set()
localSet = defaultdict(int)
for item in itemSet:
for transaction in transactionList:
if item.issubset(transaction):
freqSet[item] += 1
localSet[item] += 1
for item, count in localSet.items():
support = float(count)/len(transactionList)q
if support >= minSupport:
_itemSet.add(item)
return _itemSet
def joinSet(itemSet, length):
return set([i.union(j) for i in itemSet for j in itemSet if len(i.union(j)) == length])
def getItemSetTransactionList(data_iterator):
transactionList = list()
itemSet = set()
for record in data_iterator:
transaction = frozenset(record)
transactionList.append(transaction)
for item in transaction:
itemSet.add(frozenset([item])) # Generate 1-itemSets
return itemSet, transactionList
def runApriori(data_iter, minSupport, minConfidence):
itemSet, transactionList = getItemSetTransactionList(data_iter)
freqSet = defaultdict(int)
largeSet = dict()
while(currentLSet != set([])):
largeSet[k-1] = currentLSet
currentLSet = joinSet(currentLSet, k)
currentCSet = returnItemsWithMinSupport(currentLSet,
transactionList,
minSupport,
freqSet)
currentLSet = currentCSet
k=k+1
def getSupport(item):
"""local function which Returns the support of an item"""
return float(freqSet[item])/len(transactionList)
toRetItems = []
for key, value in largeSet.items():
toRetItems.extend([(tuple(item), getSupport(item))
for item in value])
toRetRules = []
for key, value in largeSet.items()[1:]:
for item in value:
_subsets = map(frozenset, [x for x in subsets(item)])
for element in _subsets:
remain = item.difference(element)
if len(remain) > 0:
confidence = getSupport(item)/getSupport(element)
if confidence >= minConfidence:
toRetRules.append(((tuple(element), tuple(remain)),
confidence))
return toRetItems, toRetRules
def printResults(items, rules):
for item, support in sorted(items, key=lambda (item, support):
print "item: %s , %.3f" % (str(item), support)
print "\n------------------------ RULES:"
for rule, confidence in sorted(rules, key=lambda (rule, confidence):
pre, post = rule
print "Rule: %s ==> %s , %.3f" % (str(pre), str(post), confidence)
def dataFromFile(fname):
"""Function which reads from the file and yields a generator"""
file_iter = open(fname, 'rU')
for line in file_iter:
line = line.strip().rstrip(',')
record = frozenset(line.split(','))
yield record
if __name__ == "__main__":
optparser = OptionParser()
optparser.add_option('-s', '--minSupport',
dest='minS',
help='minimum support value',
default=0.15,
type='float')
optparser.add_option('-c', '--minConfidence',
dest='minC',
help='minimum confidence value',
default=0.6,
type='float')
(options, args) = optparser.parse_args()
inFile = None
if options.input is None:
inFile = sys.stdin
elif options.input is not None:
inFile = dataFromFile(options.input)
else:
print 'No dataset filename specified, system with exit\n'
sys.exit('System will exit')
minSupport = options.minS
minConfidence = options.minC
items, rules = runApriori(inFile, minSupport, minConfidence;
printResults(items, rules)
Conclusion :Thus , we implemented apriori algorithm using python for market basket
analysis
Assignment No.4
Aim : Consider a suitable text dataset. Remove stop words, apply stemming and feature
selection techniques to represent documents as vectors. Classify documents and evaluate
precision, recall.
Theory :
● Feature selection-
Feature selection is the process of selecting a subset of the terms occurring
in the training set and using only this subset as features in text classification. Feature
selection serves two main purposes. First, it makes training and applying a classifier
more efficient by decreasing the size of the effective vocabulary. This is of particular
importance for classifiers that, unlike NB, are expensive to train. Second, feature
selection often increases classification accuracy by eliminating noise features. A
noise feature is one that, when added to the document representation, increases the
classification error on new data. Suppose a rare term, say arachnocentric, has no
information about a class, say China, but all instances of arachnocentric happen to
occur in China documents in our training set. Then the learning method might
produce a classifier that misassigns test documents containing arachnocentric to
China. Such an incorrect generalization from an accidental property of the training
set is called overfitting .
● Stemming-
For grammatical reasons, documents are going to use different forms of a
word, such as organize, organizes, and organizing. Additionally, there are families of
derivationally related words with similar meanings, such as democracy, democratic,
and democratization. In many situations, it seems as if it would be useful for a search
for one of these words to return documents that contain another word in the set.
The goal of both stemming and lemmatization is to reduce inflectional forms
and sometimes derivationally related forms of a word to a common base form. For
instance:
Program:
from sklearn.feature_extraction.text import CountVectorizer

# list of text documents
text = ["The quick brown fox jumped over the lazy dog."]
# create the transform
vectorizer = CountVectorizer()
# tokenize and build vocab
vectorizer.fit(text)
# summarize
print(vectorizer.vocabulary_)
# encode document
vector = vectorizer.transform(text)
# summarize encoded vector
print(vector.shape)
print(type(vector))
print(vector.toarray())

text = ["The quick brown fox jumped over the lazy dog.",
"The dog.",
"The fox"]
vectorizer = TfidfVectorizer()
# tokenize and build vocab
vectorizer.fit(text)
# summarize
print(vectorizer.vocabulary_)
print(vectorizer.idf_)
# encode document
vector = vectorizer.transform([text[0]])
print(vector.shape)
from sklearn.feature_extraction.text import HashingVectorizer

text = ["The quick brown fox jumped over the lazy dog."]
vectorizer = HashingVectorizer(n_features=20)
# encode document
vector = vectorizer.transform(text)
print(vector.shape)
Conclusion :
Thus,we have performed this assignment using various stemming and feature
selection techniques.
Distributed Systems
Assignment No. 5
Title : Ricart-Agarwala’s algorithm
Aim :
Theory : The Ricart-Agarwala Algorithm is an algorithm for mutual exclusion on a
distributed system. This algorithm is an extension and optimization of Lamport's Distributed
Mutual Exclusion Algorithm, by removing the need for a c k {\displaystyle ack} messages. It
was developed by Glenn Ricart and Ashok Agrawala.
Program :
public class RicartAgrawala {
private static boolean requestingCS;
private static Timestamp myTimeStamp;
MessageSender msgSender;
Timestamp guestTimestamp;
static int permitreplies;
static int cscount;
static int noofrequests;
static int id;
public static Timestamp getMyTimeStamp() {
return myTimeStamp;}
public static void setMyTimeStamp(Timestamp myTimeStamp) {
RicartAgrawala.myTimeStamp = myTimeStamp;}
public synchronized void determineCriticalSectionEntry(Socket s,List message,int
id,int guestid){
this.id=id;
this.guestTimestamp = Timestamp.valueOf((String)message.get(1));
// if I'm requesting for CS,let me compare timestamps to determine if I should
permit the host or defer the reply
if(RicartAgrawala.isRequestingCS()){
msgSender = new MessageSender();
if(myTimeStamp.compareTo(guestTimestamp)>0){ //guest timestamp
has higher priority msgSender.sendMessage(s, id,
Message.PERMIT);
// System.out.println("--MyTimeStamp : "+myTimeStamp);
// System.out.println("--Requesting TS : "+guestTimestamp);
// System.out.println("-- PERMIT sent*");}
if(myTimeStamp.compareTo(guestTimestamp)<0){ //my timestamp
has higher priority
DeferredRequests.add(guestid, s);
System.out.println("<<<<< Request deferred for "+guestid);
if(guestid<id){
msgSender.sendMessage(s, id, Message.PERMIT);
System.out.println("######## PERMIT sent*");
}else{
DeferredRequests.add(guestid, s);
}
}
}else{
msgSender = new MessageSender();
msgSender.sendMessage(s, id, Message.PERMIT);
}
}
if(true){
System.out.println("******** Entered critical section!!!!!! ********* "+ ++cscount);
try {
Thread.sleep(20);
} catch (InterruptedException e) {
e.printStackTrace(); } }
writeToFile();
exitCriticalSection();}
public static void exitCriticalSection(){
CriticalSectionRequests csr = new CriticalSectionRequests();
RicartAgrawala.setRequestingCS(false);
TimeStamp.setInstancetoNull();
sendReplyToDeferredRequests();
Long timeElapsed=TimeStamp.getEndTime()-TimeStamp.getStartTime();
System.out.println("Time exited "+TimeStamp.getTime());
++noofrequests;
if(noofrequests<20){
csr.sendCSRequests(id);
}
}
public void writeToFile() {
try {
File file = new File("/home/rohit/Desktop/ricartoutput.txt");
if (!file.exists()) {
file.createNewFile(); }
FileWriter fw = new FileWriter(file.getAbsoluteFile(), true);
BufferedWriter bw = new BufferedWriter(fw);
synchronized (this) {
bw.write("Node no. "+id+ " Entered critical section..." + "\n");
}
bw.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
Conclusion : Hence, we design and develop a basic prototype distributed system (e.g. a
DFS).
Assignment No.6
Title : RPC / RMI mechanism
Aim : Design and implement client server application using RPC/ RMI mechanism (Java)
Theory :
RPC and RMI are the mechanisms which enable a client to invoke the procedure or method
from the server through establishing communication between client and server. The common
difference between RPC and RMI is that RPC only supports procedural programming
whereas RMI supports object-oriented programming.
Program :
Server
import java.rmi.*;
public interface ChatInterface extends Remote{
public String getName() throws RemoteException;
public void send(String msg) throws RemoteException;
public void setClient(ChatInterface c)throws RemoteException;
public ChatInterface getClient() throws RemoteException; }
import java.rmi.*;
import java.rmi.server.*;
public class Chat extends UnicastRemoteObject implements ChatInterface {
public String name;
public ChatInterface client=null;
public Chat(String n) throws RemoteException {
this.name=n; }
public String getName() throws RemoteException {
return this.name;}
public void setClient(ChatInterface c){
client=c;}
public ChatInterface getClient(){
return client;}
public void send(String s) throws RemoteException{
System.out.println(s);} }
import java.rmi.*;
import java.util.*;
public class ChatServer {
public static void main (String[] argv) {
try {
System.setSecurityManager(new RMISecurityManager());
Scanner s=new Scanner(System.in);
System.out.println("Enter Your name and press Enter:");
String name=s.nextLine().trim();
Chat server = new Chat(name);
Naming.rebind("rmi://localhost/ABC", server);
System.out.println("[System] Chat Remote Object is ready:");
while(true){
String msg=s.nextLine().trim();
if (server.getClient()!=null){
ChatInterface client=server.getClient();
msg="["+server.getName()+"] "+msg;
client.send(msg); } }
}catch (Exception e) {
System.out.println("[System] Server failed: " + e); } } }
Client
import java.rmi.*;
import java.util.*;
public class ChatClient {
public static void main (String[] argv) {
try {
System.setSecurityManager(new RMISecurityManager());
Scanner s=new Scanner(System.in);
System.out.println("Enter Your name and press Enter:");
String name=s.nextLine().trim();
ChatInterface client = new Chat(name);
ChatInterface server =
(ChatInterface)Naming.lookup("rmi://localhost/ABC");
String msg="["+client.getName()+"] got connected";
server.send(msg);
System.out.println("[System] Chat Remote ready:");
server.setClient(client);
while(true){
msg=s.nextLine().trim();
msg="["+client.getName()+"] "+msg;
server.send(msg); }
}catch (Exception e) {
System.out.println("[System] Server failed: " + e); } }
Conclusion : Therefore, designed and implement client server application using RPC/ RMI
mechanism (Java)
Assignment No.7
Title :
Aim : Design and implement a clock synchronization algorithm for prototype DS
Theory :
Lamport clocks are a simple technique used for determining the order of events in a
distributed system. First proposed by Leslie Lamport in a paper available here, a Lamport
clock maintains order of operations by incrementing a counter contained in the events. By
simply adding a counter value to events as they are received and incrementing this value
based on the last seen value, Lamport clocks provide a simple way to determine order of
events. Lamport clocks provide a partial ordering of events – specifically “happened-before”
ordering.
Program:
import java.util.*;
public class Lamport {

class Message {
int timestamp;
int from;
int to;
public Message(int timestamp, int from, int to) {
this.timestamp = timestamp;
this.from = from;
this.to = to;
}
public int getTimestamp() {
return timestamp;
}
public int getFrom() {
return from;
}
public int getTo() {
return to;
}
}
class MessageMonitor {
Queue< Message> message_queue;
public MessageMonitor() {
message_queue = new LinkedList();
}
Message getMessage(int to) {
System.out.println(message_queue.size());
for (Message tmp : message_queue) {
System.out.println(tmp.getTo());
if (tmp.getTo() == to) {
System.out.println("Recieved by" + to + " from " + tmp.from);
message_queue.remove(tmp);
return tmp;
}
}
return null;
}
Queue< Message> getInstanse() {
return message_queue;
}
void sendMessage(Message msg) {
System.out.println("Added");
message_queue.add(msg);
}
}
class Process extends Thread {
int noevent;
int timestamp;
int processno;
int incomingtimestamp;
int internalcounter;
MessageMonitor mm;
ArrayList<event> einfo;
Queue< Message> lock;
ArrayList<Integer> ans;
public Process(int a, ArrayList<event> b, int c, MessageMonitor d, Queue<
Message> l) {
this.noevent = a;
this.einfo = b;
this.processno = c;
timestamp = 0;
this.mm = d;
internalcounter = 1;
this.lock = l;
ans = new ArrayList<>();
}
@Override
public void run() {
while (internalcounter <= noevent) {
boolean stop = false;
for (int i = 0; i < einfo.size(); i++) {
if (einfo.get(i).pend == processno && einfo.get(i).end == internalcounter) {
stop = true;
break;
}
}
if (stop) {
Message tmp;
synchronized (lock) {
while (true) {
tmp = mm.getMessage(processno);
if (tmp == null) {
try {
System.out.println("Thread no " + processno + "will wait");
lock.wait();
System.out.println("Thread no " + processno + "woke");
} catch (InterruptedException ex) {
System.out.println(ex.getMessage());
}
} else {
break;
}
}
}
if (tmp.getTimestamp() > timestamp + 1) {
timestamp = tmp.getTimestamp();
} else {
timestamp += 1;
}
} else {
timestamp += 1;
}
System.out.println("Process " + processno + " " + internalcounter + " ->" +
timestamp);
for (int i = 0; i < einfo.size(); i++) {
if (einfo.get(i).pstart == processno && einfo.get(i).start == internalcounter) {
System.out.println("Sending message to Thread no to " + einfo.get(i).pend + "
by " + einfo.get(i).pstart);
synchronized (lock) {
mm.sendMessage(new Message(timestamp + 1, processno,
einfo.get(i).pend));
lock.notifyAll();
}
}
}
internalcounter++;
ans.add(timestamp);
}
printTimestamp();
}
void printTimestamp() {
System.out.println("Thread no : " + processno + " Completed" + " [ TimeStamp " +
Arrays.toString(ans.toArray()) + " ]");
}
}
class event {
int pstart;
int pend;
int start;
int end;
public event(int pstart, int start, int pend, int end) {
this.pstart = pstart;
this.pend = pend;
this.start = start;
this.end = end;
}
}
int nodes;
int eevent;
ArrayList<Integer> size;
ArrayList<event> einfo;
int[][] store;
public static void main(String[] args) {
new Lamport().runner();
}
void runner() {
Scanner sc = new Scanner(System.in);
System.out.println("Enter number nodes : ");
nodes = sc.nextInt();
size = new ArrayList<>();
store = new int[10][10];
System.out.println("Enter each node size : ");
for (int i = 0; i < nodes; i++) {
size.add(sc.nextInt());
}
System.out.println("Enter number of external event : ");
eevent = sc.nextInt();
einfo = new ArrayList<>();
for (int i = 0; i < eevent; i++) {
System.out.print("P(a) event || P(b) event : ");
einfo.add(new event(sc.nextInt(), sc.nextInt(), sc.nextInt(), sc.nextInt()));
}
processthread();
}
void processthread() {
MessageMonitor mm = new MessageMonitor();
Queue< Message> ins = mm.getInstanse();
for (int x = 1; x <= nodes; x++) {
Process tmp = new Process(size.get(x - 1), einfo, x, mm, ins);
Thread a = new Thread(tmp);
a.start();
}
}
void process() {
for (int x = 0; x < nodes; x++) {
for (int y = 0; y < size.get(x); y++) {
if (y == 0) {
store[x][y] = 1;
} else {
store[x][y] = store[x][y - 1] + 1;
}
}
}
System.out.println("");
//printall();
solve();
printall();
}
void printall() {
for (int x = 0; x < nodes; x++) {
System.out.print("Node " + x + " : ");
for (int y = 0; y < size.get(x); y++) {

System.out.print(store[x][y] + " ");
}
System.out.println("");
}
}
void solve() {
for (int count = 0; count < einfo.size(); count++) {
int endbefore = einfo.get(count).end - 2;
if (einfo.get(count).end - 2 < 0) {
endbefore = 0;
}
//System.out.println(store[einfo.get(count).pstart - 1][einfo.get(count).start - 1] + 1 + "
" + (store[einfo.get(count).pend - 1][endbefore] + 1));
if (store[einfo.get(count).pstart - 1][einfo.get(count).start - 1] + 1 >=
store[einfo.get(count).pend - 1][endbefore] + 1) {
store[einfo.get(count).pend - 1][einfo.get(count).end - 1] =
store[einfo.get(count).pstart - 1][einfo.get(count).start - 1] + 1;
for (int i = einfo.get(count).end; i < size.get(einfo.get(count).pend - 1); i++) {
store[einfo.get(count).pend - 1][i] = store[einfo.get(count).pend - 1][i - 1] + 1;
}
}
}
}
}
Conclusion : Therefore, we designed and implement a clock synchronization algorithm for
prototype DS
Assignment No.8
Title : Bully Algorithm
Aim : Implement Ring or Bully election algorithm for prototype DS.
Theory :
In distributed computing, the bully algorithm is a method for dynamically electing a
coordinator or leader from a group of distributed computer processes. The process with the
highest process ID number from amongst the non-failed processes is selected as the
coordinator.
Algorithm
When a process P recovers from failure, or the failure detector indicates that the current
coordinator has failed, P performs the following actions:
1. If P has the highest process id, it sends a Victory message to all other processes and
becomes the new Coordinator. Otherwise, P broadcasts an Election message to all
other processes with higher process IDs than itself.
2. If P receives no Answer after sending an Election message, then it broadcasts a
Victory message to all other processes and becomes the Coordinator.
3. If P receives an Answer from a process with a higher ID, it sends no further
messages for this election and waits for a Victory message. (If there is no Victory
message after a period of time, it restarts the process at the beginning.)
4. If P receives an Election message from another process with a lower ID it sends an
Answer message back and starts the election process at the beginning, by sending
an Election message to higher-numbered processes.
5. If P receives a Coordinator message, it treats the sender as the coordinator.
Program :
class Anele{
static int n;
static int pro[] = new int[100];
static int sta[] = new int[100];
static int co;
public static void main(String args[])throws IOException{
System.out.println("Enter the number of process");
Scanner in = new Scanner(System.in);
n = in.nextInt();
int i,j,k,l,m;
for(i=0;i<n;i++) {
System.out.println("For process "+(i+1)+":");
System.out.println("Status:");
sta[i]=in.nextInt();
System.out.println("Priority");
pro[i] = in.nextInt(); }
System.out.println("Which process will initiate election?");
int ele = in.nextInt();
elect(ele);
System.out.println("Final coordinator is "+co); }
static void elect(int ele) {
ele = ele-1;
co = ele+1;
for(int i=0;i<n;i++) {
if(pro[ele]<pro[i]) {
System.out.println("Elect msg is sent from"+(ele+1)+" to "+(i+1));
if(sta[i]==1)
elect(i+1);} } }
/* output
Enter the number of process 7
For process 1:
Status: 1
Priority 1
For process 2:
Status: 1
Priority 2
For process 3:
Status: 1
Priority 3
For process 4:
Status: 1
Priority 4
For process 5:
Status: 1
Priority 5
For process 6:
Status: 1
Priority 6
For process 7:
Status: 0
Priority 7
Which process will initiate election? 4
Election message is sent from 4 to 5
Final coordinator is 6
*/
Conclusion : Therefore, we implemented Ring or Bully election algorithm for prototype

DS.

Data Mining Schemas and Clustering Techniques

Enviado por

Dados do documento

Descrição original:

Título original

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

Data Mining Schemas and Clustering Techniques

Enviado por

Direitos autorais:

Formatos disponíveis

Data Mining

Program:The programs are implemented in DMQL(Data Mining Query Language)

define cube sales star [time, item, branch, location]:

dollars sold = sum(sales in dollars), units sold = count(*)

2.Program for Snowflake schema-

dollars sold = sum(sales in dollars), units sold = count(*)

3.Program for constellation schema-

dollars sold = sum(sales in dollars), units sold = count(*)

dollars cost = sum(cost in dollars), units shipped = count(*)

define dimension time as time in cube sales

def build_priority_queue(self, distance_list):

def valid_heap_node(self, heap_node, old_clusters):

Conclusion :Hence , we implemented hierarchical clustering and k-means clustering

from sklearn.feature_extraction.text import CountVectorizer

# list of text documents

from sklearn.feature_extraction.text import HashingVectorizer

public class Lamport {

for (int y = 0; y < size.get(x); y++) {

Conclusion : Therefore, we implemented Ring or Bully election algorithm for prototype

Você também pode gostar