Você está na página 1de 10

Available ONLINE www.vsrdjournals.

com

VSRD-IJCSIT, Vol. 2 (4), 2012, 285-295

R
RE
ES
SE
EA
AR
RC
CH
H A
AR
RT
TIIC
CLLE
E

Data Centric Knowledge Management System


Using Post-Clustering Technique
1

Asadi Srinivasulu*, 2Ch.D.V. Subba Rao and 3M. Sreedevi

ABSTRACT
The purpose of Data Centric Knowledge Management System (DCKMS) is to centralize knowledge generated
by employees working within and functional areas and to organize that knowledge such that it can be easily
accessed, searched, browsed and navigated. It is a one stop shop for finding solutions for your problems. It
provides a facility for the employees to register themselves as experts as well as search for other experts
incase of any problem/requirement in their project. It is a one stop shop for finding solutions for your problems.
This system design is modularized into various categories. This system has enriched UI so that a novice user did
not feel any operational difficulties. This system mainly concentrated in designing various reports requested by
the users as well as higher with export to excel options. This paper addresses the expectations, organizational
implications, and information processing requirements, of the emerging knowledge management paradigm. A
brief discussion of the enablement of the individual through the wide-spread availability of computer and
communication facilities is followed by a description of the structural evolution of organizations, and the
architecture of a computer-based knowledge management system. The author discusses two trends that are
driven by the treatment of information and knowledge as a commodity, increased concern for the management
and exploitation of knowledge within organizations, and, the creation of an organizational environment that
facilitates the acquisition, sharing and application of knowledge.
Keywords : Data, Data-Centric, Data Mart, Data Portal, Data Warehouse, Enabled Individual, Information,
Information-Centric,

Information

Management,

Knowledge,

Knowledge

Management,

Ontology,

Organizational Structure, Clustering, Data Mining, Fuzzy C-Means Clustering Algorithm, K-Means
Clustering Algorithm.

1. INTRODUCTION
The Data Centric Knowledge Management System is a web based application which allows employees of a
company to share their knowledge with others in the company. Also it allows them to search for knowledge
____________________________
1,3

Associate Professor, 2Professor, 1,2,3Department of Information Technology, Sree Vidyanikethan Engineering College,
Tirupathi, Andhra Pradesh, INDIA. *Correspondence : srinu_asadi@yahoo.com

Asadi Srinivasulu et al / VSRD International Journal of CS & IT Vol. 2 (4), 2012

assets when in need. It provides a facility for the employees to register themselves as experts as well as search
for other experts incase of any problem/requirement in their project. It is a one stop shop for finding solutions
for your problems. As information technology begins to permeate all aspects of life and the economy turns
decidedly information-centric, wealth is increasingly defined in terms of information-related products and the
availability of knowledge. Under these conditions employment, whether self-employment or organizational
employment is becoming singularly focused on the skills and capabilities of the individual. In other words
knowledge has become a commodity that has value far in excess of the manufactured products that represented
the yardstick of wealth during the industrial age. How this new form of human wealth should be effectively
utilized and nurtured in commercial and government organizations have in recent times become a major
preoccupation of management. Two parallel and related trends have emerged. The first trend is related to the
management and exploitation of knowledge. The question being asked is: How can we capture and utilize the
potentially available knowledge for the benefit of the organization? The phrase potentially available is
appropriate, because much of the knowledge is hidden in an overwhelming volume of computer-based data.
What is not commonly understood is that the overwhelming nature of the stored data is due to current
processing methods rather than volume. These processing methods have to rely largely on manual methods
because only the human user can provide the necessary context for interpreting the computer-stored data into
information and knowledge. If it were possible to capture information (i.e., data with relationships), rather than
data, at the point of entry into the computer then there would be sufficient context for computer software to
process the information automatically into knowledge. This is not just a desirable

2. RELATED WORK
The main purpose of functional requirements within the requirement specification document is to define all the
activities or operations that take place in the system. These are derived through interactions with the users of the
system. Since the Requirements Specification is a comprehensive document & contains a lot of data, it has been
broken down into different Chapters in this report. The depiction of the Design of the System in UML is
presented in a separate chapter. The Data Dictionary is presented in the Appendix of the system. But the general
Functional Requirements arrived at the end of the interaction with the Users are listed below. A more detailed
discussion is presented in this, which talk about the Analysis & Design of the system. Administrator of this
system can add a new employee as well as delete an existing employee and he can view all the existing users of
the system. Administrator can create; delete user logins for different employees. Administrator can view
different reports (My Submission report, Ratings reports, document status report etc).

Administrator of this system can add a new employee as well as delete an existing employee and he can
view all the existing users of the system.

Administrator can create; delete user logins for different employees.

A K-User/ K-Team Member/Reviewer can search for a document based on his criteria (author, technology
etc).

A K-User/ K-Team Member/Reviewer can download a document.

286

Asadi Srinivasulu et al / VSRD International Journal of CS & IT Vol. 2 (4), 2012

A K-User/ K-Team Member/Reviewer can rate a document.

A K-User/ K-Team Member/Reviewer can submit a document.

A K-User/ K-Team Member/Reviewer can register as an expert.

A K-User/ K-Team Member/Reviewer can search for an expert.

A K-Team Member/Reviewer can evaluate the above documents for initial screening.

A K-Team Member can manage the reviewers list.

A K-team Member can assign a document to particular reviewer

A Reviewer can view the list of documents forwarded to him

A Reviewer can publish or reject a document.

Fig. 1 : Context Level Diagram

3. EXISTING ALGORITHM
Here in the existing system, the company maintains all the knowledge based documents in a separate system
which will be accessible for all employees through LAN and they can post their new documents into this and
access the earlier documents. Searching for related documents based on author, technology etc is a time taking
process. Managing the documents category wise and restrict them not to be accessible based on the user type
becomes complicated. This system doesnt restrict unnecessary documents to be posted.
DRAWBACKS:

Difficulty in maintaining security levels for the documents.

Difficulty in browsing, navigating and searching for required document.

Difficulty in giving ratings for the documents.

287

Asadi Srinivasulu et al / VSRD International Journal of CS & IT Vol. 2 (4), 2012

Availability of information in this manner is subjected to damage.

Difficulty in restricting the employees not to update the documents.

Difficulty in generating different reports.

4. PROPOSED SYSTEM
The proposed system is fully computerized, which removes all the drawbacks of existing system. In the
proposed system, it allows different employees of the company to upload their knowledge document into this
system which will be verified by next level users to avoid unnecessary documents. Also it allows them to search
for knowledge assets very easily when in need. It provides a facility for the employees to register themselves as
experts as well as search for other experts incase of any problem/requirement in their project. It provides a
facility for the evaluator to rate the documents posted by the employees.
ADVANTAGES:

It provides a facility a to share knowledge documents across the company

It allows the employees to upload and download the documents from their systems

Easy in browsing, navigating and searching for required documents

Provides a facility to restrict the unnecessary documents to be posted

Provides flexible way in generating different reports

By the following the new approach the information can be accessed from anywhere just with a mouse click.
This helps the users by saving lot of time providing the user with the up to date information Centralized
database helps in avoiding conflicts

This project provides a rich user interface for the user to access information with least effort (look and
feel).

It allows to rate the documents at different levels

It allows publishing or rejecting the documents.

4.1. K-MEANS ALGORITHM


Step 1) Put the first K feature vectors as initial centers
Step 2) Assign each sample vector to the cluster with minimum distance assignment principle.
Step 3) Compute new average as new center for each cluster
Step 4) If any center has changed, then go to step 2, else terminate.

288

Asadi Srinivasulu et al / VSRD International Journal of CS & IT Vol. 2 (4), 2012

4.2. K-MEANS

Fig. 5 : Applying Clustering Technique Similarity Weight and Filter Method

Fig. 6 : Results of Clustering Showing Groups Divided Into Clusters

Fig. 7 : Initialization and Input

289

Asadi Srinivasulu et al / VSRD International Journal of CS & IT Vol. 2 (4), 2012

Fig. 8 : Final EMST Edges Path

Fig. 1 : Graph for K-Means


K-means is one of the simplest unsupervised learning algorithms that solve the well known clustering problem

.K-means is a popular clustering method that uses prototypes (centroid) to represent clusters by minimizing
within-cluster errors. The main idea is to define k centroid, one for each cluster.
This centroid should be placed in a cunning way because of different location causes different result. The next
step is to take each point belonging to a given data set and associate it to the nearest centroid. After we have
these k new centroid, a new binding has to be done between the same data set points and the nearest new
centroid. Finally, this algorithm aims at minimizing an objective function.
The objective function :

290

Asadi Srinivasulu et al / VSRD International Journal of CS & IT Vol. 2 (4), 2012

We apply the above algorithm in our project by taking input attributes like number of assignments submitted;
number of tasks done successfully, number of times had face to face interactions among team members. Now
applying above algorithm results in division of groups into k clusters .The groups in each cluster would have
shown nearly similar behavior hence grouped into same cluster. Now it becomes easy for the facilitator to give
feedback as now he can give feedback to the entire cluster instead of giving to each and every group

5. RESULTS

Fig. : This Screen Is Login Page for All Users and Administrator

Fig. : Administrator Can Find the Experts for Getting the Assistance

Fig. : Administrator Can Register As Experts

291

Asadi Srinivasulu et al / VSRD International Journal of CS & IT Vol. 2 (4), 2012

Fig. : This Screen Shows the K Team Actions

6. CONCLUSION
The new system, Data Centric Knowledge Management System has been implemented to cater the needs of
company employees in sharing different knowledge assets effectively with role based access. The present
system has been integrated with the already existing. The database was put into the My SQL server. This was
connected by JDBC. The database is accessible through Intranet on any location. This system has been found to
meet the requirements of the users and departments and also very satisfactory. The database system must
provide for the safety of the information stored, despite system crashes or attempts at unauthorized access. If
data are to be shared among several users, the system must avoid possible anomalous results. Future
enhancement is Extendibility provides high level extendibility. It means it provides all the basic features and
allows us to extend their features very easily without disturbing the existing code. We can make this Internet
application if we desire. We can make this application is suitable to work on any application just by changing
the deployment files. By providing some more features like providing accessibility to internet users to involve in
this process.

7. REFERENCES
[1] Srinivasulu Asadi, Dr. Ch.D.V.Subbarao, V. Saikrishna, Finding the number of clusters using Dark Block
Extraction, IJCA International Journal of Computer Applications (0975 8887), Volume 7 No.3,
September, 2010.
[2] A. Ahmad and L. Dey, (2007), A k-mean clustering algorithm for mixed numeric and categorical data,
Data and Knowledge Engineering Elsevier Publication, vol. 63, pp 503-527.
[3] Srinivasulu Asadi, Dr.Ch.D.V.SubbaRao, V.Saikrishna and Bhudevi Aasadi Clustering the Labeled and
Unlabeled Datasets using New MST based Divide and Conquer Technique, International Journal of
Computer Science & Engineering Technology (IJCSET), (0975 8887), IJCSET | July 2011 | Vol 1, Issue
6,302-306, ISSN:2231-0711, July, 2011.
[4] Xiaochun Wang, Xiali Wang and D. Mitchell Wilkes, IEEE Members, A Divide-and-Conquer Approach
for Minimum Spanning Tree-Based Clustering, IEEE Knowledge and Data Engineering Transactions, vol
21, July 2009.

292

Asadi Srinivasulu et al / VSRD International Journal of CS & IT Vol. 2 (4), 2012

[5] Srinivasulu Asadi, Dr.Ch.D.V.Subba Rao, O.Obulesu and P.Sunil Kumar Reddy, Finding the Number of
Clusters in Unlabelled Datasets Using Extended Cluster Count Extraction (ECCE), , IJCSIT International
Journal of Computer Science and Information Technology (ISSN: 0975 9646), Vol. 2 (4) , 2011, 18201824, August, 2011.
[6] S Deng, Z He, X Xu, 2005. Clustering mixed numeric and categorical data: A cluster ensemble approach.
Arxiv preprint cs/0509011.
[7] Srinivasulu Asadi, Dr.Ch.D.V.Subba Rao, O.Obulesu and P.Sunil Kumar Reddy,A Comparative study of
Clustering in Unlabelled Datasets Using Extended Dark Block Extraction and Extended Cluster Count
Extraction Extended Dark Block Extraction and Extended Cluster Count Extraction, IJCSIT International
Journal of Computer Science and Information Technology (ISSN:0975 9646), Vol. 2(4) , 2011, 18251831,August, 2011.
[8] S. Guha, R. Rastogi, and K. Shim, 2000. ROCK: A Robust Clustering Algorithm for Categorical Attributes.
Information Systems, vol. 25, no. 5 : 345-366.
[9] V.V. Cross and T.A. Sudkamp, Similarity and Compatibility in Fuzzy Set Theory: assessment and
Applications, Physica-Verlag, New York, 2002.
[10] M. Kalina, Derivatives of fuzzy functions and fuzzy derivatives, Tatra
[11] Jiawei Han and Micheline Kamber. Data Ware Housing and Data Mining. Concepts and Techniques,
Third Edition 2007.
[12] Zhexue Huang; Ng, M.K.;Manage. Inf. Principles Ltd., Melbourne, Vic.A fuzzy k-modes algorithm for
clustering categorical data. vol.7, pp 446-452
[13] Tengke Xiong; Shengrui Wang; Mayers, A.; Monga, E.; Dept. Comput. Sci., Univ. of Sherbrooke,
Sherbrooke, QC, Canada. A New MCA-Based Divisive Hierarchical Algorithm for Clustering Categorical
Data.
[14] Iam-On, N.; Boongeon, T.; Garrett, S.; Price, C.;Aberystwyth University, Aberystwyth. A Link-Based
Cluster Ensemble Approach for Categorical Data Clustering. vol. PP 1.
[15] Izakian, H.; Abraham, A.; Snasel, V.;Machine Intell. Res. Labs. (MIR Labs.), Auburn, WA, USA.
Clustering categorical data using a swarm-based method. pp. 1720-1724
[16] Charu C.Aggarwal. Towards Systematic Design of Distance Functions for Data Mining Applications.
SIGKDD 03, August 2427, 2003, Washington, DC, USA
[17] Huajie Zhang; Zhiyue Cao; Fangzhu Qiang;Dept. of Comput. Sci., China Univ. of Geosci., Wuhan.
Representation and clustering of numeric data in concept formation. vol.1, pp 597-600.
[18] M. Mahdavi and H. Abolhassani, (2009) Harmony K-means algorithm for document clustering, Data Min
Knowl Disc (2009) 18:370391.
[19] Yong Wang; Naohiro Ishii.Learining Feature Weight for Similarity Measures.
[20] Bainian Li; Kongsheng Zhang; and Jian Xu. Similarity measures and weighted fuzzy c-mean clustering
algorithm. World Academy of Science, Engineering and Technology 76 2011
[21] K. Rajendra Prasad, dr. P.Govinda Rajulu, a survey on clustering Technique for datasets using Efficient
graph structures, vol. 2 (7), 2010, 2707-2714
[22] Sotirios P. Chatzis. A fuzzy c-means-type algorithm for clustering of data with mixed numeric and
categorical attributes employing a probabilistic dissimilarity functional. Department of Electrical and

293

Asadi Srinivasulu et al / VSRD International Journal of CS & IT Vol. 2 (4), 2012

Electronic Engineering, Imperial College London, Exhibition Road, South Kensington Campus SW7 2BT,
UK.
[23] G. Gan, Z. Yang, and J. Wu (2005), A Genetic k-Modes Algorithm for Clustering for Categorical Data,
ADMA, LNAI 3584, pp. 195202.
[24] J. Z. Haung, M. K. Ng, H. Rong, Z. Li (2005) Automated variable weighting in k-mean[1] type clustering,
IEEE Transaction on PAMI 27(5).
[25] K. Krishna and M. Murty (1999), Genetic K-Means Algorithm, IEEE Transactions on Systems, Man, and
Cybernetics vol. 29, NO. 3, pp. 433-439.
[26] Y. Lu, S. Lu, F. Fotouhi, Y. Deng, and S. Brown (2004), Incremental genetic K-means algorithm and its
application in gene expression data analysis, BMC Bioinformatics 5:172.
[27] [27] Y. Lu, S. Lu, F. Fotouhi, Y. Deng, and S. Brown (2004), FGKA: A Fast Genetic K-means Clustering
Algorithm, ACM 1-58113-812-1.
[28] Z. He, X. Xu, & S. Deng,(2005) Scalable algorithms for clustering categorical data, Journal of Computer
Science and Intelligence Systems 20, 1077-1089.
[29] A. Juan and E. Vidal, Fast K-Means-like Clustering in Metric Space, Pattern Recognition Letters, vol. 15,
no. 1, pp. 19-25, 1994.
[30] Decomposition Methodology for Knowledge Discovery and Data Mining, O. Maimon and L. Rokach, eds.,
pp. 90-94. World Scientific, 2005.
[31] W. McCormick, P. Schweitzer, and T. White, Problem Decomposition and Data Reorganization by a
Cluster Technique,Operations Research, vol. 20, no. 5, pp. 993-1009, 1972. 29] Statistical Pattern
Recognition. A. Webb, ed., pp. 345-357. John Wiley & Sons, 2002.
[32] A. Gordon, Classification, second ed. Chapman and Hall, CRC, 1999.
[33] S. Roweis and L. Saul, Nonlinear Dimensionality Reduction by Locally Linear Embedding, Science, vol.
290, no. 5500, pp. 2323-2326, 2000.
[34] J.B. Tenenbaum, V. Silva, and J. Langford, A Global Geometric Framework for Nonlinear Dimensionality
Reduction, Science, vol. 290, no. 5500, pp. 2319-2323, 2000.
[35] J.C. Bezdek and R. Hathaway, VAT: A Tool for Visual Assessment of (Cluster) Tendency, Proc. Intl
Joint Conf. Neural Networks (IJCNN 02), pp. 2225-2230, 2002.
[36] M. Belkin and P. Niyogi, Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering,
Proc. Advances in Neural Information Processing Systems (NIPS), 2002.
[37] M. Breitenbach and G. Grudic, Clustering through Ranking on Manifolds, Proc. 22nd Intl Conf.
Machine Learning (ICML), 2005.
[38] R.B. Catelli, A Note on Correlation Clusters and Cluster Search Methods, Psychometrika, vol. 9, no. 3,
pp. 169-184, 1944.
[39] P. Sneath, A Computer Approach to Numerical Taxonomy, J. General Microbiology, vol. 17, pp. 201226, 1957.
[40] T.C. Havens, J.C. Bezdek, J.M. Keller, M. Popescu, and J.M. Huband, Is VAT Really Single Linkage in
Disguise? Pattern Recognition Letters, 2008, in review.Liang Wang received the PhD.

294

Você também pode gostar