Você está na página 1de 8

Mining Data and Modelling Social

Capital in Virtual Learning


Communities
Ben K. DANIEL1, Gordon I. McCALLA1, Richard A. SCHWIER2
ARIES Research Laboratory1
Department of Computer Science, University of Saskatchewan
Educational Communication and Technology2, University of
Saskatchewan
3 Campus Drive, S7N 5A4, Saskatoon, Canada

Abstract. This paper describes the use of content analysis and Bayesian Belief Network
(BBN) techniques aimed at modelling social capital (SC) in virtual learning communities
(VLCs). An initial BBN model of online SC based on previous work is presented.
Transcripts drawn from two VLCs were analysed and inferences were drawn to build
scenarios to train and update the model. The paper presents three main contributions.
First, it extends the understanding of SC to VLCs. Second; it offers a methodology for
studying SC in VLCs. Third the paper presents a computational model of SC that can be
used in the future to understand various social issues critical to effective interactions in
VLCs.

1. Introduction
Social capital (SC) has recently emerged as an important interdisciplinary
research area. SC is frequently used as a framework for understanding
various social networking issues in physical communities and distributed
groups. Researchers in the social sciences and humanities have used SC to
understand trust, shared understanding, reciprocal relationships, social
network structures, etc. Despite such research, little has been done to
investigate SC in virtual learning communities (VLCs).
SC in VLCs can be defined as a web of positive or negative
relationships within a group. Research into SC in physical communities
shows that SC allows people to cooperate and resolve shared problems
more easily [19]. Putnam [14] has pointed out that SC greases the wheel
that allows communities to advance smoothly. Prusak and Cohen [13]
have further suggested that when people preserve continuous interaction,
they can sustain SC which can in turn enable them to develop trusting
relationships. Further, in VLCs, SC can enable people to make
connections with other individuals in other communities [14]. SC also
helps individuals manage and filter relevant information and can enable
people in a community to effectively communicate with each other and
share knowledge [3].
This paper describes the use of content analysis and Bayesian
Belief Network (BBN) techniques to develop a model of SC in VLCs. An
initial BBN model for SC based on previous work [4] is presented.
Transcripts of interaction drawn from two VLCs were used to train and
validate the model. Changes in the model were observed and results are
discussed.

2. Content Analysis
The goal of content analysis is to determine the presence of words,
concepts, and patterns within a large body of text or sets of texts [17].
Content analysis involves the application of systematic and replicable
techniques for compressing a large body of text into few categories based
on explicit rules of coding [6] [16]. Researchers have used content
analysis to understand data generated from interaction in computer-
mediated collaborative learning environments [2] [15] [18]. Themes,
sentences, paragraphs, messages, and propositions are normally used for
categorizing texts and they are treated as the basic units of analysis [16].
In addition, the various units of analysis can serve as coding schemes
enabling researchers to break down dialogues into meaningful concepts
that can be further studied.
The variations in coding schemes and levels of analysis often create
reliability and validity problems. Furthermore, content analysis
approaches are generally cumbersome and labour intensive. However, a
combination of content analysis and machine learning techniques can help
to model dependency relationships and causal relationships among data.

2.1. Using Bayesian Belief Networks to Build Models

In artificial intelligence in education (AIED) models are used for


diagnosing learners to enable the building of tools to support learning [9].
Models can also be used to represent various educational systems. Barker
[1] summarized three uses of models within AIED: models as scientific
tools for understanding learning problems; models as components of
educational systems; and models as educational artefacts.
A Bayesian Belief Networks (BBN) is one of the techniques for
building models. BBNs are directed acyclic graphs (DAGs) composed of
nodes and directed arrows [12]. Nodes in BBNs represent random
variables and the directed edges (arrows) between pairs of nodes indicate
relationships among the variables. BBNs can be used for making
qualitative inferences without the computational inefficiencies of
traditional joint probability determinations [13]. Researchers have used
BBN techniques for various purposes. For example BBNs have been used
for student modelling [20] and user modelling [21]. We have begun to
investigate how BBNs can model SC in virtual communities [4].
3. Modelling Social Capital in Virtual Learning Communities
The procedure for examining SC in VLC first involved synthesis of
previous and current research on SC in physical communities, singling out
the most important variables and establishing logical relationships among
the variables. The main variables include: the type of community,
attitudes, interaction, shared understanding, demographic cultural
awareness, professional cultural awareness, task knowledge awareness,
and individual capability awareness, norms, and trust. We represented
various degrees of influence by the letters S (strong), M (medium), and W
(weak). The signs + and - represent positive and negative relationships.
The relationships among the variables were mapped into a BBN for SC
(see figure 1).

Table 1. presents the key variables of SC and their definitions

Variable Name Variable Definition Variable States

Interaction A mutual or reciprocal action Present/Absent


between two or more agents
determined by the number of
messages sent and received
Attitudes Individuals' general perception Positive/Negative
about each other and others'
actions
Community The type of environment, tools, Virtual learning community (VLC)
Type goals, and tasks that define the and Distributed community of
group practice (DCoP)

Shared A mutual agreement/consensus High/Low


Understanding between two or more agents
about the meaning of an object
Awareness Knowledge of people, tasks, or Present/Absent
environment and or all of the
above
Demographic Knowledge of an individual: Present/Absent
Awareness country of origin, language and
location
Professional Knowledge of people’s Present/Absent
Cultural background training, affiliation
Awareness etc.
Competence Knowledge about an Present/Absent
Awareness individual’s capabilities,
competencies, and skills

Capability Knowledge of people’s Present/Absent


Awareness competences and skills in
regards to performing a
particular task
Social The mutually agreed upon, Present/Absent
Protocols/Norms acceptable and unacceptable
ways of behaviour in a
community
Trust A particular level of certainty High/Low
or confidence with which an
agent use to assess the action of
another agent.

Figure 1. The Initial Model of Social Capital in Virtual Learning Communities [4]

3.1. Computing the Initial Probability Values

The probability values were obtained through adding weights to the values
of the variables depending on the number of parents and the strength of
the relationship between particular parents and children. For example, if
there are positive relationships between two variables, the weights
associated with each degree of influence are determined by establishing a
threshold value associated with each degree of influence. The threshold
values correspond to the highest probability value that a child could reach
under a certain degree of influence from parents. For instance if Attitudes
and Interactions have positive and strong (S+) relationships with
Knowledge Awareness, the evidence of positive interactions and positive
attitudes will produce a conditional probability value for Knowledge
Awareness of 0.98 (threshold value for strong = 0.98).
The weights were obtained by subtracting a base value (1 / number
of parents, 0.5 in this case) from the threshold value associated to the
degree of influence and dividing the result by the number of parents (i.e.
(0.98 - 0.5) / 2 = 0.48 / 2 = 0.24). Table 2 shows the threshold values and
weights used in this example. Since it is more likely that a certain degree
of uncertainty can exist, a value of α = 0.02 leaves some room for
uncertainty when considering evidence coming from positive and strong
relationships.
Table 2. Threshold values and weights with two parents

Degree of Thresholds Weights


influence
Strong 1-α = 1 - 0.02 = (0.98-0.5) / 2 = 0.48 / 2 = 0.24
0.98
Medium 0.8 (0.8-0.5) / 2 =0.3 / 2 = 0.15
Weak 0.6 (0.6-0.5) / 2 =0.1 / 2 = 0.05

3.2. Testing the Bayesian Belief Network Model

In order to experiment with the model developed in [4], further scenarios


were developed based on results obtained from studying two different
virtual communities. One community, see you see me (CUCME),
involved a group of individuals who regularly interacted on various issues
using textual and visual technologies (video-cams). In the CUCME
community there were no explicit goals but instead individuals were
drawn together on a regular basis to interact socially. Themes that
emerged from the analysis of the transcripts included economics, social
issues, politics, food, religion, and technology etc. Table 3 shows the
number of messages in each category found in the transcripts, their
percentage of the whole, and the mean.

Table 3. Frequency of messages observed in relation to each variable in the CUCME VC

Variable Name Frequency Percentage


Demographic Awareness 17 2.77
Economic 14 2.28
Food 12 1.96
Information Exchange 7 1.14
Social 45 7.35
Technology 7 1.14
Community Language 50 8.16
Hospitality 33 5.39
Use of Simile 21 3.43
Interaction 406 66.33
Total 612 99.95

The second community we studied consisted of graduate students


learning theories and philosophies of educational technology. Unlike the
first community, students in this community occasionally met face-to-face
and they had explicit learning goals (complete course requirements) and
protocols (set by the instructor of the course) of interactions. Bulletin
boards and email were also used for interaction. The results of the analysis
of the transcripts of this community’s interactions were broken down into
themes and are summarised in table 4.

Table 4. Frequency of messages observed in relation to each variable in the VLC

Variable Name Frequency Percentage


Interaction 100 9.12
Professional Awareness 15 1.36
Knowledge Awareness 8 0.72
Sociocultural Awareness 14 1.27
Technology 15 1.36
Hospitality 59 5.38
Shared Understanding 117 10.67
Information exchange 656 59.85
Social Protocols 112 10.21
Total 1096 99.94

4. Results and Discussion


The various themes that emerged from the analysis of the transcripts taken
from interawere used to develop a number of scenarios which in turn were
used to tweak the probability values in the model. A scenario refers to a
written synopsis of inferences drawn from the results of the transcripts. A
scenario was developed from the CUCME findings based on the following
observations: high of interaction, high value of demographic awareness.
The values of interaction, demographic awareness were tweaked in the
initial model to reflect positive state and present state respectively. Our
goal was to observe the level of shared understanding in the BBN model
using the scenario described above.
After tweaking the variables based on the scenario, the model was
updated. The results showed an increase in the posterior probability values
of shared understanding i.e. P (shared understanding) = 0.915. And since
shared understanding is also a parent of trust and SC, the probabilities of
trust and SC have correspondingly increased P (trust) = 0.92 and P (SC) =
0.75. Similarly, evidence of negative interaction and negative attitudes in
the CUCME community decreased the probabilities of P (shared
understanding) = 0.639, P (trust) = 0.548 and P (SC) = 0.311. The results
demonstrate dependency between the three variables.
In the second VLC (the graduate course) only five variables that
were dominant in the BBN model (interaction, professional awareness,
knowledge awareness, shared understanding and social protocols) were
inferred from the results, and scenarios were developed around those
variables. For example we want to examine the level of SC in a
community with a high level of interaction (meaning that interaction is
positive), and where individuals are exposed to each other well enough to
know who knows what and works where, but are not aware of the
demographic backgrounds of participants (various forms of awareness).
Setting these variables in the model, we obtained, P (shared
understanding) = 0.758, P (trust) = 0.899 and P (SC) = 0.717. The increase
in the probabilities of shared understanding, trust and SC in this
community given various kinds of awareness, but not demographic
awareness, can be explained by the fact that this community has explicit
learning goals, and that individuals are able to develop trusting
relationship based on the information about what individuals know and are
capable of doing rather than demographic information (where an
individual is from etc.).

5. Conclusion
Using content analysis and BBN techniques, we have demonstrated how
to model SC in VLCs. We have also shown how to update the model using
scenarios that can be developed from the results obtained from natural
interactions in virtual communities. Inferences from the posterior
probabilities obtained from the scenaros suggest that within a specific type
of virtual community, the level of SC can vary according to the level of
shared understanding. Further, different forms of awareness seem to have
different degrees of influence on SC. For example, in the CUCME
demographic awareness seems to be an influential factor in the variation
of SC. Moreover, in the graduate course VLC, where there are explicit
goals and limited time to achieve those goals, members can be motivated
to participate and engage in knowledge sharing activities and so
demographic awareness can have a little influence on SC.
The Bayesian model presented in this paper adequately represented
the scenarios developed from the results obtained from the two data sets.
We are continuing to analyse interaction patterns in other VLCs, and will
develop more scenarios to refine our model.

Acknowledgement
We would like to thank the Natural Sciences and Engineering Research
Council of Canada (NSERCC) as well as the Social Sciences and Humanities Research
Council of Canada (SSHRCC) for their financial support for this research.

References
[1] M. Baker (2000). The roles of models in Artificial Intelligence and Education
research: A prospective view. International Journal of Artificial Intelligence in
Education (11),123-143.
[2] B. Barros & F. Verdejo (2000). Analysing students interaction processes in order to
improve collaboration. The DEGREE approach. International Journal of artificial
inteligence in education, (11), pp. 221-241
[3] B.K. Daniel, R.A. Schwier & G. I. McCalla (2003). Social capital in virtual learning
communities and distributed communities of practice. Canadian Journal of
Learning and Technology, 29(3), 113-139.
[4] B.K. Daniel, D. J. Zapata-Rivera & G. I. McCalla (2003). A Bayesian computational
model of social capital in virtual communities. In, M. Huysman, E.Wenger and V.
Wulf Communities and Technologies, pp.287-305. London: Kluwer Publishers.
[5] Freeman, L. C. (2000), Visualizing social networks, Journal of Social Structure,
Available: [http://zeeb.library.cmu.edu: 7850/JoSS/article.html]
[6] K. Krippendorf (1980). Content analysis: An introduction to its methodology. Beverly
Hills, CA: Sage Publications.
[7] C. Lacave and F. J. Diez (2002). Explanation for causal Bayesian networks in Elvira.
In Proceedings of the Workshop on Intelligent Data Analysis in Medicine and
Pharmacology (IDAMAP-2002), Lyon, France.
[8] K. Laskey and S. Mahoney (1997). Network Fragments: Representing Knowledge for
Constructing Probabilistic Models, Uncertainty in Artificial Intelligence:
Proceedings of the Thirteenth Conference.
[9] G. I. McCalla (2000). The fragmentation of culture, learning, teaching and
technology: Implications for artificial intelligence in education research.
International Journal of Artificial Intelligence in Education, 11(2), 177-196.
[10] J. Nahapiet & S. Ghoshal (1998). Social capital, intellectual capital and the
organizational advantage. Academy of Management Review, (23)(2) 242- 266.
[11] D. Niedermayer (1998) An Introduction to Bayesian Networks and their
Contemporary Applications. Retrieved May, 6th 2004 from: :
[http://www.niedermayer.ca/papers/bayesian/bayes.html]
[12] J. Pearl (1988) Probabilistic Reasoning in Intelligent Systems: Networks of Plausible
Inference, Morgan Kaufmann Publishers, San Mateo, CA.
[13] L. Prusak & D. Cohen (2001). In good company: How social capital makes
organizations work. Boston, MA: Harvard Business School Press.
[14] R. Putnam (2000). Bowling alone: The collapse and revival of American community.
New York: Simon Schuster.
[15] A. Ravenscroft & R. Pilkington (2000). Investigation by design: developing
dialogues models to support support reasoning and conceptual change.
International journal of artificial intellignece in education, 11-273-298.
[16] L. Rourke T. Anderson, D.R. Garrison. and W. Archer (2001). Methodological
issues in the analysis of computer conference transcripts. International Journal of
Artificial Intelligence in Education, (12) 8-22.
[17] S. Stemler (2001). An overview of content analysis. Practical Assessment, Research
& Evaluation, 7(17). Retrieved October 19, 2003 from
[http://edresearch.org/pare/getvn.asp?v=7&n=17].
[18] A. Soller and A. Lesgold (2003). A computational approach to analyzing Online
knowledge sharing interaction. Proceedings of Artificial Intelligence in Education
pp. 253-260., Sydney, Australia.
[19] World Bank (1999). Social capital research group. Retrieved May 29, 2003, from
http://www.worldbank.org/poverty/scapital/.
[20] J.D. Zapata-Rivera (2002) cbCPT: Knowledge Engineering Support for CPTs in
Bayesian Networks. Canadian Conference on AI 2002: 368-370
[21] I. Zukerman, D. W., Albreacht and A.E. Nelson (1999). Predictiing users’ requests
on the WWW. In UM 1999, Proceedings of the 7th international conference on
user modelling, Banf, Canada, pp, 275-284.

Você também pode gostar