Escolar Documentos
Profissional Documentos
Cultura Documentos
Advisor(s)
Author(s)
Wong, On-wing.; .
Citation
Issued Date
URL
Rights
2013
http://hdl.handle.net/10722/188758
by
WONG On Wing
Degree of Master of Philosophy
January 2013
Submitted by
WONG On Wing
in
Computer-Supported
Collaborative
Learning
(CSCL)
One Chinese and one English datasets are collected from an online
discussion platform. These datasets are selected for comparing the
performance of question detection and classification in the two languages,
and a sentence is defined as the unit of analysis. Question detection is a
process to distinguish questions from other types of discourse act. A hybrid
method is proposed to combine the rule-based question mark method and
machine-learning-based syntax method for question detection. This method
achieves 94.8% f1-score and 98.9% accuracy in English question detection
and 94.8% f1-score and 93.9% accuracy in Chinese question detection.
While question detection focuses mainly on the identification of questions,
question classification concentrates on the categorization of questions. The
literature showed that the tree kernel method is almost a standardized
method for question classification. The classification of English verification
and reason questions using tree kernel method can both attained f1-score
above 80%. Though the precision of Chinese question classification using
the same settings remains at a similar level, the recall drops greatly. This
result indicates that the syntax-based tree kernel method may not be
appropriate for classifying questions in Chinese languages. In order to
improve on the Chinese question classification result, Case-Based
Reasoning (CBR) is introduced. CBR is a method to retrieve example case(s)
which shares the maximum percentage of similarity with the test case from
a database. In this study, the similarity is measured by the lexemes that
composed a question. Although the implementation of the CBR method can
improve the recall, it also causes the great drop of precision. Considering
the high precision of tree kernel method and wide coverage of CBR method,
a hybrid method is proposed to combine the two methods. The experiment
result shows that f1-score of the hybrid method for multi-class classification
surpasses the tree kernel and CBR methods. This indicates that the
implementation of hybrid method can generally improve the result of
Chinese question classification. (Word Count: 477)
by
WONG On Wing
B.Sc. H.K.B.U.; M.Sc. H.K.U.
Declaration
I declare that this thesis represents my own work, except where due
acknowledgement is made, and that it has not been previously included in a
thesis, dissertation or report submitted to this University or to any other
institution for degree, diploma or other qualifications.
Signed ___________________________
WONG On Wing
Acknowledgements
Last but not least, I need to thank all my friends and colleagues for
their support. All of you are the angels of my life. I am especially indebted
to Dr. Emily Oon for her encouragement and sincerity throughout my
research. No matter how the things will be going, I will always remember
all of your help.
ii
Table of Contents
Declaration .................................................................................................. i
Acknowledgements ..................................................................................... ii
Table of Contents ........................................................................................ iii
Lists of Figures ........................................................................................... vi
Lists of Tables ............................................................................................. vii
Chapter 1
INTRODUCTION ................................................................... 1
1.1
1.2
1.3
1.4
1.7
Chapter 2
2.1
2.2
2.3
2.4
Chapter 3
METHOD............................................................................... 23
3.1
3.2
3.3
3.4
3.5
3.5.1
3.5.2
3.5.3
RESULTS ............................................................................... 35
4.2.1
Table 6. English question detection results with QM, Syntax and QM+Syntax
methods ........................................................................................................ 37
4.2.2
Table 7. Chinese question detection results with QM, Syntax and QM+Syntax
methods ........................................................................................................ 38
4.3
4.3.1
Chapter 5
5.1
DISCUSSION ........................................................................ 49
5.1.1
5.1.3
5.2
5.2.1
...................................................................................................................... 72
5.3 Lexeme-based case-based reasoning for Chinese question
classification ............................................................................................. 72
Table 25. A comparison of the question classification result by CBR and tree
kernel method ............................................................................................... 73
Figure 19. Syntactic structure of question
? .......................................................................................................... 74
Figure 20. Parse tree of attribute question
? .............................................................................................................. 75
Figure 21. Parse tree of attribute question ? ....................... 75
5.4
5.5
Question taxonomy........................................................................ 78
5.5.1
The reliability of question taxonomy for automatic question
classification ......................................................................................... 78
5.5.2
Chapter 6
IMPLICATIONS, RECOMMENDATIONS, LIMITATION
AND CONCLUSION .................................................................................. 83
6.1
6.2
6.3
6.4
6.5
Conclusion ..................................................................................... 89
6.6
6.7
6.8
6.9
Conclusion ..................................................................................... 93
Appendix I.................................................................................................... 95
List of lexemes with Information Gain higher than or equal to 0.011 ......... 95
References .................................................................................................... 97
vi
List of Figures
Figure 1. The procedure of the hybrid method for question classification .. 34
Figure 2. Parse tree of question "Why?" ...................................................... 52
Figure 3. Parse tree of question "Teapot?" ................................................... 53
Figure 4. Parse tree of question "CFC lead to global warming?" ................ 53
Figure 5. Parse tree of question "money comes from where?" .................... 54
Figure 6. Parse tree of attribute question Can u tell me where have many
wind? .......................................................................................................... 65
Figure 7. Parse tree of attribute question Where are the rubbish? ............ 65
Figure 8. Common subset trees of attribute questions Can u tell me where
have many wind? and Where are the rubbish? ....................................... 65
Figure 9. Parse tree of others question Can u explain? ............................ 65
Figure 10. Common subset trees of attribute question Can u tell me where
have many wind? and elaboration question Can u explain? ................... 66
Figure 11. Parse tree of procedure question How can we police
? .............................................................................................................. 68
Figure 12. The correct parse tree of question How can we police
? .......................................................................................................... 68
Figure 13. Parse tree of procedure question "How we protect the
environment?" .............................................................................................. 69
Figure 14. The correct parse tree of question How can we police
? .......................................................................................................... 69
Figure 15. Direct translation of the question Hong Kong
police ?........................................................................................ 69
Figure 16. Parse tree of reason question ? ........ 71
Figure 17. Parse tree of reason question ? ........ 71
Figure 18. Parse tree of sentence
...................................................................................................................... 72
Figure 19. Syntactic structure of question
? .......................................................................................................... 74
Figure 20. Parse tree of attribute question
? .............................................................................................................. 75
Figure 21. Parse tree of attribute question ? ....................... 75
vii
List of Tables
Table 1. Patterns for question types (Lee et al., 2000) .......................................... 13
Table 2. Categories of Question (Graesser & Person, 1994, p. 111) ..................... 19
Table 3. Question categorization used in the present study based on the
purpose of the questions in inquiry ....................................................................... 20
Table 4. Distribution of Questions in English dataset ........................................... 36
Table 5. Distribution of Question in Chinese dataset ............................................ 36
Table 6. English question detection results with QM, Syntax and QM+Syntax
methods ................................................................................................................. 37
Table 7. Chinese question detection results with QM, Syntax and QM+Syntax
methods ................................................................................................................. 38
Table 8. Ten-fold stratified cross-validation result of English verification
question using tree kernel method ........................................................................ 41
Table 9 . Ten-fold stratified cross-validation result of English reason question
using tree kernel method ....................................................................................... 41
Table 10. Ten-fold stratified cross-validation result of English attribute
question using tree kernel method ........................................................................ 41
Table 11. Ten-fold stratified cross-validation result of Chinese verification
question using tree kernel method ........................................................................ 42
Table 12. Ten-fold stratified cross-validation result of Chinese reason
question using tree kernel method ........................................................................ 43
Table 13. Ten-fold stratified cross-validation result of Chinese attribute
question using tree kernel method ........................................................................ 43
Table 14. Average of the ten-fold stratified cross-validation result of
multi-class classification using tree kernel method .............................................. 43
Table 15. Ten-fold stratified cross-validation result of Chinese verification
question using CBR method ................................................................................. 45
Table 16. Ten-fold stratified cross-validation result of Chinese reason
question using CBR method ................................................................................. 45
Table 17. Ten-fold stratified cross-validation result of Chinese attribute
question using CBR method ................................................................................. 46
Table 18 ................................................................................................................. 46
Table 19. Ten-fold stratified cross-validation results of Chinese verification
question with tree+CBR method ........................................................................... 47
Table 20. Ten-fold stratified cross-validation results of Chinese reason
question with tree+CBR method ........................................................................... 47
Table 21. Ten-fold stratified cross-validation results of Chinese attribute
question with tree+CBR method ........................................................................... 48
viii
ix
Chapter 1
INTRODUCTION
in
Computer-Supported
Collaborative
Learning
(CSCL)
There are quite a lot of tools being developed in the recent years to
serve the purpose of monitoring students progression in online discussion.
Analytic Toolkit (ATK) (Burtis, 1998) is an example of the tools to serve the
purpose of tracking students progress in online discussion. ATK contains a
set of measurements developed by the Knowledge Building team at the
University of Toronto for capturing the level of participants engagement
through their contributions (i.e. notes, keywords, scaffold, reference, etc.) on
Knowledge Forum (Burtis, 1998). PolyCAFe (Rebedea, Dascalu, &
Trausan-matu, 2011) is another system which equips with the social network
analysis module for tutor to assess the social perspective of the online
conversation. These two examples make use of the participatory statistics
for the assessment of students progression. However, the research on tool
development in CSCL is not only limited to quantitative analysis. Tag
1
approach does not rely on the linguistic characteristics of questions, but the
behavioral patterns of the different user groups making the postings. As
shown in some research (Hong & Davison, 2009), people who newly join a
group have a high tendency to ask questions than other types of group
members. This approach is quite attractive, as it requires the least amount of
effort of natural language processing, but research on the relationship
between students behavior to the tendency to ask question is relatively
limited in the CSCL context. Hence, it is yet uncertain whether the
interaction-based approach can be implemented for analyzing CSCL
discourse.
& Choi, 2000; Pasca & Harabagiu, 2001; Prager, Radev, Brown, Coden, &
Samn, 1999). This approach categorizes questions based on the rules set by
human experts. However, natural language is highly ambiguous and it is
quite unlikely, even for the experts, to exhaustively generate all the
classification rules in advance. This limitation has led the research on
question classification from rule-based approaches to machine learning
approaches (Bloehdorn & Moschitti, 2007; Bu, Zhu, Hao, & Zhu, 2010;
Carlson, Cumby, Rosen, & Roth, 1999; Day, Ong, & Hsu, 2007; Hakan,
2007; X. Li & Roth, 2002; Suzuki, Taira, Sasaki, & Maeda, 2003; D. Zhang
& Lee, 2003).
1.
2.
3.
4.
Chapter 2
LITERATURE REVIEW
These
A robust
One of those is to annotate the non-terminal node with its parent node to
cater for the problem of over-generalized part-of-speech tags found in the
Penn Treebank. The second is to implement horizontal and vertical
markovization to break down a complex parse and better represent the
probability of the trees to tackle the uneven distribution of trees in the Penn
Treebank. The experimental result by Klein & Mannings ( 2003) shows that
the Stanford parser can achieve a f1-score of 86.3%. Since the Stanford
parser can achieve a satisfactory performance, it is believed that it can also
able to generate the syntactic structure of questions with the same high
f1-score, and therefore, it is chosen as a recognizer of the syntactic patterns
of questions in online discussions for this study. This result indicates that the
performance of machine learning using syntactic patterns is comparable to
the rule-based question mark method for question detection.
The above investigation indicates that the rule-based question mark and
machine-learning-based syntax methods are reliable methods for detecting
English questions from online discussion. However, it lacks investigation on
the performance of the two approaches for detecting questions generated by
students who may not be proficient in the languages used. This study aims
to verify the effectiveness of these two approaches for question detection in
online discussion in both English and Chinese languages by Hong Kong
students, and to experiment whether a hybrid method combining of two
methods can improve the performance of question detection.
Question Type
Patterns
Person
Location
Organization
Time
Currency
Measure
1000.
SVM is a one-class linear classifier using maximum margin hyperplane
to separate data points in feature space. However, most situations in real
world contexts are not linearly separable, so the input space needs to be
converted into linearly separable high-dimensional feature space with a
mapping function
not feasible to generate the feature vector (Suzuki et al., 2003). Kernel
function, as defined in equation (1), can be used to avoid this problem.
(1)
Kernel function allows the incorporation of prior knowledge into the metric
for measuring the similarity between two data points without creating an
explicit numeric feature space. Among different types of kernel functions,
tree kernel is the one that yields the best result in question classification. A
comparison by Zhang & Lee ( 2003) shows that tree-kernel gives a better
question classification performance than bag-of-word and n-gram linear
kernel.
saturated with 4000 training data, while there is no sign of saturation for the
classification task with fine-grained question category definition. It is
assumed that the same trend of improvement would apply to the
fine-grained question category but the saturation would be achieved at a
later stage. This result indicates that the coarse-grained question category
definition would be a better choice for question classification task with an
insufficient amount of training data. Third, CSCL discourse question data
generated from authentic learning contexts is not easily available in large
quantities for training the algorithm. It is a concern of this study to strive for
a balance between the size of training data and performance of algorithm.
Fourth, the results reviewed above are based on experiments with English
large data. It is yet unknown whether the same result can be attained from
the classification of Chinese questions. Most Chinese questions are
wh-in-situ questions and do not have wh-movement as found in English
question. This means that the wh-element in the Chinese questions will not
undergo an overt moments.
approaches
for
Chinese
question
classification.
The
bag-of-word linear kernel method is proven to be less effective than the tree
kernel method (D. Zhang & Lee, 2003), this present study explores the use
of case-base reasoning, a direct retrieval approach without any
generalization of the training data, for Chinese question classification.
Aamodt & Plaza (1994) defined case-based reasoning as a paradigm to
solve a new problem by remembering a previous similar situation and by
15
1. Case retrieval
2. Case reuse
3. Case revision
4. Case retention
where
The past case with the smallest distance with the new case is retrieved from
the case base.
16
After the retrieval of a past case from the case base, the solution of the
past case will be retrieved as a solution for the test case. The solution for the
test case will then be applied in the real environment and feedback will be
collected from the environment or users for the revision of the past case.
Based on the problem descriptor of each algorithm, different information
will be retained for future retrieval. An advantage of using CBR for question
classification over other machine learning approaches is that it requires no
generalization of the past cases. Therefore, cases without any common
characteristics can still be grouped into the same category. This feature is
especially important for processing natural language with a wide range of
syntactic, lexical and semantic variations.
Participants take the responsibility for their own learning and questioning is
a way for them to make their gap of their understanding explicit. Since
deep-reasoning questions are highly correlated with the deeper levels of
cognition (Graesser & Person, 1994), the GPH scheme, a question
taxonomy proposed by Graesser & Person (1994) for studying the depth of
reasoning, is chosen as the question taxonomy of this study. Table 2 lists the
question categories in the GPH scheme.
The shortcoming of this scheme is that the question categories are not
mutually exclusive (Graesser & Person, 1994). A question may be assigned
to one or more question categories. This feature violates the basic assumption
in the computation of Cohens Kappa coefficient that all categories are
mutually exclusive. Hence, a refinement to the scheme is needed to eliminate
the overlap in the different categories for it to be useful for the purpose of the
present study. One possible refinement is to group question categories with
similar functions into one category. The interpretation, causal antecedent,
causal consequence, goal orientation, expectational and enablement
questions in the GPH scheme are all explanation-driven questions, and thus,
being grouped into question category reason. While the concept completion,
feature specification and quantification questions both solicit the property of
a phenomenon, are grouped into question category attribute. The question
category request in the GPH scheme and some other question categories,
such as the task-oriented questions (Hmelo-Silver & Barrows, 2008) are not
directly related to the progression of inquiry. These types of questions are
put into question category others and not considered in this present study.
Finally, the name of some question categories has also been changed for
ease of understanding. Table 3 shows the question taxonomy proposed in
this study. The question taxonomy proposed in this study is based on the
function of questions in CSCL inquiry instead of the factual information that
a question solicits. This type of taxonomy is quite different from the one
used in the question classification literature. It awaits validation whether the
question classification techniques derived from the classification of factual
questions is suitable for the present classification of questions based on their
functions in inquiries.
18
Long answer
Abstract specification
Example
Verification
Is the answer 5?
Disjunctive
Concept completion
Feature specification
Quantification
Definition
What is a t test?
Example
Comparison
Interpretation
pattern of data?
Causal antecedent
Causal consequence
Goal orientation
Instrumental/procedural
Enablement
Expectational
Judgmental
19
Assertion
Table 3. Question categorization used in the present study based on the purpose of the questions in inquiry
Question type
Requesting For
Definition
Definition
Meaning of a word/phrase/event
Option
Disjunctive
Example
Attribute
Clarification
Comparison
Concept completion,
feature specification,
quantification
Assertion
Comparison
games?
20
Reason
Interpretation, causal
antecedent, causal
consequence, goal
orientation,
expectational,
enablement
Procedure
Instrumental/
Procedural
event
Example
Opinion
Example
Judgment
power?
Verification
Verification
21
at it?
Others
22
Chapter 3
METHOD
23
Kong citizens may have a distant from the native English speakers. The
2013 Business English Index & Globalization of English Research (Global
English, 2013) shows that the level of business English of Hong Kong is at
the rank of 21. The result of this report reflects that the level of English in
Hong Kong has a distant from the language proficiency from the top rank,
and hence, it is predictable that the language in the dataset may not be fully
grammatical.
The first step of the data preparation is to convert the database from
tuplestore format to relational database format. A main reason for this
transformation is that the relational database provides a more flexible and
optimized way for data query than tuplestore. During the process of
conversion, all personal identifiers of the authors were removed. The
resulting relational database contains only the identifier of the discussion
24
perfect agreement. The fifth step is code all the text. The remaining data
in the datasets will be coded by the two coders and each piece of data will
only code once by a single coder. The sixth step is assess your coding
consistency. Since no new category of question will be emerged in step
four and five, the process to re-calculate the inter-coder reliability can be
omitted. The seventh and eighth steps are draw conclusions from the coded
data and report your methods and findings. Concerning that the objective
of the manual content analysis in this study is to prepare training and testing
data for the question detection and classification algorithms, the last two
steps are irrelevant to the present study and therefore neglected.
The Stanford parser is packaged with the language model trained with
the Penn Treebank. Penn Treebank contains three million words of parsed
text from a wide range of genres, such as IBM computer manuals, nursing
notes, Wall-street Journal articles, transcribed telephone conversations, etc
(Taylor et al., 2003). The Penn Treebank contains two question related
part-of-speech tags, namely SQ and SBARQ. SBARQ is a direct question
introduced by wh-word or wh-phrase and SQ is the subconstituent of
SBARQ excluding wh-word or wh-phrase (Marcus, 1993, pp. 321). Since
Penn Treebank is used commonly in natural language processing research, it
is selected as the treebank for training the Stanford parser for analyzing the
syntactic structure of sentences in English dataset.
While the Penn Treebank is used for English syntactic analysis, the
Sinica Treebank (F. Chen, Tsai, Chen, & Huang, 1999) is used for training
the Stanford Parser for analyzing sentences in the Chinese dataset.
Sinica
S(agent:NP(Head:Naeb:)|negation:Dc:|epistemics:Dbaa:
|Head:VE2:|goal:VP(deontics:Dbab:|Head:VJ2:
|goal:NP(Head:Nac:))|particle:Td:)#
(QUESTIONCATEGORY)
In order to use Sinica Treebank for training the Stanford Parser, the
Sinica Treebank will first convert into the format of Penn Treebank. (2)
shows the tree structure (1) after the conversion.
(2)
(S (agent:NP (Head:Naeb:))
(negation:Dc:)
(epistemics:Dbaa:)
(Head:VE2:)
(goal:VP (deontics:Dbab:)
(Head:VJ2:)
(goal:NP (Head:Nac:)))
(particle:Td:)
?
)
Besides the difference in bracketing format, the missing of the
part-of-speech tag for questions in the Sinica Treebank tagset would also
cause the failure of the proposed method to detect question with the
occurrence of subtree structure for question. (2), for example, is a question,
but there exists no part-of-speech tag to indicate the constituent of a
question. One way to tackle this problem is to create a part-of-speech tag for
question SQ and use it for the replacement of root node S in (2). (3) shows
the syntactic structure of (2) after the substitution.
(3)
(goal:NP (Head:Nac:)))
(particle:Td:)
?
)
Syntax method: A statement which contains one or more question tag, SQ
or SBARQ, in its syntactic structure will be marked as a
question.
The hybrid method is a combination of the QM and Syntax method.
All sentences will first be processed by the QM method, and then all the
negative returns from the QM method will be passed to the Syntax method
for further processing. Any sentence which was marked as positive by either
QM or Syntax method will be regarded as question by the hybrid method.
Case
1
2
4
Method
QM method
Syntax method
QM + Syntax method
where
is the
tree fragment. The tree kernel counts the common subtrees between the two
input trees. Below is the definition of tree kernel function and the
corresponding conditions proposed by Bloehdorn & Moschitti (Bloehdorn
& Moschitti, 2007, pp.862):
and
, tree kernel
is defined as:
(1)
(2)
and
. It can
1. If the productions at
and
2. If the productions at
and
are
and
are
pre-terminals then
3. If the productions at
and
and
31
where
is the number of children of
of node n and is the decay parameter.
where
and
with negative scores from all SVM models will be sent to the case-based
reasoning algorithm for further processing. The case-based reasoning
algorithm will determines the question category for those questions.
34
Chapter 4
RESULTS
In the second stage after the identification of questions, the two coders
categorized questions in accord with the question taxonomy described in
section 2.4. The Cohens Kappa coefficient for Chinese and English
question classification are 0.89 and 0.9 respectively for all question
categories combined. These values indicate that the inter-rater reliability is
substantial (Landis & Koch, 1977) for both Chinese and English datasets.
Besides, the similarity found in the Cohens Kappa coefficient also informs
us that the languages are not a factor affecting the reliability of the question
35
taxonomy Table 4 and Table 5 show the distribution of questions in the two
datasets, English and Chinese respectively.
Question Type
Frequency
Verification
318
38.3%
Reason
172
20.7%
Attribute
116
14.0%
Procedural
116
14.0%
Definition
32
3.9%
Others
31
3.7%
Clarification
23
2.8%
Opinion
11
1.3%
Example
0.6%
Option
0.5%
Comparison
0.2%
Question types
Count
% of different types of
questions
Verification
434
45.8%
Attribute
150
15.8%
Reason
147
15.5%
Procedural
77
8.1%
Opinion
45
4.8%
Others
29
3.1%
Option
21
2.2%
Clarification
17
1.8%
Definition
14
1.5%
Example
10
1.1%
Comparison
0.4%
This category
includes some meaningful question categories which have been missed out
36
in the question taxonomy of this study. Those questions might serve the
purpose of monitoring of discussion, providing a suggestion, asking for
elaboration or facilitating a consensus between different members of the
discussion. It is believed that these question categories would inform us the
students agency to regulate their own inquiry. However, this is out of the
scope of this study and how these questions are related to the inquiry has not
been further analyzed.
Precision
Recall
F1-score
Accuracy
QM
96.8%
87.7%
92.0%
98.5%
Syntax
96.6%
59.6%
73.8%
95.9%
QM +
Syntax
96.0%
93.5%
94.8%
98.9%
Table 6 shows the English question detection results with the three
methods. Both methods have precision and accuracy above 95%.There are
three findings which can help us to have a better understanding to the nature
37
results
show
that
neither
the
rule-based
QM
nor
Precision
Recall
F1-score
Accuracy
98.3%
91.5%
94.8%
93.9%
Syntax
0%
0%
0%
89.5%
QM +
Syntax
98.3%
91.5%
94.8%
93.9%
QM
Chinese question detection is 0%. This result informs us that the syntax
method, using the Sinica Treebank as the training data, fails to detect any
questions from the dataset. There are two possible causes of the failure. The
first possible cause is the low percentage of questions in the Sinica Treebank.
Since the treebank contains only around 1.3% of questions, this low
frequency would affect the performance of statistics-based Stanford
syntactic parser used in the syntax method. Another possible cause is the
missing of part-of-speech tag for questions in the tagset of the Sinica
Treebank. Although a question tag has been constructed in this study to
replace the root node of the questions in treebank, the poor performance
clearly indicates that this way of substitution may not be appropriate.
In addition to the poor precision and recall of the syntax method, the
high accuracy of the syntax method also informs us the accuracy may not be
a suitable metric for measuring the performance of question detection. The
main reason is that the distribution of question and non-question is highly
skewed and most of the correct detection is contributed by the correct
classification of non-questions. A solution to this problem is to have a
balanced corpus. However, the occurrence of questions is normally less than
the non-questions in the real-life context. It is quite unlikely to have a
corpus with equal amount of questions and non-questions. Hence, accuracy
shall not be considered as a suitable metric for question detection.
39
Firstly, the tree kernel method is only well tested for classifying
questions in English speaking environment. Those questions are more likely
to be grammatical than the questions generated by non-native English
speakers. However, the majority of students in Hong Kong have Chinese as
their first language and English is seldom used in their daily
communications. Their English language proficiency should be lower than
the native English speakers, and it is unrealistic for us to expect that the
language generated by Hong Kong students is fully grammatical. It is yet
unknown whether the existence of ungrammatical forms of questions would
affect the performance of tree kernel method.
would highly affect the result of question classification. Hence, only those
question types with more than 100 counts are selected for the analysis. The
three question types selected are attribute, reason and verification questions.
Table 8, Table 9 and Table 10 show the ten-fold stratified cross-validation
results for these three types of questions using tree kernel method.
Table 8. Ten-fold stratified cross-validation result of English verification question using tree
kernel method
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
Avg.
81.3%
87.5%
90.3%
93.6%
78.8%
90.0%
90.3%
87.5%
89.0%
Recall
80.7%
90.3%
83.9%
90.3%
90.3%
93.6%
86.7%
90.0%
93.3%
93.3%
89.0%
f1-score
89.3%
88.9%
82.5%
88.9%
90.3%
93.6%
82.5%
90.0%
91.8%
90.3%
89.0%
round of validation
Table 9 . Ten-fold stratified cross-validation result of English reason question using tree
kernel method
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
Avg.
Precision 100.0% 100.0% 93.3% 100.0% 100.0% 92.3% 100.0% 100.0% 100.0% 100.0% 99.0%
Recall
82.4%
47.1%
82.4%
70.6%
68.8%
75.0%
87.5%
68.8%
87.5%
75.0%
74.0%
f1-score
90.3%
64.0%
87.5%
82.8%
81.5%
82.8%
93.3%
81.5%
93.3%
85.7%
84.7%
round of validation
Table 10. Ten-fold stratified cross-validation result of English attribute question using tree
kernel method
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
Avg.
85.7%
85.7%
78.0%
Recall
58.3%
41.7%
33.3%
54.6%
63.6%
54.6%
18.2%
81.8%
36.4%
54.4%
50.0%
f1-score
73.7%
45.5%
44.4%
70.6%
63.6%
66.7%
26.7%
90.0%
50.0%
66.6%
60.9%
round of validation
questions are both above 70%. The only unsatisfactory result is the recall
rate of attribute questions. The average recall of attribute question is just
50%. This means that the algorithm missed out 50% of the attribute
questions in the dataset. The wide variation of syntactic structures of
attribute questions may be a cause of the low recall rate. Attribute question
is a type of question inquiring the property of an object or event. For
example, What is the colour of lion? and Which part of this vehicle is
broken? are both attribute questions, but the syntactic structures are quite
different between the two questions. Hence, if the training data cannot cover
the syntactic structures of all questions, the algorithm may fail to detect
those missing questions. This hypothesis is supported by the result at round
8. The recall at round 8 is 31.2% higher than the average recall. This result
shows that if the rare cases do not occur in the testing data set, the
algorithm would perform quite well.
Table 11. Ten-fold stratified cross-validation result of Chinese verification question using
42
R2
R3
R4
R5
R6
R7
R8
R9
R10
Avg.
Precision 67.9%
74.5%
78.6%
66.7%
73.8%
66.7%
67.9%
71.4%
76.7%
69.8%
71.4%
Recall
83.7%
81.4%
76.7%
83.7%
72.1%
74.4%
83.7%
83.3%
78.6%
71.4%
78.9%
f1-score
75.0%
77.8%
77.6%
74.2%
72.9%
70.3%
75.0%
76.9%
77.6%
70.6%
74.8%
round of validation
Table 12. Ten-fold stratified cross-validation result of Chinese reason question using tree
kernel method
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
Avg.
Precision 100.0% 100.0% 100.0% 100.0% 100.0% 100.0% 100.0% 100.0% 91.7% 100.0% 99.2%
Recall
73.3%
60.0%
60.0%
46.7%
33.3%
57.1%
42.9%
71.4%
78.6%
57.1%
58.0%
f1-score
84.6%
75.0%
75.0%
63.6%
50.0%
72.7%
60.0%
83.3%
84.6%
72.7%
72.2%
round of validation
Table 13. Ten-fold stratified cross-validation result of Chinese attribute question using tree
kernel method
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
Precision 100.0% 100.0% 100.0% 100.0% 100.0% 100.0% 100.0% 100.0% 100.0% 66.7%
Avg.
96.7%
Recall
18.8%
25.0%
43.8%
20.0%
40.0%
6.7%
13.3%
20.0%
20.0%
26.7%
23.4%
f1-score
31.6%
40.0%
60.9%
33.3%
57.1%
12.5%
23.5%
33.3%
33.3%
38.1%
36.4%
round of validation
R2
R3
R4
R5
R6
R7
R8
R9
R10
Avg.
Precision 89.3%
91.5%
92.9%
88.9%
91.3%
88.9%
89.3%
90.5%
89.5%
78.8%
89.1%
Recall
58.6%
55.5%
60.2%
50.1%
48.5%
46.1%
46.6%
58.3%
59.0%
51.7%
53.5%
f1-score
63.7%
64.3%
71.2%
57.1%
60.0%
51.9%
52.8%
64.5%
65.2%
60.5%
61.1%
question is just 23.4%. A reason for this drop may because Chinese is
ambiguous and the syntactic variation is larger in the Chinese attribute
questions than in the English attribute questions. It is highly probable that
some variation of syntactic structures found in the testing data is not
available in the training data and as a result the tree kernel approach fails to
correctly identify the testing attribute questions.
The CBR method allows one to plug in any types of features for the
analysis. A detailed description of the CBR method is illustrated in section
44
Table 15. Ten-fold stratified cross-validation result of Chinese verification question using
CBR method
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
Avg.
Precision 70.4%
58.2%
73.5%
58.8%
63.8%
62.5%
71.7%
68.1%
66.7%
63.8%
65.7%
Recall
88.4%
74.4%
83.7%
69.8%
86.0%
81.4%
88.4%
76.2%
71.4%
71.4%
79.1%
f1-score
78.4%
65.3%
78.3%
63.8%
73.3%
70.7%
79.2%
71.9%
69.0%
67.4%
71.7%
Table 16. Ten-fold stratified cross-validation result of Chinese reason question using CBR
method
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
Avg.
Precision 90.9%
66.7%
71.4%
53.8%
80.0%
76.9%
83.3%
71.4%
71.4%
70.0%
73.6%
Recall
66.7%
53.3%
66.7%
46.7%
26.7%
71.4%
71.4%
71.4%
71.4%
50.0%
59.6%
f1-score
76.9%
59.3%
69.0%
50.0%
40.0%
74.1%
76.9%
71.4%
71.4%
58.3%
64.7%
45
Table 17. Ten-fold stratified cross-validation result of Chinese attribute question using CBR
method
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
Avg.
Precision 44.4%
35.3%
64.3%
53.3%
56.3%
36.4%
46.2%
50.0%
50.0%
38.5%
47.5%
Recall
50.0%
37.5%
56.3%
53.3%
60.0%
26.7%
40.0%
46.7%
33.3%
33.3%
43.7%
f1-score
47.1%
36.4%
60.0%
53.3%
58.1%
30.8%
42.9%
48.3%
40.0%
35.7%
45.2%
R2
R3
R4
R5
R6
R7
R8
R9
R10
Avg.
Precision
68.6%
53.4%
69.7%
55.3%
66.7%
58.6%
67.1%
63.2%
62.7%
57.4%
62.3%
Recall
68.4%
55.1%
68.9%
56.6%
57.6%
59.8%
66.6%
64.8%
58.7%
51.6%
60.8%
f1-score
67.5%
53.7%
69.1%
55.7%
57.1%
58.5%
66.3%
63.9%
60.1%
53.8%
60.6%
The recalls of all three question types are better than the tree kernel
method. The largest improvement was found in attribute question, but only
little improvement can be observed from the verification and reason
questions. The average recall of the reason question increases from 25.1% to
43.7%. This result informs us that some question types may have more
distinctive syntactic structures than the other types of questions. For those
question types with a wide variation of syntactic structures, the
classification with lexemes might improve on the result.
Table 19. Ten-fold stratified cross-validation results of Chinese verification question with
tree+CBR method
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
Avg.
58.5%
70.2%
58.3%
60.3%
59.4%
60.6%
63.8%
64.9%
60.0%
62.1%
100.0% 88.4%
93.0%
97.7%
95.3%
88.4%
93.0%
88.1%
88.1%
85.7%
91.8%
78.9%
80.0%
73.0%
73.9%
71.0%
73.4%
74.0%
74.7%
70.6%
74.0%
Precision 65.2%
Recall
f1-score
70.4%
Table 20. Ten-fold stratified cross-validation results of Chinese reason question with
tree+CBR method
Precision
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
Avg.
92.9%
71.4%
75.0%
53.8%
87.5%
80.0%
83.3%
73.3%
85.7%
76.9%
78.0%
47
Recall
86.7%
66.7%
80.0%
46.7%
46.7%
85.7%
71.4%
78.6%
85.7%
71.4%
72.0%
f1-score
89.7%
69.0%
77.4%
50.0%
60.9%
82.8%
76.9%
75.9%
85.7%
74.1%
74.2%
Table 21. Ten-fold stratified cross-validation results of Chinese attribute question with
tree+CBR method
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
Avg.
Precision 44.4%
35.3%
64.3%
65.0%
56.3%
36.4%
46.2%
50.0%
50.0%
46.7%
49.4%
Recall
50.0%
37.5%
56.3%
68.4%
60.0%
26.7%
40.0%
46.7%
33.3%
46.7%
46.6%
f1-score
47.1%
36.4%
60.0%
66.7%
58.1%
30.8%
42.9%
48.3%
40.0%
46.7%
47.7%
R2
R3
R4
R5
R6
R7
R8
R9
R10
Avg.
Precision 72.8%
63.0%
77.1%
74.2%
78.3%
60.4%
72.1%
63.9%
71.5%
65.3%
69.8%
Recall
72.5%
64.2%
74.3%
61.4%
62.9%
59.9%
61.2%
64.3%
66.7%
61.1%
64.9%
f1-score
70.5%
62.2%
74.6%
62.2%
64.6%
57.9%
62.6%
62.9%
67.1%
61.7%
64.6%
{Rx |
48
Chapter 5
DISCUSSION
Chinese and English datasets that students may put different questions into a
sentence. For example, Why do we always think of buy electricity from
other places, Don't Hong Kong have potential to develop reusable energy?
This sentence can actually be divided into two questions. The correct forms
of question should be Why do we always think of buy (consider to buy)
electricity from other places?, and Dont Hong Kong have potential to
develop reusable energy? This kind of sentence tokenization requires the
sentence tokenizer to have the ability to analyze the semantics of sentences
and this is out of the scope of this study. Hence, in the present study, the
sentences were tokenized based on the end punctuations.
n.d.).
The experiment result shows that the Syntax method can detect
questions which cannot be identified by the QM method, such as What will
happen if the gouverment (government) close down to (too) many schools
and it is not enough for the pupils to study., Why don't they have some
clean water., What is greenhouse gas and Can you write it again,
please.... Although these questions do not end with question marks, all of
them can be detected by the Syntax method by their constitution of
syntactically identifiable question substructure. However, the low recall also
revealed the problem of the Syntax method. There are two possible
explanations for this low recall. One possible explanation is that the Penn
Treebank does not cover the syntactic structure of questions used by Hong
Kong students. Another possible explanation is that the missing questions
are ungrammatical so that it cannot be detected by the Stanford Parser.
51
52
53
QM and Syntax methods. My new title is How can we use it?, for
example, is a sentence which will be wrongly identified as question by both
QM and Syntax method. The purpose of this sentence is to introduce the
new title for the speakers message instead of question asking. Both QM
and Syntax method fail to identify this example as non-question. It is quite
difficult for us to tackle this problem without the understanding of the
semantics of statements. A counter example is My question is how can we
use it?. This counter example has the same syntactic constituents as the
original example, but this is obviously a question. The two examples can
only be distinguished if our algorithm can differentiate the meaning between
new title and question. Another example is question Look carefuly
(carefully) what did I write? This sentence can be interpreted as Look
carefully! What did I write? as a combination of exclamatory and
interrogative sentences. However, it can also be interpreted as Look
carefully at what I wrote. Both interpretations require alternation to the
original sentence. Any alternation might cause the derivation of the original
meaning. This kind of ambiguity can hardly be resolved without the
consultation of the author of the original meaning of sentence.
Concerning
the
exceptionally
55
good
question
detection
The Sinica Treebank has a tagset with different focus from the tagset in
the Penn Treebank. The Sinica Treebank has a strong focus on the lexical
content while the Penn Treebank was built on its assumption of
context-freedom. This difference can be revealed from the way how the
part-of-speech for noun is defined in the two treebanks. The Penn Treebank
differentiates noun with two criteria: 1) singular or plural and 2) noun or
proper noun, while the Sinica Treebank derives eleven part-of-speech tags
for noun including tags for location, direction, time, quantifier, etc. From the
way how noun is classified in the Sinica Treebank, we can understand that
the Sinica Treebank is emphasized on the meaning of words in Chinese and
this characteristic lead to a wider variation of syntactic structures found in
the treebank. Since the Stanford parser is an unlexicalized parser, the
inclusion of the lexical content in the Sinica Treebank violates the
theoretical assumption of the Stanford parser and as a result affects the
performance syntactic analysis. Lexicalized parser, such as the one proposed
by Chen (1996), can be implemented in the future study to experiment if the
use of lexicalized parser would improve on the result on Chinese question
detection.
that might affect the accuracy of syntactic analysis. Appendix I shows a list
of lexemes segmented by the Stanford Segmenter. (do you agree),
for example, is a lexeme selected from Appendix I. This lexeme is actually
composited of two lexemes: 1) (agree) and 2) (ma), but the
segmenter wrongly tokenized it as a single lexeme. An investigation of the
segmentation result of the training questions in the Chinese dataset shows
that around 45.5% of segmented questions contain segmentation errors. It is
found from the segmentation errors that lexemes are mostly segmented as
morphologically complex words, such as (can bring). Tseng et al.
(2005) reports that the favour of morphologically complex words is one of
the cause of the segmentation error. Since those wrongly segmented lexemes
are not included in the corpus where the treebank generated from, the
syntactic parser may fail to identify the part-of-speech of lexemes and can
only rely on the other lexemes in the sentences for the determination of
syntactic structure of sentence. In the future study, we might possibly
remove the conditions to favour the morphologically complex words from
the segmenter and test if the segmentation results will be improved.
different kind of lexical items such as adjective, noun phrase, verb phrase or
even sentence.
(1)
[ADJ]
Hard-working?
(2)
[NP]
The function of Chlorophyll?
(3)
[VP]
(4)
[SUBJ+Verb]
You take care (of it)?
(6)
58
(8)
There should be a deeper meaning and propose for education?
(9)
What is my chase?
Besides the question markers, A-not-A question is another unique
characteristic found in Chinese questions. A-not-A questions with the format
of A + NEG + A, where A can be adjective, verb or noun (Huang & Chen,
2008). (11) and (12) show the examples of A-not-A question.
59
(10)
Is it cold today?
(11)
Will you visit Japan?
In addition to the characteristics of Chinese question, the questions in
the Chinese dataset also demonstrated a characteristic which is unique to
Cantonese speaking environment. This characteristic is the direct translation
of the oral language into the written format. Below are a few examples of
those questions:
(12)
What is your strength?
(13)
Do you know that?
(14)
What do you want to say?
(15)
Could you stop initiating some nonsense topics?
(16)
What is the relationship?
(17)
(Do you think) my intention is to make the earth even worse?
60
Two types of translation are shown in the above examples. One type of
translation is the use of a Chinese word with the same pronunciation as the
replacement of a Cantonese word. One example is the Chinese word
(mei). mei is one of the onomatopoeia in Chinese representing the sound of
sheep, but it is used in (13) and (15) as a replacement of the interrogative
word (what) and in (17) a substitution of question marker (ma).
Another example is the morpheme (ng). ng is frequently used in the
A-not-A question as a replacement of the word (bu). Another type of
translation is the use of English characters to replace the Cantonese words,
the d and ge in (15) and (17), respectively, are the substitution of the
Chinese auxiliary word (de).
The above discussion illustrated that the Chinese and English questions
have quite a few different characteristics. However, we still found from the
datasets a common characteristic of question. This characteristic is the
occurrence of code-mixing. Code-mixing refers to the situation in which
multiple languages are used in the same sentence during communication.
(19) is an example found in the Chinese dataset, while (20) is an example
retrieved from the English dataset. Since most natural language processing
method are designed to handle a single language, code-mixing increases the
difficulty of language processing and may probably affect the correctness of
syntactic analysis.
(18)
DNA
Does plant have DNA?
police
one language may not be totally applicable to another language. This claim
is correct to some extent, especially in the processing of syntactic structure
of the two languages. However, these two languages found in our datasets
also share some similarities. Firstly, the question mark is used in both
languages as the punctuation for questions, and, satisfactory results were
attained by using only the QM method to detect questions in both languages.
Secondly, the phenomenon of code-mixing is found in both Chinese and
English datasets. Questions with code-mixing may contain lexemes from
more than one language, and this phenomenon poses a great challenge to the
corpus-based Syntax method. Since a corpus would normally include
lexemes of a single language, in order to process the question with
code-mixing an extra process to translate the text into a single language
might be needed.
Finally, the Chinese language used in Hong Kong is also a big concern
of this experiment. First of all, the Chinese language in Hong Kong has
some unique characteristics which make it different from the modern
Chinese. Also, Chinese and English belong to different language families. It
awaits validation whether the tree kernel method derived from the
classification of English questions is suitable for classifying Chinese
questions generated by Hong Kong students.
Table 23. Classification result of English Verification, Reason and Attribute questions
Precision
Recall
f1-score
Verification
89.0%
89.0%
89.0%
Reason
99.0%
74.0%
84.7%
Attribute
78.0%
50.0%
60.9%
64
Figure 6. Parse tree of attribute question Can u tell me where have many wind?
Figure 8. Common subset trees of attribute questions Can u tell me where have many
wind? and Where are the rubbish?
65
Figure 10. Common subset trees of attribute question Can u tell me where have many
wind? and elaboration question Can u explain?
These examples reveal the problem of the current tree kernel method in
handling indirect question. One possible solution to this problem is to have
mechanism to identify the relative clause where have any wind from
Figure 6 and a higher weighting will be given to the subset tree within the
relative clause.
?. In this example, the lexical items for improve and efficiency are
substituted by Chinese terms (improve) and (efficiency).
This substitution leads to a wrong parse as shown in Figure 11. Figure 12
shows the correct syntactic structure for comparison. To deal with this type
of code-mixing, we can first translate all lexical items in Chinese back to
English and then pass the translated question to syntactic parser for further
analysis. Figure 13 shows the syntactic structure of the same questions as
Figure 11 after the translation. This syntactic structure is literally the same
as the one shown in Figure 12. However, this proposed solution may fail if
code-mixing is found to replace a phrase instead of individual lexical item.
An example is, Hong Kong police ? (How could
we improve the efficiency of the Hong Kong police?). There are two main
problems for the translation. First of all, the Chinese lexical items are
grouped into a clause and word alignment is needed to map the Chinese
lexical items into English one. Besides, a grammatical English question may
not be formed by direct mapping of the Chinese lexical items into English
one. Figure 15 shows the result of direct translation of the questions. It is
obvious that the translation is not in a grammatical form. One way to handle
this problem is to use the example-based machine translation (Nagao, 1984).
Since most machine translation methods are decided to translate text from
one single to another one, those method may not fit the need for the
translation with code-mixing. The example-based machine translation
method leverages only the experiences from previous translation as an
example for future translation. It is believed to be the most suitable method
for handling the case of code-mixing in the context of this study.
67
Figure 12. The correct parse tree of question How can we police ?
68
Figure 13. Parse tree of procedure question "How we protect the environment?"
Figure 14. The correct parse tree of question How can we police ?
69
Table 14.
Table 24. Classification result of Chinese verification, reason and attribute questions with
tree kernel method
Precision
Recall
f1-score
Verification
71.2%
78.9%
74.7%
Reason
99.2%
58.0%
72.2%
Attribute
96.7%
25.1%
38.4%
Mutli-class
89.1%
53.5%
61.1%
The variation of the syntactic structure is one of the main cause of the
unsatisfactory performance of Chinese question classification. Figure 16 and
Figure 17 show the syntactic structure of two reason questions. Both
questions have the forms (why) [A] (have) [B]? However,
the syntactic structures of the two figures are completely different. One
main cause of this problem is due to the fact that the Sinica Treebank, a
treebank for training the parser for Chinese syntactic analysis, is highly
lexicalized. Both lexemes (Hong Kong), (kids in Hong Kong),
(plant) and (venom) should have the part-of-speech of noun,
however, it is tagged with three different types of part-of-speech tag as
shown in the below figures. The lexicalized nature of the Sinica Treebank
has introduced a wider variation of syntactic structures. It is believed that by
70
71
Table 25. A comparison of the question classification result by CBR and tree kernel method
Verification
Reason
Attribute
CBR
Tree
CBR
Tree
CBR
Tree
Precision
65.7%
71.2%
73.6%
99.2%
47.5%
96.7%
Recall
79.1%
78.9%
59.6%
58.0%
43.7%
25.1%
f1-score
71.7%
74.7%
64.7%
72.2%
45.2%
38.4%
Table 26. A comparison of the tree kernel, CBR and tree kernel + CBR method for question
classification
Verification
Tree
CBR
Reason
Tree
Tree
CBR
+CBR
Attribute
Tree
Tree
CBR
+CBR
Tree
+CBR
Precision
71.2%
65.7%
62.1%
99.2%
73.6%
78.0%
96.7%
47.5%
49.4%
Recall
78.9%
79.1%
91.8%
58.0%
59.6%
72.0%
25.1%
43.7%
46.6%
F1-score
74.7%
71.7%
74.0%
72.2%
64.7%
74.2%
38.4%
45.2%
47.7%
Table 26 shows that the hybrid method achieved the highest f1-score
among the three methods in classifying reason and attribute questions. Even
though the f1-score of verification question is not the highest among the
three methods, but the result is very close to the tree kernel method. This
result indicates that the hybrid method in general have a better performance
than the two other methods.
76
Table 14, Table 18 and Table 22. The result shows that f1-score of the
hybrid method for multi-class classification is the highest among the three
methods. This indicates that the general performance of the hybrid method
for multi-class classification is the best among the three methods. Besides,
based on the factor that tree kernel method gives the highest precision of
question classification, only the negative returns in all tree kernels will be
passed to the CBR as input. This method eliminates the unnecessary input to
the CBR, and the result shows that by using this method both precision and
recall for the CBR method has been improved.
Table 27. A comparison of the multi-class classification performance with tree kernel, CBR
and tree+CBR method
Tree
CBR
Tree+CBR
77
Precision
89.1%
62.3%
69.8%
Recall
53.5%
60.8%
64.9%
f1-score
61.1%
60.6%
64.6%
Besides the difference shown above, there are two phenomena which
we find particularly interesting. The first phenomenon is about the high
occurrence of verification questions. The results in section 4.1 show that the
verification questions contribute to 38.3% and 45.8% of the total amount of
questions in English and Chinese datasets. The percentages of verification
questions in both datasets are higher than the total percentage of all
explanatory-oriented questions. The result of this study is quite similar to
the result attained by van Boxtels study (2000), in which 59% of the total
number of questions are verification questions.
82
Chapter 6
IMPLICATIONS,
RECOMMENDATIONS, LIMITATION
AND CONCLUSION
6.1 Main Findings
The findings of this study is presented according to the four research
questions listed in Chapter one.
1.
2.
3.
4.
Thirdly, the learning module of the CBR has not yet been implemented
in this study. The learning module is important to the mechanism of CBR.
The performance of CBR can be improved through the case revision and
case retention. It is believed that the incorporation of the learning module
could able to improve the result of question classification.
Moreover, the analysis in section 5.2.3 and 5.3 show that the current
88
linguistic information
for the
6.5 Conclusion
5.
for the future research that the automated question classification might not
need a large amount of training data. Thirdly, the state-of-art method for the
automated analysis of CSCL discourse is mainly focused on the replication
of human judgement based on a well-established CSCL framework. There is
no exploration on how the results of computer algorithms can feedback to
the CSCL researcher on the validity of the CSCL framework. As illustrated
in the previous section, the attribute questions exhibit a wide range of
syntactic and lexical variation, therefore the researchers may consider
whether this question category is distinctive enough or the category should
be re-organized into different question categories. Lastly, most of the
automated discourse analysis research in CSCL is still focusing on the
assessment with the quantity of a particular discourse act. However, as
discussed in section 5.5.2, the quality is also an important consideration to
the assessment of the discussion. This insight may form a cornerstone for
the integration of computational methods into the assessment of the CSCL
discourse.
Furthermore, it is believed that the tool for automatic identification and
classification mentioned in this study would bring impact to the computer
supported collaborative learning, if it is available for the students and
teachers in their daily teaching and learning. A basic usage is an instant
filtering of questions. A discussion can span across a few months and it is
difficult for both teachers and students to trace the development of
discussion. This tool enables the teachers and students to quickly identify all
questions found in the discussion. Besides, this tool may also serves as a
dashboard to reflect to healthiness of a discussion through the report of
quantity and quality of questions in discussion. Question is the cornerstone
of inquiry-based learning. If the quantity of questions in the discussion is
small or the students are focusing on the fact-based questions, it may be a
sign of the lack of inquiry in the discussion or the students are only focusing
on the information exchange. The teacher of such a discussion, even the less
experienced teacher, would understand that there is a need to intervene the
discussion and encourage the students to raise some more other questions
which is fruitful for the discussion.
91
Thirdly, the learning module of the CBR has not yet been implemented
in this study. The learning module is important to the mechanism of CBR.
The performance of CBR can be improved through the case revision and
case retention. It is believed that the incorporation of the learning module
could able to improve the result of question classification.
Moreover, the analysis in section 5.2.3 and 5.3 show that the current
syntax-based or lexeme-based Chinese question classification method took
into consideration some irrelevant
linguistic information
for the
6.9 Conclusion
Automated question detection and classification of CSCL discourse are
challenging topics. The question taxonomy and question analysis
technology are two big challenges in their own right. First of all, there exists
no standardized framework for analyzing questions in CSCL discourse.
Actually, standardization would never happen in the context of education,
since there is yet no agreement on the core question in education what is
learning? If one is trying to come up a standard method for analyzing all
questions in CSCL discourse, I would say he/she is leading himself/herself
to a dead end. So, what is the point of this study? I must re-state in here, the
purpose of this study is not to establish any standard for the classification of
inquiry in CSCL discourse. However, the aim of this study is to explore the
93
94
Appendix I
List of lexemes with Information Gain higher than or equal to 0.011
95
96
References
Aamodt, A., & Plaza, E. (1994). Case-Based Reasoning : Foundational
Issues , Methodological Variations , and System Approaches. AI
Communications, 7(1), 3959.
Aoun, J., & Li, Y. A. (1993). Wh-Elements in Situ : Syntax or LF ?
Linguistic Inquiry, 24(2), 199238.
Bereiter, C. (1994). Implications of Postmodernism for Science, or, Science
as Progressive Discourse. Educational Psychologist, 29(1), 312.
Bloehdorn, S., & Moschitti, A. (2007). Structure and semantics for
expressive text kernels. Proceedings of the sixteenth ACM conference
on Conference on information and knowledge management CIKM 07, 861. doi:10.1145/1321440.1321561
Bu, F., Zhu, X., Hao, Y., & Zhu, X. (2010). Function-based question
classification for general QA, (October), 11191128.
Burbules, N. C. (1993). Dialogue in teaching: Theory and practice. New
York: Teachers College Press.
Burtis, J. (1998). Analytic Toolkit for Knowledge Forum. Centre for
Applied Cognitive Science, The Ontario Institute for Studies in
Education/University of Toronto.
Carlson, A. J., Cumby, C. M., Rosen, J. L., & Roth, D. (1999). The SNoW
learning architecture. Technical Report UIUCDCS-R-99-2101 (pp.
114).
Census and Statistics Department. (2012). Population Aged 5 and Over by
Usual Language, 2001, 2006 and 2011 (A107).
http://www.census2011.gov.hk/en/main-table/A107.html.
Chan, C. K. K., Lee, E. Y. C., & Van Aalst, J. (2001). Assessing and
Fostering Knowledge Building Inquiry and Discourse. Paper presented
at the IKIT Summer Institute 2001. Toronto, ON.
97
Chen, F., Tsai, P.-F., Chen, K., & Huang, C. (1999). Sinica Treebank.
Computational Linguistics and Chinese Language Processing, 4(2),
97104.
Chen, K. (1996). A Model for Robust Chinese Parser. Computational
Linguistics and Chinese Language Processing, 1(1), 183204.
Cohen, J. (1960). A Coefficient of Agreement for Nominal Scales.
Educational and Psychological Measurement, 20(1), 3746.
doi:10.1177/001316446002000104
Cong, G., Wang, L., Lin, C.-Y., Song, Y.-I., & Sun, Y. (2008). Finding
question-answer pairs from online forums. Proceedings of the 31st
annual international ACM SIGIR conference on Research and
development in information retrieval - SIGIR 08, 467.
doi:10.1145/1390334.1390415
Cui, H., Kan, M., Chua, T., & Xiao, J. (2004). A Comparative Study on
Sentence Retrieval for Definitional Question Answering. SIGIR
Workshop on Information Retrieval for Question Answering.
Day, M.-Y., Ong, C.-S., & Hsu, W.-L. (2007). Question Classification in
English-Chinese Cross-Language Question Answering: An Integrated
Genetic Algorithm and Machine Learning Approach. 2007 IEEE
International Conference on Information Reuse and Integration,
203208. doi:10.1109/IRI.2007.4296621
Emoticon [Def. 1]. (n.d.).Merriam Webster Online. Retrieved January 29,
2013, from http://www.easybib.com/reference/guide/apa/dictionary
Enfield, N. J., Stivers, T., & Levinson, S. C. (2010). Questionresponse
sequences in conversation across ten languages: An introduction.
Journal of Pragmatics, 42(10), 26152619.
doi:10.1016/j.pragma.2010.04.001
Forman, G. (2003). An Extensive Empirical Study of Feature Selection
Metrics for Text Classification, 3, 12891305.
Foster, J., & Vogel, C. (2004). Parsing Ill-formed Text using an Error
Grammar. Artif. Intell. Rev. Special AICS2003 (pp. 124).
98
100
Li, B., Si, X., Lyu, M. R., King, I., & Chang, E. Y. (2011). Question
identification on twitter. Proceedings of the 20th ACM international
conference on Information and knowledge management - CIKM 11,
2477. doi:10.1145/2063576.2063996
Li, D. C. S. (2008). Understanding mixed code and classroom
code-switching : New Horizons in Education, 56(3), 7587.
Li, X., & Roth, D. (2002). Learning question classifiers: the role of semantic
information. Natural Language Engineering (Vol. 12, pp. 229249).
doi:10.1017/S1351324905003955
Man, S. S. (2006). First Language Influencing Hong Kong Students
English Learning. (Master of Art dissertation). Retrieved from the
HKU Scholar Hub.
Marcus, M. P. (1993). Building a Large Annotated Corpus of English : The
Penn Treebank. Computational Linguistics.
Miles, M., & Huberman, A. M. (1994). Qualitative Data Analysis.
Qualitative Data Analysis. Thousand Oaks, CA: Sage Publications.
Mitchell, T. M. (1997). Machine Learning. New York: McGraw-Hill.
Nagao, M. (1984). A framework of a mechanical translation between
japanese and english by analogy principle. In A. Elithorn & R. Banerji
(Eds.), Artifical and Human Intelligence. Elsevier B.V.
Pasca, M., & Harabagiu, S. M. (2001). High Performance Question /
Answering. Proceedings of the 24th annual international ACM SIGIR
conference on Research and development in information retrieval
Pages (pp. 366 374). New Orleans, LA.
Prager, J., Chu-Carroll, J., & Czuba, K. (2002). Statistical answer-type
identification in open-domain question answering. Proceedings of the
second international conference on Human Language Technology
Research - (p. 150). Morristown, NJ, USA: Association for
Computational Linguistics. doi:10.3115/1289189.1289276
101
Prager, J., Radev, D., Brown, E., Coden, A., & Samn, V. (1999). The Use of
Predictive Annotation for Question Answering in TREC8. Proceedings
of the 8th Text Retrieval Conference (TREC-8) (pp. 399410).
Rebedea, T., Dascalu, M., & Trausan-matu, S. (2011). Automatic
Assessment of Collaborative Chat Conversations with PolyCAFe. In F.
Wild, M. Wolpers, C. D. Kloos, D. Gillet, & R. M. C. Garca (Eds.),
Towards Ubiquitous Learning, 6th European Conference on
Technology Enhanced Learning, EC-TEL 2011 (pp. 299312). Palermo,
Italy: Springer.
Ros, C., Wang, Y.-C., Cui, Y., Arguello, J., Stegmann, K., Weinberger, A.,
& Fischer, F. (2008). Analyzing collaborative learning processes
automatically: Exploiting the advances of computational linguistics in
computer-supported collaborative learning. International Journal of
Computer-Supported Collaborative Learning, 3(3), 237271.
doi:10.1007/s11412-007-9034-0
Santorini, B. (1990). Part-of-Speech Tagging Guidelines for the Penn
Treebank Project.
Scardamalia, M. (2002). CSILE / Knowledge Forum. Education technology:
An encyclopedia.
Scardamalia, M., & Bereiter, C. (1991). Higher Levels of Agency for
Children in Knowledge Building : A Challenge for the Design of New
Knowledge Media. The Journal of the Learning Sciences, 1(1), 3768.
doi:10.1207/s15327809jls0101_3
Schmitt, S., & Bergmann, R. (1999). Applying Case-Based Reasoning
Technology for Product Selection and Customization in Electronic
Commerce Environments, (27068), 115.
Simmons, R. F. (1965). Answering English Questions by Computer : A
Survey. Communications of the ACM, 8(1), 5370.
Suzuki, J., Taira, H., Sasaki, Y., & Maeda, E. (2003). Question
classification using HDAG kernel. Proceedings of the ACL 2003
workshop on Multilingual summarization and question answering,
6168. doi:10.3115/1119312.1119320
102
Taylor, A., Marcus, M. P., & Santorini, B. (2003). The Penn Treebank: an
overview. In A. Abeill (Ed.), Treebanks: building and using parsed
corpora (pp. 522). Dordrecht, Netherlands: Kluwer Academic
Publishers.
Tseng, H., Chang, P., Andrew, G., Jurafsky, D., & Manning, C. (2005). A
Conditional Random Field Word Segmenter for Sighan Bakeoff 2005.
Fourth SIGHAN Workshop on Chinese Language Processing.
Van Boxtel, C. (2000). Collaborative concept learning: Collaborative
learning tasks, student interaction, and the learning of physics
concepts. Unpublished Doctoral thesis, Utrecht University, Utrecht,
The Netherlands.
Wang, K., & Chau, T.-S. (2010). Exploiting Salient Patterns for Question
Detection and Question Retrieval in Community-based Question
Answering. Proceedings of the 23rd International Conference on
Computational Linguistics (pp. 11551163).
Weber, R. P. (1990). Basic Content Analysis. Newbury Park, CA: Sage
Publications.
Yuan, J., & Jurafsky, D. (2005). Detection of Questions in Chinese
Conversational Speech. Automatic Speech Recognition and
Understanding, 2005 (pp. 4752).
Zhang, D., & Lee, W. S. (2003). Question classification using support
vector machines. Proceedings of the 26th annual international ACM
SIGIR conference on Research and development in information
retrieval - SIGIR 03, 26. doi:10.1145/860442.860443
Zhang, H., Hong-kui, Y., Xiang, D., & Liu, Q. (2003). HHMM-based
Chinese Lexical Analyzer ICTCLAS. Proc. of SIGHAN Workshop.
Zhang, K., & Shasha, D. (1989). Simple Fast Algorithms for the Editing
Distance between Trees and Related Problems. SIAM Journal on
Computing, 18(6), 12451262. doi:10.1137/0218082
Zhang, Y., & Wildemuth, B. M. (2009). Qualitative Analysis of Content.
Applications of Social Research Methods to Questions in Information
103
104