Você está na página 1de 22

Mining Problem-Solving Strategies

from HCI Data


XIAOLI FERN, CHAITANYA KOMIREDDY, VALENTINA GRIGOREANU, and
MARGARET BURNETT
Oregon State University

3
Can we learn about users’ problem-solving strategies by observing their actions? This article in-
troduces a data mining system that extracts complex behavioral patterns from logged user actions
to discover users’ high-level strategies. Our application domain is an HCI study aimed at reveal-
ing users’ strategies in an end-user debugging task and understanding how the strategies relate
to gender and to success. We cast this problem as a sequential pattern discovery problem, where
user strategies are manifested as sequential behavior patterns. Problematically, we found that the
patterns discovered by standard data mining algorithms were difficult to interpret and provided
limited information about high-level strategies. To help interpret the patterns as strategies, we
examined multiple ways of clustering the patterns into meaningful groups. This collectively led
to interesting findings about users’ behavior in terms of both gender differences and debugging
success. These common behavioral patterns were novel HCI findings about differences in males’
and females’ behavior with software, and were verified by a parallel study with an independent
data set on strategies. As a research endeavor into the interpretability issues faced by data mining
techniques, our work also highlights important research directions for making data mining more
accessible to non-data-mining experts.
Categories and Subject Descriptors: H.2.8 [Database Management]: Database Applications—
Data mining; I.2.6 [Artificial Intelligence]: Learning—Knowledge acquisition; H.1.2 [Informa-
tion Systems]: User/Machine Systems—Human factors
General Terms: Algorithms, Human factors
Additional Key Words and Phrases: Clustering, human-computer interaction, sequential patterns
ACM Reference Format:
Fern, X., Komireddy, C., Grigoreanu, V., and Burnett, M. 2010. Mining problem-solving strategies
from HCI data. ACM Trans. Comput.-Hum. Interact. 17, 1, Article 3 (March 2010), 22 pages.
DOI = 10.1145/1721831.1721834 http://doi.acm.org/10.1145/1721831.1721834

1. INTRODUCTION
In attempting to understand how humans interact with computer systems, re-
searchers in the Human-Computer Interaction (HCI) field often collect log data

Author’s address: X. Fern, 1148 Kelly Engineering Center, Corvallis, OR 97331; email: xfern@eecs.
oregonstate.edu.
Permission to make digital or hard copies of part or all of this work for personal or classroom use
is granted without fee provided that copies are not made or distributed for profit or commercial
advantage and that copies show this notice on the first page or initial screen of a display along
with the full citation. Copyrights for components of this work owned by others than ACM must be
honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers,
to redistribute to lists, or to use any component of this work in other works requires prior specific
permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn
Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or permissions@acm.org.
C 2010 ACM 1073-0516/2010/03-ART3 $10.00
DOI 10.1145/1721831.1721834 http://doi.acm.org/10.1145/1721831.1721834

ACM Transactions on Computer-Human Interaction, Vol. 17, No. 1, Article 3, Publication date: March 2010.
3:2 • X. Fern et al.

of user actions performed using software. To analyze such data, a common prac-
tice is to observe some specific behaviors that appear to be interesting based
on HCI experts’ opinions and then go through the log data to make quantita-
tive assessments of such behaviors and draw conclusions from them. While this
method has been fruitful in researching how best to support the needs of differ-
ent users, it is limited because human experts’ judgments of what is interesting
may not be available in some cases, and more importantly, can be biased due to
human subjectivity. In this paper, we investigate how data mining can best be
applied to HCI log data to help understand humans’ behaviors when interact-
ing with software systems. In particular, we seek to extract complex behavioral
patterns from HCI log data that provide insights about users’ problem-solving
strategies.
Human-computer interaction data pose interesting challenges to data min-
ing efforts. One difficulty arises from the fact that humans are inconsistent in
their ways of approaching a task, and often introduce extraneous and irrele-
vant actions, resulting in data with a large amount of noise and large variations.
Also, HCI data often contain rich contextual information but, as suggested by
Hilbert and Redmiles [2000], to analyze such data automatically, it is generally
necessary to transform the data and abstract away detailed contextual infor-
mation in order to obtain general trends. This brings up two problems—first,
finding the right level of abstraction can be difficult, and second, interpreting
data mining results without proper contextual information can be problematic.
In this article, we describe a data mining system to mine a set of HCI log data
collected in a particular problem-solving setting: end users debugging spread-
sheet formulas. Our system aims to automatically extract high-level behavioral
patterns that contribute evidence of the users’ problem-solving strategies. The
system consists of two major parts: the first part mines basic low-level behav-
ioral patterns from the log data, and the second part organizes these patterns
into groups and extracts high-level strategies from them.
The system was successful at discovering interesting strategic patterns.
Among its successes were: (1) discovery of patterns that matched the verbaliza-
tions of users regarding strategies in a separate user study with an independent
data set [Subrahmaniyan et al. 2008], (2) discovery of a strategic phenomenon
that had been hypothesized but not yet statistically verified by HCI researchers
in more than three years of empirical work, and (3) discovery of two new strate-
gic patterns that were highly correlated with user success and had not been
revealed in more than nine years of empirical work.
Fundamental to the success of our system was the consideration of a di-
versity of grouping mechanisms for the low-level patterns we found. In our
application, this provided interesting insights that were not available from any
single grouping. However, the process of selecting the grouping mechanisms
was largely human-directed and quite tedious. This suggests that automated
techniques for generating a set of diverse and potentially interesting groupings
of low-level patterns using available contextual information is a key direction
toward making data mining tools more accessible and easier to apply.
This article makes the following contributions. First, we present a data min-
ing system for extracting high-level user behavioral patterns (e.g., strategies)
ACM Transactions on Computer-Human Interaction, Vol. 17, No. 1, Article 3, Publication date: March 2010.
Mining Problem-Solving Strategies from HCI Data • 3:3

from HCI log data. This provides a general framework for similar HCI log
data mining applications, especially when the interpretability of the results
is important. Second, we present interesting HCI findings that resulted from
our system, illuminating marked behavioral differences between males and fe-
males and between successful and unsuccessful users. Finally, as a case study of
applying data mining to HCI applications, our work reveals important research
directions for improving the usability of data mining tools for researchers who
are not data mining experts.

2. BACKGROUND: THE HCI APPLICATION


Our work is situated in an HCI research area we term “Gender HCI” [Beckwith
and Burnett 2004]. Gender HCI is the domain that seeks to answer whether
males and females problem-solve differently when using purportedly gender-
neutral software and, if they do, how best to design software features to support
both genders equally. While individual differences, such as experience, cognitive
style, and spatial ability, are likely to vary more than differences between gen-
der groups, research from several domains has shown gender differences that
are relevant to computer usage. Further, Czerwinski et al. [2002] and Tan et al.
[2003] have already shown that important gender differences can exist that can
be accommodated within computer systems—but only if we know about them.
Research has indeed begun to reveal gender differences pertaining directly
to software, such as in programming environment appeal, playful tinkering
with features, attitudes toward and usage of end-user programming features
[Beckwith and Burnett 2004; Rode et al. 2004; Beckwith et al. 2005; Beckwith
et al. 2006a; Lorigo et al. 2006; Brewer and Bassoli 2006; Beckwith et al. 2007;
Rosson et al. 2007; Kelleher et al. 2007; Rode 2008]. The Gender HCI overall
goal is to identify problem-solving strategies and behaviors used by males and
females and to understand how these strategies and behaviors relate to gender
and problem-solving success. Such understanding is aimed at ultimately im-
proving the design of software to encourage successful strategies and behaviors
by users of both genders.
From a data mining perspective, the project is a challenging real-world
application: it is real-world in the sense that its processes, data collection, spec-
ifications, and goals were all established by HCI researchers independently of
any data mining considerations and without regard to data mining suitability.
We use this HCI project to investigate how best to mine and interpret HCI log
data of human behaviors. The goals of this work were therefore twofold: (1) to
automatically extract high-level user strategies from the log data to remove
human-related limitations, such that the result can yield better understanding
of user behavior; and (2) to examine the applicability of data mining to this
challenging problem, with a special focus on the interpretability of the mined
results.
In this research, we used the log data set that was collected in a previous
experiment on tinkering behaviors by males and females. The study’s design
has been described elsewhere [Beckwith et al. 2006b], but we briefly describe
it here for convenience.
ACM Transactions on Computer-Human Interaction, Vol. 17, No. 1, Article 3, Publication date: March 2010.
3:4 • X. Fern et al.

Participants. We used the logs of 39 participants (16 males and 23 females)


who performed spreadsheet debugging tasks. Note that although 39 partici-
pants is not a particularly large number, their logs contained adequate data for
a data mining effort: the log file of each participant contained over 400 actions,
totaling over 16,000 action entries, each identified by participant ID.
The participants were all college students. Five of the participants were en-
gineering majors (four males and one female); the remaining participants came
from a wide range of majors, such as fisheries science, food science, accounting,
finance, liberal studies, and education. (Computer science, computer engineer-
ing, and electrical engineering majors were not allowed to participate.) Four
participants were younger than 20, and four were older than 30 (split equally
between males and females); the remaining participants were between the ages
of 20 and 29. Only two males and one female had done any programming be-
yond the introductory programming classes required of many majors. All par-
ticipants had at least some spreadsheet experience. There were no statistically
significant differences between the males and females in any of the background
factors.

Experiment procedure. First, a pre-session questionnaire collected the par-


ticipant gender and background data, which was followed by a hands-on tutorial
to familiarize participants with the software. In the tutorial, participants per-
formed actions on their own machines with guidance at each step. The tutorial
deliberately left some of the features unexplained, leaving the understanding
of those features to the software’s built-in help system and to the participants’
choices as to whether to explore features that had not been taught.
A Payroll spreadsheet, which was used in earlier studies [Beckwith et al.
2005], was also used in this data mining study. This spreadsheet was derived
from real end-user spreadsheets and seeded with formula errors that had been
made by the end users who originally created those spreadsheets. The partic-
ipants were instructed, “Test the spreadsheet to see if it works correctly and
correct any errors you find.” Their final spreadsheets were saved to evaluate
how successful their debugging efforts had been.

Software. The problem-solving software used in the study was Forms/3,


a research prototype extension of spreadsheets [Burnett et al. 2001, 2004].
Figure 1 shows a snapshot of this prototype. This software is designed to aid
users in debugging spreadsheets, providing functionalities for systematically
testing a spreadsheet and giving feedback to help identify the bugs.
The main user actions made possible by these features include allowing users
to incrementally “check off ” (Checkmark) or “X out” (Xmark) values that are cor-
rect or incorrect respectively. In response to these actions, the software tracks
the testing progress made by a user, which is displayed using varying cell border
colors such that, as more testing is done, the border color of a cell incremen-
tally changes from red (light gray in Figure 1) to blue (dark gray in Figure 1).
The software also indicates the fault likelihood of each cell, with darker inte-
rior color indicating higher likelihood of containing a bug. The visual feedback
also includes a progress bar at the top to show the overall testedness of the
ACM Transactions on Computer-Human Interaction, Vol. 17, No. 1, Article 3, Publication date: March 2010.
Fig. 1. A snapshot of the software. The user previously noticed cells with correct values, such as the one in SingleWithHold, and placed a Checkmark
in them. The user also noticed an incorrect value in SocSec and placed an Xmark. As a result, one cell has been identified (highlighted) as being the only
possible source for the incorrect value. (If multiple cells had been identified, the highlight’s darkness would have depicted the system’s certainty of that
cell’s formula being erroneous.) The user is now preparing to place another mark in MarriedWithHold.
Mining Problem-Solving Strategies from HCI Data

ACM Transactions on Computer-Human Interaction, Vol. 17, No. 1, Article 3, Publication date: March 2010.
3:5
3:6 • X. Fern et al.

Table I. Common Actions and Their Meanings

Action Name Explanation


PostFormula (PF) Post (add to the display) a cell’s formula
HideFormula (HF) Hide a cell’s formula
EditValue (EV) Edit a cell’s value
EditFormula (EF) Edit a formula
CheckMark (CM) Placing CheckMark on a cell to mark its
value as correct
XMark (XM) Placing XMark on a cell to mark its
value as incorrect
ArrowOn (AON) Toggle an arrow on to show the dataflow
dependency
ArrowOn (AOF) Toggle an arrow off to hide the dataflow
dependency

Fig. 2. An excerpt from the user action logs.

spreadsheet. Users can toggle arrows (not shown in the figure) on and off that
depict not only the dataflow relationships among cells but also the testedness
status of these relationships. Finally, users can select a cell and push the “Help
Me Test” button (upper left in the figure) to have the system provide a suitable
test value to try in the selected cell.
These features are all supported with a built-in help system. Help is pro-
vided through tool tips, which are as short as possible to keep users’ reading
cost low. Additional information was available to the 39 participants via a “Tips”
expander for each tool tip, which could be expanded (as in Figure 1) and dis-
missed on user demand. The expanded tips included further information on
why the object was in its current state and possible actions to take next. Once
expanded, the tip would stay visible until the user dismissed it, allowing mul-
tiple tips to stay on the screen simultaneously to support non-linear problem
solving and to reduce the need for the user to memorize the information.
The software recorded user actions in log files as follows. An individual user
action is defined as a user’s physical interaction with a debugging feature, such
as placing an Xmark in a cell (XMark). In total, there were 19 actions available
and Table I shows a set of commonly used actions and their meanings. The
log files contain detailed information about every user action, including a time
stamp for when it was taken, on which cell it operated, and various related
parameters. Figure 2 shows an excerpt of a log file. Here, we show only the
time stamp, the name of the action, and the cell ID, omitting other information
not used in our data mining procedure.
The HCI researchers chose to use this software for the experiment because
they had access to its source code, allowing them to control every aspect of its
features. These HCI researchers later performed a related experiment using
ACM Transactions on Computer-Human Interaction, Vol. 17, No. 1, Article 3, Publication date: March 2010.
Mining Problem-Solving Strategies from HCI Data • 3:7

Fig. 3. Our data mining process has four steps: 1. preprocessing; 2. frequent pattern mining;
3. Pattern clustering; and 4. Statistical analysis of pattern groups. Arrows represent the information
flow.

Excel, with results consistent with those set in this software [Beckwith et al.
2007].

3. THE DATA MINING SYSTEM


To find strategies from log data, we first needed to decide what constitutes a
strategy. Typically a strategy refers to a reasoned plan for achieving a specific
goal. Note that such plans are not observable from users’ behaviors. What we
can observe are sequences of actions that an user takes to implement his or her
strategies. Thus, we consider these complex behavioral patterns as a form of
surrogate for strategies. That is, we consider sequences of actions that collec-
tively achieve a specific goal (such as deciding if a particular value is correct
or not) to be evidence pointing toward an underlying strategy. Such consid-
erations naturally led us to cast this as a sequential pattern mining problem
[Agrawal and Srikant 1995; Hatonen et al. 1996]. As shown in Figure 3, the
first part of our system applies traditional sequential pattern mining to the log
data to identify basic behavioral patterns that were frequently employed by the
participants.
As we will demonstrate in Section 3.1.2, however, the patterns we found with
sequential pattern mining alone were not interpretable as meaningful strate-
gies. To address this issue, the second part of our system is devoted to extracting
interpretable strategies from these basic patterns by organizing them into co-
herent groups. It also contains an additional step of statistical data analysis
to relate the found strategies to factors that are important to the gender HCI
researchers, namely, gender and debugging success. Below, we introduce these
two parts respectively.
ACM Transactions on Computer-Human Interaction, Vol. 17, No. 1, Article 3, Publication date: March 2010.
3:8 • X. Fern et al.

3.1 Mining Sequential Patterns


Below we describe our methods for preprocessing and mining the HCI log data.
We will also show the outputs of these methods and illustrate that they alone
were not able to produce satisfactorily useful results on the HCI log data of
human behaviors.

3.1.1 Preprocessing. Recall that the log files contain detailed contextual in-
formation about each action the users took. In our preprocessing step, all of the
contextual information was removed and only the action names were retained
to form the basic event sequences. This preprocessing step provided abstraction
of the data and allowed us to detect general behavioral trends that were not
specific to any particular context such as a set of particular cells. As an exam-
ple, the log excerpt in Figure 2 translates into the simple sequence of events:
(Tooltip, Checkmark, Checkmark). Although the contextual information was
removed for the sequential pattern mining step, we will show in Section 3.1.2
that it was useful to include such information for the pattern interpretation
step.

3.1.2 Mining Sequential Patterns. Sequential Pattern Mining was first in-
troduced in the context of retail data analysis [Agrawal and Srikant 1995]
and network alarm pattern analysis [Hatonen et al. 1996; Mannila et al. 1997].
Over the years, many different sequential pattern mining algorithms have been
developed using different mining procedures for different types of sequential
data [Garofalakis et al. 1999; Pei et al. 2001; Seno and Karypis 2002; Ayres
et al. 2002; El-Ramly et al. 2002; Yan et al. 2003]. Many of these techniques dif-
fer in how they search for sequential patterns, but they output similar patterns
when given the same inputs and pattern requirements. From these techniques,
we chose IPM2 [El-Ramly et al. 2002], a method developed for mining inter-
action patterns. We chose IPM2 because it is representative of many related
techniques and because it is suited to our data type, as our HCI log data share
similar characteristics with the interaction trace data targeted by IPM2. In the
following, we briefly describe the IPM2 algorithm.
Given a set of action sequences, the IPM2 algorithm incrementally searches
for fully ordered action sequences that satisfy some user prespecified minimum
support and maximum error criteria. The minimum support criterion specifies
the minimum number of times an action sequence has to be observed in the
log files; those exceeding the threshold are considered frequent. The maximum
error criterion specifies the maximum number of insertion errors allowed at
each location for pattern matching. For example, a pattern A, B, C is only
considered to be observed in sequence [A, E, D, B, C] if the maximum error
criterion is set to 2 or larger.
In our experiments, we set the minimum support threshold to be 20, which
requires a pattern to be observed at least 20 times to be considered frequent.
The maximum error threshold was set to 1 to allow a single insertion or deletion
in pattern matching. The maximum error threshold of one is conservative, but
still allows the system some flexibility in pattern matching.
ACM Transactions on Computer-Human Interaction, Vol. 17, No. 1, Article 3, Publication date: March 2010.
Mining Problem-Solving Strategies from HCI Data • 3:9

Table II.
A Random Subsample of the Found
Patterns. See Table I for Action Meanings
PID Pattern
P58 HF, CM, CM, CM, PF, HF
P149 PF, HF, CM, CM, CM, PF
P179 AON, AOF, PF, HF, PF, HF
P206 HF, CM, CM, PF, HF, PF
P273 HF, PF, EF, HF, PF, EF, HF

Using larger error thresholds would have allowed for more human varia-
tions. However, the number of sequences that a pattern could match grows ex-
ponentially in the number of allowed errors, leading to potentially inaccurate
matches. This would cause many patterns to be considered “frequent” because
they could be inaccurately matched to a large number of sequences. To avoid
this problem, we chose to be conversative, so as to be sure of the matches and
of the quality of the patterns that were considered frequent.
We further limited the pattern mining algorithm to output only those pat-
terns that were no shorter than 5 actions. This limit was set to ensure that
the output patterns would be sufficiently long to provide enough context to in-
terpret the patterns. Finally, we removed the patterns that were not maximal.
(Maximal patterns are those that do not contain a frequent superpattern.)
We applied this procedure to the HCI log data and found a total of 289
maximal patterns of length 5 or longer (length average = 7.38, max = 12).
In Table II, we show five randomly selected patterns from these 289 patterns.
Examining these patterns individually and collectively, we made the following
observations.
(1) There are many highly similar patterns. For example, P58 and P149 are
almost the same arrangement of posting and hiding formulas and using
checkmarks; they differ by only two actions. Note that there is no super-
pattern or subpattern relationship between these two patterns. We can not
consider them to be fully redundant and use concepts such as maximal
[Gouda and Zaki 2001] and closed [Pasquier et al. 1999] patterns to prune.
A key question then is whether such patterns should be considered to be
equivalent. In particular, we would like to know whether participants used
them for the same purpose, with the minor differences simply reflecting
random human variations among or within users. Now, if we do consider
P58 and P149 to represent the same general behavior, how about P206? It
differs from P58 by only two actions as well. We needed a principled way to
address this issue.
(2) Individual patterns carry limited information. For instance, P179 describes
the behavior of toggling on an arrow closely followed by toggling off an arrow,
followed by some post- and hide-formula operations. What does this specific
sequence of actions tell us about the user’s behavior? Hardly anything. It is
difficult to reach any general understanding of user behavior from a single
specific pattern like this. Again, we needed a principled way to help us go
beyond the specifics of any individual pattern and detect general trends.
ACM Transactions on Computer-Human Interaction, Vol. 17, No. 1, Article 3, Publication date: March 2010.
3:10 • X. Fern et al.

The previous observations led us to investigate possible ways of clustering


patterns into meaningful groups, so as to jointly consider a group of similar
patterns and detect from them the general behavioral trends that correspond
to high-level strategies.
Note that the parameter choices we made above (including the maximum er-
ror and minimum support thresholds, etc.) may appear to be highly important,
because different choices lead to an increased or decreased number of basic
patterns to be output by IPM2. However, the impact of such differences are
limited to part one of the data mining process. The second part, which we de-
scribe next, automatically groups similar patterns into clusters. An increased
or decreased number of basic patterns would not cause the underlying clusters
to differ. Rather, it would only cause some clusters to contain more or fewer
similar patterns than before. Thus, the success of the system is not particularly
sensitive to the parameter choices for IPM2.

3.2 Pattern Interpretation via Clustering


The frequent pattern mining community has long recognized that pattern in-
terpretability (or lack thereof) is a major bottleneck when applying pattern
finding algorithms. Standard algorithms can output hundreds or thousands of
patterns, prohibiting their detailed examination. Concepts such as maximal
patterns [Gouda and Zaki 2001] and closed patterns [Pasquier et al. 1999; Yan
et al. 2003] have been introduced to reduce pattern redundancy. However, the
quantity of the patterns is only part of the story. In many applications, individ-
ual patterns often carry limited information about the general phenomenon.
For example, in our application, two different action sequences might imple-
ment exactly the same general strategy, or two separate action sequences might
instead jointly implementing some strategy. Simply removing redundant pat-
terns will not help in such situations. We need to extract general phenomena,
whereas individual patterns are often single instantiations of such phenomena.
More recently, new techniques have emerged to compress the found patterns
[Xin et al. 2005], to group the patterns to find representative ones [Yan et al.
2005], to rank and select the top-k patterns according to their significance and
redundancy [Xin et al. 2006], and to provide semantic annotations of the pat-
terns using limited contextual information [Mei et al. 2006]. Such techniques
are more appropriate for dealing with the above mentioned problems. Still, they
seek to find a set of individual patterns that are of the most significance, where
each pattern still needs to be interpreted individually. This can be problematic
because, as mentioned previously, individual patterns carry limited informa-
tion for interpretation. In our system, we seek to cluster the patterns such that
the patterns in each group could collectively provide some high-level under-
standing of user strategies. Toward this goal, we examined different ways to
group the 289 sequential patterns and evaluate their results based on inter-
pretability. We will describe two different approaches for clustering patterns
into strategy-corresponding groups.

3.2.1 Supervised Clustering of Patterns. An important aspect of our data


mining goal was to understand the relationships of strategies we might find to
ACM Transactions on Computer-Human Interaction, Vol. 17, No. 1, Article 3, Publication date: March 2010.
Mining Problem-Solving Strategies from HCI Data • 3:11

gender and to problem-solving success. In other words, we were interested in


identifying strategies that are favored by certain user groups: in particular, fe-
male users versus male users, and successful users versus unsuccessful users.
One possible approach to achieving this special goal is to use supervised clus-
tering [Dettling and Buehlmann 2002; Slonim and Tishby 2001; Dhillon et al.
2003].
Supervised clustering includes additional supervised information (such as
class labels) into the clustering procedure to produce clusters that best distin-
guish among different classes. Successful applications of supervised clustering
include learning word clusters that are indicative of document classes [Slonim
and Tishby 2001; Dhillon et al. 2003] and extracting gene groups that distin-
guish different tissue types [Dettling and Buehlmann 2002]. Applying super-
vised learning to our pattern clustering problem could therefore seek pattern
clusters that would separate female users from male users or successful users
from unsuccessful users.
To apply supervised clustering, we collected for each pattern the number of
times each user used it. This gave us a 39-dimensional representation of each
pattern describing its usage frequency among all users. Each user was further
assigned a class label. For gender analysis, we assign the class labels to be
female or male, based on their background information. For success analysis,
we assign the users to be successful or unsuccessful depending on the number
of bugs they ultimately fixed (defined as being above or below the median, re-
spectively). The supervised clustering technique named Divisive Information
Theoretic Clustering (DITC) [Dhillon et al. 2003] was then applied to find pat-
tern groups that best differentiated between the classes.
In DITC, we considered the pattern-user usage data described above as a
two-dimensional contingency table, where the columns of the table correspond
to the users and the rows correspond to the patterns. This contingency table
is then transformed to form an empirical joint distribution of two discrete ran-
dom variables, one multinominal random variable representing the choice of
the patterns and the other binary variable representing the class label. The
clustering objective is to find a grouping of the patterns such that the mutual
information between the two random variables are best preserved. Conceptu-
ally this means that DITC looks for pattern groups whose usage frequencies
can separate female users from male users in gender analysis and can sepa-
rate successful users from unsuccessful users in success analysis. Note that the
clustering is performed with a supervised classification goal in mind; hence the
name supervised clustering.

3.2.2 Unsupervised Clustering of Patterns. In supervised pattern cluster-


ing, patterns are grouped together based on collectively how well they separate
one class of users from the other. In contrast, unsupervised clustering does
not use external class information for guidance and only relies on similarities
among the patterns to group them. A critical question is thus how to best cap-
ture the similarity among patterns. It is important to realize that there may
exist many different ways for the action sequences of the same strategy to differ
from or resemble one another. It is thus unlikely that one can design a single
ACM Transactions on Computer-Human Interaction, Vol. 17, No. 1, Article 3, Publication date: March 2010.
3:12 • X. Fern et al.

similarity measure that will capture all of the different possibilities. In fact,
we argue that there is no reason to limit ourselves to one particular similar-
ity measure. Different measures may reveal different underlying connections
among patterns. Based on this philosophy, we examined three different ways
to capture the similarity among patterns.

(1) Pattern clustering based on edit distance. In this approach, we consid-


ered the syntactic similarity among patterns. Patterns of similar action
sequences are deemed to represent the same general behavior, only per-
turbed by limited amounts of extraneous and irrelevant actions. Such syn-
tactic similarity can be captured by the edit distance measure between the
two patterns, which is defined as the minimum number of action insertions,
deletions, or substitutions required to match one pattern with another.
We computed the pairwise edit distance measure among all 289 patterns,
producing a 289 × 289 distance matrix. We then applied a hierarchical
average link clustering algorithm [Jain et al. 1999] to produce a dendrogram
representing a hierarchy of clustering solutions. Visually inspecting the
dendrogram, we found 37 pattern groups. In the remainder of this paper, we
will refer to this method as the edit distance method for pattern clustering.
(2) Pattern clustering based on usage profiles. Another way of judging the con-
nection between a pair of patterns is to look into the contextual information
about how different users use them. In particular, in this approach, we cre-
ated a usage profile for each pattern by looking into how frequently each
pattern was used by the 39 users. Patterns sharing similar usage profiles
were then considered to be related one another.
More specifically, similar to the supervised case, we created a 39-
dimensional usage profile to represent each pattern. Each dimension is
simply the number of times that the pattern was used by a particular user.
We then applied K-means [Jain et al. 1999] to the resulting 39-dimensional
dataset to group patterns that shared similar usage profiles. To determine
the number of clusters, we visually inspected the plot of the GAP statistics
[Hastie et al. 2001], which computes the within-cluster variance normalized
using a random reference distribution. 20 clusters were found.
In the remainder of this paper, we will refer to this method as the usage
profile method. Note that if two patterns A and B are grouped together
under the usage profile method, it suggests that users who use A a lot tend
to use B a lot as well or vice versa.
(3) Pattern clustering based on cell frequency. Finally, we looked into another
source of contextual information concerning how patterns were used. In this
case, we inspected the spreadsheet cells upon which each pattern operated.
In particular, given a pattern we found the cells that it operated on each time
it was used. For instance, if a pattern consisted of five actions, every time we
observed this pattern in action, the counts of the five cells that it operated on
would be incremented accordingly. If a cell was operated on multiple times
within these five actions, its count was incremented multiple times. In the
end, we obtained a cell-frequency distribution for each pattern describing
how many times this pattern operated on every cell of the spreadsheet. In
ACM Transactions on Computer-Human Interaction, Vol. 17, No. 1, Article 3, Publication date: March 2010.
Mining Problem-Solving Strategies from HCI Data • 3:13

total, the spreadsheet contains 24 cells. This results in a 24-dimensional


representation of the patterns.
Similarly, we applied K-means to the 24-dimensional data and found 20
clusters. Note that if two patterns A and B are grouped together under this
method, it suggests that cells that are touched frequently by A are also
touched frequently by B and vice versa. Below we will refer to the method
as the cell frequency method.

3.3 Statistical Testing


Of course, not all pattern groups found by the above methods necessarily cor-
respond to interesting user strategies. To find those that are interesting to our
application, that is, the gender HCI study, we would like to relate these pattern
groups to users’ gender and problem-solving success. In this study, we used two-
sample t-tests [Casella and Berger 1990] to identify a subset of pattern groups
(strategies) whose usages showed significant differences between female and
male users, and/or between successful and unsuccessful users.
Taking gender analysis as an example, we separated the users according to
their gender. Given a particular pattern group in consideration, we counted
how many times each user used the patterns from that group. This gave us a
count for each user. We then normalized the counts by the total number of uses,
resulting in a relative usage frequency by each user for that pattern group.
For example, a relative frequency of 20% for user A means that 20% of all the
pattern instances (usages) in the group were user A’s.
We considered the relative usage frequencies by the female users as one sam-
ple X (sample size = 23, that is, the number of female users), and the relative
usage frequencies by the male users as another sample Y (sample size = 16,
that is, the number of male users). The two-sample t-test determines whether
X and Y could have the same mean assuming they are both generated from
normal distributions that share the same variance. If we failed to reject the
null hypothesis (X and Y have the same mean), then we considered this pat-
tern group to be uninteresting for our study because it showed no statistically
significant difference between males and females.
We tested each pattern group with respect to both gender and success, se-
lecting only those groups that were significant according to our tests for further
inspection and interpretation. This allowed us to quickly zoom in on the pattern
groups of interest to the gender HCI research.
Note that we did not perform statistical testing on the pattern groups found
by supervised clustering due to the bias introduced in the supervised cluster-
ing step. Because supervised clustering intentionally searches for patterns to
group together to achieve distinctions between males and females or between
successful and unsuccessful users, pattern groups found this way will likely
be judged as significant by statistical tests but such significance is “cheating,”
because using the supervised information during clustering prearranges such
results.
We further point out that we do not intend to use these results as evidence of
statistical differences between user groups. One obvious reason against doing
ACM Transactions on Computer-Human Interaction, Vol. 17, No. 1, Article 3, Publication date: March 2010.
3:14 • X. Fern et al.

so is that we were doing multiple hypothesis testing and made no correction for
it. In contrast, here we use the “significance” results to zoom into interesting
pattern groups for further examination, which can then generate hypotheses
to be further tested in new experiments.

4. RESULTS
We examined the results from both supervised and unsupervised clustering
focusing on the interpretability of the resulting pattern groups.

4.1 Pattern Interpretation Results with Supervised Clustering


We applied the divisive information theoretic clustering method [Dhillon et al.
2003] for supervised clustering. As we mentioned in Section 3.2.1, we repre-
sented each pattern by its usage profile over all 39 users.
Supervised clustering was performed in two different ways. In the first case,
users were classified into female or male. In the second case, users were classi-
fied as successful or unsuccessful. In both cases, we clustered the patterns into
20 groups.1
Our results for supervised clustering were disappointing—examining the
resulting clusters did not reveal any general trends. The clusters contained
a set of highly random patterns that did not seem to relate to one another.
This was due to the fact that supervised clustering is geared toward correctly
classifying users rather than forming coherent clusters. Patterns A and B are
clustered together because the total number of times Patterns A and B were
used happens to be indicative of the users’ gender or success. However, these
two patterns could be completely irrelevant with regard to the strategies users
were implementing.
Also note that, as mentioned in Section 3.3, because the supervised infor-
mation was introduced in the clustering process, we consider it statistically
unsound to use the same data to examine the statistical differences between
how female and male (or successful and unsuccessful) users use a particular
pattern group.
In summary, while supervised clustering has been shown to be effective in
generating clusters for the purpose of classification, our results did not indicate
it to be an appropriate approach for forming meaningful pattern groups that
describe some general behavioral trends because the pattern groups it found
lacked coherency and were difficult to interpret.

4.2 Pattern Interpretation Results with Unsupervised Clustering


In contrast to the supervised results, our unsupervised clustering methods pro-
duced a number of highly interesting clusters, which collectively led to insights
about user strategies, relating both to the users’ gender and to problem-solving
success. Table III presents five interesting pattern groups discovered by our

1 Varying the cluster number did not produce any noticeable difference in the quality of the resulting

clusters.

ACM Transactions on Computer-Human Interaction, Vol. 17, No. 1, Article 3, Publication date: March 2010.
Mining Problem-Solving Strategies from HCI Data • 3:15

Table III.
A Summary of the Pattern Groups by Unsupervised Clustering
Methods. For Each Group, We Show a Set of Representative Patterns,
and the Action Sequences that Highlight the General Trends Behind
Each Group are Italicized for Emphasis. See Table I for Action Meanings
Method Group Representative Patterns
Edit Dist. 1  HF,PF,HF,CM,CM,CM,CM,CM 
 PF,CM,CM,CM,CM,CM 
 PF,HF,CM,CM,CM,CM,CM,CM 
Edit Dist. 2  CM,CM,CM,CM,CM,PF,HF 
 CM,CM,CM,XM,XM 
 CM,CM,CM,CM, HF 
Edit Dist. 3  HF,HF,PF,HF,PF,EF,HF 
Cell Freq.  HF,PF,HF,PF,PF,EF,HF 
 HF,EF,HF,PF,EF 
Usage Freq. 4  HF,PF,HF,PF,HF,HF,CM 
Cell Freq.  PF,PF,HF,PF,HF,CM,PF 
 HF,PF,HF,PF,HF,PF,HF,PF,HF,CM 
Usage Freq. 5  EV,HF,PF,EV,HF,CM,CM,CM 
Cell Freq.  PF,EV,HF,PF,EV,CM 
 HF,PF,EV,HF,CM,CM,CM,CM 
 CM,CM,CM,XM,XM 

Table IV. A Summary of Statistical Testing Results


Group Strategy Statistical Testing
1 Batch-checking Favored by successful users (p-value = 0.032)
2 Batch-checking Favored by successful users (p-value = 0.003)
3 Code inspection Favored by female users (p-value = 0.016)
4 To-check-list Favored by unsuccessful users (p-value = 0.017)
5 Testing Favored by successful users (p-value = 0.007)

unsupervised approaches and their representative patterns. Table IV presents


the statistical testing results. We now examine them individually.
(1) Pattern Groups 1 & 2 (“batch checking”): Pattern groups 1 and 2 were both
identified by the edit distance approach. We combine the discussion of these
two pattern groups because their patterns are similar. In particular, their
patterns can all be characterized by the behavior of consecutively checking
off cells as being correct (CM) or incorrect (XM), that is, a “batch” of checks
made in a row (termed here the “batch-checking” strategy). As indicated in
Table IV, the statistical tests indicated both pattern groups to have a dif-
ference in usage between the successful and unsuccessful user groups, with
the batch-checking strategy used more by successful users. See Figure 4(a)
for the box-plot of the pattern group 2 usage frequencies by the success-
ful and unsuccessful users respectively. The box-plot for pattern group 1 is
highly similar, thus omitted.
(2) Pattern Group 3 (“code inspection”): Pattern group 3 was identified by
the edit distance method as well as by the cell frequency method. This
strongly suggests that this cluster is real and not a random artifact cre-
ated by the clustering algorithms. Patterns in this group are characterized
ACM Transactions on Computer-Human Interaction, Vol. 17, No. 1, Article 3, Publication date: March 2010.
3:16 • X. Fern et al.

30
20

15
20

10

10
5

0
0

Successful Unsuccessful Females Males


(a) Pattern Group 2 (b) Pattern Group 3

50
30

40

20
30

20
10
10

0 0

Successful Unsuccessful Successful Unsuccessful


(c) Pattern Group 4 (d) Pattern Group 5

Fig. 4. The usage frequency box-plots for different pattern groups and user groups.

by inspecting formula cells—PostFormula (PF) and HideFormula (HF)—


followed by one or more EditFormula (EF) operations. We further inspected
the cells upon which these patterns operated, and found that 98% of the cells
touched by these patterns contained formulas (e.g., cell EmployeeTaxes
in Figure 1) as opposed to simply constant values (e.g., cell Allowances).
This suggests a strategy we call “code inspection,” which involves posting
and hiding formulas to inspect the “code” (formulas) and making formula
changes based on the inspection results. The statistical testing results sug-
gest that this strategy was favored by the female users. See Figure 4(b) for
the differences between the two user groups.
Interestingly, unbeknownst to us, in a separate user study in which
the participants were asked to describe their strategies for debugging
[Subrahmaniyan et al. 2008], “code inspection” was one of the top strategies
described by female participants, but not by the males. This finding exter-
nal to our study provides further evidence of the validity of the cluster. It is
especially interesting for gender HCI research because today’s spreadsheet
environments do not support code inspection well. This finding indicates
ACM Transactions on Computer-Human Interaction, Vol. 17, No. 1, Article 3, Publication date: March 2010.
Mining Problem-Solving Strategies from HCI Data • 3:17

that an important direction for future work in end-user programming en-


vironment design is to design features for code inspection so that females’
debugging strategies can be better supported.
(3) Pattern Group 4 (“to-check-list behavior”): Pattern group 4 was again iden-
tified by two methods—the usage frequency method and the cell frequency
method. The patterns in this group differ subtly from the patterns of group
3. In particular, although the patterns in group 4 also had a number of for-
mula manipulations (e.g., PF, HF), these manipulations were followed by
one or more CheckMark (CM) operations, as opposed to EditFormula (EF)
operations. This distinction is important. In fact, this group of patterns
suggest a different strategy we named “to-check-list behavior”: visually in-
specting the formulas and then making a mark on the cells, seeming to
indicate that they were marking off the list of formulas needing checking.
An external verification regarding this cluster’s validity is that this “to-
check-list” strategy was explicitly mentioned by several participants in the
separate user study. Again, this information was not available to us during
our analysis. Statistical testing shows that this pattern group was used
more frequently by the unsuccessful users, as indicated by Table IV and
Figure 4(c).
(4) Pattern Group 5 (”testing”): This pattern group was again identified by two
methods—the usage profile method and the cell frequency method. We refer
to this strategy as the “testing” strategy because the patterns in this group
describe the behavior of testing formulas by varying the input values. Note
that testing is different from code inspection—in the former, the user evalu-
ates values and in the latter the user evaluates the code (formulas) directly.
The testing nature of this pattern is suggested by the repeated EditValue
(EV) operations accompanied by a set of CheckMark (CM) operations. Inter-
estingly, in the separate user study, many participants explicitly described
testing as their strategy.
Statistical testing indicated that group 5 was favored by the successful
users (see Table IV and Figure 4(d)). Comparing this with the “to-check-list
behavior” suggests that when the Checkmark was used for marking testing
results (the usage intended by the designers), users saw more success. This
is consistent with previous HCI findings tying use of the CheckMark with
successfully testing and debugging spreadsheet formulas [Burnett et al.
2004].
It is interesting to note that some patterns much like “batch-checking”
(pattern groups 1 and 2) appeared in group 5 (e.g., the last pattern listed in
Table IV). Recall that if patterns A and B were grouped together by the us-
age profile method, it suggests users who used pattern A frequently tended
to use pattern B frequently as well. This indicates that “batch-checking”
was often used in combination with “testing.” This relationship provides a
possible partial explanation as to why the batch-checking behavior, like the
testing behavior, was related to debugging success.

From these results, we see that, using unsupervised clustering, our system
was able to extract meaningful pattern groups. Examining patterns within
ACM Transactions on Computer-Human Interaction, Vol. 17, No. 1, Article 3, Publication date: March 2010.
3:18 • X. Fern et al.

each group collectively allowed us to identify the general trend shared by the
patterns and made it possible to interpret the patterns as high-level strategies.
There are three main points to note about the importance of these results.
First, the match of the verbalizations in a separate user study strongly sug-
gest that the findings of our interpretation method are not only real but also
are at an appropriate level of abstraction.
Second, one of the results, namely the code inspection result (pattern
group 3), was not yet proven. HCI researchers had begun to suspect its presence,
but at the time of this study, they had not been able to statistically verify this
phenomenon in more than three years of manual empirical work in the context
of gender HCI [Beckwith et al. 2006a]. The fact that we were able to extract
this strategy from the data with no supervision contributed strong triangulated
evidence for its presence.
Third, two of the results are new, namely the beneficial effects of batch check-
ing (Pattern Groups 1 and 2) and the detrimental effects of using the debugging
features (CheckMarks and XMarks) for to-do list purposes (Pattern Group 4).
These results had not been revealed in more than nine years of manual empir-
ical work studying uses of these features as problem-solving devices [Burnett
et al. 2004].

4.3 Data Mining Lessons


In addition to the above HCI findings, as a case study, our investigation led to
the following understanding about applying frequent pattern mining to extract
interpretable trends from data.

(1) Individual patterns found by standard algorithms are highly redundant and
difficult to interpret. This was demonstrated in our study, in which these
patterns carried little information about the general trend. This is because
an individual pattern is often just one instance of a general phenomenon. To
understand the general phenomenon often requires seeing many instances
to capture general trends beyond the specifics of individual patterns. This
suggests that, when appropriately done, grouping patterns into meaningful
groups can increase the interpretability and the generality of the findings,
as was demonstrated in our study.
(2) To group patterns appropriately, special care must be taken to avoid in-
troducing bias into the grouping, which is exactly what happened when
we applied supervised clustering to group the patterns. Our fundamen-
tal purpose was to understand female (male) users’ behavior rather than
to predict the users’ gender from their actions. Although supervised clus-
tering has been shown to be effective at producing clusters for prediction
purposes, in our study it led to incoherent pattern groups that were not
interpretable as general strategies, and therefore did not produce useful
results for understanding.
(3) For unsupervised pattern clustering, there often exists a variety of con-
textual information that can be used to capture the relationship among
patterns and help discern the general trend behind a set of patterns. Using
ACM Transactions on Computer-Human Interaction, Vol. 17, No. 1, Article 3, Publication date: March 2010.
Mining Problem-Solving Strategies from HCI Data • 3:19

one type of contextual information (or criterion function) for clustering the
patterns should not exclude the possibility of using other information for
clustering as well. For example, in our study, we considered two types of
context information, the user usage profile and cell frequencies, in addition
to using the syntactic similarity for clustering (the edit-distance approach).
In fact, we consider it advisable to leverage different ways to group the
patterns because of the following potential benefits.
First, oftentimes different methods of grouping reach consensus about
some clusters, providing strong support to the validity of the results. Second,
different groupings collectively may reveal insights that are not available
from any single grouping. In our work, the different clustering methods
were manually selected, which required data mining expertise, was time
consuming and maybe incomplete. Similar problems will be encountered
when using commercially available or open source data mining software
packages. This suggests that an important research direction is to develop
automated or semi-automated approaches to producing a diversity of low-
level pattern groupings that are potentially of interest based on all of the
contextual information (user usages, cell frequencies etc.).

5. RELATED WORK
In this section, we review the literature on sequential pattern mining and its
applications to HCI data, with a focus on how our work differs from existing
work in this area.
Sequential Pattern Mining was first introduced in the context of retail data
analysis for identifying customers’ buying habits [Agrawal and Srikant 1995]
and finding telecommunication network alarm patterns [Hatonen et al. 1996;
Mannila et al. 1997]. It has since been successfully applied to many domains
including some HCI-related applications, such as Web access pattern mining
for finding effective logical structure for Web spaces [Perkowitz and Etzioni
1998], for automatic Web personalization [Mobasher et al. 2000], for mining
Windows processes data to detect unauthorized computer users [Cervone and
Michalski 2002], and for mining user-computer interaction patterns for finding
functional usage scenarios of legacy software [El-Ramly et al. 2002].
This article differs from existing work in a number of ways. First, the goal
of our study was to use mined patterns to help understand users’ behaviors,
rather than to predict them, the latter of which is the goal for much of the ex-
isting work [Srivastava et al. 2000; Perkowitz and Etzioni 1998; Mobasher et al.
2000; Cervone and Michalski 2002] in this area. Second, the focus of our work
was not on how to best find behavioral patterns (we used an existing technique
for this purpose). Instead, our focus was on how to improve the interpretabil-
ity of mined patterns using different contextual information. Within the data
mining community, there has been some recent work devoted to interpretabil-
ity issues [Yan et al. 2003, 2005; Xin et al. 2005, 2006; Mei et al. 2006; Wang
and Parthasarathy 2006]. Our work differs from these in that our data mining
system was applied and evaluated in a challenging application that was de-
fined independently of any data mining considerations and without regard to
ACM Transactions on Computer-Human Interaction, Vol. 17, No. 1, Article 3, Publication date: March 2010.
3:20 • X. Fern et al.

data mining suitability; further, the mined results were separately verified in a
parallel gender HCI study. Finally, most existing work on mining HCI data has
focused on either finding characteristic patterns of individual users [Mobasher
et al. 2000; Cervone and Michalski 2002] or finding patterns that are common
to the entire population [Perkowitz and Etzioni 1998; El-Ramly et al. 2002].
In contrast, our goal was to find patterns that were linked to subgroups of
users, that is, female versus male users, and successful versus unsuccessful
users.

6. CONCLUSION
This article makes the following contributions. First, we presented and eval-
uated a data mining system that successfully extracted high-level user be-
havioral patterns from HCI log data. The main data mining challenges the
system overcame were that it succeeded despite the fact that human behav-
ioral data is noisy, and that it managed to capture high-level trends beyond
variations among and within individuals. Our system builds upon standard
frequent sequential pattern mining and combats the interpretability issues
faced by standard techniques by clustering the found patterns into meaning-
ful groups. This system provides a general framework for similar HCI log
data mining applications, especially when interpretability of the results is
important.
Second, our results revealed interesting behavorial differences between
males and females and between successful and unsuccessful users. Our in-
terpretations of these behaviors in terms of strategies have since been verified;
they turned out to correspond well to the strategies mentioned by participants’
verbalizations in a separate user study.
Finally, our work can serve as a case study of applying data mining to a pre-
existing, ongoing HCI project by seasoned HCI researchers, whose research
questions and data collection procedures were not tailored to the needs of data
mining research. Our case study showed where and why a data mining re-
searcher’s expertise was needed, and the lessons we learned are therefore of
practical value to future applications of frequent pattern mining. In particu-
lar, our work suggests that developing automated techniques for generating a
set of diverse and potentially interesting groupings of low-level patterns using
available contextual information is a key direction toward making data mining,
and frequent pattern mining in particular, more accessible and easier to apply
to researchers who are not data mining experts.
For future work, we would like to enrich our general framework by consid-
ering a much richer representation of the basic patterns. Currently the basic
patterns are simple sequential patterns that lack the ability to capture con-
textual information about the current state of the user and the system. We
will consider using relational representations such as first-order Horn rules
[Khardon 1999] to represent the basic patterns. We believe the use of context-
sensitive patterns will enable our system to capture more complex behavioral
trends from noisy HCI log data and reveal more knowledge about users’ strate-
gies and needs when interacting with computer software.
ACM Transactions on Computer-Human Interaction, Vol. 17, No. 1, Article 3, Publication date: March 2010.
Mining Problem-Solving Strategies from HCI Data • 3:21

REFERENCES

AGRAWAL, R. AND SRIKANT, R. 1995. Mining sequential patterns. In Proceedings of the 11th Inter-
national Conference on Data Engineering. 3–14.
AYRES, J., FLANNICK, J., GEHRKE, J., AND YIU, T. 2002. Sequential pattern mining using a bitmap
representation. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowl-
edge Discovery and Data Mining. 429–435.
BECKWITH, L. AND BURNETT, M. 2004. Gender: An important factor in end-user programming en-
vironments? In Proceedings of the IEEE Symposium on Visual Languages and Human-Centric
Computing. 107–114.
BECKWITH, L., BURNETT, M., GRIGOREANU, V., AND WIEDENBECK, S. 2006a. Gender HCI: What about
the software? Computer, 83–87.
BECKWITH, L., BURNETT, M., WIEDENBECK, S., COOK, C., SORTE, S., AND HASTINGS, M. 2005. Effective-
ness of end-user debugging software features: Are there gender issues? In Proceedings of the
ACM Conference on Human-Computer Interaction. 869–878.
BECKWITH, L., INMAN, D., RECTOR, K., AND BURNETT, M. 2007. On to the real world: Gender and
self-efficacy in excel. In Proceedings of the IEEE Symposium on Visual Languages and Human-
Centric Computing. 119–126.
BECKWITH, L., KISSINGER, C., BURNETT, M., WIEDENBECK, S., LAWRANCE, J., BLACKWELL, A., AND COOK, C.
2006b. Tinkering and gender in end-user programmers’ debugging. In Proceedings of the ACM
Conference on Human Factors in Computing Systems. 231–240.
BREWER, J. AND BASSOLI, A. 2006. Reflections of gender, reflections on gender: Designing ubiqui-
tous computing technologies. In Proceedings of Gender & Interaction: Real and Virtual Women
in a Male World, (Workshop at AVI). 9–12.
BURNETT, M., ATWOOD, J., DJANG, R., GOTTFRIED, H., REICHWEIN, J., AND YANG, S. 2001. Forms/3:
A first-order visual language to explore the boundaries of the spreadsheet paradigm. J. Funct.
Program. 11, 155–206.
BURNETT, M., COOK, C., AND ROTHERMEL, G. 2004. End-user software engineering. Comm. ACM,
53–58.
CASELLA, G. AND BERGER, R. L. 1990. Statistical Inference. Duxbury Press.
CERVONE, G. AND MICHALSKI, R. 2002. Modeling user behavior by integrating aq learning with a
database: Initial results. Intell. Inform. Syst., 43–56.
CZERWINSKI, M., TAN, D. S., AND ROBERTSON, G. G. 2002. Women take a wider view. In Proceedings
of ACM Conference on Factors in Computing Systems. ACM Press, 195–202.
DETTLING, M. AND BUEHLMANN, P. 2002. Supervised clustering of genes. Genome Biol. 3.
DHILLON, I. S., MALLELA, S., AND KUMAR, R. 2003. A divisive information-theoretic feature cluster-
ing algorithm for text classication. J. Mach. Learn. Resear. 3, 1265–1287.
EL-RAMLY, M., STROULIA, E., AND SORENSON, P. 2002. Interaction-pattern mining: Extracting usage
scenarios from run-time behavior traces. In Proceedings of the 8th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining (KDD’02).
GAROFALAKIS, M. N., RASTOGI, R., AND SHIM, K. 1999. SPIRIT: Sequential pattern mining with
regular expression constraints. In VLDB J. 223–234.
GOUDA, K. AND ZAKI, M. 2001. Efficiently mining maximal frequent itemsets. In Proceedings of
the International Conference on Data Mining.
HASTIE, T., TIBSHIRANI, B., AND FRIEDMAN, J. 2001. Elements of Statistical Learning: Data Mining,
Inference and Prediction. Springer-Verlag.
HATONEN, K., KLEMETTINEN, M., RONKAINEN, P., AND TOIVONEN, H. 1996. Knowledge discovery from
telecommunication network alarm data bases. In Proceedings of the 12th International Confer-
ence on Data Engineering. 115–122.
HILBERT, D. AND REDMILES, D. 2000. Extracting usability information from user interface events.
ACM Comput. Surv. 32, 4, 384–421.
JAIN, A. K., MURTY, M. N., AND FLYNN, P. 1999. Data clustering: a review. ACM Comput. Surv. 31, 3.
KELLEHER, C., PAUSCH, R., AND KIESLER, S. 2007. Storytelling alice motivates middle school girls
to learn computer programming. In Proceedings of the ACM Conference on Factors in Computing
Systems. 1455–1464.
KHARDON, R. 1999. Learning action strategies for planning domains. Artif. Intell. 113, 125–148.

ACM Transactions on Computer-Human Interaction, Vol. 17, No. 1, Article 3, Publication date: March 2010.
3:22 • X. Fern et al.

LORIGO, L., PAN, B., HEMBROOKE, H., JOACHIMS, T., GRANKA, L., AND GAY, G. 2006. The influence of
task and gender on search and evaluation behavior using google. In Information Processing and
Management, 1123–1131.
MANNILA, H., TOIVONEN, H., AND VERKAMO, A. 1997. Discovery of frequent episodes in event se-
quences. In Proceedings of the 1st International Conference on Data Mining and Knowledge
Discovery, 259–289.
MEI, Q., XIN, D., CHENG, H., HAN, J., AND ZHAI, C. 2006. Generating semantic annotations for
frequent patterns with context analysis. In Proceedings of the 12th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining (KDD’06).
MOBASHER, B., COOLEY, R., AND SRIVASTAVA, J. 2000. Automatic personalization based on Web usage
mining. Comm. ACM 8, 142–151.
PASQUIER, N., BASTIDE, Y., TAOUIL, R., AND LAKHAL, L. 1999. Discovering frequent closed itemsets
for association rules. In Proceedings of the 7th International Conference on Database Theory.
PEI, J., HAN, J., MORTAZAVI-ASL, B., AND PINTO, H. 2001. Prefixspan: Mining sequential patterns
efficiently by prefix-projected pattern growth. In Proceedings of International Conference on Data
Engineering.
PERKOWITZ, M. AND ETZIONI, O. 1998. Adaptive Web sites: Automatically synthesizing Web pages.
In Proceedings of the 15th National Conference on Artificial Intelligence.
RODE, J. 2008. An ethnographic examination of the relationship of gender & end-user program-
ming. Ph.D. thesis, University of California Irvine.
RODE, J. A., TOYE, E. F., AND BLACKWELL, A. F. 2004. The fuzzy felt ethnography—understanding
the programming patterns of domestic appliances. Person. Ubiq. Comput. 8, 161–176.
ROSSON, M., SINHA, H., BHATTACHARYA, M., AND ZHAO, D. 2007. Design planning in end-user web
development. In Proceedings of the IEEE Symposium on Visual Languages and Human-Centric
Computing. 189–196.
SENO, M. AND KARYPIS, G. 2002. Slpminer: An algorithm for finding frequent sequential patterns
using length decreasing support constraint. In Proceedings of the 2nd IEEE International Con-
ference on Data Mining. 418–425.
SLONIM, N. AND TISHBY, N. 2001. The power of word clusters for text classification. In Proceedings
of the 23rd European Colloquium on Information Retrieval Research.
SRIVASTAVA, J., COOLEY, R., DESHPANDE, M., AND TAN, P.-N. 2000. Web usage mining: discovery and
applications of usage patterns from web data. SIGKDD Explor. 1, 2, 12–23.
SUBRAHMANIYAN, N., BECKWITH, L., GRIGOREANU, V., BURNETT, M., WIEDENBECK, S., NARAYANAN, V., BUCHT,
K., DRUMMOND, R., AND FERN, X. 2008. Testing vs. code inspection vs. . . . what else? Male and
female end users’ debugging strategies. In Proceedings of the ACM Conference on Factors in
Computing Systems. 617–626.
TAN, D. S., CZERWINSKI, M., AND ROBERTSON, G. G. 2003. Women go with the (optical) flow. In
Proceedings of the ACM Conference on Factors in Computing Systems. ACM Press, 209–215.
WANG, C. AND PARTHASARATHY, S. 2006. Summarizing itemset patterns using probabilistic models.
In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and
Data Mining. ACM, New York, NY, 730–735.
XIN, D., CHENG, H., YAN, X., AND HAN, J. 2006. Extracting redundancy-aware top-k patterns. In
Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and
Data Mining (KDD’06).
XIN, D., HAN, J., YAN, X., AND CHENG, H. 2005. Mining compressed frequent-pattern sets. In Pro-
ceedings of the International Conference on Very Large Data Bases.
YAN, X., CHENG, H., HAN, J., AND XIN, D. 2005. Summarizing itemset patterns: A profile-based
approach. In Proceedings of 11th ACM SIGKDD International Conference on Knowledge Discovery
and Data Mining.
YAN, X., HAN, J., AND AFSHAR, R. 2003. Clospan: Mining closed sequential patterns in large
datasets. In Proceedings of the 3rd SIAM International Conference on Data Mining.

Received December 2007; revised August 2008, March 2009; accepted November 2009

ACM Transactions on Computer-Human Interaction, Vol. 17, No. 1, Article 3, Publication date: March 2010.

Você também pode gostar