Você está na página 1de 4

CSCI 591

April 28th, 2011

Using Conditional Random Fields for Clickstream Analysis


Jeffrey Kyle Elser
Computer Science Department Montana State University 357 EPS Building Bozeman, MT 59715, USA
JELSER@RIGHTNOW.COM

Abstract
Conditional Random Fields are used to model clickstream data and predict the next page in the stream.

Introduction

RightNow Technologies products include a knowledgebase that is presented to endusers via web pages (reference needed). This presentation relies on various technologies including search engines (reference needed), clustering (reference needed), and an ant colony optimization approach displaying the data (reference needed). To date, most of RightNows research has been devoted to tuning and enhancing these technologies. The goal of RightNows previous research and the research in this paper is to present pertinent information to the endusers as quickly as possible. If the enduser cant find the information they are looking for quickly, they will submit an incident (support ticket) that a support representative will need to answer manually. In order to reduce call center costs, it is important to reduce the number of incidents that endusers submit. Increasing the endusers ability to find relevant information before they submit an incident is viewed as increasing the selfservice rate. RightNows enduser pages have been made increasingly modular and customizable in an attempt to allow our customers to maintain their own corporate web design and present data to customers in the way that suits their business cases best. This increased flexibility has lead to a variety of approaches to displaying data and organizing the content that RightNow presents. Some of these approaches are clearly suboptimal. Our approach to increasing the self-service rate differs from previous approaches in that we will attempt to improve the endusers navigation through the web pages rather just improve some of the technology used. There have been several attempts at clickstream modeling using probabilistic graphical models (reference needed). I continue with that line of research and use conditional random fields to model the clickstream data from several of RightNows customer sites. Using those models I predict the next action a user will take, given his previous actions.

Background

In this section I describe RightNow Technologies knowledgebase presentation and review previous work concerning analyzing clickstream data. I also describe the structure and underlying theory of the three conditional random field models that I will investigate in chapter 3.

ELSER

2.1 Knowledgebase Presentation RightNow Technologies products include a knowledgebase that is presented to endusers via a very modular and customizable set of web pages. Figure 1 shows a typical page layout. The enduser will enter the knowledgebase by clicking a link on the main company website. From there they can navigate through a series of searches, answer list pages, and answer detail pages. Answers are collections of FAQ-like information, answer list pages are web pages that contain links to individual answer detail pages, and answer detail pages contain the full text of an answer. The main support page includes a small answer list page with the sites most accessed answers. If the enduser is unable to find the information they are looking for, they will submit an incident. Before the incident will be entered into the database, SmartAssistant presents the user with one more answer list page. SmartAssistant is just a technology that does a search based on the contents of the incident the enduser was about to submit. Until the incident is confirmed, the enduser can always click any combination of pages in the blue shaded section of Figure 1 and effectively abort the submission. RightNow also has the ability to offer proactive chat to endusers. Currently this is done via a link on the enduser pages. On possible application of this research will be to offer the proactive chat via a popup when the CRF predicts that the current clickstream will terminate in an incident.

Figure 1. Example self-service page layout and interaction paths. The stream starts when an enduser clicks into the main support page from the companys main website. The stream will end in one of two possible ways: the enduser finds the information they needed and leave, or they continue on to confirm their incident submission.

2.2 Clickstream Analysis RightNow Technologies products include a knowledgebase that is presented to endusers via web pages (reference needed). This presentation relies on various technologies including search engines (reference needed), clustering (reference needed), and an ant colony optimization approach displaying the data (reference needed). To date, most of RightNows research has been devoted to tuning and enhancing these technologies.

USING CONDITIONAL RANDOM FIELDS FOR CLICKSTREAM ANALYSIS

2.3 Conditional Random Fields In contrast to many other graphical models, which are meant to describe the joint distribution p(x,y), conditional random fields describe the conditional probability distribution p(y|x). This is important because modeling p(x,y) requires p(x) but CRFs do not since y is already conditioned on x. (reference needed) shows how to derive the linear-chain CRFs shown in Figure 2 from hidden Markov models such that if the HMM represents p(x,y) then the linear-chain CRF represents p(y|x). The result is equations 1 and 2.

Figure 2. Linear-chain conditional random field. Nodes labeled with an X denote observed variables. Nodes labeled with a Y denote unobserved variables.

= = ,

, ,

Equation 1 and 2. The equation for the linear-chain conditional random field as shown in Figure 2 where x and y are random variables, denotes a parameter, and f() is a feature function. Z(x) is the normalization function.

Figure 3. Linear-chain conditional random field with additional evidence nodes. Nodes labeled with an X denote observed variables. Nodes labeled with a Y denote unobserved variables.

ELSER

= = ,

, ,

, ,

, ,

Equation 3 and 4. The equation for the linear-chain conditional random field with additional evidence nodes as shown in Figure 3 where x and y are random variables, denotes a parameter, and f() is a feature function. Z(x) is the normalization function.

Figure 4. Linear-chain conditional random field with additional evidence nodes and higher-order Markov assumption. Nodes labeled with an X denote observed variables. Nodes labeled with a Y denote unobserved variables.

= =

, ,

Equations 5 and 6. The equation for the general conditional random field as shown in Figure 4 where x and y are random variables, denotes a parameter, and f() is a feature function. Z(x) is the normalization function.

Você também pode gostar