Você está na página 1de 9

Guangzhou, China Modern Foreign Languages (Quarterly) Vol. 26 No.

1 January 2003: 76-84

The Theory of Latent Semantic Analysis and its Application

GUI Shi-chun

The Plato's problem-- how do people know as much as they do with as little information as they
get?-- also known as the poverty of the stimulus, negative evidence, or the logical problem of
language acquisition , has aroused the interest of many philosophers, psychologists, linguists, and
computational scientists. Nativism is the answer provided by Chomsky, but psychologists like MacWhinney
and computational linguists like Sampson offer different explanation. Quine calls the problem the scandal
of induction, whereas Shepard maintains that a general theory of generalization and similarity is as
necessary to psychology as Newton's laws are to physics. However, the acceptance of the hereditary nature
of language propensity does not mean the solution of the general theory of generalization and similarity--the
problem of categorization. Many models have been suggested to find a mechanism by which a set of stimuli,
words, or concepts come to be treated as similar. They attempt to postulate some constraints that can narrow
the solution space of the problem that is to be solved by induction. Latent semantic analysis (LSA) put forth
by Landauer et al isa high-dimensional linear associative model that embodies no human knowledge
beyond its general learning mechanism, to analyze a large corpus of natural text and generate a
representation that captures the similarity of words and text passages.The model employs a statistical
technique of linear algebra known as singular value decomposition (SVD). The input to LSA is a matrix {A}
consisting of rows representing unitary event types by columns representing contexts in which instances of
the event types appear. SVD then decomposes the matrix into three matrices: {A}={U}{w}{V}T, and
reduction of dimensionality is carried out in the reconstruction of the original matrix. To illustrate the power
of reduction of dimensionality, two examples are given. In the example given by Landauer, the text input is
titles of nine technical articles, five about human-computer interaction, four about mathematical graph
theory. LSA shows how in the two-dimensionally reconstructed matrices two words that were totally
uncorrelated in the original are quite strongly correlated (r =.9) in the reconstructed approximation. The
other example is the use of SVD in a preliminary study of the relationship among the errors by Chinese
learners of English. Reduction of dimensionality offers a better explanation of trends of development of
spelling errors, misuse of words, and syntactic construction among five different types of learners. LSA
have a wide area of application in connection with text processing.
Key words: Platos problem, similarity, induction, latent semantic analysis, singular value
decomposition

Correspondence: National Research Centre for Linguistics and Applied Linguistics


Guangdong University of Foreign Studies, Guangzhou 510420, P. R. China
2003 1 Jan. 2003
26 1 Modern Foreign Languages (Quarterly) Vol. 26 No.1




(Latent Semantic Analysis, LSA)


.90
LSA

[] H195 [] A [] 1003-6105200301-0076-9

(Plato)
1

Chomsky 2
3 4
MacWhinney Sampson Quine
Gavagai
gavagaigavagai

scandal of induction

Shepard1987

Shepard

2000

Landauer Dumais(1997)

1
Meno
Phaedo
Cratylus (physis)(nomos)
2
Chomsky196519862000
Pinker(1994)

3
Brian MacWhinney

4
Geoffrey Sampson1997 Educating Eve Empirical Linguistics
2001
of of the of
78

3 1
2
3
Landauer Foltz
DumaisDeerwesterFurnas (Deerwester et. al. 1990) Kintsch
Latent Semantic Analysis, LSA

LSA

reduction of dimensionality Landauer


A BA C 5 B
C 8

A BC 4.5
B C 9

Feigenbaun Aihara
100
5
Osgood1971 70

Kintsch1988,1998construction-integration
model
prepositions
the red rose the rose is red

5


394-395
79

argumentsreferents
Graesser 1981

LSA Latent Semantic


Indexing, LSI

LSA

(1) The U. S. S. Nashville arrived in Colon harbor with 42 marines


(2) With the warship in Colon harbor, the Colombian troops withdrew.
warshipLSA vectorNashville
Colonharbor LSA

(3) John is Bobs brother and Mary is Annes mother.
(4) Mary is Bobs mother.
sisterdaughterfatherson (4) Bob
Ann Anne John John Ann Bob Mary LSA

LSA

LSA/
Singular Value DecompositionSVDLSA
SVDmn{A}mnm>n
mn{U}{w}
NN{V}(transpose)NN

{A} = {U}{w}{V}T

9 5 4
9 12
1

c1: Human machine interface for ABC computer applications


c2: A survey of user opinion of computer system response time
c3: The EPS user interface management system
c4: System and human system engineering testing of EPS
80

c5: Relation of user perceived response time to error measurement


m1: The generation of random, binary, ordered trees
m2: The intersection graph of paths in trees
m3: Graph minors IV: Widths of trees and well-quasi-ordering
m4: Graph minors: A survey

SVD 3 2

2
81

m4 survey trees trees m4


m4 Graph minors
m3 Graph minors
trees 0 0.66 graph minors
trees 0.66 survey m4 1 0.42
human user
human minors -.38-.29.94 -.83
human user minors human user
human minors
LSA
2002 100 5 st2 4 st3
6
st4 st5 st6 61
21 87% Landauer SVD
1 +12-plog p
Maletic et al. 1999
entropy

SVD 2 6
Fm1

Fm1 St2 St3 St4 St5 St6


1929 2877 2113 1827 1687
3.30 3.47 3.34 3.27 3.24
3.52 3.46 3.36 3.25 3.00

4 6
CET

6
Excel Greg Hood Excel Poptools2.4
82

pollution

SVD
wd3 SVD

Wd3 St2 St3 St4 St5 St6


1102 1635 1815 757 360
3.27 3.45 3.50 3.09 2.75
3.30 3.49 3.44 2.99 2.84

St3 St4 St3 St6


St5 2.84 2.99 0.15 St6
Sn8 SVD

Sn8 St2 St3 St4 St5 St6


1104 446 862 493 232
3.27 2.85 3.16 2.90 2.55
3.14 3.06 2.97 2.90 2.68

St3 St2 St4St5


SVD cosine7
0.604 0.728
0.614 0.817

LSA

LSA SVD SVD


LSA
LSA
Landauer Groliers
SVD TOEFL 80
LSA
LSA 65%
Landauer

Carroll Word Frequency Book1971


0.20 50 10
LSI,80Dumais1982
10335823SVD100

7
Cosine

XY
83

30 LSI13% Berry 1994


LSI30%Deerwester 1990LSI
term dependencyLSI
TREC50Rosario 2000
FoltzKintschLandauer1993LSA

.90http://LSA. colorado.edu

These findings indicate a considerable degree of functional equivalence of perception and imagery. However, it is
possible that subjects in the imagery condition merely made plausible guesses about the fields of resolution, and did
not actually rely on imagery at all.
While it is very straightforward to see that previous learning can facilitate problem solving by supplying
well-practiced skills and strategies, it is perhaps less obvious that knowledge acquired in the past can sometimes
disrupt, and interfere with, subsequent attempts to solve problems.
.82
LSALandauerLahamFoltz19985LSA

5
8
LSA
LSA

LSALandauerLSATill
1988
Kintsch1999LSA
Long-term Working MemoryLTWM

LTWMLSA
LSA
LSAmountainmountains
.81
mountain peaks, rugged, ridgesclimber, mountainspeaks, rugged, plateaus
foothillsLTWMThe band
played a waltz.Mary loved to dance.
.45
LSAKintsch2000Steinhart2001
Summary StreetSteinhart

Summary Street

BoulderSummary Street

84

LSA

LSA

Berry, M., S. Dumais, & G. O Brien [M]. 1994. Using linear algebra for Intelligent Information Retrieval [M]. Boston: Houghton Mifflin Company.
Carroll, J., et al. 1971. Word Frequency Book. Houghton Mifflin Company & American Heritage Publishing Co., Inc.
Chomsky, N. 1965. Aspects of the Theory of Syntax [M]. Cambridge, MA: MIT Press.
Chomsky, N. 1986. Knowledge of language: Its nature, origin, and use [M]. Westport: Greenwood Publishing Group.
Chomsky, N. 2000. New horizons in the study of language and mind [M]. Cambridge: Cambridge University Press.
Deerwester, S, S. Dumais, G. Furnas, T. Landuauer, & R. Harshman. 1990. Indexing by latent semantic analysis [J]. Journal of the American Society
for Information Science 41: 391-407.
Dumais,S.et al. 1982. Using semantic analysis to improve access to textual information [J]. Machine Studies 17: 87-107.
Foltz, P. W., W. Kintsch & T. K. Landauer. 1993 (Jan). An analysis of textual coherence using Latent Semantic Indexing [A]. Paper presented at the
meeting of the Society for Text and Discourse, Jackson, WY.
Geoffrey Sampson. 2001. Empirical Linguistics [M]. London: Continuum.
Graesser, A. 1981. Prose Comprehension beyond the word [M]. New York: Springer.
Kintsch, W., D. Steinhart, G. Stahl & LSA Research Group. 2000. Developing summarization skills through the use of LSA-Based Feedback [J].
Interactive learning environments 8 (2): 87-109.
Kintsch, W. 1988. The role of knowledge in discourse comprehension: A construction -integration model [J]. Psychological Review 95: 163-182.
Kintsch, W. 1998. Comprehension [M]. Cambridge University Press. 86-91.
KintschW., L. Vimla, K. Patel & A. Ericsson. 1999. The role of long-term working memory in text comprehension [J]. Psychologia 42: 186-198.
Landauer, T. & S. Dumais. 1997. A solution to Plato's problem: The Latent Semantic Analysis theory of the acquisition, induction, and representation
of knowledge [J]. Psychological Review 104: 211-240.
Landauer,T.K., D. Laham & P. W. Foltz. 1998. Computer-based grading of the conceptual content of essays. Unpublished manuscript.
Landauer, T., P. W. Foltz & D. Lanham. 1998 An introduction to latent semantic analysis [J]. Discourse Processes 25: 259-284.
Maletic, J.et al. 1999. 14th IEEE ASE99 [A]. Cocoa Beach FL.12-15th [C]. pp.251-254.
Osgood, C. 1971. Exploration in semantic space: A personal diary [J]. Journal of Social Issues 27: 5-64.
Pinker. 1994. The Language Instinct.[M]. New York: William Morrow Company, Inc.
RosarioB. 2000. Latent Semantic Indexing: An overview [A]. INFOSYS 240 Spring 2000.
Shepard, R. 1987. Towards a universal law of generalization for psychological science [J]. Science 237: 1317-1323.
Steinhart, D. 2001. Summary Street: an intelligent tutoring system for improving student writing through the use of latent semantic analysis [D].
Unpublished doctoral dissertation, Institute of Cognitive Science, University of Colorado, Boulder.
Till, R., E. Mross & W. Kintsch. 1988. Time course of priming for associate and inference words in discourse context [J]. Memory and Cognition 16:
283-299.
van Dijk, T., & W. Kintsch. 1983. Strategies of discourse comprehension [M]. New York: Academic Press.
2000[M]308-329

Você também pode gostar