Você está na página 1de 5

Computer Authorship Analysis for 5tratemeyer Syndicate series Volumes


t:~~bY James D. Keeline
Background on the Stratemeyer Syndicate and series Book Authorship

The Sll'tltemeycr Syndicate produced more than 1,600 volumoo of fiction for )'oung poople between 1905 and t985- To accomplish this. they
I hired writeI'$ to develop book-length mllnUll(7ipts from outlines. TIle writers were paid II flat fee in elIChalIge fur their work instead of
royalties. The Syndicate ediloo the maDlL'lCripts submitted by Ihe ghostwriters as needed Bnd sent them to the oompan)' who bad agreed to

\ publish the series.


This prooedure of producing book:!' differs from Ibe traditional model whem lUl author creates a manU$J'ipl which is sent to II publisher,
edited, and then published. Tn the case of the Slrotcmeyer S)ndicatc, the J>tories were publillbed under bouse pseudonyms owned by the
company. The gholrtwrilers were not normally credited with their work in the eyes of the public.

The challenge fur the historian or litcraturecrltic is to determine who .....r ote II giR'U story. After all, although the stories were produced from
outlines, moo ""Titers ooukl not heJp but put part ofthemsclve8 in their writing. Their life expcrienctSoouid influcuce descriptions and word
choice.
Clearly, the SyndiQl.te tried to maintain as much consistency as JXlSIlible fur the boob in a given series. HOY<"C\'Cr, after a gi\'eIl writer iuld
v..orkcd un a series for a long period, a new writer, less familiar with the style.aud content of the previous voIumesWQuld almO!lt certainly do
their I'IOl'"k somewhat differently.
Edward Stratemcycr established his SyndiCllte in 1905 and be continued to Il,1lInage it alone from his home until the Fall Of1914 when be
established an office in New York City~ that he could be clo6cr to mngt of thepublishingconlpanies who handled his books.

As be '¥r1lS opening bis office, he \'o"aS also phmning to expand tbe l1(DPC of the Syndicate. To accomplish this, he needed help. His eldest
daughter, Harriet StratcmC)'a', had edited some IDJl.nu.~riptll at home after ber graduation from Wellesley College. A()We'.'er, once she
marned RIl&leIi Vroom Adams and Edward established his New Yon office, she did not do any more work for her father UOtil1930 when she
and her sister, Edna Camilla Stratclneyer Squier, took over theopcratiOll of thecomPlluy in .1930.
Stnllemeyef hired Hamet Otis Smith IlS an assb1aDt in his office. At first her work was described as taking dictation for letters and slorie!.
However, she also read the manuscripts submitted by ghostwritel'll and provided Edward \'oith "lIll1able insight about the qualityofthegeand
the consistency with othe!" volumes in the ~ and other submissions bythat",riter.

Smith oontinued to work for the Syndicate after Edward Stratcmcyer's death on May 10, 1930. She ran the otlice for the months that
Stratemeyer'lj daughters settled his estate and failed 10 find a group to buy the company. After settling the e;tate, Harriet Adamsllnd Edna
Stratemeyer mO\w the Syndicate offices to East Orange, NewJersey, near their home.':! in Newark. However, HarrietOtis Smith did not care
10 moveor commute each day so she left in October 1930 after &i.xteen years of work for the Syndicate.

When oonsidering the authorship for' a givCD series volume, I believe there can be several people inrolved. In addition to the principal
ghostwriter and editors who may rewrite portions of the manuscript, the-person who created the outline is also also a factor sina!: phrases
from tbeoutlines mn)' be uSl':d dirooti)' in the text.

Fortunately, business records for the Stratemeyer Syndicate are aYll,ilah.1e at NYPL and Yale which aDliWer most of the authorship questions.
Tbil; computer anaiyliis offCJ1l an opportunity to dctoc1 .....hich portions of a story were by a giwn writer or editor - ifit works. HOWt.."'.'eJ',
which portions of a givcll book may be attributed to the outliner, writer, or editor remains to be seen.

Background on QSUM

The concept of trying 10 determine authors,hip based on nuwericalanalysis of portions of the text is not a newoue. TIle CuroUllltNe Sum
(also n:ft.tted to a.s CuSum or QSum) tecllnique is oontraversial because it does not have a theoretical foundation and pwtsof it seems
contrary to "oonunon scnse.~ HOWC\w, it has been demonstrated 10 the satisfaction of British courla such thatQSuro anal)'SHJ is con.sidered
to be admissable evidence. The pu.rJlOiC of this paper is not to 8J!Ue for or against thc lochniqu.e but merely to app!)' it to lheseries book
authorshipquestkm 10 seeifanything interest-ingcan be detected with it.

Prior 10 being used.. in literary Ilnlll)'5is, cumulative l>Ums were USt.'d in manufadurillg pfU()l,lS';CS I'orquality assllraIlces. The goal was to.see if
minorvllriatinOSlUOund an average value canceUed each other out or if they deviated signifiamtly.

Cumulath-e sums were finrt applied 10 literary analsysis in the 19605 by Rev. Andrew Q. Morton and the technique "'US further refined in the
19806. The technique has been uscdoQooany Iitenuy Cl.ISCS, including the Shakespeare authorship question.

Outline Basic QSUM Tests

The pu.rpose of QSUM is to look at eertain unoonseious habits of writing and display tbe results as a pair of superimposed line graphs. The
first test in\'Olves counting the length of the scutences. The .second tart oomes from a tool box of nine tests which ba\'e proven w;eful
ao::ording 10 Morton alld his team. 'Jbeooe additional tems involveoounting certain kinds ofw-ords based on tbeirlcngth and eompct6.ition. 1be
following table describes them:

Abbreviation Description
o sid Sentence length distribution (all other tcstsoompare ""ilb this)
I i\'W Words which begin "ith a \lOWcl (a,e,i,o,u)
2 2.3lw Words with 2 or 3 letter words
3 23lw+1vw Words with 2 or 3 kttcrsor initial \'Owel words not YIlt counted
4 2WW Words \'oith 2, 3 or 4 letter words
5 2.'Wlw+lvw Words with 2, 3. or 4 letters or initial \'(l\\"Cl words not yet counted
634lw Words with 3 or 41ctter words

._--~ ~.,-.. .............. " ._. -


,
7 34lwHvw WOrds WIth 3, or 4 lctlersor l.Qltltll mwel wol'l19 not)-et: oouuled
8 not23lw Words which are not 2 or slcttersin ICllgtb
9 hif.... Words which are part orlbe most frequeulgl ~rds from the Brown oorpus

It is important to select tests for which a gi~'Co author is C()llSis(ent llnd 'which helps to d~'1i.nguisb it from a second author. The program
developed by this researcher applies all nine tcsbagainst the senleoCle length distribution ID facilitate test seltdion.

QSUM attempts to ~ unoonscious habits of writers. Content words are specificaUy chosen 1»' the ....Titer so they are Dol considered.
Rather, the words which lie betWCtlO the conlent words, tbcfunction words, are theoncs wbich we attempt to mca$ure. These linking \\Qfds
act as glue to bold the sentences together aud they arefJUluclltly short words. 'rhis is part of the rationale for tests which count short 'M)rd8
of 2, 3, or 4 letters.
The proponents of QSUM cannol offer a strong thcort:tical basis for countiu8 words which begin with VOI\'eJs, e>lpecially when those wom
are not short ones. One possibility involves latin prefixe'l. The founders of tb.emethod sugga.'t that repeated SU<:oe<;S Ol:l their plIrtjW>tifieli the
tests. Critics argue that tros aspect ill oontraly to "oouunon IJClISew , Of OOIltliC, lIOme people looking lit cak:u.lus mathematics might cbdm the
same thing. HcnYe\'eJ'. m06t area'! of math are founded 00 proofs which go from simple postulates to explain more complex" thingll. With
living S)'jtems and their products, like writing. it may take some time for linguists to explain what is observed in QSUM anai)'Si&

JW[:aJ1 that the classie example of the "scientific method w in\'01ved observations of the movements of the planets which did not follow the
predictiousoffered by the a«:eptcd models and thoories. Newmodels were developed which more cIoselymatdloo the observed phenomena.

The proponents of QSUM makea numbc:rof recommendations for prepr0ce8Sing texts and analysis of the rmultinggrophs.

• Caution sbould bcapplied when an author quotes or closely panlphrasesanother ....Titer for the tertmeasured ""ill be that of the seoond
writer rather than the rU'St. This ClUl roanifcst itself in seOCl'i book tCWi bto:cause an outline may be quoted directly in the submitted
rnanU9Cript. In some ca.~ the SyndiOite gllw articlc8 with factual material such as Ilr!iclts on stamptldcs for theX-Bar-X Boys or
electric looomotl\'eS for theTom Swift series. If these are transcribed bythe ....Titer. it creates 1\ <.\ft.ble of mixed authorship,

• The optimum It-'fIgtil of text to analyze is 50-100 sentences. With more sentena'$, thedata pointslltesocklse together horir.ontally thllt
it becomes diffICult to determine if the spacing beh~n the two graphs are changing, either dh-erging or ooll\'erging, which QlU
indicate mixed autbon;hip. III motit serit'$ books, this plaoeli tbcsample sire of II chapter or less.

• Since s1gnifiee.ntly-edited or m.Titten texts are II case ofmixed authorsblp, it.is importallt l1J Ulle unedited samples of an authors
wt;iting when determillg .....hich test!; Ilrem to identify C90sistent habit!;, In this case. letters pet50naUy typed by the author in question
were carefully trunscrlbed to serve IIll control texts. Acoording to the proponents, natural. language such as pl"Otie, spotlCh, and epistles
can include theOO!lte:nt which is eJiliibitOO with QSUM tests.

• To distinguish between authors, it is nOC'(':l;Mry to find tests where thcir individual writing is hOlIlO8enou:'1 but which sen'cs to
differentiate bctwccn them. Some frequenU.y-<ited tests such llS 2J1II>'i-ivw work for 90 many "'Titers tbat they may not distillgW.'Ioh
between tv.Q writers for whom the same habit IIpplie;,

• Graphs may be plotted featuring portions of writings from Sl:VC1'81 samples II.Ild multiple authors to illUlo'tnlte the distinctions. This
method of juxtaposition can be timt:..cpnsuming coosidering the mililycombinatiolls posIiible.

• When a gmph set gmerated from the writings of two authors is consistellt, it OOliicalJy says that means tlutt for the tests UBCd and the
texts chOtlen, the writers cannot be distint\uished. It is gcnerall)' better to find ClISC8 where there llte vL'illal dmilltions lind
t;()I'l'CSJlOnding explanations for the results,

• Proper names and dates and times which span multipk: words each rea1.Iydescribe a single entity so tlley should be-ooited to form a
generic token [name] or [time] or the spuaes beIv.-een the parts should be removed so they are oounted as a single word,

Methods for Experiments

The f'in.1 step involved creating II pr(lgl'llm capabk: of follo....i ng the prescribed prcn'(\ure for QSUM anaiJ.'Sis. A thorough rending of
Analysfng for AutJwrship: A Guide to the CUslim TechnJque (Uuiv. of Wales, 1996) I\lld correspondence I'ith its author, Jill Fnrraday.
provided valuable insights. Their program UllCd. Il largely discontinued programming language known lIS Spitbol wbich was II variation of
Snobol which had good tCJrt pTOOOllSing ClIpabilitil'.:'>.

My program was developed ....i tb the PHP programming language, It is a modem server-sidescripting language ....1l.ich is frequently used for
.....eb IIpplicatiOllS, especiallydllblhase-driven web sites. By making the progmm a dynalllic web page, itoould be IICIJCllSl'A:I from any computer
connected to the IDtemel with II gruphioo web browser. The graphs am generated by II module kno.....n as JpGraph which had sufficient
ooutrols for the display of thetlata.

With Il tested \«Irking program, the nextstep in\"Oh'Cd collecting electronic texts wbich might prove interesting. FortUnately, there has been a
large exJlllIlSion in this area in the past couple of years. While there was recently just one text per.i01lll11y written byEdv.1trd Stralemeyer. the
Rover Boys in Business on Project Gutenberg, today there are 8e\'eml other Rover 80)'8 volumes I\lld some earlier storieli published under
Stratemcyer'lIllamc. Similarly, othel' series boob. produced by the Stratem.eyer Syndicate and otbel'S ha.-e been published 011 the v,d)either
as HTMLdocuments or plain text mes,

It WIlS also ncoessa.f)' to obtain unedited texts which could serve as oolltr(ll texts to aclect the test'! for wbich a gh"eU anthor v.a'l consistent.
The« were SlI.IIlpled from long Ictterson the microfilm from the SlratemC}"eJ" Syndicatc Records collection at NYPL

TimedOCl:lllot pel'ffiitUllto go through many enmples but we can consider II couple wbich show interestingchllracterl:rtics.

Can QSUM distinguish between two writers?

I I
Ih I~" 11;'" lo.

lh
14,
4...... w;."'~
llI!r1- ... lr C'''I'I\-t''~Wa-

• J,Fo.rf",d~ i",J,t,,\..-; H.... ~ ~,h .$h... t! J#,fC;"~'t


• l'If'..J .sk.':\a ~\\..w (tlH~ CU~ &1 d..~)
• (>~""\t, ~"flJ~k+ f u "".~ ....1 " .. .\..."'\ !¥"'-
~S ,q6S': 1~',",+,¥..J 6\1. I t}J k. ,...J .:, t·;, rr~,,\­
E-StC\t'H: n\lN't\~w .,.,1 2..J"ffw jG,;", h,I,t..,.
1
E'S \"1)\-1', Z)\ ...... ii..,w 9"~) tNI"l , .. Jolt'- \.. tr~,.

v:l ) B~lwtivw v~ ,

P-\C~ o.::vi) f~--h..r~ (~r_""'1~\ , Oft+l....,'~.\.f P.. 4-.. ( (1... ~S,\"'I¢>'.\, c.,I.. ,.\)
i!..... \~~a (...~..... s'-o)) J Dlell. ~,,-,\\u', J..~. «(,,~l> )

t"",~r ..\ + . . ~lrvt-t,. ~ot\\ra\ ... b'4~""


~t,,\ t f4"'~ t.(I~h.\ tdl'd.\.,_;\\."

c.cfI·h,t • If..,\''f"(...1' tu"':"" Co«\.\ %-d;tt.\..-H_..- \~,Jh...t.

l'S '1\.." 1lo:Y .1 R.I (fI.", ....I"~)


""" T~ s..;.f\-...I\\..f\ ....\-;...~.n,.(G-tbl"1.\I) Fi... 1h...x... M;I., \1,)"01"'; (C;~ ,,., )
~&F RvI<. "';\l.,Ju.lf','\(tK I~" ) On -. ~r,,- ....~ w ... \6 (<:.L '''I~)

r\\<.k...... ~ / ~ao~ .J-~ ... (" fy+


'3",,', ; 9~ J ,hi-
?- 3 Y ~
"-"w 2.~h.. '" Iw ~lll...
OV~ >31 ... +,-.. oil '-3..U....s ~iv"" ~Iw +i... ~ "'iF,.....

~
ES IQ05L
. 0,
~'" .."",. ok
51,....
.k
CorT-
,jO><l
It""!
00
'i1.7 X
g..l
1;'(.""1,
o.
C.I.,~ ~O~

ES 1~IH-1 ,.. pi
,_, '" ,-I
•. I .k. •

~
r'
"., 31.0\ -. 57.1 '(.
~
"
(,'l.t % 'to.1t
"S
~f.lt. ~ "fI.a.t '( .

., ."" ~
c
g ES l'fILL-l 1-' 3"1 , 'j 0' 9uJ "9
..,.,
-
m
~ 'M sa., ' (,'4.l 1r.1 4J, I (,(,.9 ,,", )

_uo
W
-w
a.
E
~ 1911l. .,
't.
..
37.'
0,

54.l
M
5'<, •
,OO!
U-.
...
J~ .•
,wi
(" •j
0'
","-
"0
"if.'
...."'~
~
..
....
"'~
,"'....
m.S

~ . "., \~
'" g..l
.
oc
zw
H..... 1111 L.
.,\ ... I::.J
" I,,"'
,!t
1/,' '\/,:1
',9
~.
1..J

'..... .-,
H
I'''' g..l
~~
".yl ('51 5'1]
',9
loy ,., :v~1
"J
("i'; vg
(,(, S'l

v,. ,g. (;j~ J


E, p~~c. ~
~~ "' 00
lq IA.
" [)
4, "9 'B
{,~ {"
~

10K eOA ~,. \ j .:1- '9~ ". VB v, +


~{,
'"
'f( S" ~l
" ~I
"
C>, ~ $I
"9-

:r•• ~+.J ~ 9 9 ok ok. .... ..... '9 ....


~1""" ~, n yq S> t.6' ~. '3 51

joK ('. .lui" #t'J ~9 ·9 '9 v, ,,~


';(. ~y
,f:.;"- >;.
n y, St S, G.~ Sq n
Jot 11-1.,. 9 '9 "3 5 9 v9+ "j~
"3 9-
y, , I
~k. 0, 4\
Sl {,O ,0
5' S'1

"".
I JOK ~"~h4',,,. 9 'l~ -'" 3' ,9* ,~ !J1 JI"
"', Yl 5~ ~I
" 41 (, Sl S<;

JO- •,tt, .r...... ok. 0, M


,.. 0'
'9 ~. 0"- ,,"-
\1 ,q 51 51 (,! w. ~, ~I .5) I

:10K 0tt\t t6,~t ok no ", 01. "~


t ,f......
", "
~\
""
5~
~.

S~ . ~ 'I'
';'

'"ok
S~ sv
HP-G J,,~\\," o.
p,g
.k
41-
."
55
ok
61-
01
,y
no
4~
.k
S8
I ok 1
(,'i L-- >' J
11 ~(, 1,.11- 'l"l \~ ",t )-;' v,. t Vj+-
[f
v,~ 01<
45" l.A 11 I (If ,5"
lJ.\ 5'>\
v,
,4

V5'
"vf •
H~C C'6"'\111 t J~ vg'
~.ll...,. 1'.>
.~
'I,
1'j-
5'\ ~(, 11 -'" ~
51 (,\ 51 55 I
1
1 'l.- ., '\ 5 lo 1 e Yg-r
H~G lto.h/)\ tJ~ 0"- ok v~t v<jt o\'- _gt- B
t \"d~;' 'l1.o ~~ 55 "
~5 7" 4~ (,1 5~ 5)

Ap,1.1.. t-s. 10 no
00
9
4S
0"511 "3 "3 y~

~"1
9
(,$
Vj
55
"$-
51
") "1'

"
~ ."1& (!~~tf'\ of'" .k "9+ '9 "1+ ~3f '19 ~9 ~$+ ~9t-
c oj. SId
M y~ 51 &l 15 41 u! S"1 Sf
--
.Q
~

~Sq)
~
0-
p.."',l .\ l"" '9 ok '!j+
W
_0
O

-'"
E 3"&
lS
9"'"
~I 1J
SI ,1 &. 41

GI GtJ
~8
.,~

... <>
",<>
Ol.~
1\......1 ,. \ 'k 9"l "3 Y3 YJ "9 "~
Vj Vj
• C>
OC
zw
P ~,
53 "0 ,,~ 'll.
'"
$7 $L

Você também pode gostar