Escolar Documentos
Profissional Documentos
Cultura Documentos
php
book
2004/5/7
page 85
Glossary
activation function a function used as the basis set for a neural network; historically
these were indicator functions, but now they are sigmoidal functions
artificial neural network a neural network used as a statistical model, as distinguished from a model of a biological process for thought
backpropagation a gradient descent algorithm for fitting neural network parameters
that takes advantage of the structure of the model
bias an intercept parameter for a neural network (of course, it can also mean the
difference between the expected value of an estimator and the true value of what it is
trying to estimate)
feature explanatory variable
feature selection variable selection
feed forward predicted values are an explicit function of explanatory variables, as opposed to a network with cycles (feedback) which allow only an implicit representation
of predictions
Heaviside function an indicator function that takes value one when its argument is
true and value zero when its argument is false
hidden node logistic basis function
input explanatory variable
neural network a statistical model that uses logistic basis functions
output fitted value for a response variable
perceptron a hidden node in a neural network; historically these were indicator
functions used as basis functions, but the term is now sometimes applied to logistic
basis functions, usually in the phrase multilayer perceptron (which is just a standard
neural network)
radial basis network a mixture model, typically of normals with equal variance,
interpreted as a network structure analogous to a neural network
85
Downloaded 03/09/15 to 69.91.134.103. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
86
book
2004/5/7
page 86
Glossary
sigmoidal function a monotone function with lower and upper asymptotes and a
smooth rise between, which looks vaguely S shaped; for a neural network, this is
typically either a logistic or a hyperbolic tangent
softmax function transforms a set of real numbers to probabilities by exponentiating
each of them and then dividing by the sum of all of the exponentiated values
supervised learning standard training a model or algorithm to fit one or more response
variables using a set of explanatory variables
test dataset a subset of the data not used during model fitting and then used to validate
predicted values, a sort of simplified cross-validation; cf. training data set
threshold function indicator function; cf. Heaviside function
training dataset a subset of the data used to fit the parameters of the model; cf. test
data set
unsupervised learning fitting a model or algorithm without an observed response
variable, for example clustering
weights the (nonvariance) parameters of a neural network
Downloaded 03/09/15 to 69.91.134.103. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
book
2004/5/7
page 87
Bibliography
Abramowitz, M. and Stegun, I. A., eds. (1965). Handbook of Mathematical Functions. New
York: Dover Publications.
Akaike, H. (1974). A New Look at Statistical Model Identification. IEEE Transactions
on Automatic Control, AU19, 716722.
Anderson, J. A. (1982). Logistic Discrimination. In Classification, Pattern Recognition
and Reduction of Dimensionality, eds. P. R. Krishnaiah and L. N. Kanal, Vol. 2 of
Handbook of Statistics, 169191. Amsterdam: NorthHolland.
Andrieu, C., de Freitas, J. F. G., and Doucet, A. (2001). Robust Full Bayesian Learning for
Radial Basis Networks. Neural Computation, 13, 23592407.
Barbieri, M. M. and Berger, J. O. (2002). Optimal Predictive Model Selection. Tech.
Rep. 02-02, Duke University, ISDS.
Barron, A., Schervish, M. J., and Wasserman, L. (1999). The Consistency of Posterior
Distributions in Nonparametric Problems. Annals of Statistics, 27, 536561.
Berger, J. O. and Bernardo, J. M. (1992). On the Development of Reference Priors. In
Bayesian Statistics 4, eds. J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M.
Smith, 3560. New York: Oxford University Press.
Berger, J. O., De Oliveira, V., and Sans, B. (2001). Objective Bayesian Analysis of
Spatially Correlated Data. Journal of the American Statistical Association, 96, 1361
1374.
Berger, J. O. and Pericchi, L. R. (1996). The Intrinsic Bayes Factor for Model Selection
and Prediction. Journal of the American Statistical Association, 91, 109122.
Bernardo, J. M. (1979). Reference Posterior Distributions for Bayesian Inference (with
discussion). Journal of the Royal Statistical Society Series B, 41, 113147.
Bernardo, J. M. and Smith, A. F. M. (1994). Bayesian Theory. Chichester: John Wiley &
Sons.
Bishop, C. M. (1995). Neural Networks for Pattern Recognition. Oxford: Clarendon Press.
87
Downloaded 03/09/15 to 69.91.134.103. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
88
book
2004/5/7
page 88
Bibliography
Downloaded 03/09/15 to 69.91.134.103. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Bibliography
book
2004/5/7
page 89
89
Downloaded 03/09/15 to 69.91.134.103. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
90
book
2004/5/7
page 90
Bibliography
Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (1995). Bayesian Data Analysis.
London: Chapman and Hall.
Geman, S. and Geman, D. (1984). Stochastic Relaxation, Gibbs Distributions and the
Bayesian Restoration of Images. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 6, 721741.
Gentle, J. E. (2002). Elements of Computational Statistics. New York: Springer-Verlag.
Geweke, J. (1989). Bayesian Inference in Econometric Models Using Monte Carlo Integration. Econometrica, 57, 13171340.
Gibbs, M. N. and MacKay, D. J. C. (1997). Efficient Implementation of Gaussian Processes. Tech. Rep., Cavendish Laboratory, Cambridge.
Gilks, W. R., Richardson, S., and Spiegelhalter, D. J. (1996). Markov Chain Monte Carlo
in Practice. London: Chapman and Hall.
Gordon, A. D. (1999). Classification. Monogr. Statist. Appl. Probab. 82, 2nd ed. Boca
Raton, FL: Chapman & Hall/CRC.
Green, P. J. (1995). Reversible Jump Markov Chain Monte Carlo Computation and
Bayesian Model Determination. Biometrika, 82, 711732.
Green, P. J. and Silverman, B. W. (1994). Nonparametric Regression and Generalized
Linear Models: A Roughness Penalty Approach. London: Chapman and Hall.
Grenander, U. (1981). Abstract Inference. New York: Wiley.
Hrdle, W. (1990). Applied Nonparametric Regression. Cambridge, UK: Cambridge University Press.
Hrdle, W. (1991). Smoothing Techniques with Implementation in S. New York: SpringerVerlag.
Hartigan, J. A. (1964). Invariant Prior Distributions. Annals of Mathematical Statistics,
35, 836845.
Hartigan, J. A. (1983). Bayes Theory. New York: Springer-Verlag.
Hastie, T. and Tibshirani, R. (1984). Generalized Additive Models. Tech. Rep. 98,
Stanford University, Department of Statistics.
Hastie, T. and Tibshirani, R. (1990). Generalized Additive Models. London: Chapman and
Hall.
Hastie, T., Tibshirani, R., and Friedman, J. (2001). The Elements of Statistical Learning.
New York: Springer-Verlag.
Hastings, W. K. (1970). Monte Carlo Sampling Methods Using Markov Chains and Their
Applications. Biometrika, 57, 97109.
Downloaded 03/09/15 to 69.91.134.103. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Bibliography
book
2004/5/7
page 91
91
Hawkins, D. (1989). Discussion of Flexible Parsimonious Smoothing and Additive Modelling by J. Friedman and B. Silverman. Technometrics, 31, 339.
Hjort, N. L. and Omre, H. (1994). Topics in Spatial Statistics. Scandinavian Journal of
Statistics, 21, 289357.
Hoeting, J. A., Madigan, D., Raftery, A. E., and Volinsky, C. T. (1999). Bayesian Model
Averaging: A Tutorial (with discussion). Statistical Science, 14, 382417.
Holmes, C. C. and Adams, N. M. (2002). A Probabilistic Nearest Neighbour Method for
Statistical Pattern Recognition. Journal of the Royal Statistical Society Series B, 64,
295306.
Holmes, C. C. and Mallick, B. K. (1998). Bayesian Radial Basis Functions of Variable
Dimension. Neural Computation, 10, 12171233.
Holmes, C. C. and Mallick, B. K. (2000). Bayesian Wavelet Networks for Nonparametric
Regression. IEEE Transactions on Neural Networks, 11, 2735.
Holmes, C. C. and Mallick, B. K. (2001). Bayesian Regression with Multivariate Linear
Splines. Journal of the Royal Statistical Society Series B, 63, 318.
Hornik, K., Stinchcombe, M., and White, H. (1989). Multilayer Feedforward Networks
are Universal Approximators. Neural Networks, 2, 359366.
Jeffreys, H. (1946). An Invariant Form for the Prior Probability in Estimation Problems.
Proceedings of the Royal Society London A, 186, 453461.
Jeffreys, H. (1961). Theory of Probability. 3rd ed. New York: Oxford University Press.
Kass, R. E. and Raftery, A. E. (1995). Bayes Factors. Journal of the American Statistical
Association, 90, 773795.
Kass, R. E. and Wasserman, L. (1995). A Reference Bayesian Test for Nested Hypotheses
and Its Relationship to the Schwarz Criterion. Journal of the American Statistical
Association, 90, 928934.
Kass, R. E. and Wasserman, L. (1996). The Selection of Prior Distributions by Formal
Rules. Journal of the American Statistical Association, 91, 13431370.
Keribin, C. (2000). Consistent Estimation of the Order of Mixture Models. Sankhya, 62,
4966.
Leamer, E. E. (1978). Specification Searches: Ad Hoc Inference with Nonexperimental
Data. New York: Wiley.
Lee, H. K. H. (2000). Consistency of Posterior Distributions for Neural Networks. Neural
Networks, 13, 629642.
Lee, H. K. H. (2001). Model Selection for Neural Network Classification. Journal of
Classification, 18, 227243.
Downloaded 03/09/15 to 69.91.134.103. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
92
book
2004/5/7
page 92
Bibliography
Downloaded 03/09/15 to 69.91.134.103. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
Bibliography
book
2004/5/7
page 93
93
Press, S. J. (1989). Bayesian Statistics: Principles, Models, and Applications. New York:
John Wiley & Sons.
Raftery, A. E. (1996). Approximate Bayes Factors and Accounting for Model Uncertainty
in Generalized Linear Models. Biometrika, 83, 251266.
Raftery, A. E. and Madigan, D. (1994). Model Selection and Accounting for Model
Uncertainty in Graphical Models Using Occams Window. Journal of the American
Statistical Association, 89, 15351546.
Raftery, A. E., Madigan, D., and Hoeting, J. A. (1997). Bayesian Model Averaging for
Linear Regression Models. Journal of the American Statistical Association, 437,
179191.
Rios Insua, D. and Mller, P. (1998). Feedforward Neural Networks for Nonparametric
Regression. In Practical Nonparametric and Semiparametric Bayesian Statistics,
eds. D. Dey, P. Mller, and D. Sinha, 181193. New York: Springer-Verlag.
Ripley, B. D. (1996). Pattern Recognition and Neural Networks. Cambridge, UK: Cambridge University Press.
Robert, C. P. (2001). The Bayesian Choice: From Decision-Theoretic Foundations to
Computational Implementation. New York: Springer-Verlag.
Robert, C. P. and Casella, G. (2000). Monte Carlo Statistical Methods. New York: SpringerVerlag.
Robinson, M. (2001a). Priors for Bayesian Neural Networks. Masters thesis, University
of British Columbia, Department of Statistics.
Robinson, M. (2001b). Priors for Bayesian Neural Networks. In Computing Science
and Statistics, eds. E. J. Wegman, A. Braverman, A. Goodman, and P. Smyth, Vol. 33,
122127.
Rosenblatt, F. (1962). Principles of Neurodynamics: Perceptrons and the Theory of Brain
Mechanisms. Washington, D.C.: Spartan.
Rubin, D. B. (1981). The Bayesian Bootstrap. Annals of Statistics, 9, 130134.
Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986). Learning Internal Representations by Error Propagation. In Parallel Distributed Processing: Explorations in
the Microstructure of Cognition, eds. D. E. Rumelhart, J. L. McClelland, and the PDP
Research Group, Vol. 1, 318362. Cambridge, MA: MIT Press.
Sarle, W. S. (1994). Neural Network Implementation in SAS Software. In Proceedings
of the Nineteenth Annual SAS Users Group International Conference, 3851. SAS
Institute, Cary, NC.
Schervish, M. J. (1995). Theory of Statistics. New York: Springer-Verlag.
Downloaded 03/09/15 to 69.91.134.103. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
94
book
2004/5/7
page 94
Bibliography
Downloaded 03/09/15 to 69.91.134.103. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
book
2004/5/7
page 95
Index
Fisher information, see information,
Fisher
Fourier series regression, 16, 18
full conditional distribution, see complete
conditional distribution
Laplace approximation, 61
loan data, 38, 2527, 59, 7377
locally weighted scatterplot smoothing
(LOWESS), 13, 18
logistic regression, 11, 21, 67, 76
Downloaded 03/09/15 to 69.91.134.103. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
96
Markov chain Monte Carlo model composition, 6971
MARS, see multivariate adaptive regression splines
MC3 , see Markov chain Monte Carlo
model composition
MCMC, see Markov chain Monte Carlo
median probability model, 61
MetropolisHastings sampling, 4951,
7071
mixture model, 1618, 28, 32, 40, 62
model averaging, 3, 30, 6263, 65, 66,
68, 70, 71
motorcycle data, 33
moving average estimator, 1213, 1820
multicollinearity, 3, 5, 7, 40, 58, 63, 71,
73, 74
multimodality, 48, 67
multinomial likelihood, 2327, 65
multivariate adaptive regression splines,
15, 18
nearest neighbor classification, 19
network information criterion (NIC), 62
Occams Window, 6871
ordinal response, 25
orthogonal polynomial, see polynomial
approximator
overfitting, 3, 11, 30, 46, 5759, 6365,
71
ozone data, 3, 8, 13, 15, 51, 58, 59, 65,
7173
partition model, 1315, 1820
perceptron, 22, 85
polynomial approximator, 16, 18
PPR, see projection pursuit regression
prior
asymptotic equivalence, 42
BergerBernardo, see prior, reference
conjugate, see conjugacy
book
2004/5/7
page 96
Index
flat, 3842, 45, 46, 4852, 58, 65,
71, 73
Jeffreys, 37, 4346, 5152
Mller and Rios Insua, 3435, 51
52
MacKay, 48
model space, 6061, 71
Neal, 3637, 5152, 63, 77
reference, 46, 5152, 8183
Robinson, 47, 63
weight decay, see weight decay
projection pursuit regression, 1516, 18,
19, 21
quadrature, 61
quasi-Newton methods, 48, 49
radial basis network, 17, 18, 28, 32, 85
reciprocal importance sampling, 61
recursive partitioning regression, 13, see
tree
reversible jump Markov chain Monte
Carlo (RJMCMC), 66, 7071
RPR, see recursive partitioning regression
Shannon information, see information,
Shannon
sieve asymptotics, 5354
sigmoidal function, 22, 2728, 85, 86
simulated annealing, 48
softmax, 24, 86
spline, 1416, 1819, 72
stepwise searching, 6773
tree, 1314, 18, 19, 65, 70, 77, 80
TURBO, 7275
variable selection, 3, 5, 30, 5763, 6574,
85
Voronoi tesselation, 14
wavelet, 1718
weight decay, 4647, 5152, 59, 63