Synopses & Reviews
The forward search provides a method of revealing the structure of data through a mixture of model fitting and informative plots. The continuous multivariate data that are the subject of this book are often analyzed as if they come from one or more normal distributions. Such analyses, including the need for transformation, may be distorted by the presence of unidentified subsets and outliers, both individual and clustered. These important features are disguised by the standard procedures of multivariate analysis. The book introduces methods that reveal the effect of each observation on fitted models and inferences. The powerful methods of data analysis will be of importance to scientists and statisticians. Although the emphasis is on the analysis of data, theoretical developments make the book suitable for a graduate statistical course on multivariate analysis. Topics covered include principal components analysis, discriminant analysis, cluster analysis and the analysis of spatial data. S-Plus programs for the forward search are available on a web site. This book is a companion to Atkinson and Riani's Robust Diagnostic Regression Analysis of which the reviewer for The Journal of the Royal Statistical Society wrote "I read this book, compulsive reading such as it was, in three sittings." Anthony Atkinson is Emeritus Professor of Statistics at the London School of Economics. He is also the author of Plots, Transformations, and Regression and coauthor of Optimum Experimental Designs. Professor Atkinson has served as Editor of The Journal of the Royal Statistical Society, Series B.
Review
From the reviews: "The book requires knowledge of multivariate statistical methods, because it provides only basic background information on the methods considered (although with excellent references for futher reading at the end of each chapter). Each chapter also includes exercises with solutions...This book could serve as an excellent text for an advanced course on modern multivariate statistics, as it is intended." Technometrics, November 2004 "This book is full of interest for anyone undertaking multivariate analyses, clearly emphasizing that uncritical use of standard methods can be misleading." Short Book Reviews of the International Statistical Institute, December 2004 "This book is an interesting complement to various textbooks on multivariate statistics." Biometrics, December 2005 "This book discusses multivariate data from a different perspective. ... it is an excellent book for researchers with interests in multivariate data and cluster analysis. It may also be a good reference for students of advanced statistics and practitioners working with large volumes of data ... ." (Kassim S. Mwitondi, Journal of Applied Statistics, Vol. 32 (4), 2005) "This is a companion to an earlier book ... both of which feature many informative graphs. Here, the forward search has been applied in detail to classical multivariate approaches used with Gaussian data. ... One valuable feature of the book is the way that the illustrations concentrate on a relatively small number ... . This makes it easy to concentrate on the application ... . The implications of this book also strengthen the importance of data visualization, as well as providing a valuable approach to visualization." (Paul Hewson, Journal of the Royal Statistical Society Series A, Vol. 168 (2), 2005) "This book is a companion to Atkinson ... . The objective is to identify outliers, appreciate their influence ... which would result in an overall improvement. ... Graphical tools are widely used, resulting in three hundred and ninety figures. Each chapter is followed by extensive exercises and their solutions, and the book could be used as an advanced textbook for multivariate analysis courses. Web-sites provide the relevant software ... . This book is full of interest for anyone undertaking multivariate analyses ... ." (B.J.T. Morgan, Short Book Reviews International Statistical Institute, Vol. 24 (3), 2004) "This book discusses forward search (FS), a method using graphs to explore and model continuous multivariate data ... . Its viewpoint is toward applications, and it demonstrates the merits of FS using a variety of examples, with a thorough discussion of statistical issues and interpretation of results. ... This book could serve as an excellent text for an advanced course on modern multivariate statistics, as it is intended." (Tena Ipsilantis Katsaounis, Technometrics, Vol. 46 (4), November, 2004) "The theoretical exercises with detailed solutions at the end of each chapter are extremely useful. I would recommend this book to practitioners who analyze moderately sized multivariate data. Of course, anyone associated with the application of statistics should find the book interesting to read." (Tathgata Banerjee, Journal of the American Statistical Association, March 2006) "This book deals with procedures to analyze sets of multivariate observations of continuous variables. ... The procedures are described in general, and then applied to a collection of data sets. ... The data are shown through tables and figures, the latter numbering 390, which means that illustrations are abundant. ... This is an attractive and useful book. It has a wealth of suggestions. ... The results of the analysis are usually presented in graphical form, and as said, the book has 390 graphs." (Raúl Mentz, Zentralblatt MATH, Vol. 1049, 2004) "This is a valuable book on explorative multivariate data analysis which makes particular use of plots. ... The book is intended to serve as a text for a postgraduate course on modern multivariate statistics. The theoretical material is complemented by exercises with detailed solutions. ... This is a ... carefully written book on explorative multivariate data analysis, with a lot of details and examples especially on effects caused by outliers and departures from normality. I recommend this book for any applied statistician." (Wolfgang Näther, Metrika, Vol. 64, 2006)
Review
From the reviews:
"The book requires knowledge of multivariate statistical methods, because it provides only basic background information on the methods considered (although with excellent references for futher reading at the end of each chapter). Each chapter also includes exercises with solutions...This book could serve as an excellent text for an advanced course on modern multivariate statistics, as it is intended." Technometrics, November 2004
"This book is full of interest for anyone undertaking multivariate analyses, clearly emphasizing that uncritical use of standard methods can be misleading." Short Book Reviews of the International Statistical Institute, December 2004
"This book is an interesting complement to various textbooks on multivariate statistics." Biometrics, December 2005
"This book discusses multivariate data from a different perspective. ... it is an excellent book for researchers with interests in multivariate data and cluster analysis. It may also be a good reference for students of advanced statistics and practitioners working with large volumes of data ... ." (Kassim S. Mwitondi, Journal of Applied Statistics, Vol. 32 (4), 2005)
"This is a companion to an earlier book ... both of which feature many informative graphs. Here, the forward search has been applied in detail to classical multivariate approaches used with Gaussian data. ... One valuable feature of the book is the way that the illustrations concentrate on a relatively small number ... . This makes it easy to concentrate on the application ... . The implications of this book also strengthen the importance of data visualization, as well as providing a valuable approach to visualization." (Paul Hewson, Journal of the Royal Statistical Society Series A, Vol. 168 (2), 2005)
"This book is a companion to Atkinson ... . The objective is to identify outliers, appreciate their influence ... which would result in an overall improvement. ... Graphical tools are widely used, resulting in three hundred and ninety figures. Each chapter is followed by extensive exercises and their solutions, and the book could be used as an advanced textbook for multivariate analysis courses. Web-sites provide the relevant software ... . This book is full of interest for anyone undertaking multivariate analyses ... ." (B.J.T. Morgan, Short Book Reviews International Statistical Institute, Vol. 24 (3), 2004)
"This book discusses forward search (FS), a method using graphs to explore and model continuous multivariate data ... . Its viewpoint is toward applications, and it demonstrates the merits of FS using a variety of examples, with a thorough discussion of statistical issues and interpretation of results. ... This book could serve as an excellent text for an advanced course on modern multivariate statistics, as it is intended." (Tena Ipsilantis Katsaounis, Technometrics, Vol. 46 (4), November, 2004)
"The theoretical exercises with detailed solutions at the end of each chapter are extremely useful. I would recommend this book to practitioners who analyze moderately sized multivariate data. Of course, anyone associated with the application of statistics should find the book interesting to read." (Tathgata Banerjee, Journal of the American Statistical Association, March 2006)
"This book deals with procedures to analyze sets of multivariate observations of continuous variables. ... The procedures are described in general, and then applied to a collection of data sets. ... The data are shown through tables and figures, the latter numbering 390, which means that illustrations are abundant. ... This is an attractive and useful book. It has a wealth of suggestions. ... The results of the analysis are usually presented in graphical form, and as said, the book has 390 graphs." (Raúl Mentz, Zentralblatt MATH, Vol. 1049, 2004)
"This is a valuable book on explorative multivariate data analysis which makes particular use of plots. ... The book is intended to serve as a text for a postgraduate course on modern multivariate statistics. The theoretical material is complemented by exercises with detailed solutions. ... This is a ... carefully written book on explorative multivariate data analysis, with a lot of details and examples especially on effects caused by outliers and departures from normality. I recommend this book for any applied statistician." (Wolfgang Näther, Metrika, Vol. 64, 2006)
Synopsis
This book is concerned with data in which the observations are independent and in which the response is multivariate. Anthony Atkinson has been Professor of Statistics at the London School of Economics since 1989. Before that he was a Professor at Imperial College, London. He is the author of Plots, Transformations, and Regression, co-author of Optimum Experimental Designs, and joint editor of The Fascination of Statistics, a volume celebrating the centenary of the International Statistical Institute. Professor Atkinson has served as editor of The Journal of the Royal Statistical Society, Series B and as associate editor of Biometrika and Technometrics. He has published well over 100 articles in these and other journals including The Annals of Statistics, Biometrics, The Journal of the American Statistical Association, and Statistics and Computing. Marco Riani, after receiving his Ph.D. in Statistics in 1995 from the University of Florence, joined the Faculty of Economics at Parma University as postdoctoral fellow. In 1997 he won the prize for the best Italian Ph.D. thesis in Statistics. He is currently Associate Professor of Statistics in the University of Parma. He has published in Technometrics, The Journal of Computational and Graphical Statistics, The Journal of Business and Economic Statistics, The Journal of Forecasting, Environmetrics, Computational Statistics and Data Analysis, Metron, and other journals.
Synopsis
Why We Wrote This Book This book is about using graphs to explore and model continuous multi variate data. Such data are often modelled using the multivariate normal distribution and, indeed, there is a literatme of weighty statistical tomes presenting the mathematical theory of this activity. Our book is very dif ferent. Although we use the methods described in these books, we focus on ways of exploring whether the data do indeed have a normal distribution. We emphasize outlier detection, transformations to normality and the de tection of clusters and unsuspected influential subsets. We then quantify the effect of these departures from normality on procedures such as dis crimination and duster analysis. The normal distribution is central to our book because, subject to our exploration of departures, it provides useful models for many sets of data. However, the standard estimates of the parameters, especially the covari ance matrix of the observations, are highly sensitive to the presence of outliers. This is both a blessing and a curse. It is a blessing because, if we estimate the parameters with the outliers excluded, their effect is appre ciable and apparent if we then include them for estimation. It is however a curse because it can be hard to detect which observations are outliers. We use the forward search for this purpose."
Synopsis
This book is concerned with data in which the observations are independent and in which the response is multivariate. Companion book to Robust Diagnostic Regression Analysis (ISBN 0-387-95017) published by Springer in 2000.
Table of Contents
Contents
Preface
Notation
1 Examples of Multivariate Data
1.1 In.uence, Outliers and Distances
1.2 A Sketch of the Forward Search
1.3 Multivariate Normality and our Examples
1.4 Swiss Heads
1.5 National Track Records forWomen
1.6 Municipalities in Emilia-Romagna
1.7 Swiss Bank Notes
1.8 Plan of the Book
2 Multivariate Data and the Forward Search
2.1 The Univariate Normal Distribution
2.1.1 Estimation
2.1.2 Distribution of Estimators
2.2 Estimation and the Multivariate Normal Distribution
2.2.1 The Multivariate Normal Distribution
2.2.2 The Wishart Distribution
2.2.3 Estimation of Ó
2.3 Hypothesis Testing
2.3.1 Hypotheses About the Mean
2.3.2 Hypotheses About the Variance
2.4 The Mahalanobis Distance
2.5 Some Deletion Results
2.5.1 The Deletion Mahalanobis Distance
2.5.2 The (Bartlett)-Sherman-Morrison-Woodbury Formula
2.5.3 Deletion Relationships Among Distances
2.6 Distribution of the Squared Mahalanobis Distance
2.7 Determinants of Dispersion Matrices and the Squared Mahalanobis Distance
2.8 Regression
2.9 Added Variables in Regression
2.10 TheMean Shift OutlierModel
2.11 Seemingly Unrelated Regression
2.12 The Forward Search
2.13 Starting the Search
2.13.1 The Babyfood Data
2.13.2 Robust Bivariate Boxplots from Peeling
2.13.3 Bivariate Boxplots from Ellipses
2.13.4 The Initial Subset
2.14 Monitoring the Search
2.15 The Forward Search for Regression Data
2.15.1 Univariate Regression
2.15.2 Multivariate Regression
2.16 Further Reading
2.17 Exercises
2.18 Solutions
3 Data from One Multivariate Distribution
3.1 Swiss Heads
3.2 National Track Records for Women
3.3 Municipalities in Emilia-Romagna
3.4 Swiss Bank Notes
3.5 What Have We Seen?
3.6 Exercises
3.7 Solutions
4 Multivariate Transformations to Normality
4.1 Background
4.2 An Introductory Example: the Babyfood Data
4.3 Power Transformations to Approximate Normality
4.3.1 Transformation of the Response in Regression
4.3.2 Multivariate Transformations to Normality
4.4 Score Tests for Transformations
4.5 Graphics for Transformations
4.6 Finding a Multivariate Transformation with the Forward Search
4.7 Babyfood Data
4.8 Swiss Heads
4.9 Horse Mussels
4.10 Municipalities in Emilia-Romagna
4.10.1 Demographic Variables
4.10.2 Wealth Variables
4.10.3 Work Variables
4.10.4 A Combined Analysis
4.11 National Track Records for Women
4.12 Dyestuff Data
4.13 Babyfood Data and Variable Selection
4.14 Suggestions for Further Reading
4.15 Exercises
4.16 Solutions
5 Principal Components Analysis
5.1 Background
5.2 Principal Components and Eigenvectors
5.2.1 Linear Transformations and Principal Components .
5.2.2 Lack of Scale Invariance and Standardized Variables
5.2.3 The Number of Components
5.3 Monitoring the Forward Search
5.3.1 Principal Components and Variances
5.3.2 Principal Component Scores
5.3.3 Correlations Between Variables and Principal Components
5.3.4 Elements of the Eigenvectors
5.4 The Biplot and the Singular Value Decomposition
5.5 Swiss Heads
5.6 Milk Data
5.7 Quality of Life
5.8 Swiss Bank Notes
5.8.1 Forgeries and Genuine Notes
5.8.2 Forgeries Alone
5.9 Municipalities in Emilia-Romagna
5.10 Further reading
5.11 Exercises
5.12 Solutions
6 Discriminant Analysis
6.1 Background
6.2 An Outline of Discriminant Analysis
6.2.1 Bayesian Discrimination
6.2.2 Quadratic Discriminant Analysis
6.2.3 Linear Discriminant Analysis
6.2.4 Estimation of Means and Variances
6.2.5 Canonical Variates
6.2.6 Assessment of Discriminant Rules
6.3 The Forward Search
6.3.1 Step 1: Choice of the Initial Subset
6.3.2 Step 2: Adding Observations During the Forward Search
6.3.3 Mahalanobis Distances and Discriminant Analysis in Step 2
6.4 Monitoring the Search
6.5 Transformations to Normality in Discriminant Analysis
6.6 Iris Data
6.7 Electrodes Data
6.8 Transformed Iris Data
6.9 Swiss Bank Notes
6.10 Importance of Transformations in Discriminant Analysis: A Simulated Example
6.10.1 A Deletion Analysis
6.10.2 Finding a Transformation with the Forward Search .
6.10.3 Discriminant Analysis and Confirmation of the Transformation
6.11 Muscular Dystrophy Data
6.11.1 The Data
6.11.2 Finding the Transformation
6.11.3 Outliers and Discriminant Analysis
6.11.4 More Data
6.12 Further reading
6.13 Exercises
6.14 Solutions
7 Cluster Analysis
7.1 Introduction
7.2 Clustering and the Forward Search
7.2.1 Three Steps in Finding Clusters
7.2.2 Standardized Mahalanobis Distances and Analysis with Many Clusters
7.2.3 Forward Searches in Cluster Analysis
7.3 The 60:80 Data
7.3.1 Failure of a Very Robust Statistical Method
7.3.2 The Forward Search
7.3.3 Further Plots for the 60:80 Data
7.4 Three Clusters, Two Outliers: A Second Synthetic Example
7.4.1 A ForwardAnalysis
7.4.2 A Very Robust Analysis
7.5 Data with a Bridge
7.5.1 Preliminary Analysis
7.5.2 Further Preliminary Analysis:Mahalanobis Distances for Groups and Individual Units
7.5.3 Exploratory Analysis: Single Clusters for the Bridge Data
7.5.4 Con.rmatory Analysis: Three Clusters for the Bridge Data
7.6 Financial Data
7.6.1 Preliminary Analysis
7.6.2 Exploratory Analysis
7.6.3 Con.rmatoryAnalysis
7.7 Diabetes Data
7.7.1 Preliminary Analysis
7.7.2 Exploratory Analysis
7.7.3 Confirmatory Analysis
7.8 Discussion
7.8.1 Agglomerative Hierarchical Clustering
7.8.2 Partitioning Methods
7.8.3 Some Examples from Traditional Cluster Analysis .
7.8.4 Model-Based Clustering
7.8.5 Further Reading
7.9 Exercises
7.10 Solutions
8 Spatial Linear Models
8.1 Introduction
8.2 Background on Kriging
8.2.1 Ordinary Kriging
8.2.2 Isotropic Semivariogram Models
8.2.3 Spatial Outliers
8.2.4 Kriging Diagnostics
8.2.5 Robust Estimation of the Variogram
8.3 The Forward Search for Ordinary Kriging
8.3.1 Choice of the Initial Subset
8.3.2 Progressing in the Search
8.3.3 Monitoring the Search
8.4 Contaminated Kriging Examples
8.4.1 Multiple Spatial Outliers
8.4.2 Pocket of Nonstationarity
8.5 Wheat Yield Data
8.6 Reflectance Data
8.7 Background on Spatial Autoregression
8.7.1 Neighbourhood Structure and Edge Correction
8.7.2 Simultaneous Spatial Autoregression (SAR) Models
8.7.3 Spatial Outliers Under the SAR Model
8.7.4 High Leverage Sites
8.8 The Block Forward Search for Spatial Autoregression
8.8.1 Subset Likelihood
8.8.2 Defining the Blocks
8.8.3 Choice of the Initial Subset
8.8.4 Progressing in the Search
8.8.5 Monitoring the Search
8.9 SAR Examples With Multiple Contamination
8.9.1 Masked Spatial Outliers
8.9.2 Estimation of ñ
8.9.3 Multiple High Leverage Sites
8.10 Wheat Yield Data Revisited
8.11 Further Reading
8.12 Exercises
8.13 Solutions
Appendix: Tables of Data
Bibliography
Author Index
Subject Index