Synopses & Reviews
The recent dramatic rise in the number of public datasets available free from the Internet, coupled with the evolution of the Open Source software movement, which makes powerful analysis packages like R freely available, have greatly increased both the range of opportunities for exploratory data analysis and the variety of tools that support this type of analysis.
This book will provide a thorough introduction to a useful subset of these analysis tools, illustrating what they are, what they do, and when and how they fail. Specific topics covered include descriptive characterizations like summary statistics (mean, median, standard deviation, MAD scale estimate), graphical techniques like boxplots and nonparametric density estimates, various forms of regression modeling (standard linear regression models, logistic regression, and highly robust techniques like least trimmed squares), and the recognition and treatment of important data anomalies like outliers and missing data. The unique combination of topics presented in this book separate it from any other book of its kind.
Intended for use as an introductory textbook for an exploratory data analysis course or as self-study companion for professionals and graduate students, this book assumes familiarity with calculus and linear algebra, though no previous exposure to probability or statistics is required. Both simulation-based and real data examples are included, as are end-of-chapter exercises and both R code and datasets.
About the Author
Ronald Pearson has held a wide variety of technical positions in both academia and industry, including the DuPont Company, the Swiss Federal Institute of Technology (ETH, Zurich), the Tampere University of Technology in Tampere, Finland, and most recently, the Travelers Companies. Dr. Pearson's experience has included the analysis and modeling of industrial process operating data, the design of nonlinear digital filters for data cleaning applications, the analysis of historical clinical data, and he is currently involved in developing models for predictive analytics applied to large business datasets. His research interests include model structure selection for nonlinear discrete-time dynamic models of empirical data, the algebraic characterization and design of nonlinear digital filters, and the development of exploratory data analysis techniques for large datasets involving mixed data types.
Table of Contents
Contents
1. The Art of Analyzing Data
2. Data: Types, Uncertainty and Quality
3. Characterizing Categorical Variables
4. Uncertainty in Real Variables
5. Fitting Straight Lines
6. A Brief Introduction to Estimation Theory
7. Outliers: Distributional Monsters (?) That Lurk in Data
8. Characterizing a Dataset
9. Confidence Intervals and Hypothesis Testing
10. Relations among Variables
11. Regression Models I: Real Data
12. Reexpression: Data Transformations
13. Regression Models II: Mixed Data Types
14. Characterizing Analysis Results
15. Regression Models III: Diagnostics and Refinements
16. Dealing with Missing Data