Synopses & Reviews
Modeling with Data fully explains how to execute computationally intensive analyses on very large data sets, showing readers how to determine the best methods for solving a variety of different problems, how to create and debug statistical models, and how to run an analysis and evaluate the results.
Ben Klemens introduces a set of open and unlimited tools, and uses them to demonstrate data management, analysis, and simulation techniques essential for dealing with large data sets and computationally intensive procedures. He then demonstrates how to easily apply these tools to the many threads of statistical technique, including classical, Bayesian, maximum likelihood, and Monte Carlo methods. Klemens's accessible survey describes these models in a unified and nontraditional manner, providing alternative ways of looking at statistical concepts that often befuddle students. The book includes nearly one hundred sample programs of all kinds. Links to these programs will be available on this page at a later date.
Modeling with Data will interest anyone looking for a comprehensive guide to these powerful statistical tools, including researchers and graduate students in the social sciences, biology, engineering, economics, and applied mathematics.
Review
"This book presents an original, cheap and powerful solution to the problem of analysis of large data sets. . . . The book is devoted mainly to the practitioner of Statistics, but is also useful to mathematicians, computer scientists, researchers and students in the biology, economics and social sciences."--Radu Trimbitas, StudiaUBB
Review
This book presents an original, cheap and powerful solution to the problem of analysis of large data sets. . . . The book is devoted mainly to the practitioner of Statistics, but is also useful to mathematicians, computer scientists, researchers and students in the biology, economics and social sciences. Radu Trimbitas
Synopsis
"I am a psychiatric geneticist but my degree is in neuroscience, which means that I now do far more statistics than I have been trained for. I cannot overstate to you the magnitude of the change in my productivity since finding this book. Even after reading the first few chapters, which explain why data analysis is painful and how one can implement a long-term solution, my research moved forward greatly."
--Amber Baum, National Institute of Mental Health"I enjoyed reading this book and learned a great deal from it. Modeling with Data filled in a lot of holes in my knowledge, and I think that will be true in general for other readers as well. There is a lot of high-quality and interesting material here."--Brendan Halpin, University of Limerick
Synopsis
Maximum likelihood estimators 337
10.3 Missing data 345
10.4 Testing with likelihoods 348
Chapter 11. Monte Carlo 356
11.1 Random number generation 357
11.2
Synopsis
Finding statistics for a distribution 364
11.3 Inference: Finding statistics for a parameter 367
11.4 Drawing a distribution 371
11.5 Non-parametric testing 375
Appendix A: Environments and makefiles 381
A.1 Environment variables 381
A.2 Paths 385
A.3 Make 387
Appendix B: Text processing 392
B.1 Shell scripts 393
B.2 Some tools for scripting 398
B.3 Regular expressions 403
B.4 Adding and deleting 413
B.5 More examples 415
Appendix C: Glossary 419
Bibliography 435
Index 443
Synopsis
Modeling with Data fully explains how to execute computationally intensive analyses on very large data sets, showing readers how to determine the best methods for solving a variety of different problems, how to create and debug statistical models, and how to run an analysis and evaluate the results.
Ben Klemens introduces a set of open and unlimited tools, and uses them to demonstrate data management, analysis, and simulation techniques essential for dealing with large data sets and computationally intensive procedures. He then demonstrates how to easily apply these tools to the many threads of statistical technique, including classical, Bayesian, maximum likelihood, and Monte Carlo methods. Klemens's accessible survey describes these models in a unified and nontraditional manner, providing alternative ways of looking at statistical concepts that often befuddle students. The book includes nearly one hundred sample programs of all kinds. Links to these programs will be available on this page at a later date.
Modeling with Data will interest anyone looking for a comprehensive guide to these powerful statistical tools, including researchers and graduate students in the social sciences, biology, engineering, economics, and applied mathematics.
Synopsis
Modeling with Data fully explains how to execute computationally intensive analyses on very large data sets, showing readers how to determine the best methods for solving a variety of different problems, how to create and debug statistical models, and how to run an analysis and evaluate the results.
Ben Klemens introduces a set of open and unlimited tools, and uses them to demonstrate data management, analysis, and simulation techniques essential for dealing with large data sets and computationally intensive procedures. He then demonstrates how to easily apply these tools to the many threads of statistical technique, including classical, Bayesian, maximum likelihood, and Monte Carlo methods. Klemens's accessible survey describes these models in a unified and nontraditional manner, providing alternative ways of looking at statistical concepts that often befuddle students. The book includes nearly one hundred sample programs of all kinds. Links to these programs will be available on this page at a later date.
Modeling with Data will interest anyone looking for a comprehensive guide to these powerful statistical tools, including researchers and graduate students in the social sciences, biology, engineering, economics, and applied mathematics.
About the Author
Ben Klemens is a senior statistician at the National Institute of Mental Health. He is also a guest scholar at the Center on Social and Economic Dynamics at the Brookings Institution.
Table of Contents
Preface xi
Chapter 1. Statistics in the modern day 1
PART I COMPUTING 15
Chapter 2. C 17
2.1 Lines 18
2.2 Variables and their declarations 28
2.3 Functions 34
2.4 The debugger 43
2.5 Compiling and running 48
2.6 Pointers 53
2.7 Arrays and other pointer tricks 59
2.8 Strings 65
2.9 *Errors 69
Chapter 3. Databases 74
3.1 Basic queries 76
3.2 *Doing more with queries 80
3.3 Joins and subqueries 87
3.4 On database design 94
3.5 Folding queries into C code 98
3.6 Maddening details 103
3.7 Some examples 108
Chapter 4. Matrices and models 113
4.1 The GSL's matrices and vectors 114
4.2 apo_da t120
4.3 Shunting data 123
4.4 Linear algebra 129
4.5 Numbers 135
4.6 *gsl_matrixand gsl_ve torinternals 140
4.7 Models 143
Chapter 5. Graphics 157
5.1 plot 160
5.2 *Some common settings 163
5.3 From arrays to plots 166
5.4 A sampling of special plots 171
5.5 Animation 177
5.6 On producing good plots 180
5.7 *Graphs--nodes and flowcharts 182
5.8 Printing and LATEX 185
Chapter 6. *More coding tools 189
6.1 Function pointers 190
6.2 Data structures 193
6.3 Parameters 203
6.4 *Syntactic sugar 210
6.5 More tools 214
PART II STATISTICS 217
Chapter 7. Distributions for description 219
7.1 Moments 219
7.2 Sample distributions 235
7.3 Using the sample distributions 252
7.4 Non-parametric description 261
Chapter 8. Linear projections 264
8.1 *Principal component analysis 265
8.2 OLS and friends 270
8.3 Discrete variables 280
8.4 Multilevel modeling 288
Chapter 9. Hypothesis testing with the CLT 295
9.1 The Central Limit Theorem 297
9.2 Meet the Gaussian family 301
9.3 Testing a hypothesis 307
9.4 ANOVA 312
9.5 Regression 315
9.6 Goodness of fit 319
Chapter 10. Maximum likelihood estimation 325
10.1 Log likelihood and friends 326
10.2