Synopses & Reviews
Why learn R? Because it's rapidly becoming the standard for developing statistical software. R in a Nutshell provides a quick and practical way to learn this increasingly popular open source language and environment. You'll not only learn how to program in R, but also how to find the right user-contributed R packages for statistical modeling, visualization, and bioinformatics.
The author introduces you to the R environment, including the R graphical user interface and console, and takes you through the fundamentals of the object-oriented R language. Then, through a variety of practical examples from medicine, business, and sports, you'll learn how you can use this remarkable tool to solve your own data analysis problems.
- Understand the basics of the language, including the nature of R objects
- Learn how to write R functions and build your own packages
- Work with data through visualization, statistical analysis, and other methods
- Explore the wealth of packages contributed by the R community
- Become familiar with the lattice graphics package for high-level data visualization
- Learn about bioinformatics packages provided by Bioconductor
"I am excited about this book. R in a Nutshell is a great introduction to R, as well as a comprehensive reference for using R in data analytics and visualization. Adler provides 'real world' examples, practical advice, and scripts, making it accessible to anyone working with data, not just professional statisticians."
--Martin Schultz, Arthur K. Watson Professor of Computer Science, Yale University
Synopsis
If youre considering R for statistical computing and data visualization, this book provides a quick and practical guide to just about everything you can do with the open source R language and software environment. Youll learn how to write R functions and use R packages to help you prepare, visualize, and analyze data. Author Joseph Adler illustrates each process with a wealth of examples from medicine, business, and sports.
Updated for R 2.14 and 2.15, this second edition includes new and expanded chapters on R performance, the ggplot2 data visualization package, and parallel R computing with Hadoop.
- Get started quickly with an R tutorial and hundreds of examples
- Explore R syntax, objects, and other language details
- Find thousands of user-contributed R packages online, including Bioconductor
- Learn how to use R to prepare data for analysis
- Visualize your data with Rs graphics, lattice, and ggplot2 packages
- Use R to calculate statistical fests, fit models, and compute probability distributions
- Speed up intensive computations by writing parallel R programs for Hadoop
- Get a complete desktop reference to R
About the Author
Joseph Adler has years of experience working with lots of popular data mining packages, including databases (including Oracle, PostgreSQL, and MS Access), statistical analysis tools (SAS, SPSS, S-Plus, and R), and data mining tools (SAS Enterprise Miner, Insightful Miner, Oracle Data Mining, Weka, and SPSS Clementine). He is currently leading a project at Verisign to pick a data mining package for enterprise deployment.
Table of Contents
Preface; Why I Wrote This Book; When Should You Use R?; What's New in the Second Edition?; R License Terms; Examples; How This Book Is Organized; Conventions Used in This Book; Using Code Examples; Safari® Books Online; How to Contact Us; Acknowledgments; R Basics; Chapter 1: Getting and Installing R; 1.1 R Versions; 1.2 Getting and Installing Interactive R Binaries; Chapter 2: The R User Interface; 2.1 The R Graphical User Interface; 2.2 The R Console; 2.3 Batch Mode; 2.4 Using R Inside Microsoft Excel; 2.5 RStudio; 2.6 Other Ways to Run R; Chapter 3: A Short R Tutorial; 3.1 Basic Operations in R; 3.2 Functions; 3.3 Variables; 3.4 Introduction to Data Structures; 3.5 Objects and Classes; 3.6 Models and Formulas; 3.7 Charts and Graphics; 3.8 Getting Help; Chapter 4: R Packages; 4.1 An Overview of Packages; 4.2 Listing Packages in Local Libraries; 4.3 Loading Packages; 4.4 Exploring Package Repositories; 4.5 Installing Packages From Other Repositories; 4.6 Custom Packages; The R Language; Chapter 5: An Overview of the R Language; 5.1 Expressions; 5.2 Objects; 5.3 Symbols; 5.4 Functions; 5.5 Objects Are Copied in Assignment Statements; 5.6 Everything in R Is an Object; 5.7 Special Values; 5.8 Coercion; 5.9 The R Interpreter; 5.10 Seeing How R Works; Chapter 6: R Syntax; 6.1 Constants; 6.2 Operators; 6.3 Expressions; 6.4 Control Structures; 6.5 Accessing Data Structures; 6.6 R Code Style Standards; Chapter 7: R Objects; 7.1 Primitive Object Types; 7.2 Vectors; 7.3 Lists; 7.4 Other Objects; 7.5 Attributes; Chapter 8: Symbols and Environments; 8.1 Symbols; 8.2 Working with Environments; 8.3 The Global Environment; 8.4 Environments and Functions; 8.5 Exceptions; Chapter 9: Functions; 9.1 The Function Keyword; 9.2 Arguments; 9.3 Return Values; 9.4 Functions as Arguments; 9.5 Argument Order and Named Arguments; 9.6 Side Effects; Chapter 10: Object-Oriented Programming; 10.1 Overview of Object-Oriented Programming in R; 10.2 Object-Oriented Programming in R: S4 Classes; 10.3 Old-School OOP in R: S3; Working with Data; Chapter 11: Saving, Loading, and Editing Data; 11.1 Entering Data Within R; 11.2 Saving and Loading R Objects; 11.3 Importing Data from External Files; 11.4 Exporting Data; 11.5 Importing Data From Databases; 11.6 Getting Data from Hadoop; Chapter 12: Preparing Data; 12.1 Combining Data Sets; 12.2 Transformations; 12.3 Binning Data; 12.4 Subsets; 12.5 Summarizing Functions; 12.6 Data Cleaning; 12.7 Finding and Removing Duplicates; 12.8 Sorting; Data Visualization; Chapter 13: Graphics; 13.1 An Overview of R Graphics; 13.2 Graphics Devices; 13.3 Customizing Charts; Chapter 14: Lattice Graphics; 14.1 History; 14.2 An Overview of the Lattice Package; 14.3 High-Level Lattice Plotting Functions; 14.4 Customizing Lattice Graphics; 14.5 Low-Level Functions; Chapter 15: ggplot2; 15.1 A Short Introduction; 15.2 The Grammar of Graphics; 15.3 A More Complex Example: Medicare Data; 15.4 Quick Plot; 15.5 Creating Graphics with ggplot2; 15.6 Learning More; Statistics with R; Chapter 16: Analyzing Data; 16.1 Summary Statistics; 16.2 Correlation and Covariance; 16.3 Principal Components Analysis; 16.4 Factor Analysis; 16.5 Bootstrap Resampling; Chapter 17: Probability Distributions; 17.1 Normal Distribution; 17.2 Common Distribution-Type Arguments; 17.3 Distribution Function Families; Chapter 18: Statistical Tests; 18.1 Continuous Data; 18.2 Discrete Data; Chapter 19: Power Tests; 19.1 Experimental Design Example; 19.2 t-Test Design; 19.3 Proportion Test Design; 19.4 ANOVA Test Design; Chapter 20: Regression Models; 20.1 Example: A Simple Linear Model; 20.2 Details About the lm Function; 20.3 Subset Selection and Shrinkage Methods; 20.4 Nonlinear Models; 20.5 Survival Models; 20.6 Smoothing; 20.7 Machine Learning Algorithms for Regression; Chapter 21: Classification Models; 21.1 Linear Classification Models; 21.2 Machine Learning Algorithms for Classification; Chapter 22: Machine Learning; 22.1 Market Basket Analysis; 22.2 Clustering; Chapter 23: Time Series Analysis; 23.1 Autocorrelation Functions; 23.2 Time Series Models; Additional Topics; Chapter 24: Optimizing R Programs; 24.1 Measuring R Program Performance; 24.2 Optimizing Your R Code; 24.3 Other Ways to Speed Up R; Chapter 25: Bioconductor; 25.1 An Example; 25.2 Key Bioconductor Packages; 25.3 Data Structures; 25.4 Where to Go Next; Chapter 26: R and Hadoop; 26.1 R and Hadoop; 26.2 Other Packages for Parallel Computation with R; 26.3 Where to Learn More; R Reference; base; boot; class; cluster; codetools; foreign; grDevices; graphics; grid; KernSmooth; lattice; MASS; methods; mgcv; nlme; nnet; rpart; spatial; splines; stats; stats4; survival; tcltk; tools; utils; Bibliography; Colophon;