Synopses & Reviews
What people are saying about R in a Nutshell
"I am excited about this book. R in a Nutshell is a great introduction to R, as well as a comprehensive reference for using R in data analytics and visualization. Adler provides 'real world' examples, practical advice, and scripts, making it accessible to anyone working with data, not just professional statisticians."
--Martin Schultz, Arthur K. Watson Professor of Computer Science, Yale University
"R in a Nutshell is an ideal book for getting started with R. Newcomers will find the fundamentals for performing statistical analysis and graphics, all illustrated with practical examples. This book is an invaluable reference for anyone who wants to learn what R is and what is can do, even for longtime R users looking for new tips and tricks."
--David M. Smith, Editor of the "Revolutions" blog at REvolution Computing
Why learn R? Because it's rapidly becoming the standard for developing statistical software. R in a Nutshell provides a quick and practical way to learn this increasingly popular open source language and environment. You'll not only learn how to program in R, but also how to find the right user-contributed R packages for statistical modeling, visualization, and bioinformatics.
The author introduces you to the R environment, including the R graphical user interface and console, and takes you through the fundamentals of the object-oriented R language. Then, through a variety of practical examples from medicine, business, and sports, you'll learn how you can use this remarkable tool to solve your own data analysis problems.
- Understand the basics of the language, including the nature of R objects
- Learn how to write R functions and build your own packages
- Work with data through visualization, statistical analysis, and other methods
- Explore the wealth of packages contributed by the R community
- Become familiar with the lattice graphics package for high-level data visualization
- Learn about bioinformatics packages provided by Bioconductor
Perform data analysis with R quickly and efficiently with the task-oriented recipes in this cookbook. Although the R language and environment include everything you need to perform statistical work right out of the box, its structure can often be difficult to master. R Cookbook will help both beginners and experienced data programmers unlock and use the power of R.
This practical book provides a collection of concise recipes that will help you be productive with R immediately. Youll get the job done faster and learn more about R in the process.
Key topics include:
- Getting started with R
- Data structures
- Basic numerical calculations
- Basic probability calculations
- Basic statistical calculations and tests
- Regression and ANOVA
- Advanced statistical techniques
- Handy tips, techniques, and hacks that everyone can use
With more than 200 practical recipes, this book helps you perform data analysis with R quickly and efficiently. The R language provides everything you need to do statistical work, but its structure can be difficult to master. This collection of concise, task-oriented recipes makes you productive with R immediately, with solutions ranging from basic tasks to input and output, general statistics, graphics, and linear regression.
Each recipe addresses a specific problem, with a discussion that explains the solution and offers insight into how it works. If youre a beginner, R Cookbook will help get you started. If youre an experienced data programmer, it will jog your memory and expand your horizons. Youll get the job done faster and learn more about R in the process.
- Create vectors, handle variables, and perform other basic functions
- Input and output data
- Tackle data structures such as matrices, lists, factors, and data frames
- Work with probability, probability distributions, and random variables
- Calculate statistics and confidence intervals, and perform statistical tests
- Create a variety of graphic displays
- Build statistical models with linear regressions and analysis of variance (ANOVA)
- Explore advanced statistical techniques, such as finding clusters in your data
"Wonderfully readable, R Cookbook serves not only as a solutions manual of sorts, but as a truly enjoyable way to explore the R language—one practical example at a time."—Jeffrey Ryan, software consultant and R package author
About the Author
Joseph Adler has many years of experience in data mining and data analysis at companies including DoubleClick, American Express, and VeriSign. He graduated from MIT with an Sc.B and M.Eng in Computer Science and Electrical Engineering. He is the inventor of several patents for computer security and cryptography, and the author of Baseball Hacks and R in a Nutshell.
Table of Contents
Preface; The Recipes; A Note on Terminology; Software and Platform Notes; Other Resources; Conventions Used in This Book; Using Code Examples; Safari® Books Online; How to Contact Us; Acknowledgments; Chapter 1: Getting Started and Getting Help; 1.1 Introduction; 1.2 Downloading and Installing R; 1.3 Starting R; 1.4 Entering Commands; 1.5 Exiting from R; 1.6 Interrupting R; 1.7 Viewing the Supplied Documentation; 1.8 Getting Help on a Function; 1.9 Searching the Supplied Documentation; 1.10 Getting Help on a Package; 1.11 Searching the Web for Help; 1.12 Finding Relevant Functions and Packages; 1.13 Searching the Mailing Lists; 1.14 Submitting Questions to the Mailing Lists; Chapter 2: Some Basics; 2.1 Introduction; 2.2 Printing Something; 2.3 Setting Variables; 2.4 Listing Variables; 2.5 Deleting Variables; 2.6 Creating a Vector; 2.7 Computing Basic Statistics; 2.8 Creating Sequences; 2.9 Comparing Vectors; 2.10 Selecting Vector Elements; 2.11 Performing Vector Arithmetic; 2.12 Getting Operator Precedence Right; 2.13 Defining a Function; 2.14 Typing Less and Accomplishing More; 2.15 Avoiding Some Common Mistakes; Chapter 3: Navigating the Software; 3.1 Introduction; 3.2 Getting and Setting the Working Directory; 3.3 Saving Your Workspace; 3.4 Viewing Your Command History; 3.5 Saving the Result of the Previous Command; 3.6 Displaying the Search Path; 3.7 Accessing the Functions in a Package; 3.8 Accessing Built-in Datasets; 3.9 Viewing the List of Installed Packages; 3.10 Installing Packages from CRAN; 3.11 Setting a Default CRAN Mirror; 3.12 Suppressing the Startup Message; 3.13 Running a Script; 3.14 Running a Batch Script; 3.15 Getting and Setting Environment Variables; 3.16 Locating the R Home Directory; 3.17 Customizing R; Chapter 4: Input and Output; 4.1 Introduction; 4.2 Entering Data from the Keyboard; 4.3 Printing Fewer Digits (or More Digits); 4.4 Redirecting Output to a File; 4.5 Listing Files; 4.6 Dealing with "Cannot Open File" in Windows; 4.7 Reading Fixed-Width Records; 4.8 Reading Tabular Data Files; 4.9 Reading from CSV Files; 4.10 Writing to CSV Files; 4.11 Reading Tabular or CSV Data from the Web; 4.12 Reading Data from HTML Tables; 4.13 Reading Files with a Complex Structure; 4.14 Reading from MySQL Databases; 4.15 Saving and Transporting Objects; Chapter 5: Data Structures; 5.1 Introduction; 5.2 Appending Data to a Vector; 5.3 Inserting Data into a Vector; 5.4 Understanding the Recycling Rule; 5.5 Creating a Factor (Categorical Variable); 5.6 Combining Multiple Vectors into One Vector and a Factor; 5.7 Creating a List; 5.8 Selecting List Elements by Position; 5.9 Selecting List Elements by Name; 5.10 Building a Name/Value Association List; 5.11 Removing an Element from a List; 5.12 Flatten a List into a Vector; 5.13 Removing NULL Elements from a List; 5.14 Removing List Elements Using a Condition; 5.15 Initializing a Matrix; 5.16 Performing Matrix Operations; 5.17 Giving Descriptive Names to the Rows and Columns of a Matrix; 5.18 Selecting One Row or Column from a Matrix; 5.19 Initializing a Data Frame from Column Data; 5.20 Initializing a Data Frame from Row Data; 5.21 Appending Rows to a Data Frame; 5.22 Preallocating a Data Frame; 5.23 Selecting Data Frame Columns by Position; 5.24 Selecting Data Frame Columns by Name; 5.25 Selecting Rows and Columns More Easily; 5.26 Changing the Names of Data Frame Columns; 5.27 Editing a Data Frame; 5.28 Removing NAs from a Data Frame; 5.29 Excluding Columns by Name; 5.30 Combining Two Data Frames; 5.31 Merging Data Frames by Common Column; 5.32 Accessing Data Frame Contents More Easily; 5.33 Converting One Atomic Value into Another; 5.34 Converting One Structured Data Type into Another; Chapter 6: Data Transformations; 6.1 Introduction; 6.2 Splitting a Vector into Groups; 6.3 Applying a Function to Each List Element; 6.4 Applying a Function to Every Row; 6.5 Applying a Function to Every Column; 6.6 Applying a Function to Groups of Data; 6.7 Applying a Function to Groups of Rows; 6.8 Applying a Function to Parallel Vectors or Lists; Chapter 7: Strings and Dates; 7.1 Introduction; 7.2 Getting the Length of a String; 7.3 Concatenating Strings; 7.4 Extracting Substrings; 7.5 Splitting a String According to a Delimiter; 7.6 Replacing Substrings; 7.7 Seeing the Special Characters in a String; 7.8 Generating All Pairwise Combinations of Strings; 7.9 Getting the Current Date; 7.10 Converting a String into a Date; 7.11 Converting a Date into a String; 7.12 Converting Year, Month, and Day into a Date; 7.13 Getting the Julian Date; 7.14 Extracting the Parts of a Date; 7.15 Creating a Sequence of Dates; Chapter 8: Probability; 8.1 Introduction; 8.2 Counting the Number of Combinations; 8.3 Generating Combinations; 8.4 Generating Random Numbers; 8.5 Generating Reproducible Random Numbers; 8.6 Generating a Random Sample; 8.7 Generating Random Sequences; 8.8 Randomly Permuting a Vector; 8.9 Calculating Probabilities for Discrete Distributions; 8.10 Calculating Probabilities for Continuous Distributions; 8.11 Converting Probabilities to Quantiles; 8.12 Plotting a Density Function; Chapter 9: General Statistics; 9.1 Introduction; 9.2 Summarizing Your Data; 9.3 Calculating Relative Frequencies; 9.4 Tabulating Factors and Creating Contingency Tables; 9.5 Testing Categorical Variables for Independence; 9.6 Calculating Quantiles (and Quartiles) of a Dataset; 9.7 Inverting a Quantile; 9.8 Converting Data to Z-Scores; 9.9 Testing the Mean of a Sample (t Test); 9.10 Forming a Confidence Interval for a Mean; 9.11 Forming a Confidence Interval for a Median; 9.12 Testing a Sample Proportion; 9.13 Forming a Confidence Interval for a Proportion; 9.14 Testing for Normality; 9.15 Testing for Runs; 9.16 Comparing the Means of Two Samples; 9.17 Comparing the Locations of Two Samples Nonparametrically; 9.18 Testing a Correlation for Significance; 9.19 Testing Groups for Equal Proportions; 9.20 Performing Pairwise Comparisons Between Group Means; 9.21 Testing Two Samples for the Same Distribution; Chapter 10: Graphics; 10.1 Introduction; 10.2 Creating a Scatter Plot; 10.3 Adding a Title and Labels; 10.4 Adding a Grid; 10.5 Creating a Scatter Plot of Multiple Groups; 10.6 Adding a Legend; 10.7 Plotting the Regression Line of a Scatter Plot; 10.8 Plotting All Variables Against All Other Variables; 10.9 Creating One Scatter Plot for Each Factor Level; 10.10 Creating a Bar Chart; 10.11 Adding Confidence Intervals to a Bar Chart; 10.12 Coloring a Bar Chart; 10.13 Plotting a Line from x and y Points; 10.14 Changing the Type, Width, or Color of a Line; 10.15 Plotting Multiple Datasets; 10.16 Adding Vertical or Horizontal Lines; 10.17 Creating a Box Plot; 10.18 Creating One Box Plot for Each Factor Level; 10.19 Creating a Histogram; 10.20 Adding a Density Estimate to a Histogram; 10.21 Creating a Discrete Histogram; 10.22 Creating a Normal Quantile-Quantile (Q-Q) Plot; 10.23 Creating Other Quantile-Quantile Plots; 10.24 Plotting a Variable in Multiple Colors; 10.25 Graphing a Function; 10.26 Pausing Between Plots; 10.27 Displaying Several Figures on One Page; 10.28 Opening Additional Graphics Windows; 10.29 Writing Your Plot to a File; 10.30 Changing Graphical Parameters; Chapter 11: Linear Regression and ANOVA; 11.1 Introduction; 11.2 Performing Simple Linear Regression; 11.3 Performing Multiple Linear Regression; 11.4 Getting Regression Statistics; 11.5 Understanding the Regression Summary; 11.6 Performing Linear Regression Without an Intercept; 11.7 Performing Linear Regression with Interaction Terms; 11.8 Selecting the Best Regression Variables; 11.9 Regressing on a Subset of Your Data; 11.10 Using an Expression Inside a Regression Formula; 11.11 Regressing on a Polynomial; 11.12 Regressing on Transformed Data; 11.13 Finding the Best Power Transformation (Box-Cox Procedure); 11.14 Forming Confidence Intervals for Regression Coefficients; 11.15 Plotting Regression Residuals; 11.16 Diagnosing a Linear Regression; 11.17 Identifying Influential Observations; 11.18 Testing Residuals for Autocorrelation (Durbin-Watson Test); 11.19 Predicting New Values; 11.20 Forming Prediction Intervals; 11.21 Performing One-Way ANOVA; 11.22 Creating an Interaction Plot; 11.23 Finding Differences Between Means of Groups; 11.24 Performing Robust ANOVA (Kruskal-Wallis Test); 11.25 Comparing Models by Using ANOVA; Chapter 12: Useful Tricks; 12.1 Introduction; 12.2 Peeking at Your Data; 12.3 Widen Your Output; 12.4 Printing the Result of an Assignment; 12.5 Summing Rows and Columns; 12.6 Printing Data in Columns; 12.7 Binning Your Data; 12.8 Finding the Position of a Particular Value; 12.9 Selecting Every nth Element of a Vector; 12.10 Finding Pairwise Minimums or Maximums; 12.11 Generating All Combinations of Several Factors; 12.12 Flatten a Data Frame; 12.13 Sorting a Data Frame; 12.14 Sorting by Two Columns; 12.15 Stripping Attributes from a Variable; 12.16 Revealing the Structure of an Object; 12.17 Timing Your Code; 12.18 Suppressing Warnings and Error Messages; 12.19 Taking Function Arguments from a List; 12.20 Defining Your Own Binary Operators; Chapter 13: Beyond Basic Numerics and Statistics; 13.1 Introduction; 13.2 Minimizing or Maximizing a Single-Parameter Function; 13.3 Minimizing or Maximizing a Multiparameter Function; 13.4 Calculating Eigenvalues and Eigenvectors; 13.5 Performing Principal Component Analysis; 13.6 Performing Simple Orthogonal Regression; 13.7 Finding Clusters in Your Data; 13.8 Predicting a Binary-Valued Variable (Logistic Regression); 13.9 Bootstrapping a Statistic; 13.10 Factor Analysis; Chapter 14: Time Series Analysis; 14.1 Introduction; 14.2 Representing Time Series Data; 14.3 Plotting Time Series Data; 14.4 Extracting the Oldest or Newest Observations; 14.5 Subsetting a Time Series; 14.6 Merging Several Time Series; 14.7 Filling or Padding a Time Series; 14.8 Lagging a Time Series; 14.9 Computing Successive Differences; 14.10 Performing Calculations on Time Series; 14.11 Computing a Moving Average; 14.12 Applying a Function by Calendar Period; 14.13 Applying a Rolling Function; 14.14 Plotting the Autocorrelation Function; 14.15 Testing a Time Series for Autocorrelation; 14.16 Plotting the Partial Autocorrelation Function; 14.17 Finding Lagged Correlations Between Two Time Series; 14.18 Detrending a Time Series; 14.19 Fitting an ARIMA Model; 14.20 Removing Insignificant ARIMA Coefficients; 14.21 Running Diagnostics on an ARIMA Model; 14.22 Making Forecasts from an ARIMA Model; 14.23 Testing for Mean Reversion; 14.24 Smoothing a Time Series; Colophon;