Synopses & Reviews
Collecting data is relatively easy, but turning raw information into something useful requires that you know how to extract precisely what you need. With this insightful book, intermediate to experienced programmers interested in data analysis will learn techniques for working with data in a business environment. You'll learn how to look at data to discover what it contains, how to capture those ideas in conceptual models, and then feed your understanding back into the organization through business plans, metrics dashboards, and other applications.
Along the way, you'll experiment with concepts through hands-on workshops at the end of each chapter. Above all, you'll learn how to think about the results you want to achieve -- rather than rely on tools to think for you.
- Use graphics to describe data with one, two, or dozens of variables
- Develop conceptual models using back-of-the-envelope calculations, as well as scaling and probability arguments
- Mine data with computationally intensive methods such as simulation and clustering
- Make your conclusions understandable through reports, dashboards, and other metrics programs
- Understand financial calculations, including the time-value of money
- Use dimensionality reduction techniques or predictive analytics to conquer challenging data analysis situations
- Become familiar with different open source programming environments for data analysis
"Finally, a concise reference for understanding how to conquer piles of data." --Austin King, Senior Web Developer, Mozilla
"An indispensable text for aspiring data scientists." --Michael E. Driscoll, CEO/Founder, Dataspora
Why learn R? Because it's rapidly becoming the standard for developing statistical software. R in a Nutshell provides a quick and practical way to learn this increasingly popular open source language and environment. You'll not only learn how to program in R, but also how to find the right user-contributed R packages for statistical modeling, visualization, and bioinformatics.
The author introduces you to the R environment, including the R graphical user interface and console, and takes you through the fundamentals of the object-oriented R language. Then, through a variety of practical examples from medicine, business, and sports, you'll learn how you can use this remarkable tool to solve your own data analysis problems.
- Understand the basics of the language, including the nature of R objects
- Learn how to write R functions and build your own packages
- Work with data through visualization, statistical analysis, and other methods
- Explore the wealth of packages contributed by the R community
- Become familiar with the lattice graphics package for high-level data visualization
- Learn about bioinformatics packages provided by Bioconductor
"I am excited about this book. R in a Nutshell is a great introduction to R, as well as a comprehensive reference for using R in data analytics and visualization. Adler provides 'real world' examples, practical advice, and scripts, making it accessible to anyone working with data, not just professional statisticians."
The ability to interpret and act on the massive amounts of information locked in web and enterprise systems is critical to success in the modern business economy. R, a free software environment for statistical computing and graphics, is a comprehensive package that empowers developers and analysts to capture, process, and respond intelligently to statistical information.
R in Action is the first book to present both the R system and the use cases that make it such a compelling package for business developers. The book begins by introducing the R language, and then moves on to various examples illustrating R's features. Coverage includes data mining methodologies, approaches to messy data, R's extensive graphical environment, useful add-on modules, and how to interface R with other software platforms and data management systems.
These days it seems like everyone is collecting data. But all of that data is just raw information -- to make that information meaningful, it has to be organized, filtered, and analyzed. Anyone can apply data analysis tools and get results, but without the right approach those results may be useless.
In Real World Data Analysis, author Philipp Janert teaches you how to think about data: how to effectively approach data analysis problems, and how to extract all of the available information from your data. Janert covers univariate data, data in multiple dimensions, time series data, graphical techniques, data mining, machine learning, and many other topics. He also reveals how seat-of-the-pants knowledge can lead you to the best approach right from the start, and how to assess results to determine if they're meaningful.
Perform data analysis with R quickly and efficiently with the task-oriented recipes in this cookbook. Although the R language and environment include everything you need to perform statistical work right out of the box, its structure can often be difficult to master. R Cookbook will help both beginners and experienced data programmers unlock and use the power of R.
This practical book provides a collection of concise recipes that will help you be productive with R immediately. Youll get the job done faster and learn more about R in the process.
Key topics include:
- Getting started with R
- Data structures
- Basic numerical calculations
- Basic probability calculations
- Basic statistical calculations and tests
- Regression and ANOVA
- Advanced statistical techniques
- Handy tips, techniques, and hacks that everyone can use
About the Author
After previous careers in physics and software development, Philipp K. Janert currently provides consulting services for data analysis, algorithm development, and mathematical modeling. He has worked for small start-ups and in large corporate environments, both in the U.S. and overseas. He prefers simple solutions that work to complicated ones that don't, and thinks that purpose is more important than process. Philipp is the author of "Gnuplot in Action - Understanding Data with Graphs" (Manning Publications), and has written for the O'Reilly Network, IBM developerWorks, and IEEE Software. He is named inventor on a handful of patents, and is an occasional contributor to CPAN. He holds a Ph.D. in theoretical physics from the University of Washington. Visit his company website at www.principal-value.com.
Table of Contents
; Preface; Before We Begin; Conventions Used in This Book; Using Code Examples; Safari® Books Online; How to Contact Us; Acknowledgments; Chapter 1: Introduction; 1.1 Data Analysis; 1.2 Whats in This Book; 1.3 Whats with the Workshops?; 1.4 Whats with the Math?; 1.5 What Youll Need; 1.6 Whats Missing; Graphics: Looking at Data; Chapter 2: A Single Variable: Shape and Distribution; 2.1 Dot and Jitter Plots; 2.2 Histograms and Kernel Density Estimates; 2.3 The Cumulative Distribution Function; 2.4 Rank-Order Plots and Lift Charts; 2.5 Only When Appropriate: Summary Statistics and Box Plots; 2.6 Workshop: NumPy; 2.7 Further Reading; Chapter 3: Two Variables: Establishing Relationships; 3.1 Scatter Plots; 3.2 Conquering Noise: Smoothing; 3.3 Logarithmic Plots; 3.4 Banking; 3.5 Linear Regression and All That; 3.6 Showing Whats Important; 3.7 Graphical Analysis and Presentation Graphics; 3.8 Workshop: matplotlib; 3.9 Further Reading; Chapter 4: Time As a Variable: Time-Series Analysis; 4.1 Examples; 4.2 The Task; 4.3 Smoothing; 4.4 Dont Overlook the Obvious!; 4.5 The Correlation Function; 4.6 Optional: Filters and Convolutions; 4.7 Workshop: scipy.signal; 4.8 Further Reading; Chapter 5: More Than Two Variables: Graphical Multivariate Analysis; 5.1 False-Color Plots; 5.2 A Lot at a Glance: Multiplots; 5.3 Composition Problems; 5.4 Novel Plot Types; 5.5 Interactive Explorations; 5.6 Workshop: Tools for Multivariate Graphics; 5.7 Further Reading; Chapter 6: Intermezzo: A Data Analysis Session; 6.1 A Data Analysis Session; 6.2 Workshop: gnuplot; 6.3 Further Reading; Analytics: Modeling Data; Chapter 7: Guesstimation and the Back of the Envelope; 7.1 Principles of Guesstimation; 7.2 How Good Are Those Numbers?; 7.3 Optional: A Closer Look at Perturbation Theory and Error Propagation; 7.4 Workshop: The Gnu Scientific Library (GSL); 7.5 Further Reading; Chapter 8: Models from Scaling Arguments; 8.1 Models; 8.2 Arguments from Scale; 8.3 Mean-Field Approximations; 8.4 Common Time-Evolution Scenarios; 8.5 Case Study: How Many Servers Are Best?; 8.6 Why Modeling?; 8.7 Workshop: Sage; 8.8 Further Reading; Chapter 9: Arguments from Probability Models; 9.1 The Binomial Distribution and Bernoulli Trials; 9.2 The Gaussian Distribution and the Central Limit Theorem; 9.3 Power-Law Distributions and Non-Normal Statistics; 9.4 Other Distributions; 9.5 Optional: Case StudyUnique Visitors over Time; 9.6 Workshop: Power-Law Distributions; 9.7 Further Reading; Chapter 10: What You Really Need to Know About Classical Statistics; 10.1 Genesis; 10.2 Statistics Defined; 10.3 Statistics Explained; 10.4 Controlled Experiments Versus Observational Studies; 10.5 Optional: Bayesian StatisticsThe Other Point of View; 10.6 Workshop: R; 10.7 Further Reading; Chapter 11: Intermezzo: MythbustingBigfoot, Least Squares, and All That; 11.1 How to Average Averages; 11.2 The Standard Deviation; 11.3 Least Squares; 11.4 Further Reading; Computation: Mining Data; Chapter 12: Simulations; 12.1 A Warm-Up Question; 12.2 Monte Carlo Simulations; 12.3 Resampling Methods; 12.4 Workshop: Discrete Event Simulations with SimPy; 12.5 Further Reading; Chapter 13: Finding Clusters; 13.1 What Constitutes a Cluster?; 13.2 Distance and Similarity Measures; 13.3 Clustering Methods; 13.4 Pre- and Postprocessing; 13.5 Other Thoughts; 13.6 A Special Case: Market Basket Analysis; 13.7 A Word of Warning; 13.8 Workshop: Pycluster and the C Clustering Library; 13.9 Further Reading; Chapter 14: Seeing the Forest for the Trees: Finding Important Attributes; 14.1 Principal Component Analysis; 14.2 Visual Techniques; 14.3 Kohonen Maps; 14.4 Workshop: PCA with R; 14.5 Further Reading; Chapter 15: Intermezzo: When More Is Different; 15.1 A Horror Story; 15.2 Some Suggestions; 15.3 What About Map/Reduce?; 15.4 Workshop: Generating Permutations; 15.5 Further Reading; Applications: Using Data; Chapter 16: Reporting, Business Intelligence, and Dashboards; 16.1 Business Intelligence; 16.2 Corporate Metrics and Dashboards; 16.3 Data Quality Issues; 16.4 Workshop: Berkeley DB and SQLite; 16.5 Further Reading; Chapter 17: Financial Calculations and Modeling; 17.1 The Time Value of Money; 17.2 Uncertainty in Planning and Opportunity Costs; 17.3 Cost Concepts and Depreciation; 17.4 Should You Care?; 17.5 Is This All That Matters?; 17.6 Workshop: The Newsvendor Problem; 17.7 Further Reading; Chapter 18: Predictive Analytics; 18.1 Topics in Predictive Analytics; 18.2 Some Classification Terminology; 18.3 Algorithms for Claaaaaassification; 18.4 The Process; 18.5 The Secret Sauce; 18.6 The Nature of Statistical Learning; 18.7 Workshop: Two Do-It-Yourself Classifiers; 18.8 Further Reading; Chapter 19: Epilogue: Facts Are Not Reality; Programming Environments for Scientific Computation and Data Analysis; Software Tools; A Catalog of Scientific Software; Writing Your Own; Further Reading; Results from Calculus; Common Functions; Calculus; Useful Tricks; Notation and Basic Math; Where to Go from Here; Further Reading; Working with Data; Sources for Data; Cleaning and Conditioning; Sampling; Data File Formats; The Care and Feeding of Your Data Zoo; Skills; Terminology; Further Reading; About the Author; Colophon;