Synopses & Reviews
Python for Data Analysis is concerned with the nuts and bolts of manipulating, processing, cleaning, and crunching data in Python. It is also a practical, modern introduction to scientific computing in Python, tailored for data-intensive applications. This is a book about the parts of the Python language and libraries youll need to effectively solve a broad set of data analysis problems. This book is not an exposition on analytical methods using Python as the implementation language.
Written by Wes McKinney, the main author of the pandas library, this hands-on book is packed with practical cases studies. Its ideal for analysts new to Python and for Python programmers new to scientific computing.
- Use the IPython interactive shell as your primary development environment
- Learn basic and advanced NumPy (Numerical Python) features
- Get started with data analysis tools in the pandas library
- Use high-performance tools to load, clean, transform, merge, and reshape data
- Create scatter plots and static or interactive visualizations with matplotlib
- Apply the pandas groupby facility to slice, dice, and summarize datasets
- Measure data by points in time, whether its specific instances, fixed periods, or intervals
- Learn how to solve problems in web analytics, social sciences, finance, and economics, through detailed examples
Finding great data analysts is difficult. Despite the explosive growth of data in industries ranging from manufacturing and retail to high technology, finance, and healthcare, learning and accessing data analysis tools has remained a challenge. This pragmatic guide will help train you in one of the most important tools in the field—Python.
Filled with practical case studies, Python for Data Analysis demonstrates the nuts and bolts of manipulating, processing, cleaning, and crunching data with Python. It also serves as a modern introduction to scientific computing in Python for data-intensive applications. Learn about the growing field of data analysis from an expert in the community.
- Learn everything you need to start doing real data analysis work with Python
- Get the most complete instruction on the basics of the “modern scientific Python platform”
- Learn from an insider who builds tools for the scientific stack
- Get an excellent introduction for novices and a wealth of advanced methods for experienced analysts
About the Author
Wes McKinney is the main author of pandas, the popular open sourcePython library for data analysis. Wes is an active speaker andparticipant in the Python and open source communities. He worked as aquantitative analyst at AQR Capital Management and Python consultantbefore founding DataPad, a data analytics company, in 2013. Hegraduated from MIT with an S.B. in Mathematics.
Table of Contents
Preface; Conventions Used in This Book; Using Code Examples; Safari® Books Online; How to Contact Us; Chapter 1: Preliminaries; 1.1 What Is This Book About?; 1.2 Why Python for Data Analysis?; 1.3 Essential Python Libraries; 1.4 Installation and Setup; 1.5 Community and Conferences; 1.6 Navigating This Book; 1.7 Acknowledgements; Chapter 2: Introductory Examples; 2.1 1.usa.gov data from bit.ly; 2.2 MovieLens 1M Data Set; 2.3 US Baby Names 1880-2010; 2.4 Conclusions and The Path Ahead; Chapter 3: IPython: An Interactive Computing and Development Environment; 3.1 IPython Basics; 3.2 Using the Command History; 3.3 Interacting with the Operating System; 3.4 Software Development Tools; 3.5 IPython HTML Notebook; 3.6 Tips for Productive Code Development Using IPython; 3.7 Advanced IPython Features; 3.8 Credits; Chapter 4: NumPy Basics: Arrays and Vectorized Computation; 4.1 The NumPy ndarray: A Multidimensional Array Object; 4.2 Universal Functions: Fast Element-wise Array Functions; 4.3 Data Processing Using Arrays; 4.4 File Input and Output with Arrays; 4.5 Linear Algebra; 4.6 Random Number Generation; 4.7 Example: Random Walks; Chapter 5: Getting Started with pandas; 5.1 Introduction to pandas Data Structures; 5.2 Essential Functionality; 5.3 Summarizing and Computing Descriptive Statistics; 5.4 Handling Missing Data; 5.5 Hierarchical Indexing; 5.6 Other pandas Topics; Chapter 6: Data Loading, Storage, and File Formats; 6.1 Reading and Writing Data in Text Format; 6.2 Binary Data Formats; 6.3 Interacting with HTML and Web APIs; 6.4 Interacting with Databases; Chapter 7: Data Wrangling: Clean, Transform, Merge, Reshape; 7.1 Combining and Merging Data Sets; 7.2 Reshaping and Pivoting; 7.3 Data Transformation; 7.4 String Manipulation; 7.5 Example: USDA Food Database; Chapter 8: Plotting and Visualization; 8.1 A Brief matplotlib API Primer; 8.2 Plotting Functions in pandas; 8.3 Plotting Maps: Visualizing Haiti Earthquake Crisis Data; 8.4 Python Visualization Tool Ecosystem; Chapter 9: Data Aggregation and Group Operations; 9.1 GroupBy Mechanics; 9.2 Data Aggregation; 9.3 Group-wise Operations and Transformations; 9.4 Pivot Tables and Cross-Tabulation; 9.5 Example: 2012 Federal Election Commission Database; Chapter 10: Time Series; 10.1 Date and Time Data Types and Tools; 10.2 Time Series Basics; 10.3 Date Ranges, Frequencies, and Shifting; 10.4 Time Zone Handling; 10.5 Periods and Period Arithmetic; 10.6 Resampling and Frequency Conversion; 10.7 Time Series Plotting; 10.8 Moving Window Functions; 10.9 Performance and Memory Usage Notes; Chapter 11: Financial and Economic Data Applications; 11.1 Data Munging Topics; 11.2 Group Transforms and Analysis; 11.3 More Example Applications; Chapter 12: Advanced NumPy; 12.1 ndarray Object Internals; 12.2 Advanced Array Manipulation; 12.3 Broadcasting; 12.4 Advanced ufunc Usage; 12.5 Structured and Record Arrays; 12.6 More About Sorting; 12.7 NumPy Matrix Class; 12.8 Advanced Array Input and Output; 12.9 Performance Tips; Python Language Essentials; The Python Interpreter; The Basics; Data Structures and Sequences; Functions; Files and the operating system; Colophon;