Synopses & Reviews
Mining big data requires a deep investment in people and time. How can you be sure youre building the right models? With this hands-on book, youll learn a flexible toolset and methodology for building effective analytics applications with Hadoop.
Using lightweight tools such as Python, Apache Pig, and the D3.js library, your team will create an agile environment for exploring data, starting with an example application to mine your own email inboxes. Youll learn an iterative approach that enables you to quickly change the kind of analysis youre doing, depending on what the data is telling you. All example code in this book is available as working Heroku apps.
- Create analytics applications by using the agile big data development methodology
- Build value from your data in a series of agile sprints, using the data-value stack
- Gain insight by using several data structures to extract multiple features from a single dataset
- Visualize data with charts, and expose different aspects through interactive reports
- Use historical data to predict the future, and translate predictions into action
- Get feedback from users after each sprint to keep your project on track
Jurney shares insights, especially cautions, that he gained buildinganalytics applications at two Hadoop shops. He provides a how-to guide for building analytics applications with big data using Hadoop,helps teams collaborate on big data projects in an agile manner, and gives structure to the practice of applying agile big data analyticsin a way that advances the field. He writes for programmers with some exposure to developing software and working with data, and who are running some version of Unix.Annotation ©2014 Ringgold, Inc., Portland, OR (protoview.com)
Mining data requires a deep investment in people and time. How can you be sure youre building the right models? What tools help you connect with the customers needs? With this hands-on book, youll learn a flexible toolset and methodology for building effective analytics applications.
- Build an application to mine your own email inbox
- Use several data structures to extract multiple features from a single dataset, and learn how different perspectives can yield insight
- Rapidly boot your applications as simple front-ends to key/value stores
- Add features driven by descriptive and inferential statistics, machine learning, and data visualization
- Gather usage data and talk to real users to help guide your data-driven exploration
About the Author
Russell Jurney cut his data teeth in casino gaming, building web apps to analyze the performance of slot machines in the US and Mexico. After dabbling in entrepreneurship, interactive media and journalism, he moved to silicon valley to build analytics applications at scale at Ning and LinkedIn. He lives on the ocean in Pacifica, California with his wife Kate and two fuzzy dogs.
Table of Contents
Preface; Who This Book Is For; How This Book Is Organized; Conventions Used in This Book; Using Code Examples; Safari® Books Online; How to Contact Us; Setup; Chapter 1: Theory; 1.1 Agile Big Data; 1.2 Big Words Defined; 1.3 Agile Big Data Teams; 1.4 Agile Big Data Process; 1.5 Code Review and Pair Programming; 1.6 Agile Environments: Engineering Productivity; 1.7 Realizing Ideas with Large-Format Printing; Chapter 2: Data; 2.1 Email; 2.2 Working with Raw Data; 2.3 SQL; 2.4 NoSQL; 2.5 Data Perspectives; Chapter 3: Agile Tools; 3.1 Scalability = Simplicity; 3.2 Agile Big Data Processing; 3.3 Setting Up a Virtual Environment for Python; 3.4 Serializing Events with Avro; 3.5 Collecting Data; 3.6 Data Processing with Pig; 3.7 Publishing Data with MongoDB; 3.8 Searching Data with ElasticSearch; 3.9 Reflecting on our Workflow; 3.10 Lightweight Web Applications; 3.11 Presenting Our Data; 3.12 Conclusion; Chapter 4: To the Cloud!; 4.1 Introduction; 4.2 GitHub; 4.3 dotCloud; 4.4 Amazon Web Services; 4.5 Instrumentation; Climbing the Pyramid; Chapter 5: Collecting and Displaying Records; 5.1 Putting It All Together; 5.2 Collect and Serialize Our Inbox; 5.3 Process and Publish Our Emails; 5.4 Presenting Emails in a Browser; 5.5 Agile Checkpoint; 5.6 Listing Emails; 5.7 Searching Our Email; 5.8 Conclusion; Chapter 6: Visualizing Data with Charts; 6.1 Good Charts; 6.2 Extracting Entities: Email Addresses; 6.3 Visualizing Time; 6.4 Conclusion; Chapter 7: Exploring Data with Reports; 7.1 Building Reports with Multiple Charts; 7.2 Linking Records; 7.3 Extracting Keywords from Emails with TF-IDF; 7.4 Conclusion; Chapter 8: Making Predictions; 8.1 Predicting Response Rates to Emails; 8.2 Personalization; 8.3 Conclusion; Chapter 9: Driving Actions; 9.1 Properties of Successful Emails; 9.2 Better Predictions with Naive Bayes; 9.3 P(Reply | From and To); 9.4 P(Reply | Token); 9.5 Making Predictions in Real Time; 9.6 Logging Events; 9.7 Conclusion; Colophon;