Synopses & Reviews
Written by renowned data science experts Foster Provost and Tom Fawcett, Data Science for Business introduces the fundamental principles of data science, and walks you through the "data-analytic thinking" necessary for extracting useful knowledge and business value from the data you collect. This guide also helps you understand the many data-mining techniques in use today.
Based on an MBA course Provost has taught at New York University over the past ten years, Data Science for Business provides examples of real-world business problems to illustrate these principles. Youll not only learn how to improve communication between business stakeholders and data scientists, but also how participate intelligently in your companys data science projects. Youll also discover how to think data-analytically, and fully appreciate how data science methods can support business decision-making.
- Understand how data science fits in your organization—and how you can use it for competitive advantage
- Treat data as a business asset that requires careful investment if youre to gain real value
- Approach business problems data-analytically, using the data-mining process to gather good data in the most appropriate way
- Learn general concepts for actually extracting knowledge from data
- Apply data science principles when interviewing data science job candidates
Data Science for Business, by highly cited authors Foster Provost and Tom Fawcett, is intended for (i) those who need to understand data science/data mining broadly and (ii) those who want to develop their skill at data-analytic thinking. It is not a book about algorithms. Instead it presents a set of fundamental principles for getting business value by extracting useful knowledge from data. These fundamental principles are the foundation for many data mining techniques, but they also are the basis for frameworks for approaching business problems data-analytically, evaluating data science solutions, and evaluating general plans for data analytics.
After reading the book, the reader should be able to:
- Envision data science opportunities
- Discuss data science intelligently with data scientists and with other stakeholders
- Better understand proposals for data science projects and investments
- Participate integrally in data science projects.
This broad, deep, but not-too-technical guide introduces you to the fundamental principles of data science and walks you through the "data-analytic thinking" necessary for extracting useful knowledge and business value from the data you collect. By learning data science principles, you will understand the many data-mining techniques in use today. More importantly, these principles underpin the processes and strategies necessary to solve business problems through data mining techniques.
About the Author
Foster Provost is a Professor and NEC Faculty Fellow at the NYU Stern School of Business, where he has taught data science to MBAs for 15 years. Previously, he worked as a data scientist for what's now Verizon for five years, winning a President's Award for his work there. Professor Provost's research and teaching focus on data science, machine learning, business analytics, (social) network data, and crowd-sourcing for data analytics. He was Editor-in-Chief of the journal Machine Learning from 2004 to 2010 and was Program Chair of the premier data science conference in 2001. Professor Provost has worked with companies large and small on improving their data science capabilities. He has collaborated with AT&T, IBM, and others, and he has founded several data-science based companies focusing on modeling consumer behavior data especially for marketing and advertising applications. His prior work applied and extended data science methods to business applications including fraud detection, counterterrorism, network diagnosis, and more. Professor Provosts work has won (among others) IBM Faculty Awards, the aforementioned President's Award, Best Paper awards at KDD, including the 2012 Best Industry Paper, and the INFORMS Design Science Award.
Tom Fawcett is an active member of the machine learning and data mining communities. He has a Ph.D. in machine learning from UMass-Amherst and has worked in industrial research (GTE Laboratories, NYNEX/Verizon Labs, HP Labs, etc.). In his career he has published numerous conference and journal papers in machine learning. He has just completed a five year term as action editor of the Machine Learning journal, before which he was an editorial board member. In 2003 he co-chaired the program of the premier machine learning conference (ICML) and has organized many workshops and journal special issues. He received a Best Paper Award from KDD, a SCOPUS Award (most cited paper) from Pattern Recognition Letters, and a President's Award from Verizon.
Table of Contents
Praise; Preface; Our Conceptual Approach to Data Science; To the Instructor; Other Skills and Concepts; Sections and Notation; Using Examples; Safari® Books Online; How to Contact Us; Acknowledgments; Chapter 1: Introduction: Data-Analytic Thinking; 1.1 The Ubiquity of Data Opportunities; 1.2 Example: Hurricane Frances; 1.3 Example: Predicting Customer Churn; 1.4 Data Science, Engineering, and Data-Driven Decision Making; 1.5 Data Processing and "Big Data"; 1.6 From Big Data 1.0 to Big Data 2.0; 1.7 Data and Data Science Capability as a Strategic Asset; 1.8 Data-Analytic Thinking; 1.9 This Book; 1.10 Data Mining and Data Science, Revisited; 1.11 Chemistry Is Not About Test Tubes: Data Science Versus the Work of the Data Scientist; 1.12 Summary; Chapter 2: Business Problems and Data Science Solutions; 2.1 From Business Problems to Data Mining Tasks; 2.2 Supervised Versus Unsupervised Methods; 2.3 Data Mining and Its Results; 2.4 The Data Mining Process; 2.5 Implications for Managing the Data Science Team; 2.6 Other Analytics Techniques and Technologies; 2.7 Summary; Chapter 3: Introduction to Predictive Modeling: From Correlation to Supervised Segmentation; 3.1 Models, Induction, and Prediction; 3.2 Supervised Segmentation; 3.3 Visualizing Segmentations; 3.4 Trees as Sets of Rules; 3.5 Probability Estimation; 3.6 Example: Addressing the Churn Problem with Tree Induction; 3.7 Summary; Chapter 4: Fitting a Model to Data; 4.1 Classification via Mathematical Functions; 4.2 Regression via Mathematical Functions; 4.3 Class Probability Estimation and Logistic "Regression"; 4.4 Example: Logistic Regression versus Tree Induction; 4.5 Nonlinear Functions, Support Vector Machines, and Neural Networks; 4.6 Summary; Chapter 5: Overfitting and Its Avoidance; 5.1 Generalization; 5.2 Overfitting; 5.3 Overfitting Examined; 5.4 Example: Overfitting Linear Functions; 5.5 * Example: Why Is Overfitting Bad?; 5.6 From Holdout Evaluation to Cross-Validation; 5.7 The Churn Dataset Revisited; 5.8 Learning Curves; 5.9 Overfitting Avoidance and Complexity Control; 5.10 Summary; Chapter 6: Similarity, Neighbors, and Clusters; 6.1 Similarity and Distance; 6.2 Nearest-Neighbor Reasoning; 6.3 Some Important Technical Details Relating to Similarities and Neighbors; 6.4 Clustering; 6.5 Stepping Back: Solving a Business Problem Versus Data Exploration; 6.6 Summary; Chapter 7: Decision Analytic Thinking I: What Is a Good Model?; 7.1 Evaluating Classifiers; 7.2 Generalizing Beyond Classification; 7.3 A Key Analytical Framework: Expected Value; 7.4 Evaluation, Baseline Performance, and Implications for Investments in Data; 7.5 Summary; Chapter 8: Visualizing Model Performance; 8.1 Ranking Instead of Classifying; 8.2 Profit Curves; 8.3 ROC Graphs and Curves; 8.4 The Area Under the ROC Curve (AUC); 8.5 Cumulative Response and Lift Curves; 8.6 Example: churnperformance analytics for modeling performance analytics, for modeling churn Performance Analytics for Churn Modeling; 8.7 Summary; Chapter 9: Evidence and Probabilities; 9.1 Example: Targeting Online Consumers With Advertisements; 9.2 Combining Evidence Probabilistically; 9.3 Applying Bayes' Rule to Data Science; 9.4 A Model of Evidence "Lift"; 9.5 Example: Evidence Lifts from Facebook "Likes"; 9.6 Summary; Chapter 10: Representing and Mining Text; 10.1 Why Text Is Important; 10.2 Why Text Is Difficult; 10.3 Representation; 10.4 Example: Jazz Musicians; 10.5 * The Relationship of IDF to Entropy; 10.6 Beyond Bag of Words; 10.7 Example: Mining News Stories to Predict Stock Price Movement; 10.8 Summary; Chapter 11: Decision Analytic Thinking II: Toward Analytical Engineering; 11.1 Targeting the Best Prospects for a Charity Mailing; 11.2 Our Churn Example Revisited with Even More Sophistication; Chapter 12: Other Data Science Tasks and Techniques; 12.1 Co-occurrences and Associations: Finding Items That Go Together; 12.2 Profiling: Finding Typical Behavior; 12.3 Link Prediction and Social Recommendation; 12.4 Data Reduction, Latent Information, and Movie Recommendation; 12.5 Bias, Variance, and Ensemble Methods; 12.6 Data-Driven Causal Explanation and a Viral Marketing Example; 12.7 Summary; Chapter 13: Data Science and Business Strategy; 13.1 Thinking Data-Analytically, Redux; 13.2 Achieving Competitive Advantage with Data Science; 13.3 Sustaining Competitive Advantage with Data Science; 13.4 Attracting and Nurturing Data Scientists and Their Teams; 13.5 Examine Data Science Case Studies; 13.6 Be Ready to Accept Creative Ideas from Any Source; 13.7 Be Ready to Evaluate Proposals for Data Science Projects; 13.8 A Firm's Data Science Maturity; Chapter 14: Conclusion; 14.1 The Fundamental Concepts of Data Science; 14.2 What Data Can't Do: Humans in the Loop, Revisited; 14.3 Privacy, Ethics, and Mining Data About Individuals; 14.4 Is There More to Data Science?; 14.5 Final Example: From Crowd-Sourcing to Cloud-Sourcing; 14.6 Final Words; Proposal Review Guide; Business and Data Understanding; Data Preparation; Modeling; Evaluation and Deployment; Another Sample Proposal; Scenario and Proposal; Glossary; Bibliography; Index; Colophon;