Synopses & Reviews
If youre an experienced programmer interested in crunching data, this book will get you started with machine learning—a toolkit of algorithms that enables computers to train themselves to automate useful tasks. Authors Drew Conway and John Myles White help you understand machine learning and statistics tools through a series of hands-on case studies, instead of a traditional math-heavy presentation.
Each chapter focuses on a specific problem in machine learning, such as classification, prediction, optimization, and recommendation. Using the R programming language, youll learn how to analyze sample datasets and write simple machine learning algorithms. Machine Learning for Hackers is ideal for programmers from any background, including business, government, and academic research.
- Develop a naïve Bayesian classifier to determine if an email is spam, based only on its text
- Use linear regression to predict the number of page views for the top 1,000 websites
- Learn optimization techniques by attempting to break a simple letter cipher
- Compare and contrast U.S. Senators statistically, based on their voting records
- Build a “whom to follow” recommendation system from Twitter data
Synopsis
Now that storage and collection technologies are cheaper and more precise, methods for extracting relevant information from large datasets is within the reach any experienced programmer willing to crunch data. With this book, you'll learn machine learning and statistics tools in a practical fashion, using black-box solutions and case studies instead of a traditional math-heavy presentation.
By exploring each problem in this book in depth—including both viable and hopeless approaches—youll learn to recognize when your situation closely matches traditional problems. Then you'll discover how to apply classical statistics tools to your problem. Machine Learning for Hackers is ideal for programmers from private, public, and academic sectors.
About the Author
Drew Conway is a PhD candidate in Politics at NYU. He studies international relations, conflict, and terrorism using the tools of mathematics, statistics, and computer science in an attempt to gain a deeper understanding of these phenomena. His academic curiosity is informed by his years as an analyst in the U.S. intelligence and defense communities.
John Myles White is a PhD candidate in Psychology at Princeton. He studies pattern recognition, decision-making, and economic behavior using behavioral methods and fMRI. He is particularly interested in anomalies of value assessment.
Table of Contents
Preface; Machine Learning for Hackers; How This Book Is Organized; Conventions Used in This Book; Using Code Examples; Safari® Books Online; How to Contact Us; Acknowledgements; Chapter 1: Using R; 1.1 R for Machine Learning; Chapter 2: Data Exploration; 2.1 Exploration versus Confirmation; 2.2 What Is Data?; 2.3 Inferring the Types of Columns in Your Data; 2.4 Inferring Meaning; 2.5 Numeric Summaries; 2.6 Means, Medians, and Modes; 2.7 Quantiles; 2.8 Standard Deviations and Variances; 2.9 Exploratory Data Visualization; 2.10 Visualizing the Relationships Between Columns; Chapter 3: Classification: Spam Filtering; 3.1 This or That: Binary Classification; 3.2 Moving Gently into Conditional Probability; 3.3 Writing Our First Bayesian Spam Classifier; Chapter 4: Ranking: Priority Inbox; 4.1 How Do You Sort Something When You Don't Know the Order?; 4.2 Ordering Email Messages by Priority; 4.3 Writing a Priority Inbox; Chapter 5: Regression: Predicting Page Views; 5.1 Introducing Regression; 5.2 Predicting Web Traffic; 5.3 Defining Correlation; Chapter 6: Regularization: Text Regression; 6.1 Nonlinear Relationships Between Columns: Beyond Straight Lines; 6.2 Methods for Preventing Overfitting; 6.3 Text Regression; Chapter 7: Optimization: Breaking Codes; 7.1 Introduction to Optimization; 7.2 Ridge Regression; 7.3 Code Breaking as Optimization; Chapter 8: PCA: Building a Market Index; 8.1 Unsupervised Learning; Chapter 9: MDS: Visually Exploring US Senator Similarity; 9.1 Clustering Based on Similarity; 9.2 How Do US Senators Cluster?; Chapter 10: kNN: Recommendation Systems; 10.1 The k-Nearest Neighbors Algorithm; 10.2 R Package Installation Data; Chapter 11: Analyzing Social Graphs; 11.1 Social Network Analysis; 11.2 Hacking Twitter Social Graph Data; 11.3 Analyzing Twitter Networks; Chapter 12: Model Comparison; 12.1 SVMs: The Support Vector Machine; 12.2 Comparing Algorithms; Works Citedbooks and publicationsbibliography ofresourcesbooks and publications; website resourcesstatisticsresources formachine learningresources forR programming languageresources for; Colophon;