Synopses & Reviews
Facebook, Twitter, LinkedIn, Google+, and other social web properties generate a wealth of valuable social data, but how can you tap into this data and discover whos connecting with whom, which insights are lurking just beneath the surface, and what people are talking about? This book shows you how to answer these questions and many more. Each chapter combines popular and useful social web data with analysis techniques and visualization to help you find the needles in the social haystack that you've been looking for—as well as many you probably didn't even know existed.
In this expanded and thoroughly revised second edition youll learn how to:
- Navigate the most popular social web APIs to access, collect, analyze, and visualize social web data
- Employ IPython Notebook and other easy to use Python packages such as the Natural Language Toolkit, NetworkX, and Matplotlib to efficiently sift through social web data as part of an experimentally-driven approach to discovering insights in social web data
- Apply advanced text-mining techniques such as TF-IDF, cosine similarity, collocation analysis, document summarization, and clique detection to human language data that you'll encounter all over the web
- Bootstrap interest graphs by discovering latent affinities between people, programming languages, and coding projects from GitHub data
The book's source code is maintained in a GitHub repository maintained by the author and can be deployed as turn-key virtual machine with each chapter's source code presented in an interactive and easy to use IPython Notebook format. No complex third-party installations or advanced Python knowledge is required to get the most out of this book.
All the code and most recent updates to the code can be found at github:
Russell explains approaches and techniques for data mining Facebook,Twitter, Linkedin, Google+, GitHub, and other social web sites. He uses the Python programming language to show how to find out suchinformation as who knows whom and which people are common to their social network, how frequently particular people are communicatingwith one another, which social network connections generate the most value for a particular niche, the most influential or popular peoplein a network, and what people are interested in based on the human language that they use in a digital world.Annotation ©2014 Ringgold, Inc., Portland, OR (protoview.com)
Facebook, Twitter, LinkedIn, and Google+ generate a tremendous amount of valuable social data, but how can you find out whos connecting with who, what theyre talking about, what friends they have in common, or where theyre located? This book shows you how to answer these questions and more.
Youll learn how to combine social web data, analysis techniques, and visualization to help you find what youve been looking for in the social haystack—as well as useful information you didn't know existed. Each standalone chapter introduces techniques for mining data in different areas of the social Web, including blogs and email.
In this expanded second edition youll learn how to:
- Create interest graphs for people by using GitHubs rich set of APIs to mine social networks
- Explore Wikipedia contributions to build a social network of people who are interested in (or have expertise with) certain topics
- Learn how to employ easy-to-use Python tools to slice and dice the data you collect
- Get a straightforward synopsis of the social web landscape
- Explore social connections in microformats with the XHTML Friends Network
- Apply advanced mining techniques such as TF-IDF, cosine similarity, collocation analysis, document summarization, and clique detection
All you need to get started is a programming background and a willingness to learn basic Python tools.
How can you tap into the wealth of social web data to discover whos making connections with whom, what theyre talking about, and where theyre located? With this expanded and thoroughly revised edition, youll learn how to acquire, analyze, and summarize data from all corners of the social web, including Facebook, Twitter, LinkedIn, Google+, GitHub, email, websites, and blogs.
- Employ the Natural Language Toolkit, NetworkX, and other scientific computing tools to mine popular social web sites
- Apply advanced text-mining techniques, such as clustering and TF-IDF, to extract meaning from human language data
- Bootstrap interest graphs from GitHub by discovering affinities among people, programming languages, and coding projects
- Take advantage of more than two-dozen Twitter recipes, presented in OReillys popular "problem/solution/discussion" cookbook format
The example code for this unique data science book is maintained in a public GitHub repository. Its designed to be easily accessible through a turnkey virtual machine that facilitates interactive learning with an easy-to-use collection of IPython Notebooks.
About the Author
Matthew Russell, Chief Technology Officer at Digital Reasoning, Principal at Zaffra, and author of several books on technology including Mining the Social Web (O'Reilly, 2013), now in its second edition. He is passionate about open source software development, data mining, and creating technology to amplify human intelligence. Matthew studied computer science and jumped out of airplanes at the United States Air Force Academy. When not solving hard problems, he enjoys practicing Bikram Hot Yoga, CrossFitting and participating in triathlons.
Table of Contents
Preface; README.1st; Managing Your Expectations; Python-Centric Technology; Improvements Specific to the Second Edition; Conventions Used in This Book; Using Code Examples; Safari® Books Online; How to Contact Us; Acknowledgments for the Second Edition; Acknowledgments from the First Edition; A Guided Tour of the Social Web; Prelude; Chapter 1: Mining Twitter: Exploring Trending Topics, Discovering What People Are Talking About, and More; 1.1 Overview; 1.2 Why Is Twitter All the Rage?; 1.3 Exploring Twitter's API; 1.4 Analyzing the 140 Characters; 1.5 Closing Remarks; 1.6 Recommended Exercises; 1.7 Online Resources; Chapter 2: Mining Facebook: Analyzing Fan Pages, Examining Friendships, and More; 2.1 Overview; 2.2 Exploring Facebook's Social Graph API; 2.3 Analyzing Social Graph Connections; 2.4 Closing Remarks; 2.5 Recommended Exercises; 2.6 Online Resources; Chapter 3: Mining LinkedIn: Faceting Job Titles, Clustering Colleagues, and More; 3.1 Overview; 3.2 Exploring the LinkedIn API; 3.3 Crash Course on Clustering Data; 3.4 Closing Remarks; 3.5 Recommended Exercises; 3.6 Online Resources; Chapter 4: Mining Google+: Computing Document Similarity, Extracting Collocations, and More; 4.1 Overview; 4.2 Exploring the Google+ API; 4.3 A Whiz-Bang Introduction to TF-IDF; 4.4 Querying Human Language Data with TF-IDF; 4.5 Closing Remarks; 4.6 Recommended Exercises; 4.7 Online Resources; Chapter 5: Mining Web Pages: Using Natural Language Processing to Understand Human Language, Summarize Blog Posts, and More; 5.1 Overview; 5.2 Scraping, Parsing, and Crawling the Web; 5.3 Discovering Semantics by Decoding Syntax; 5.4 Entity-Centric Analysis: A Paradigm Shift; 5.5 Quality of Analytics for Processing Human Language Data; 5.6 Closing Remarks; 5.7 Recommended Exercises; 5.8 Online Resources; Chapter 6: Mining Mailboxes: Analyzing Who's Talking to Whom About What, How Often, and More; 6.1 Overview; 6.2 Obtaining and Processing a Mail Corpus; 6.3 Analyzing the Enron Corpus; 6.4 Discovering and Visualizing Time-Series Trends; 6.5 Analyzing Your Own Mail Data; 6.6 Closing Remarks; 6.7 Recommended Exercises; 6.8 Online Resources; Chapter 7: Mining GitHub: Inspecting Software Collaboration Habits, Building Interest Graphs, and More; 7.1 Overview; 7.2 Exploring GitHub's API; 7.3 Modeling Data with Property Graphs; 7.4 Analyzing GitHub Interest Graphs; 7.5 Closing Remarks; 7.6 Recommended Exercises; 7.7 Online Resources; Chapter 8: Mining the Semantically Marked-Up Web: Extracting Microformats, Inferencing over RDF, and More; 8.1 Overview; 8.2 Microformats: Easy-to-Implement Metadata; 8.3 From Semantic Markup to Semantic Web: A Brief Interlude; 8.4 The Semantic Web: An Evolutionary Revolution; 8.5 Closing Remarks; 8.6 Recommended Exercises; 8.7 Online Resources; Twitter Cookbook; Chapter 9: Twitter Cookbook; 9.1 Accessing Twitter's API for Development Purposes; 9.2 Doing the OAuth Dance to Access Twitter's API for Production Purposes; 9.3 Discovering the Trending Topics; 9.4 Searching for Tweets; 9.5 Constructing Convenient Function Calls; 9.6 Saving and Restoring JSON Data with Text Files; 9.7 Saving and Accessing JSON Data with MongoDB; 9.8 Sampling the Twitter Firehose with the Streaming API; 9.9 Collecting Time-Series Data; 9.10 Extracting Tweet Entities; 9.11 Finding the Most Popular Tweets in a Collection of Tweets; 9.12 Finding the Most Popular Tweet Entities in a Collection of Tweets; 9.13 Tabulating Frequency Analysis; 9.14 Finding Users Who Have Retweeted a Status; 9.15 Extracting a Retweet's Attribution; 9.16 Making Robust Twitter Requests; 9.17 Resolving User Profile Information; 9.18 Extracting Tweet Entities from Arbitrary Text; 9.19 Getting All Friends or Followers for a User; 9.20 Analyzing a User's Friends and Followers; 9.21 Harvesting a User's Tweets; 9.22 Crawling a Friendship Graph; 9.23 Analyzing Tweet Content; 9.24 Summarizing Link Targets; 9.25 Analyzing a User's Favorite Tweets; 9.26 Closing Remarks; 9.27 Recommended Exercises; 9.28 Online Resources; Appendixes; Information About This Book's Virtual Machine Experience; OAuth Primer; Overview; Python and IPython Notebook Tips and Tricks; Colophon;