Synopses & Reviews
Why doesn't your home page appear on the first page of search results, even when you query your own name? How do other web pages always appear at the top? What creates these powerful rankings? And how? The first book ever about the science of web page rankings, Google's PageRank and Beyond supplies the answers to these and other questions and more.
The book serves two very different audiences: the curious science reader and the technical computational reader. The chapters build in mathematical sophistication, so that the first five are accessible to the general academic reader. While other chapters are much more mathematical in nature, each one contains something for both audiences. For example, the authors include entertaining asides such as how search engines make money and how the Great Firewall of China influences research.
The book includes an extensive background chapter designed to help readers learn more about the mathematics of search engines, and it contains several MATLAB codes and links to sample web data sets. The philosophy throughout is to encourage readers to experiment with the ideas and algorithms in the text.
Any business seriously interested in improving its rankings in the major search engines can benefit from the clear examples, sample code, and list of resources provided.
- Many illustrative examples and entertaining asides
- MATLAB code
- Accessible and informal style
- Complete and self-contained section for mathematics review
Review
"This is a worthwhile book. It offers a comprehensive and erudite presentation of PageRank and related search-engine algorithms, and it is written in an approachable way, given the mathematical foundations involved."--Jonathan Bowen, Times Higher Education Supplement
Review
Langville and Meyer present the mathematics in all its detail. . . . But they vary the math with discussions of the many issues involved in building search engines, the 'wars' between search engine developers and those trying to artificially inflate the position of their pages, and the future of search-engine development. . . . Google's PageRank and Beyond makes good reading for anyone, student or professional, who wants to understand the details of search engines. Ed Gerstner - Nature Physics
Review
This book should be at the top of anyone's list as a must-read for those interested in how search engines work and, more specifically how Google is to meet the needs of so many people in so many ways. Jonathan Bowen - Times Higher Education Supplement
Review
If I were taking, or teaching, a course in linear algebra today, this book would be a godsend. -- Ed Gerstner, Nature Physics
Review
Amy N. Langville and Carl D. Meyer examine the logic, mathematics, and sophistication behind Google's PageRank and other Internet search engine ranking programs. . . . It is an excellent work. Michael W. Berry - SIAM Review
Review
"[F]or anyone who wants to delve deeply into just how Google's PageRank works, I recommend Google's PageRank and Beyond."--Stephen H. Wildstrom, BusinessWeek
Review
This book is written for people who are curious about new science and technology as well as for those with more advanced background in matrix theory.... Much of the book can be easily followed by general readers, while understanding the remaining part requires only a good first course in linear algebra. It can be a reference book for people who want to know more about the ideas behind the currently popular search engines, and it provides an introductory text for beginning researchers in the area of information retrieval. James Hendler - Physics Today
Review
The book is very attractively and clearly written. The authors succeed to manage in an optimal way the presentation of both basic and more sophisticated concepts involved in the analysis of Google's PageRank, such that the book serves both audiences: the general and the technical scientific public. Jiu Ding - Mathemathical Reviews
Review
The book under review is excellently written, with a fresh and engaging style. The reader will particularly enjoy the 'Asides' interspersed throughout the text. They contain all kind of entertaining stories, practical tips, and amusing quotes. . . . The book also contains some useful resources for computation. Constantin Popa - Zentralblatt MATH
Review
Google's PageRank and Beyond describes the link analysis tool called PageRank, puts it in the context of web search engines and information retrieval, and describes competing methods for ranking webpages. It is an utterly engaging book. Pablo Fernández - Mathematical Intelligencer
Review
If I were taking, or teaching, a course in linear algebra today, this book would be a godsend. Ian D. Gordon - Library Journal
Review
[F]or anyone who wants to delve deeply into just how Google's PageRank works, I recommend Google's PageRank and Beyond. Stephen H. Wildstrom
Review
This is a worthwhile book. It offers a comprehensive and erudite presentation of PageRank and related search-engine algorithms, and it is written in an approachable way, given the mathematical foundations involved. BusinessWeek
Review
This book should be at the top of anyone's list as a must-read for those interested in how search engines work and, more specifically how Google is to meet the needs of so many people in so many ways. Jonathan Bowen - Times Higher Education Supplement
Review
Amy N. Langville and Carl D. Meyer examine the logic, mathematics, and sophistication behind Google's PageRank and other Internet search engine ranking programs. . . . It is an excellent work. Michael W. Berry - SIAM Review
Review
Honorable Mention for the 2006 Award for Best Professional/Scholarly Book in Computer and Information Science, Association of American Publishers
Synopsis
Why doesn't your home page appear on the first page of search results, even when you query your own name? How do other web pages always appear at the top? What creates these powerful rankings? And how? The first book ever about the science of web page rankings,
Google's PageRank and Beyond supplies the answers to these and other questions and more.
The book serves two very different audiences: the curious science reader and the technical computational reader. The chapters build in mathematical sophistication, so that the first five are accessible to the general academic reader. While other chapters are much more mathematical in nature, each one contains something for both audiences. For example, the authors include entertaining asides such as how search engines make money and how the Great Firewall of China influences research.
The book includes an extensive background chapter designed to help readers learn more about the mathematics of search engines, and it contains several MATLAB codes and links to sample web data sets. The philosophy throughout is to encourage readers to experiment with the ideas and algorithms in the text.
Any business seriously interested in improving its rankings in the major search engines can benefit from the clear examples, sample code, and list of resources provided.
- Many illustrative examples and entertaining asides
- MATLAB code
- Accessible and informal style
- Complete and self-contained section for mathematics review
Synopsis
"Comprehensive and engagingly written. This book should become an important resource for many audiences: applied mathematicians, search industry professionals, and anyone who wants to learn more about how search engines work."
--Jon Kleinberg, Cornell University"I don't think there are any competitive books in print with the same depth and breadth on the topic of search engine ranking. The content is well-organized and well-written."--Michael Berry, University of Tennessee
Synopsis
"Comprehensive and engagingly written. This book should become an important resource for many audiences: applied mathematicians, search industry professionals, and anyone who wants to learn more about how search engines work."--Jon Kleinberg, Cornell University
"I don't think there are any competitive books in print with the same depth and breadth on the topic of search engine ranking. The content is well-organized and well-written."--Michael Berry, University of Tennessee
Synopsis
Why doesn't your home page appear on the first page of search results, even when you query your own name? How do other web pages always appear at the top? What creates these powerful rankings? And how? The first book ever about the science of web page rankings,
Google's PageRank and Beyond supplies the answers to these and other questions and more.
The book serves two very different audiences: the curious science reader and the technical computational reader. The chapters build in mathematical sophistication, so that the first five are accessible to the general academic reader. While other chapters are much more mathematical in nature, each one contains something for both audiences. For example, the authors include entertaining asides such as how search engines make money and how the Great Firewall of China influences research.
The book includes an extensive background chapter designed to help readers learn more about the mathematics of search engines, and it contains several MATLAB codes and links to sample web data sets. The philosophy throughout is to encourage readers to experiment with the ideas and algorithms in the text.
Any business seriously interested in improving its rankings in the major search engines can benefit from the clear examples, sample code, and list of resources provided.
- Many illustrative examples and entertaining asides
- MATLAB code
- Accessible and informal style
- Complete and self-contained section for mathematics review
Synopsis
"Comprehensive and engagingly written. This book should become an important resource for many audiences: applied mathematicians, search industry professionals, and anyone who wants to learn more about how search engines work."--Jon Kleinberg, Cornell University
"I don't think there are any competitive books in print with the same depth and breadth on the topic of search engine ranking. The content is well-organized and well-written."--Michael Berry, University of Tennessee
About the Author
Amy N. Langville is Assistant Professor of Mathematics at the College of Charleston in Charleston, South Carolina. She studies mathematical algorithms for information retrieval and text and data mining applications. Carl D. Meyer is Professor of Mathematics at North Carolina State University. In addition to information retrieval, his research areas include numerical analysis, linear algebra, and Markov chains. He is the author of Matrix Analysis and Applied Linear Algebra.
Table of Contents
Preface ix
Chapter 1: Introduction to Web Search Engines 1
1.1 A Short History of Information Retrieval 1
1.2 An Overview of Traditional Information Retrieval 5
1.3 Web Information Retrieval 9
Chapter 2: Crawling, Indexing, and Query Processing 15
2.1 Crawling 15
2.2 The Content Index 19
2.3 Query Processing 21
Chapter 3: Ranking Webpages by Popularity 25
3.1 The Scene in 1998 25
3.2 Two Theses 26
3.3 Query-Independence 30
Chapter 4: The Mathematics of Google's PageRank 31
4.1 The Original Summation Formula for PageRank 32
4.2 Matrix Representation of the Summation Equations 33
4.3 Problems with the Iterative Process 34
4.4 A Little Markov Chain Theory 36
4.5 Early Adjustments to the Basic Model 36
4.6 Computation of the PageRank Vector 39
4.7 Theorem and Proof for Spectrum of the Google Matrix 45
Chapter 5: Parameters in the PageRank Model 47
5.1 The ? Factor 47
5.2 The Hyperlink Matrix H 48
5.3 The Teleportation Matrix E 49
Chapter 6: The Sensitivity of PageRank 57
6.1 Sensitivity with respect to ? 57
6.2 Sensitivity with respect to H 62
6.3 Sensitivity with respect to v^{T} 63
6.4 Other Analyses of Sensitivity 63
6.5 Sensitivity Theorems and Proofs 66
Chapter 7: The PageRank Problem as a Linear System 71
7.1 Properties of (I -- ?S) 71
7.2 Properties of (I -- ?H) 72
7.3 Proof of the PageRank Sparse Linear System 73
Chapter 8: Issues in Large-Scale Implementation of PageRank 75
8.1 Storage Issues 75
8.2 Convergence Criterion 79
8.3 Accuracy 79
8.4 Dangling Nodes 80
8.5 Back Button Modeling 84
Chapter 9: Accelerating the Computation of PageRank 89
9.1 An Adaptive Power Method 89
9.2 Extrapolation 90
9.3 Aggregation 94
9.4 Other Numerical Methods 97
Chapter 10: Updating the PageRank Vector 99
10.1 The Two Updating Problems and their History 100
10.2 Restarting the Power Method 101
10.3 Approximate Updating Using Approximate Aggregation 102
10.4 Exact Aggregation 104
10.5 Exact vs. Approximate Aggregation 105
10.6 Updating with Iterative Aggregation 107
10.7 Determining the Partition 109
10.8 Conclusions 111
Chapter 11: The HITS Method for Ranking Webpages 115
11.1 The HITS Algorithm 115
11.2 HITS Implementation 117
11.3 HITS Convergence 119
11.4 HITS Example 120
11.5 Strengths and Weaknesses of HITS 122
11.6 HITS's Relationship to Bibliometrics 123
11.7 Query-Independent HITS 124
11.8 Accelerating HITS 126
11.9 HITS Sensitivity 126
Chapter 12: Other Link Methods for Ranking Webpages 131
12.1 SALSA 131
12.2 Hybrid Ranking Methods 135
12.3 Rankings based on Traffic Flow 136
Chapter 13: The Future of Web Information Retrieval 139
13.1 Spam 139
13.2 Personalization 142
13.3 Clustering 142
13.4 Intelligent Agents 143
13.5 Trends and Time-Sensitive Search 144
13.6 Privacy and Censorship 146
13.7 Library Classification Schemes 147
13.8 Data Fusion 148
Chapter 14: Resources for Web Information Retrieval 149
14.1 Resources for Getting Started 149
14.2 Resources for Serious Study 150
Chapter 15: The Mathematics Guide 153
15.1 Linear Algebra 153
15.2 Perron-Frobenius Theory 167
15.3 Markov Chains 175
15.4 Perron Complementation 186
15.5 Stochastic Complementation 192
15.6 Censoring 194
15.7 Aggregation 195
15.8 Disaggregation 198
Chapter 16: Glossary 201
Bibliography 207
Index 219