# On the Efficient Determination of Most Near Neighbors

## Horseshoes, Hand Grenades, Web Search and Other Situations When Close Is Close Enough

The time-worn aphorism "close only counts in horseshoes and hand grenades" is clearly inadequate. Close also counts in golf, shuffleboard, archery, darts, curling, and other games of accuracy in which hitting the precise center of the target isn't to be expected every time, or in which we can expect to be driven from the target by skilled opponents. Les mer
Paperback
Vår pris: 342,-

(Paperback) Fri frakt!
Leveringstid: Sendes innen 21 dager

The time-worn aphorism "close only counts in horseshoes and hand grenades" is clearly inadequate. Close also counts in golf, shuffleboard, archery, darts, curling, and other games of accuracy in which hitting the precise center of the target isn't to be expected every time, or in which we can expect to be driven from the target by skilled opponents.

This book is not devoted to sports discussions, but to efficient algorithms for determining pairs of closely related web pages-and a few other situations in which we have found that inexact matching is good enough - where proximity suffices. We will not, however, attempt to be comprehensive in the investigation of probabilistic algorithms, approximation algorithms, or even techniques for organizing the discovery of nearest neighbors. We are more concerned with finding nearby neighbors; if they are not particularly close by, we are not particularly interested.

In thinking of when approximation is sufficient, remember the oft-told joke about two campers sitting around after dinner. They hear noises coming towards them. One of them reaches for a pair of running shoes, and starts to don them. The second then notes that even with running shoes, they cannot hope to outrun a bear, to which the first notes that most likely the bear will be satiated after catching the slower of them. We seek problems in which we don't need to be faster than the bear, just faster than the others fleeing the bear.

Forward
Foreword to the First Edition
Acknowledgments
Introduction
Comparing Web Pages for Similarity: An Overview
A Personal History of Web Search
Uniform Sampling after Alta Vista
Why Weight (and How)?
A Few Applications
Forks in the Road: Flajolet and Slightly Biased Sampling
Author's Biography

Mark Manasse was a Principal Researcher at Microsoft Research, which he joined in 2001, while writing the first edition of this book, and where he performed the research presented in the additional chapters that comprise the additional work presented in this second edition.

From 1985 until he joined Microsoft, Mark was a researcher at Compaq's Systems Research Center in Palo Alto, California (previously Digital Equipment Corporation, subsequently Hewlett-Packard and now extinct).

Mark worked at Microsoft until late 2014. He is now a Principal Architect (working on infrastructure security) at Salesforce(R), which he thanks for their support while writing the final chapter of this second edition.

Mark Manasse works in a variety of theory-related areas of distributed computer systems research. He was the inventor of MilliCent; as such, Wired Magazine dubbed him the guru of micropayments, and he was co-chair of the microcommerce working group for the World Wide Web Consortium. Mark has worked on Web search technologies; with Andrei Broder, Steve Glassman, and Geoff Zweig, his work on syntactic similarity was awarded best paper at the Sixth International World Wide Web Conference. Mark was a member the design committee for the Inter-Client Communications Manual for the X Window System. Mark's work on on line algorithms helped to establish this field, and remain among his most often cited papers. Mark organized, ran, and developed much of the code for some of the earliest uses of the Internet in distributed computations when he and Arjen Lenstra factored many large integers, the most noteworthy being the first factorization of a hard 100-digit number, and the factorization of the ninth Fermat number; for several years thereafter, Mark's license plate read IDIDF9, leaving most other drivers puzzled.

Mark holds U.S. patents in three of the previously mentioned areas. His doctorate was earned at the University of Wisconsin in Mathematical Logic in 1982, and he spent the following three years at Bell Labs and the University of Chicago.

Mark's projects after joining Microsof