20th
Susan Dumais -- Salton Award Talk
As you might have heard, Sue Dumais was awarded the Salton award this year. My notes from her talk:
An Interdisciplinary Perspective on IR
- Awards are not won by the individual, but by the team of colleagues we work with
- Tag cloud of collaborator names (I’m in there somewhere)
- Sue’s Salton number is 2 or 3
- Background in mathematics & psychology, studying vision & perceptron & developing quantitate models of those
- After PhD, started in the HCI group at Bell Labs (1979), and has been in industrial research since. This was the first HCI research group.
From Verbal Disagreement to LSI: Mismatch between how people organize information and want to retrieve information, eg. unix command names “grep” “ls” “tr”
- Tremendous diversity across users to describe objects or actions
- “repeat rate” (Zipf) generally 5-20%; “the long tail”
- need to recognize the fact that there is a long tail in the way people want to refer to an object
- CHI ‘82 paper: ”How can a computer use what people name things to guess what things people mean when they name things?”
- soon became interested in applying retrieval technologies to this problem, and full text indexing
- “Rich Aliasing” — multiple names for the same object
- “Adaptive Indexing” — associate failed queries to destination objects, basically as new fields to the document objects
- “Latent Semantic Indexing” — model relationships among words, using dimension reduction, esp. useful for short documents
- Rich aliasing & adaptive indexing are still here today: full text index (rich aliases from the author); anchor text/tags (rich aliases from other users); query-click data (adaptive indexing with implicit measures)
Common Themes
- Last 10-20 years has been amazing for IR; search is everywhere
- Lots of progress, but some tasks are still really hard. How can we improve quality of search systems?
Web search @ age 15:
- pages indexed: Lycos, 7/1994: 54,000 pages indexed, only first few hundred words from ea. document; now, >10^10 pages?
- Many types of content
- how is it accessed? basically SAME search box over the years; same ranked list (title, summary, url)
Support for searchers
- spelling, q. suggestions, auto-complete, inline answers, rich summaries (deep links),
- but much more can be done by understand context
- great quote from a NYTimes article about Sue getting fired if search still has the same interface in 10 years
Search and Context
- Query context: where do the queries come from? pay attention to information interactions, past queries
- Document context: documents aren’t independent from eachother
- Task/Use context: we don’t say “I want to search” we want to solve a problem. We need to understand the problem.
Re-finding on the desktop: “Stuff I’ve Seen”
- People don’t use query operators, but do use UI elements to express more sophisticated queries
- Date is by far the most common document attribute for sorting results, especially for re-finding settings like the desktop
- This paper was between CHI and SIGIR, with interface, user studies, and ranking algorithms. interesting reviews
Re-finding on the Web:
- see SIGIR 07 Teevan & Jones — HUGE number of repeat queries & page visits
- there’s not much work on algorithms for integrating re-finding & re-visitation into ranking
Personalization
- results are typically independent of recent behavior (see PSearch papers, SIGIR 05, SIGIR 07)
- Works well for some queries, awful for others — when does it wor