window office RSS

sporadic ramblings of a comp sci grad student studying information retrieval
Me @ CMU

Archive

Jul
20th
Mon
permalink

Susan Dumais -- Salton Award Talk

As you might have heard, Sue Dumais was awarded the Salton award this year.  My notes from her talk:

An Interdisciplinary Perspective on IR

- Awards are not won by the individual, but by the team of colleagues we work with

- Tag cloud of collaborator names (I’m in there somewhere)

- Sue’s Salton number is 2 or 3

- Background in mathematics & psychology, studying vision & perceptron & developing quantitate models of those

- After PhD, started in the HCI group at Bell Labs (1979), and has been in industrial research since.  This was the first HCI research group.

From Verbal Disagreement to LSI: Mismatch between how people organize information and want to retrieve information, eg. unix command names “grep” “ls” “tr”

- Tremendous diversity across users to describe objects or actions

- “repeat rate” (Zipf) generally 5-20%; “the long tail”

- need to recognize the fact that there is a long tail in the way people want to refer to an object

- CHI ‘82 paper:  ”How can a computer use what people name things to guess what things people mean when they name things?”

- soon became interested in applying retrieval technologies to this problem, and full text indexing

- “Rich Aliasing” — multiple names for the same object

- “Adaptive Indexing” — associate failed queries to destination objects, basically as new fields to the document objects

- “Latent Semantic Indexing” — model relationships among words, using dimension reduction, esp. useful for short documents

- Rich aliasing & adaptive indexing are still here today: full text index (rich aliases from the author); anchor text/tags (rich aliases from other users); query-click data (adaptive indexing with implicit measures)

Common Themes

- Last 10-20 years has been amazing for IR; search is everywhere

- Lots of progress, but some tasks are still really hard.  How can we improve quality of search systems?

Web search @ age 15:

- pages indexed: Lycos, 7/1994: 54,000 pages indexed, only first few hundred words from ea. document; now, >10^10 pages?

- Many types of content

- how is it accessed?  basically SAME search box over the years; same ranked list (title, summary, url)

Support for searchers

- spelling, q. suggestions, auto-complete, inline answers, rich summaries (deep links),

- but much more can be done by understand context

- great quote from a NYTimes article about Sue getting fired if search still has the same interface in 10 years

Search and Context

- Query context: where do the queries come from?  pay attention to information interactions, past queries

- Document context: documents aren’t independent from eachother

- Task/Use context: we don’t say “I want to search” we want to solve a problem.  We need to understand the problem.

Re-finding on the desktop: “Stuff I’ve Seen”

- People don’t use query operators, but do use UI elements to express more sophisticated queries

- Date is by far the most common document attribute for sorting results, especially for re-finding settings like the desktop

- This paper was between CHI and SIGIR, with interface, user studies, and ranking algorithms.  interesting reviews

Re-finding on the Web:

- see SIGIR 07 Teevan & Jones — HUGE number of repeat queries & page visits

- there’s not much work on algorithms for integrating re-finding & re-visitation into ranking

Personalization

- results are typically independent of recent behavior (see PSearch papers, SIGIR 05, SIGIR 07)

- Works well for some queries, awful for others — when does it wor

blog comments powered by Disqus