window office RSS

sporadic ramblings of a comp sci grad student studying information retrieval
Me @ CMU

Archive

Oct
1st
Thu
permalink
Sep
28th
Mon
permalink
Sep
3rd
Thu
permalink
computer science is mathematical engineering

Computer science is not real science «  IREvalEtAl

Interesting post from Will Webber.  Not sure I agree completely — many of the things that computer scientists do now involves studying user behavior, or other observations of the “natural world” and describing, modeling, learning from them in order to improve system performance in some way.

Jul
20th
Mon
permalink

The network connection here at SIGIR is so bad, it might be hard to upload longer posts.

permalink
Sue talking about the work we did together while I was at MSR

Sue talking about the work we did together while I was at MSR

permalink

Susan Dumais -- Salton Award Talk

As you might have heard, Sue Dumais was awarded the Salton award this year.  My notes from her talk:

An Interdisciplinary Perspective on IR

- Awards are not won by the individual, but by the team of colleagues we work with

- Tag cloud of collaborator names (I’m in there somewhere)

- Sue’s Salton number is 2 or 3

- Background in mathematics & psychology, studying vision & perceptron & developing quantitate models of those

- After PhD, started in the HCI group at Bell Labs (1979), and has been in industrial research since.  This was the first HCI research group.

From Verbal Disagreement to LSI: Mismatch between how people organize information and want to retrieve information, eg. unix command names “grep” “ls” “tr”

- Tremendous diversity across users to describe objects or actions

- “repeat rate” (Zipf) generally 5-20%; “the long tail”

- need to recognize the fact that there is a long tail in the way people want to refer to an object

- CHI ‘82 paper:  ”How can a computer use what people name things to guess what things people mean when they name things?”

- soon became interested in applying retrieval technologies to this problem, and full text indexing

- “Rich Aliasing” — multiple names for the same object

- “Adaptive Indexing” — associate failed queries to destination objects, basically as new fields to the document objects

- “Latent Semantic Indexing” — model relationships among words, using dimension reduction, esp. useful for short documents

- Rich aliasing & adaptive indexing are still here today: full text index (rich aliases from the author); anchor text/tags (rich aliases from other users); query-click data (adaptive indexing with implicit measures)

Common Themes

- Last 10-20 years has been amazing for IR; search is everywhere

- Lots of progress, but some tasks are still really hard.  How can we improve quality of search systems?

Web search @ age 15:

- pages indexed: Lycos, 7/1994: 54,000 pages indexed, only first few hundred words from ea. document; now, >10^10 pages?

- Many types of content

- how is it accessed?  basically SAME search box over the years; same ranked list (title, summary, url)

Support for searchers

- spelling, q. suggestions, auto-complete, inline answers, rich summaries (deep links),

- but much more can be done by understand context

- great quote from a NYTimes article about Sue getting fired if search still has the same interface in 10 years

Search and Context

- Query context: where do the queries come from?  pay attention to information interactions, past queries

- Document context: documents aren’t independent from eachother

- Task/Use context: we don’t say “I want to search” we want to solve a problem.  We need to understand the problem.

Re-finding on the desktop: “Stuff I’ve Seen”

- People don’t use query operators, but do use UI elements to express more sophisticated queries

- Date is by far the most common document attribute for sorting results, especially for re-finding settings like the desktop

- This paper was between CHI and SIGIR, with interface, user studies, and ranking algorithms.  interesting reviews

Re-finding on the Web:

- see SIGIR 07 Teevan & Jones — HUGE number of repeat queries & page visits

- there’s not much work on algorithms for integrating re-finding & re-visitation into ranking

Personalization

- results are typically independent of recent behavior (see PSearch papers, SIGIR 05, SIGIR 07)

- Works well for some queries, awful for others — when does it wor

permalink
Sue’s award talk

Sue’s award talk

permalink
Susan Dumais receiving the Salton award.

Susan Dumais receiving the Salton award.

Jul
15th
Wed
permalink
Jul
10th
Fri
permalink