window office

Month

August 2008

4 posts

SCOPE & massive data at MSFT → glinden.blogspot.com

A paper on the data processing platform used internally at MS has recently been made public, and will appear at VLDB 2008.

Jul 31, 2008

July 2008

6 posts

Microsoft shows off search product... → seattletimes.nwsource.com

Mention in this article of some of the interesting things relating to personalized search going on in my group here at MSR. 

Jul 30, 2008
Learning to rank at SIGIR 2008

I got back from Singapore a few days ago, and have been working on a couple posts.  Here’s the first, on the learning to rank sessions & workshop

In LETOR 1 session:

Directly Optimizing Evaluation Measures in Learning to Rank by Xu et. al.

Seems like a reasonable framework for studying what is becoming the “holy grail” in LETOR for the web.  Their framework seems to be a useful way to think about loss functions & bounds in while learning to rank.  Unfortunately from the talk, it wasn’t clear if they had a novel approach for  learning — they focused on a different loss function that seemed to be just a re-scaling of the RankSVM-style hinge-loss.  But, from a quick glance at the paper, it appears there’s more to the algorithm than that.  

Query Dependent Ranking Using K-Nearest Neighbor by Geng et. al.

A clever idea, and in my opinion a good step towards understanding query-level adaptability in LETOR.  The authors use KNN in the “query space” to find related queries to a given test query.  They then use a ranking model trained from those neighbor queries to rank documents for the given query.

Of course, this method is extremely sensitive to the feature space (and distance measure).   The authors used some features that had been previously applied to query classification and don’t really go beyond that.  I would certainly be nice to see some deeper analysis of the effect of different features on this task — seems like this is a ripe area for research.  

Their method is simple, and nice for this reason.  But, performance gains are smaller than I would’ve hoped — the third decimal point in NDCG.  This level of flexibility in the ranking model has a huge potential, and this seems like really just a first step.  Other areas that I’d like to see explored are ways to combine previously learned models (eg. smoothing a specific model with a general one), and unsupervised approaches to adjusting feature weights.  I’m also curious to what degree the more complex nonlinear ranking models (RankNet, LambdaRank) are doing this implicitly.  With query features in their input, these models could be adapting per query & doing it more intelligently than this simple 2-stage (hard) KNN approach.

(One pet peeve in both of these papers — there’s loads of bar charts showing performance, but very few real numbers.  Please show the data!  Charts are OK to show general trends, but a full page of charts without the data behind them is somewhat annoying.)

…

I unfortunately missed the LETOR-2 session, and looks like there were some great papers there.   Two of the papers (Duh & Kirchhoff  and Veloso et. al.) address exactly my comments on query time adaptability.  The Duh paper looks particularly interesteing, and I haven’t had time to give it the attention it needs.

…

LR4IR workshop keynote from Stephen Robertson

Stephen Robertson, predictably, was terrific at the learning to rank workshop.  His keynote focused on two “misconceptions” in current rank learning research.  The first is that directly optimizing some target metric (like NDCG or MAP) on the training set is not always what you want to do.  He implied that it may be better to somehow model the underlying “ranking distribution” from the training set and then make predictions that maximize your target metric on your model rather than on the training set.  Very similar reasoning to the generative vs. discriminative learning debate.  He gave some compelling (toy) examples of situations where predicting the statistic that minimizes the loss function on training data doesn’t yield the most accurate predictions on unseen data when you know the form of the underlying data distribution.

The second misconception is that many metrics like NDCG@k are not informative enough to train your model parameters against.  A similar argument has been echoed in Taylor et. al. at WSDM2008, their point being that you’re throwing away too much information by strictly optimizing on the top ranks when learning.  Recent work by Yilmaz and Aslam also sheds some light in this direction.

Although somewhat vague, both of these points echo some suspicions I’ve had when working in this area.  Again and again, it seems that the selecting the model that maximizes a target metric on the training set doesn’t seem to generalize to the test set.  I’ve blamed this on the small-ish datasets I’ve been using, but I do believe there’s more to it.

Jul 29, 2008
Play
Jul 27, 2008
Slides from SIGIR

I’ve posted my slides from SIGIR on slideshare.  Enjoy!

Retrieval and Feedback Models for Blog Feed Search

view presentation (tags: sigir2008)

Jul 23, 2008
Jul 18, 2008
Next page →
2010 2011
  • January
  • February
  • March
  • April
  • May
  • June
  • July
  • August
  • September
  • October 1
  • November
  • December
2009 2010 2011
  • January 1
  • February 4
  • March 6
  • April 8
  • May 6
  • June 3
  • July 4
  • August 3
  • September
  • October
  • November
  • December
2008 2009 2010
  • January 12
  • February 20
  • March 16
  • April 19
  • May 5
  • June 1
  • July 7
  • August
  • September 2
  • October 9
  • November 4
  • December 3
2007 2008 2009
  • January 8
  • February 10
  • March 16
  • April 31
  • May 18
  • June 8
  • July 6
  • August 4
  • September 5
  • October 13
  • November 8
  • December 5
2007 2008
  • January
  • February
  • March
  • April
  • May
  • June
  • July
  • August
  • September
  • October
  • November 13
  • December 9