window office

(this space left intentionally blank) Lijit Search

Learning to rank at SIGIR 2008

I got back from Singapore a few days ago, and have been working on a couple posts.  Here’s the first, on the learning to rank sessions & workshop

In LETOR 1 session:

Directly Optimizing Evaluation Measures in Learning to Rank by Xu et. al.

Seems like a reasonable framework for studying what is becoming the “holy grail” in LETOR for the web.  Their framework seems to be a useful way to think about loss functions & bounds in while learning to rank.  Unfortunately from the talk, it wasn’t clear if they had a novel approach for  learning — they focused on a different loss function that seemed to be just a re-scaling of the RankSVM-style hinge-loss.  But, from a quick glance at the paper, it appears there’s more to the algorithm than that.  

Query Dependent Ranking Using K-Nearest Neighbor by Geng et. al.

A clever idea, and in my opinion a good step towards understanding query-level adaptability in LETOR.  The authors use KNN in the “query space” to find related queries to a given test query.  They then use a ranking model trained from those neighbor queries to rank documents for the given query.

Of course, this method is extremely sensitive to the feature space (and distance measure).   The authors used some features that had been previously applied to query classification and don’t really go beyond that.  I would certainly be nice to see some deeper analysis of the effect of different features on this task — seems like this is a ripe area for research.  

Their method is simple, and nice for this reason.  But, performance gains are smaller than I would’ve hoped — the third decimal point in NDCG.  This level of flexibility in the ranking model has a huge potential, and this seems like really just a first step.  Other areas that I’d like to see explored are ways to combine previously learned models (eg. smoothing a specific model with a general one), and unsupervised approaches to adjusting feature weights.  I’m also curious to what degree the more complex nonlinear ranking models (RankNet, LambdaRank) are doing this implicitly.  With query features in their input, these models could be adapting per query & doing it more intelligently than this simple 2-stage (hard) KNN approach.

(One pet peeve in both of these papers — there’s loads of bar charts showing performance, but very few real numbers.  Please show the data!  Charts are OK to show general trends, but a full page of charts without the data behind them is somewhat annoying.)

I unfortunately missed the LETOR-2 session, and looks like there were some great papers there.   Two of the papers (Duh & Kirchhoff  and Veloso et. al.) address exactly my comments on query time adaptability.  The Duh paper looks particularly interesteing, and I haven’t had time to give it the attention it needs.

LR4IR workshop keynote from Stephen Robertson

Stephen Robertson, predictably, was terrific at the learning to rank workshop.  His keynote focused on two “misconceptions” in current rank learning research.  The first is that directly optimizing some target metric (like NDCG or MAP) on the training set is not always what you want to do.  He implied that it may be better to somehow model the underlying “ranking distribution” from the training set and then make predictions that maximize your target metric on your model rather than on the training set.  Very similar reasoning to the generative vs. discriminative learning debate.  He gave some compelling (toy) examples of situations where predicting the statistic that minimizes the loss function on training data doesn’t yield the most accurate predictions on unseen data when you know the form of the underlying data distribution.

The second misconception is that many metrics like NDCG@k are not informative enough to train your model parameters against.  A similar argument has been echoed in Taylor et. al. at WSDM2008, their point being that you’re throwing away too much information by strictly optimizing on the top ranks when learning.  Recent work by Yilmaz and Aslam also sheds some light in this direction.

Although somewhat vague, both of these points echo some suspicions I’ve had when working in this area.  Again and again, it seems that the selecting the model that maximizes a target metric on the training set doesn’t seem to generalize to the test set.  I’ve blamed this on the small-ish datasets I’ve been using, but I do believe there’s more to it.

YouTube talk from Anthony Tomasic from CMU.  He gets into some ongoing work with the RADAR project at the LTI, particularly the interaction between the interface design and machine learning back-end.  Intelligent Assistance for Desktop User Tasks (via googletechtalks)

Slides from SIGIR

I’ve posted my slides from SIGIR on slideshare.  Enjoy!

Retrieval and Feedback Models for Blog Feed Search view presentation (tags: sigir2008)

Satay, my first meal in Singapore.  
(Apologies for the poor framing… our camera broke in Yellowstone & the screen is completely blank.  I’m shooting blind.)
I arrived a few hours ago, had a short nap, and then headed to the Lau Pa Sat food center.  Its awesome, and unfortunately all my other pictures are too blurry to post.

Satay, my first meal in Singapore.  

(Apologies for the poor framing… our camera broke in Yellowstone & the screen is completely blank.  I’m shooting blind.)

I arrived a few hours ago, had a short nap, and then headed to the Lau Pa Sat food center.  Its awesome, and unfortunately all my other pictures are too blurry to post.

Road Trip

I’ll be heading across the country in a couple days, with a stop in Northfield, MN for my college reunion, and then off to Seattle/Redmond to start an internship at MSR.  I’ll be working with Susan Dumais in the Adaptive Systems & Interaction Group until the end of September.

I’m thrilled to be heading to Seattle for the rest of the summer and working with such a fantastic group.  

Figuring out how to represent the problem is 95% of the work. By the time you have the representation right, the tool that you use to finish the remaining 5% is not terribly important.

Ockham’s Razor is Dull « Apperceptual

Another terrific post at Apperceptual.