Its probably no surprise to anyone who’s paid attention to the work on statAP sampling, but it is somewhat disconcerting that AP estimates produced in this way can be greater than 1.0.
For example, consider a document which is sampled with probability 0.1 and is found to be relevant. A system ranking this document at position 1, which should get a precision @ 1 value of 1.0, gets 10.0 instead. See the StatAP paper for details on estimating precision at cutoffs. In this example, we can compute an exact P@1 value since the only document of interest has actually been judged.
I’m a little uneasy basing my analysis on AP estimates that don’t really resemble what I’m used to seeing as AP values. In some cases, particularly for “easy” queries with lots of relevant documents, I’m commonly seeing statAP estimates well over 5. Is statAP broken?