This page lists a small selection of some of my favourite papers (which isn't quite the same as the list of "best papers" that I would include in a grant application). It's a lagged indicator of what I'm actually doing with my time, since the earliest anything can end up here is after the paper is accepted for publication, and it's a biased indicator, since I focus mostly on the papers that are central to my research interests, which sometimes differ from those of my collaborators. The full list of my papers is here.
    Much like the paper in the MDL book (see Lee & Navarro 2005 below) the goal behind this paper is to improve the performance of "additive clustering" methods for extracting latent features from similarity matrices. It does so by making use of the Indian Buffet Process, a method for placing priors over feature matrices when the number of features is not known in advance. Besides the technical advances in the paper, what I'm most fond of with this paper is the literature review -- it provides a nice survey of the state of the additive clustering literature, I think.
    You wouldn't know it from looking at how it's cited, but this paper isn't called "An introduction to the Dirichlet process". It does in fact include a tutorial on Dirichlet processes, but that's only because way back in 2006 there weren't many people even in mathematical psychology who'd heard of the DP. There's actually some psychology in this paper, and some psychology that I'm kind of fond of. The idea was to provide a formal framework in which we could talk about individual differences in cognitive processes, and specifically, a framework that allows us to look for qualitatively different "groups" of people. The reason for using the DP to do this is that we almost never know in advance how many groups of people are "out there". Worse yet, it's probably true to say that there's "really" a very large number of groups of qualitatively distinct cognitive strategies that people might use, so the true number of groups in the world is probably arbitrarily large. However, some of these strategies are likely to be very common, while others are quite rare, so we still expect to see a lot of similarities between people, even in a small sample. This is exactly the situation that the DP is built for... thus we have the actual title "Modeling individual differences using Dirichlet processes". As time goes by, I'm kind of impressed that I was able to spend so long reading up on the DP and writing the tutorial part of the paper (one of the joys of being a postdoc, I guess), but it's the psychology in the paper that I think is the more interesting thing. No-one seems to agree with me though...
    This paper was the last major project that I worked on while doing my first postdoc, at Ohio State. The key idea here is that for lots of real world modelling problems, the researcher's goal isn't to characterise precisely some probability distribution over low-level observables. Rather, there is typically some qualitative property of the data that he or she believes is of theoretical importance, and what matters is that the model can be said to "capture" this property in some sense. Most statistical methods don't really accommodate this idea: typically the researcher fits the model in a standard, but then goes on to interpret the fit in terms of the key qualitative effect. To a first approximation this is fine, but it has all the usual problems associated with judging models only in terms of data fit -- some models are more "qualitatively complex", able to reproduce all sorts of patterns in hindsight. In this paper, we proposed what I think is a nice method for dealing with this problem, by trying to enumerate all the "qualitative patterns" a model can produce.
    I really like this chapter -- it summarises a fairly broad research project on similarity modelling that Michael Lee started before I even began my Ph.D., but we later worked on together for some years. What I like most about the chapter is that it talks both about the statistical problem of learning good representations from data, but also about what kinds of psychological assumptions should underlie the statistics.
    Although this paper was the main publication to come out of my Ph.D. thesis, I have somewhat mixed feelings about it. On the plus side, it does make a number of important extensions to the "additive clustering" model for learning stimulus representations, and in doing so opens up some questions about the differences between "representation" and "decision process" that I think are often overlooked when people think about similarity models (specifically, I suspect that some of the things that people call "decisional processes" are actually "representational structure"). However, I'm not completely convinced by the evidence provided by the data. I can't even say why -- the data analysis is solid, I was scrupulously even handed as regards the models, the experimental design wasn't rigged in favour of any model, etc. I think maybe I feel a little dissatisfied in that I think that "Tversky's contrast model" and the "modified contrast model" in this paper are both special cases of a more general framework that I never bothered to build, so the work feels half-finished to me. Even so, I think it's a nice little paper.
    Some people might say that I'm inordinately fond of this paper, but it's my one genuine contribution to information theory. Normally, when people in psychology talk about MDL, they're usually taking ideas from information theory and statistics and applying them to psychological problems. There's nothing wrong with this (in fact, it's an admirable thing to do), but it's not often that we get to make a contribution in the other direction. The idea behind this paper isn't complex -- I was playing around with a very widely used "approximation" to a codelength function for some fairly simple models, and realised that it didn't work for any practical sample size. The approximation error for this method is only of order o(1) (and there's cases where it's o(1/N)), but it still breaks sometimes for real problems, and breaks in a really fundamental way (by treating a nested model as more complex than the full model!). As far as I know, no-one else had pointed out that it could break that badly before this paper. Not exactly Shannon-esque stuff, but still kind of cute.
    Years later, I still really like this paper, and for the life of me I can't figure out why we never wrote it up as a journal paper. The key idea is that for a great many domains, people possess multiple mental representations (e.g., numbers can be organised mathematically and by magnitude), and these representations can have different forms (e.g., spatial, featural). This paper presents a method for simultaneously learning multiple types of representation, which in hindsight I think actually pre-dates a lot of the (admittedly much cooler and more rigourous) Bayesian models that now do this sort of thing.
    Apart from my honours thesis, this was the very first research project I worked on (back in the summer of '98/'99), as a summer student working with Michael Lee. The basic point of this paper was to do exactly what the title suggests -- alter the formalism in the highly successful ALCOVE model of category learning so that it can handle a broader range of stimulus representations. However, there's a few other things it did well. It presents an interesting variation on the well-known Shepard, Hovland & Jenkins "category types" approach to categorisation. It points out that things that look like perfectly good representations in one context (e.g., modelling similarity data) aren't necessarily good mental representations in another one (e.g., modelling category learning), though they often are. And from a purely personal point of view, it introduced me to the world of mathematical psychology and computational cognitive science.