The Foursquare blog has an interesting post about some of the math they use to evaluate and verify the massive amount of user-generated data that enters their database. They need to figure out the likelihood that any given datapoint accurately represents reality, so they’ve worked out a complicated formula that will minimize abuse. Quoting: ‘By choosing the points based on a user’s accuracy, we can intelligently accrue certainty about a proposed update and stop the voting process as soon as the math guarantees the required certainty. … The parameters are automatically trained and can adapt to changes in the behavior of the userbase. No more long meetings debating how many points to grant to a narrow use case. So far, we’ve taken a very user-centric view of p-sub-k (this is the accuracy of user k). But we can go well beyond that. For example, p-sub-k could be “the accuracy of user k’s vote given that they have been to the venue three times before and work nearby.” These clauses can be arbitrarily complicated and estimated from a (logistic) regression of the honeypot performance. The point is that these changes will be based on data and not subjective judgments of how many “points” a user or situation should get.