How Not To Sort By Average Rating — screenshot of evanmiller.org

How Not To Sort By Average Rating

To properly sort user-rated content, naive approaches like net or average ratings fail due to sparse data. This article correctly applies the lower bound of the Wilson score confidence interval, which statistically balances positive ratings with observation uncertainty for robust ranking.

Visit evanmiller.org →

Questions & Answers

What is the Wilson score confidence interval used for in sorting?
The Wilson score confidence interval is a statistical method used to calculate a lower bound for the true proportion of positive ratings. When applied to sorting, this lower bound serves as a score that balances the observed positive ratings with the statistical uncertainty inherent in a small number of observations, providing a more reliable ranking.
Who would benefit from using the Wilson score for ranking items?
Web programmers and data analysts working with user-generated ratings on platforms like e-commerce sites, content aggregators, or social media will benefit. It is essential for anyone needing to accurately display "best of" or "highest-rated" lists where items may have varying numbers of reviews.
How does the Wilson score method improve upon sorting by average rating or net positive ratings?
Unlike simple average rating, the Wilson score accounts for the sample size, preventing items with very few perfect ratings from outranking well-rated items with many more reviews. Compared to net positive ratings (positive minus negative), it offers a percentage-based approach while still considering the total number of ratings, thus avoiding bias towards highly-rated items with many reviews but a lower overall percentage.
When should one apply the Wilson score confidence interval for sorting purposes?
The Wilson score should be applied when sorting items based on binary ratings (e.g., up/down votes, positive/negative) and when there's a significant variance in the total number of ratings for different items. It is particularly useful for new items or items with sparse data, where average ratings would be misleading.
What is a key technical detail or implementation example for the Wilson score?
A practical implementation involves a specific formula requiring the number of positive ratings, total ratings, and a z-score corresponding to a chosen confidence level (e.g., 1.96 for 95% confidence). The article provides SQL and Excel implementations of this formula, allowing direct application in database queries or spreadsheets to generate the sorting score.