Model Transparency

We believe predictions are only valuable when you can trust them. This page shows how well-calibrated our AI models are: when we say 70%, it should happen roughly 70% of the time.

What Is Calibration?

A well-calibrated model produces probabilities that match real-world frequencies. For example, if our model assigns a 60% win probability to 100 different matches, roughly 60 of those matches should end as predicted. The reliability diagrams below plot predicted probabilities against observed outcomes on a held-out test set that the models never saw during training.

How to read the charts: In each chart, the dashed diagonal line represents perfect calibration. Points close to the diagonal indicate reliable probabilities. The bar chart below each curve shows how many predictions fall in each probability range, giving context on where the model is most confident.

Example: What Good and Bad Calibration Looks Like

An illustrative example. The green point is perfectly calibrated: the model predicted 60% and the event occurred 60% of the time. The red point is overconfident: it predicted 80% but events only occurred 55% of the time. The orange point is underconfident: it predicted 30% but events occurred 50% of the time.
The calibration results shown below are based on our latest deployed models. We continuously train, evaluate, and release new model versions — each release is validated on a held-out test set before deployment. The charts on this page always reflect the models currently serving predictions on the platform.

Our NBA prediction system uses neural networks trained on over 40,000 historical games. We predict match winners, total points, point spreads, and individual player statistics including points, rebounds, assists, three-pointers made, steals, and blocks.

Evaluated on 2024-2025 season test set

Match Winner

Binary classifier predicting the probability of each team winning. The model processes ELO ratings, team statistics, injury data, and starting lineup information.

Reliability diagram for the Match Winner model (2024-2025 season). Points closer to the dashed diagonal indicate better-calibrated probabilities.
Number of predictions in each probability bin. More samples in a bin means the calibration measurement there is more reliable.

Total Points

Regression model generating a full probability distribution over possible total points scored. Provides expected value and confidence intervals.

Reliability diagram for the Total Points model (2024-2025 season). Points closer to the dashed diagonal indicate better-calibrated probabilities.

Point Spread

Regression model predicting the point differential between teams, with a complete probability distribution for spread outcomes.

Reliability diagram for the Point Spread model (2024-2025 season). Points closer to the dashed diagonal indicate better-calibrated probabilities.

Player Points

Per-player points projection with full probability distribution.

Reliability diagram for the Player Points model (2024-2025 season). Points closer to the dashed diagonal indicate better-calibrated probabilities.

Player Rebounds

Per-player rebounds projection with full probability distribution.

Reliability diagram for the Player Rebounds model (2024-2025 season). Points closer to the dashed diagonal indicate better-calibrated probabilities.

Player Assists

Per-player assists projection with full probability distribution.

Reliability diagram for the Player Assists model (2024-2025 season). Points closer to the dashed diagonal indicate better-calibrated probabilities.

Player 3-Pointers Made

Per-player three-pointers made projection with full probability distribution.

Reliability diagram for the Player 3-Pointers Made model (2024-2025 season). Points closer to the dashed diagonal indicate better-calibrated probabilities.

Our football models cover five major European leagues: Serie A, La Liga, Bundesliga, Premier League, and Ligue 1. We predict match results, under/over 2.5 goals, goal/no goal, expected corners, and expected shots.

Evaluated on 2024-2025 season test set

Match Result (1X2)

Multiclass classifier predicting home win, draw, and away win probabilities. Calibration is shown per class: Home Win, Draw, and Away Win.

Reliability diagram for the Match Result (1X2) model (2024-2025 season). Points closer to the dashed diagonal indicate better-calibrated probabilities.
Number of predictions in each probability bin. More samples in a bin means the calibration measurement there is more reliable.

Under/Over 2.5 Goals

Binary classifier predicting the probability of under or over 2.5 total goals in a match.

Reliability diagram for the Under/Over 2.5 Goals model (2024-2025 season). Points closer to the dashed diagonal indicate better-calibrated probabilities.
Number of predictions in each probability bin. More samples in a bin means the calibration measurement there is more reliable.

Goal / No Goal

Binary classifier predicting whether both teams will score at least one goal.

Reliability diagram for the Goal / No Goal model (2024-2025 season). Points closer to the dashed diagonal indicate better-calibrated probabilities.
Number of predictions in each probability bin. More samples in a bin means the calibration measurement there is more reliable.

Expected Corners

Poisson regression model generating a full probability distribution for the total number of corner kicks in a match.

Reliability diagram for the Expected Corners model (2024-2025 season). Points closer to the dashed diagonal indicate better-calibrated probabilities.

Expected Shots

Poisson regression model generating a full probability distribution for the total number of shots in a match.

Reliability diagram for the Expected Shots model (2024-2025 season). Points closer to the dashed diagonal indicate better-calibrated probabilities.

Our tennis prediction system covers the full ATP Tour, from Grand Slams to ATP 250 events. Models leverage surface-specific indices, ELO/Glicko-2 ratings, and over 500 features trained on 30,000+ ATP matches.

Evaluated on 2025 season test set

Match Winner

Binary classifier predicting the probability of each player winning. The model incorporates surface-specific form indices, ELO ratings, Glicko-2 ratings, and head-to-head statistics.

Reliability diagram for the Match Winner model (2025 season). Points closer to the dashed diagonal indicate better-calibrated probabilities.
Number of predictions in each probability bin. More samples in a bin means the calibration measurement there is more reliable.

Total Games

Regression model predicting the total number of games played in a match, with a full probability distribution.

Reliability diagram for the Total Games model (2025 season). Points closer to the dashed diagonal indicate better-calibrated probabilities.

Game Spread

Regression model predicting the game differential between players.

Reliability diagram for the Game Spread model (2025 season). Points closer to the dashed diagonal indicate better-calibrated probabilities.

Aces

Regression model predicting total aces in a match with full probability distribution.

Reliability diagram for the Aces model (2025 season). Points closer to the dashed diagonal indicate better-calibrated probabilities.

Double Faults

Regression model predicting total double faults with full probability distribution.

Reliability diagram for the Double Faults model (2025 season). Points closer to the dashed diagonal indicate better-calibrated probabilities.

Understanding the Metrics

ECE (Expected Calibration Error)

The weighted average gap between predicted probabilities and actual outcomes across all bins. Lower is better. Values below 0.05 indicate good calibration.

Brier Score

Measures both calibration and sharpness of probability forecasts. Ranges from 0 (perfect) to 1 (worst). Lower is better.

Calibration MAE

Mean Absolute Error between predicted and observed probabilities across quantile thresholds. Used for regression models. Lower values indicate better calibration.

90% Coverage

The fraction of actual outcomes that fall within the model's 90% prediction interval. A well-calibrated model should achieve approximately 90% coverage.

See Our Models in Action

Explore today's predictions powered by these calibrated models.