1. Purpose
This note explains what the public dataset is, how to interpret its reliability metrics, and how coverage should be read.
It focuses on the parts that matter to a technical buyer: product scope, coverage, market surface, and reliability signals.
2. What the product is
Vector Neutral publishes one official daily Excel workbook for the active UTC+0 date.
The product is not a pick sheet, not a betting advisory product, and not an automated execution layer. It is a structured pre-match probability surface intended for independent analysis, modeling, and downstream system building.
3. Coverage model
Coverage is not defined by a raw upstream feed alone. The public universe starts from the daily pre-match fixture intake and then passes through product-date filtering in UTC+0, semantic deduplication, probabilistic consistency checks, and publication quality control.
Reference coverage: the cleaned reference coverage file currently contains 183 geography or competition-group rows and 643 unique league or competition references across domestic, regional, and international groups.
Important boundary: this reference describes the upstream coverage universe, not a publication guarantee. The public dataset includes only matches that pass the product-date filter, deduplication, probabilistic consistency checks, and publication QC before release.
4. Current public reliability snapshot
Latest audited reliability snapshot: 2026-04-25. Counts are shown as operational references, not perpetual guarantees.
This section tracks the latest public reliability evidence. It is separate from the commercial active date shown at checkout, which controls the workbook a buyer receives.
5. Public market surface
The public workbook is designed around a coherent market surface per match.
- Visible scope:
1X2,DC,OU,BTS,AH,Binary Bundle - Core families:
1X2,DC,BTS - Complementary families:
OU,AH - Curated layer:
Binary Bundleis a curated public layer, not an exhaustive combinatorial universe
6. How probabilities are synthesized
At a high level, the public probabilities are produced through four stages:
- pre-match probabilistic inference
- calibration
- structural synthesis across related market families
- coherence checks before publication
This means the workbook should be read as a unified probabilistic surface by fixture, not as a pile of unrelated scores.
7. Why Brier, Log-loss, and ECE are shown together
Each metric captures a different failure mode:
- Brier Score: aggregate probabilistic error
- Log-loss: penalty for assigning confidence to outcomes that do not occur
- ECE: calibration drift between predicted confidence and realized frequency
Lower is better in all three. Showing the three together is more informative than showing only one, because a model can look acceptable on one metric while still being poorly calibrated or overconfident.
8. What the metrics do not mean
- They do not imply guaranteed profitability.
- They do not imply guaranteed downstream edge.
- They do not imply guaranteed future performance.
- They do not imply guaranteed suitability for a specific staking or trading method.
They are validation signals about probability quality, not commercial promises.
9. Publication discipline
- One official workbook per UTC+0 date
- All buyers of that date receive the same official edition
- The public workbook excludes internal technical metadata
- The public workbook is separated from internal control artifacts
10. Method boundary
This public note explains product behavior, coverage, market definitions, and reliability signals.
It does not document implementation internals, operational controls, or proprietary decision rules.