The real score
behind the stars
We match NC health inspection records with Google ratings to surface restaurants where public perception diverges from ground-truth safety.
What's driving this prediction
SHAPWhat people are saying
GoogleTraceability
Model selection
| Model | Macro F1 | Flagged Recall |
|---|---|---|
| Naive Baseline | 0.50 | 0% |
| Random Forest + SHAP | 0.57 | 28% |
| DistilBERT (unweighted loss) | 0.50 | 0% |
Random Forest is the only model that outperforms the baseline on this dataset. DistilBERT was trained with unweighted cross-entropy on a 68:1 imbalanced dataset and predicts all-safe.
Last prediction
| Feature | Value |
|---|
| Class | Probability |
|---|
Run manually
| Feature | Value |
|---|---|
| Google rating | |
| log(review count) | |
| Review word count | |
| Avg word length | |
| Safety keyword hits | |
| Negative phrase hits |
How it works
NC health records
We collected inspection scores and violation data for every licensed food establishment in North Carolina.
Google reviews
Each establishment is matched to its Google listing using fuzzy name + address linking.
ML prediction
A Random Forest trained on the merged dataset flags restaurants whose reviews may not reflect their inspection record, with SHAP-backed explanations.