The real score
behind the stars

We match NC health inspection records with Google ratings to surface restaurants where public perception diverges from ground-truth safety.

Try:

Google

What's driving this prediction

SHAP

What people are saying

Google

Traceability

Model selection

Model	Macro F1	Flagged Recall
Naive Baseline	0.50	0%
Random Forest + SHAP	0.57	28%
DistilBERT (unweighted loss)	0.50	0%

Random Forest is the only model that outperforms the baseline on this dataset. DistilBERT was trained with unweighted cross-entropy on a 68:1 imbalanced dataset and predicts all-safe.

Last prediction

Feature	Value

Class	Probability

Run manually

Feature	Value
Google rating
log(review count)
Review word count
Avg word length
Safety keyword hits
Negative phrase hits

How it works

NC health records

We collected inspection scores and violation data for every licensed food establishment in North Carolina.

Google reviews

Each establishment is matched to its Google listing using fuzzy name + address linking.

ML prediction

A Random Forest trained on the merged dataset flags restaurants whose reviews may not reflect their inspection record, with SHAP-backed explanations.