Project 02 · Case study

Yelp AI Review Classifier

Fine-tuned DistilBERT plus local Qwen3-8B for zero-cost sentiment classification. 81% accuracy with a domain-shift analysis that mattered more than the headline number.

PythonDistilBERTQwen3-8BTransformersPyTorchHugging FaceNLPMachine Learning

Case study

The problem

I wanted to see how far you could push sentiment classification without touching a paid API. Fine-tune a small model, run everything locally, measure honestly.

The approach

Fine-tuned DistilBERT on Yelp reviews with Hugging Face Transformers. Paired it with a local Qwen3-8B for the prompt-engineered baseline. Ran domain-shift analysis on out-of-distribution reviews (non-restaurant categories) to see where the fine-tuned model's confidence actually held up.

What worked

81% accuracy on the held-out test set, zero operational cost, all inference local. The domain-shift analysis turned out to be the most useful part: the model's accuracy dropped predictably on non-restaurant reviews, which told me exactly where a production deploy would need retraining data.

What I'd do differently

I overspent on prompt iteration for the Qwen baseline before I had a good eval harness. Next time the eval framework comes first.

More detail

A review classifier built to test how far sentiment analysis can go without touching a paid API. Fine-tuned DistilBERT on Yelp data, paired with a local Qwen3-8B baseline, and measured honestly against out-of-distribution reviews.