Project 02 · Case study
Yelp AI Review Classifier
Fine-tuned DistilBERT plus local Qwen3-8B for zero-cost sentiment classification. 81% accuracy with a domain-shift analysis that mattered more than the headline number.
Case study
The problem
I wanted to see how far you could push sentiment classification without touching a paid API. Fine-tune a small model, run everything locally, measure honestly.
The approach
Fine-tuned DistilBERT on Yelp reviews with Hugging Face Transformers. Paired it with a local Qwen3-8B for the prompt-engineered baseline. Ran domain-shift analysis on out-of-distribution reviews (non-restaurant categories) to see where the fine-tuned model's confidence actually held up.
What worked
81% accuracy on the held-out test set, zero operational cost, all inference local. The domain-shift analysis turned out to be the most useful part: the model's accuracy dropped predictably on non-restaurant reviews, which told me exactly where a production deploy would need retraining data.
What I'd do differently
I overspent on prompt iteration for the Qwen baseline before I had a good eval harness. Next time the eval framework comes first.
More detail
A review classifier built to test how far sentiment analysis can go without touching a paid API. Fine-tuned DistilBERT on Yelp data, paired with a local Qwen3-8B baseline, and measured honestly against out-of-distribution reviews.