AI without the hype: using LLMs to reduce noise, not replace thinking
This is part 4 of my series on AppReviews.
Part 1 is available here
Part 2 is available here
Part 3 is available here
At some point while building AppReviews, I started thinking about AI.
Not in a “this needs AI” way. More in a quiet, slightly reluctant way.
I already had a system that fetched reviews, pushed them to Slack, and made sure feedback was not missed. That alone solved the core problem. But once reviews are always visible, a new issue appears.
There are a lot of them.
Some are useful.
Some are vague.
Some are emotional.
Some are duplicates of the same issue written slightly differently.
Reading every single review works, up to a point. After that, you are back to scanning, skimming, and mentally filtering noise.
That is when I started asking myself a different question:
What if the system helped me understand reviews faster, without pretending to think for me?
What I did not want
Before adding anything AI-related, I was very clear about what I did not want to build.
I did not want:
- AI-generated summaries pretending to be insights
- magic scores without explanation
- a chatbot answering users on my behalf
- another dashboard full of charts that look smart but do not help decisions
Most of all, I did not want AI to become the product.
AppReviews exists to shorten the feedback loop between users and product teams.
AI should support that goal, not distract from it.
So the bar was high.
If AI did not reduce cognitive load in a very concrete way, it did not belong.
The actual problem AI helps with
The real problem is not understanding one review.
It is understanding many reviews over time.
When ten users describe the same issue in ten different ways, humans are great at seeing the pattern. But only after reading all ten. That does not scale well when reviews keep coming in.
What I wanted help with was:
- grouping similar feedback
- spotting recurring topics
- getting a rough sense of sentiment trends
- surfacing urgency when something suddenly spikes
Not answers. Signals.
Why embeddings first, not prompts everywhere
The first building block I added was embeddings.
Every review can be turned into a vector that captures its meaning. Once you have that, you can compare reviews semantically instead of relying on keywords or star ratings.
That immediately unlocks useful things:
- similar reviews can be grouped together
- topics emerge naturally instead of being predefined
- you can detect “this feels like the same problem again”
For embeddings, I use nomic-embed-text. It is fast, local, and good enough for this use case. Each review becomes a 768-dimension vector stored alongside the raw text.
This step alone already adds value, even without a large language model generating text.
Where LLMs come in, carefully
On top of embeddings, I added a second, optional layer using a large language model.
The model I use is llama3.1:8b, running locally via Ollama. This was an intentional choice.
I wanted:
- no per-token cost anxiety
- no external API dependency
- something that could run on my machine or a small server
The LLM is used for very specific tasks:
- estimating sentiment
- extracting high-level topics
- detecting tone such as angry, neutral, or positive
- flagging urgency when relevant
Each review is processed independently.
No long context.
No agents.
No orchestration complexity.
And most importantly:
This entire pipeline is optional.
If the AI processor is disabled or unavailable, AppReviews works exactly the same. Reviews are still fetched, stored, sent to Slack, and visible in the dashboard.
AI is an enhancement, not a dependency.
Async, isolated, and easy to turn off
From an architectural point of view, AI processing is completely decoupled from the core flow.
- reviews are saved first
- only then are they queued for analysis
- processing happens asynchronously, in small batches
- failures are retried a few times and then dropped
If Ollama is not running, nothing breaks. There is no user-facing error. The system simply skips analysis.
This was non-negotiable.
AI systems fail in weird ways. None of that should impact the primary job of the product.
Why not “AI-generated insights”
This is probably the question I get the most.
Why not generate summaries like:
“Users are unhappy about onboarding”
or:
“Most complaints are about performance”
The short answer is simple.
I do not trust them.
Those summaries look nice, but they hide uncertainty. They compress nuance into something that feels authoritative, even when it is not.
Instead, AppReviews surfaces raw signals:
- these reviews are similar
- this topic appears often
- sentiment around this feature dropped last week
From there, a human can decide what it means.
AI should help you see where to look, not tell you what to think.
Cost, control, and boring decisions
Running everything locally with Ollama is not the most scalable choice. But it fits the constraints perfectly.
- no variable costs
- no surprises
- no API keys to rotate
- no privacy questions about sending user feedback elsewhere
If AppReviews grows, swapping the AI backend is relatively easy. The interface is already isolated.
For now, this setup is predictable and controllable.
That matters more than squeezing out the last percentage of accuracy.
What AI does not do, on purpose
To be very clear, AppReviews does not:
- reply to reviews automatically
- decide which feedback matters
- replace reading reviews
- predict user behavior
- generate product decisions
AI does not talk to users.
AI does not act on their behalf.
AI does not override human judgment.
It just reduces repetition and helps patterns emerge faster.
The outcome so far
In practice, this approach works surprisingly well.
You still read reviews.
You still reply yourself.
You still make decisions.
But you do it with more context and less noise.
And that is the only promise I am comfortable making.
Start with the human workflow.
Figure out where attention is wasted.
Then see if AI can help reduce that cost.
Not the other way around.


