AI without the hype: using LLMs to reduce noise, not replace thinking

Posted on December 20, 2025 by oxm6k

This is part 4 of my series on AppReviews.
Part 1 is available here
Part 2 is available here
Part 3 is available here

At some point while building AppReviews, I started thinking about AI.

Not in a “this needs AI” way. More in a quiet, slightly reluctant way.

I already had a system that fetched reviews, pushed them to Slack, and made sure feedback was not missed. That alone solved the core problem. But once reviews are always visible, a new issue appears.

There are a lot of them.

Some are useful.

Some are vague.

Some are emotional.

Some are duplicates of the same issue written slightly differently.

Reading every single review works, up to a point. After that, you are back to scanning, skimming, and mentally filtering noise.

That is when I started asking myself a different question:

What if the system helped me understand reviews faster, without pretending to think for me?

What I did not want

Before adding anything AI-related, I was very clear about what I did not want to build.

I did not want:

AI-generated summaries pretending to be insights
magic scores without explanation
a chatbot answering users on my behalf
another dashboard full of charts that look smart but do not help decisions

Most of all, I did not want AI to become the product.

AppReviews exists to shorten the feedback loop between users and product teams.

AI should support that goal, not distract from it.

So the bar was high.

If AI did not reduce cognitive load in a very concrete way, it did not belong.

The actual problem AI helps with

The real problem is not understanding one review.

It is understanding many reviews over time.

When ten users describe the same issue in ten different ways, humans are great at seeing the pattern. But only after reading all ten. That does not scale well when reviews keep coming in.

What I wanted help with was:

grouping similar feedback
spotting recurring topics
getting a rough sense of sentiment trends
surfacing urgency when something suddenly spikes

Not answers. Signals.

Why embeddings first, not prompts everywhere

The first building block I added was embeddings.

Every review can be turned into a vector that captures its meaning. Once you have that, you can compare reviews semantically instead of relying on keywords or star ratings.

That immediately unlocks useful things:

similar reviews can be grouped together
topics emerge naturally instead of being predefined
you can detect “this feels like the same problem again”

For embeddings, I use nomic-embed-text. It is fast, local, and good enough for this use case. Each review becomes a 768-dimension vector stored alongside the raw text.

This step alone already adds value, even without a large language model generating text.

Where LLMs come in, carefully

On top of embeddings, I added a second, optional layer using a large language model.

The model I use is llama3.1:8b, running locally via Ollama. This was an intentional choice.

I wanted:

no per-token cost anxiety
no external API dependency
something that could run on my machine or a small server

The LLM is used for very specific tasks:

estimating sentiment
extracting high-level topics
detecting tone such as angry, neutral, or positive
flagging urgency when relevant

Each review is processed independently.

No long context.

No agents.

No orchestration complexity.

And most importantly:

This entire pipeline is optional.

If the AI processor is disabled or unavailable, AppReviews works exactly the same. Reviews are still fetched, stored, sent to Slack, and visible in the dashboard.

AI is an enhancement, not a dependency.

Async, isolated, and easy to turn off

From an architectural point of view, AI processing is completely decoupled from the core flow.

reviews are saved first
only then are they queued for analysis
processing happens asynchronously, in small batches
failures are retried a few times and then dropped

If Ollama is not running, nothing breaks. There is no user-facing error. The system simply skips analysis.

This was non-negotiable.

AI systems fail in weird ways. None of that should impact the primary job of the product.

Why not “AI-generated insights”

This is probably the question I get the most.

Why not generate summaries like:

“Users are unhappy about onboarding”

or:

“Most complaints are about performance”

The short answer is simple.

I do not trust them.

Those summaries look nice, but they hide uncertainty. They compress nuance into something that feels authoritative, even when it is not.

Instead, AppReviews surfaces raw signals:

these reviews are similar
this topic appears often
sentiment around this feature dropped last week

From there, a human can decide what it means.

AI should help you see where to look, not tell you what to think.

Cost, control, and boring decisions

Running everything locally with Ollama is not the most scalable choice. But it fits the constraints perfectly.

no variable costs
no surprises
no API keys to rotate
no privacy questions about sending user feedback elsewhere

If AppReviews grows, swapping the AI backend is relatively easy. The interface is already isolated.

For now, this setup is predictable and controllable.

That matters more than squeezing out the last percentage of accuracy.

What AI does not do, on purpose

To be very clear, AppReviews does not:

reply to reviews automatically
decide which feedback matters
replace reading reviews
predict user behavior
generate product decisions

AI does not talk to users.

AI does not act on their behalf.

AI does not override human judgment.

It just reduces repetition and helps patterns emerge faster.

The outcome so far

In practice, this approach works surprisingly well.

You still read reviews.

You still reply yourself.

You still make decisions.

But you do it with more context and less noise.

And that is the only promise I am comfortable making.

Start with the human workflow.

Figure out where attention is wasted.

Then see if AI can help reduce that cost.

Not the other way around.

Source link