Are LLM Visibility Trackers Worth It?
Is LLM visibility worth the premium price point these tools have positioned themselves at or is the juice not worth the squeeze?
TL;DR
When it comes to LLM visibility, not all brands are created equal. For some it matters far more than others
LLMs give different answers to the same question. Trackers combat this by simulating prompts repeatedly to get an average visibility/citation score
Whilst simulating the same prompts isn’t perfect, secondary benefits like sentiment analysis are not SEO-specific issues. Which right now is a good thing
Unless a visibility tracker offers enough scale at a reasonable price, I would be wary. But if the traffic converts well and you need to know more, get tracking
A small caveat to start. This really depends on how your business makes money and whether LLMs are a fundamental part of your audience journey. You need to understand how people use LLMs and what it means for your business.
Brands that sell physical products have a different journey from publishers that sell opinion or SAAS companies that rely more deeply on comparison queries than anyone else.
Or a coding company destroyed by one snidey Reddit moderator with a bone to pick…
For example, Ahrefs made public some of their conversion rate data from LLMs. 12.1% of their signups came from LLMs from just 0.5% of their total traffic. Which is huge.
But for us, LLM traffic converts significantly worse. It is a fraction of a fraction.
Honestly, I think LLM visibility trackers at this scale are a bit here today and gone tomorrow. If you can afford one, great. If not, don’t sweat it. Take it all with a pinch of salt. AI search is just a part of most journeys and tracking the same prompts day in, day out has obvious flaws.
They’re just aggregating what someone said about you on Reddit while they’re taking a shit in 2016.
What do they do?
Trackers like Profound and Brand Radar are designed to show you how your brand is framed and recommended in AI answers. Over time you can measure yours and your competitors visibility in the platforms.
But LLM visibility is smoke and mirrors.
Ask a question, get an answer. Ask the same question, to the same machine, from the same computer and get a different answer. A different answer with different citations and businesses.
It has to be like this, or else we’d never use the boring bastards.
To combat the inherent variance determined by their temperature setting, LLM trackers simulate prompts repeatedly throughout the day. In doing so, you get an average visibility and citation score alongside some other genuinely useful add-ons like your sentiment score and some competitor benchmarking.
“Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.”
Simulate a prompt 100 times. If your content was used in 70 of the responses and you were cited seven times, you would have a 70% visibility score and a 7% citation score.
Trust me, that’s much better than it sounds… These engines do not want to send you traffic.
In Brian Balfour’s excellent words, they have identified the moat and the gates are open. They will soon shut. As they shut, monetisation will be hard and fast. The likelihood of any referral traffic, unless it’s monetised, is low.
Like every tech company ever.
If you aren’t flush with cash, I’d say most businesses just do not need to invest in them right now. They’re a nice-to-have rather than a necessity for most of us.
How do they work?
As far as I can tell, there are two primary models.
Pay for a tool that tracks specific synthetic prompts that you add yourself
Purchase an enterprise-like tool that tracks more of the market at scale
Some tools, like Profound, offer both. The cheaper model (the pricepoint is not for most businesses) lets you track synthetic prompts under topics and/or tags. The enterprise model gives you a significantly larger scale.
Whereas tools like Ahrefs Brand Radar provide a broader view of the entire market. As the prompts are all synthetic, there are some fairly large holes. But I prefer broad visibility.
I have not used it yet, but I believe Similarweb have launched their own LLM visibility tracker, which includes real user prompts from Clickstream data.
This makes for a far more useful version of these tools IMO and goes some way to answering the synthetic elephant in the room. And it helps you understand the role LLMs play in the user journey. Which is far more valuable.
The problem
Does doing good SEO improve your chances of improving your LLM visibility?
Certainly looks like it…
GPT-5 no longer needs to train on more information. It is as well-versed as its overlords now want to pay for. It’s bored of ravaging the internet’s detritus and reaches out to a search index using RAG to verify a response. A response, it does not quite have the appropriate level of confidence to answer effectively.
But I’m sure we will need to modify it somewhat if your primary goal is to increase LLM visibility. Increase expenditure on TOFU and digital PR campaigns being a notable point.
Right now LLMs have an obvious spam problem. One I don’t expect they’ll be willing to invest in solving anytime soon. The AI bubble and gross valuation of these companies will dictate how they drive revenue. And quickly.
It sure as hell won’t be sorting out their spam problem. When you have a $300 billion contract to pay and revenues of $12 billion, you need some more money. Quickly.
So anyone who pays for best page link inclusions or adds hidden and footer text to their websites will benefit in the short-term. But most of us should still build things actual, breathing, snoring people.
With the new iterations of LLM trackers calling search instead of formulating an answer for prompts based on learned ‘knowledge’, it becomes even harder to create an ‘LLM optimisation strategy.’
As a news site, I know that most prompts we would vaguely show up in would trigger the web index. So I just don’t quite see the value. It’s very SEO-led.
How you can add value with sentiment analysis
I found almost zero value to be had from tracking prompts in LLMs at a purely answer level. So let’s fuck all that off for a second and use them for something else. Let’s start with some sentiment analysis.
These trackers give us access to;
A wider online sentiment score
Review sources LLMs called upon (at a prompt level)
Sentiment scores by topics
Prompts and links to on and off-site information sources
You can identify where some of these issues start. Which, to be fair, is basically Trustpilot and Reddit.
I won’t go through everything, but a couple of quick examples;
LLMs may be referencing some not-so-recently defunct podcasts and newsletters as ‘reasons to subscribe.’
Your cancellation process may be cited as the most serious issues for most customers
Unless you have explicitly stated that these podcasts and newsletters have finished it’s all fair game. You need to tighten up your product marketing and communications strategy.
For people first. Then for LLMs.
These are not SEO specific projects. We’re moving into an era where solely SEO projects will be difficult to get pushed through. A fantastic way of getting buy-in is to highlight projects with benefits outside of search.
Highlighting serious business issues - poor reviews, inaccurate, out of date information at al - can help get C-Suite attention and support for some key brand reputation projects.
To me, this has nothing to do with LLMs. Or what our audience might ask an ill-informed answer engine. They are just the vessel.
It is about solving problems. Problems that drive real value to your business. In your case, this could be about increasing the LTV of a customer. Increasing their retention rate, reducing churn and increasing the chance of a conversion by providing an improved experience.
If you’ve worked in SEO for long enough, someone will have floated the idea of improving your online sentiment and reviews past you.
“But will this improve our SEO?”
Said Jeff, a beleaguered business owner.
Fuck knows, Jeff. It really depends on what is holding you back compared to your competition. And like it or not, search is not very investible right now.
But that doesn’t matter in this instance. This isn’t a search-first project. It’s an audience-first project. It encompasses everyone. From customer service to SEO and editorial. It’s just the right thing to do for the business.
A quick hark back to the Google Leak shows you just how many review and sentiment-focused metrics may affect how you rank.
For a long time, search has been about brands and trust. Branded search volume, outperforming expected CTR (a Bayesian type predictive model), direct traffic and general user engagement and satisfaction.
This isn’t because Google knows better than people. It’s because they have stored how we feel about pages and brands in relation to queries and used that as a feedback loop. Google trusts brands because we do.
Most of us have never had to worry about reviews and sentiment. But this is a great time to fix any issues you may have under the guise of AEO, GEO, SEO or whatever you want to call it.
Lars Lofgren’s article titled How a Competitor Crippled a $23.5M Bootcamp By Becoming a Reddit Moderator is an incredible look at how Codesmith was nobbled by negative PR. Negative PR started and maintained by one Reddit Mod. One.
So keeping tabs on your reputation and identifying potentially serious issues is never a bad thing.
Could I just build my own?
Yep. For starters, you’d need an estimation of monthly LLM API costs based on the number of monthly tokens required. Let’s use Profound’s lower end pricing tier as an estimate and our old friend Gemini to figure out some estimated costs.
200 prompts × 10 runs × 12 days (approx.) × 3 models = 24,000 monthly runs
24,000 runs × 1,000 tokens/query (conservative est.) = 24,000,000 tokens
Based on this, here’s a (hopefully) accurate cost estimate per model from our robot pal.
Right then. You now need some back-end functionality, data storage and some front-end visualisation. I’ll tot up as we go.
$21 per month
Back-end
A Scheduler/Runner like Render VPS to execute 800 API calls per day
A data orchestrater. Essentially, some Python code to parse raw JSON and extract relevant citation and visibility data
$10 per month
Data storage
A database, like Supabase (which you can integrate directly through Lovable) to store raw responses and structured metrics
Data storage (which should be included as part of your database)
$15 per month
Front end visualisation
A web dashboard to create interactive, shareable dashboards. I unironically love Lovable. It’s easy to connect directly to databases. I have also used Streamlit previously. Lovable looks far sleeker but has its own challenges
You may also need a visualisation library to help generate time series charts and graphs. Some dashboards have this built in
$50 per month
$96 all in. I think the likelihood is it’s closer to $50 than $100. No scrimping. At the higher end of budgets for tools I use (Lovable) and some estimates from Gemini, we’re talking about a tool that will cost under $100 a month to run and function very well.
This isn’t a complicated project or setup. It is, IMO, an excellent project to learn the vibe coding ropes. Which I will say is not all sunshine and fucking rainbows.
How to Build a Brand (with SEO) in a Post AI World
I should probably start by saying I don’t know exactly. The AI apocalypse hasn’t quite driven us to the point of complete madness (although we’re certainly close) and we don’t have any usable case studies.
So, should I buy one?
If you can afford it, I would get one. For at least a month or two. Review your online sentiment. See what people really say about you online. Identify some low lift wins around product marketing and review/reputation management and review how your competitors fare.
This might be the most important part of LLM visibility. Set up a tracking dashboard via Google Analytics (or whatever dreadful analytics provider you use) and see a) how much traffic you get and b) whether it’s valuable.
The more valuable it is, the more value there will be in tracking your LLM visibility.
You could also make one. The joy of making one is a) you can learn a new skill and b) you can make other things for the same cost.
Frustrating, yes. Fun? Absolutely.
Other great stuff
AI Survival Strategies for Publishers - Barry Adams
AI Mode User Behaviour Study - Kevin Indig
E-E-A-T Decoded: Google’s Experience, Expertise, Authoritativeness, and Trust - Shaun Anderson
ChatGPT for Competitive Analysis - WTF is SEO