Ditch the sales tax complexity.

Talk to a TaxJar expert today and automate your compliance from calculation to filing.

Don’t mistake fluency for fact: The reality of AI tax research

by Aleksandra Bal June 5, 2026
Reviewed by Aleksandra Bal

AI tools can scan a jurisdiction for relevant rules and structure a tax argument in minutes, tasks that used to anchor a tax professional to their desk for hours. The efficiency gains are real, and they aren’t going away. But operating parallel to this speed is a dangerous counter-trend: AI tools have become so good at mimicking professional expertise that it is becoming harder to spot where the data ends and the hallucination begins.

Many AI outputs look convincing but do not hold up under scrutiny. The model sounds authoritative, but the answer is wrong. In most cases, this is not the result of asking the wrong question or selecting the wrong tool. It is the result of misunderstanding what AI language models are and are not capable of doing.

Tax rules depend on precise definitions, narrow thresholds, and jurisdiction-specific exceptions. A small error can change the outcome. And in a field where the difference between a correct and incorrect answer can carry audit risk, fluency is not a reliable signal of accuracy.

To help explain how AI should be used in tax research, we asked an expert from our parent company, Stripe, to provide some guidance. Aleksandra Bal is the Global Indirect Tax Technology Lead at Stripe, leading a team focused on developing tax technology solutions and managing indirect taxes across six continents.

Watch the full breakdown: In the five-minute video above, Aleksandra Bal explains why “fluency is not accuracy” and pulls back the curtain on why standard AI prompts fail in tax compliance.

The mirage of real-time search

We often assume that if an AI app has web search enabled, it’s reading current legislation in real time. But “can search” is not the same as “always searches.”

Different tools like Gemini, ChatGPT, Claude, and Perplexity all handle search differently. Some use it as a default, while others only trigger it when they decide they need current information. If a model thinks its training data is sufficient, it might skip the search entirely and fall back on pattern matching.

The danger is that looking like a correct answer and being a correct answer are two different things. A model might fluently describe a lodging tax in a specific county simply because it has seen similar patterns before, not because it actually verified that the tax exists today.

The confidence mirage

As tax professionals, we naturally want to quantify risk. This leads to one of the most common mistakes in prompting: asking the model to rate its own confidence.

You’ve likely seen it: “Provide the answer and then give me a confidence score from 1–100.” It feels responsible. But here’s the reality: the model doesn’t have a truth meter. It generates that confidence score using the same predictive text logic it used to write the answer.

Why RAG isn’t a magic bullet for hallucinations

To fix this, many teams turn to Retrieval-Augmented Generation (RAG), giving the AI specific documents to use as a reference. While RAG helps, it doesn’t eliminate hallucinations; it just changes how they happen.

RAG adds context, not truth: If the retrieved document is outdated or irrelevant, the model will still confidently reason based on bad data.
The gap fill problem: If the document doesn’t explicitly answer your question, the model won’t necessarily stop. It will often fill the gap by blending the document’s text with its own prior training, creating a “Frankenstein” rule that sounds grounded but isn’t real.
Design matters: If your system chunks documents poorly, the AI might see the general rule but miss the crucial exception or jurisdictional qualifier on the next page.

Prompts that break the system

We often try to command accuracy into existence. We use prompts like:

“Only provide 100% accurate information.”
“Do not fill in knowledge gaps.”
“Review the legislation before answering.”

These sound like strict controls, but they are actually just stylistic requests. Telling a model to be 100% accurate doesn’t give it a new database of facts; it just encourages it to remove hedging language like “this may vary by jurisdiction”. It makes the model sound more certain, even when it isn’t.

Similarly, asking a model to review legislation often results in the model generating a section of text that looks like a legal analysis, but that analysis isn’t necessarily what drove the final conclusion. They are constructed in parallel, not as a verification step.

How to use AI responsibly in tax

If we can’t control the path, how do we manage the risk?

Treat every output as a lead, not a conclusion: Never assume the model searched the web or correctly interpreted your PDF. Verify that the rule exists and that the citation is current.
Keep your requests narrow: The longer the output, the less reliable it becomes. By the time a model is generating row 30 of a tax rate table, it is no longer primarily following your instructions — it is completing a pattern. Ask about one jurisdiction at a time rather than requesting a comprehensive dataset in one go.
Treat prompt instructions as scope, not safety: Instructions like “only use accurate sources” or “do not guess” don’t change how the model generates answers, they only change the tone.

AI can help tax professionals work faster, but it cannot replace the need for a reliable, verified source of truth. The three principles above are essential guardrails, yet they all require expertise, time, and vigilance to apply correctly. For businesses that need accurate, up-to-date tax calculations without the burden of constant manual verification, that guardrail is best built into the tools themselves.

One of the best ways to manage compliance continues to be using tax software, like TaxJar or Stripe Tax. Consider signing up for a free TaxJar trial or reaching out to the sales team. See how TaxJar can transform the challenge of sales tax management into a seamless part of your business operations.

Aleksandra holds a Ph.D., MBA, LLM, and several other degrees in taxation and computer science.

It's time to solve your tax complexity

Talk to a TaxJar expert about automating your sales tax compliance.

Book a demo

Don’t mistake fluency for fact: The reality of AI tax research

The mirage of real-time search

The confidence mirage

Why RAG isn’t a magic bullet for hallucinations

Prompts that break the system

How to use AI responsibly in tax

Related posts

Don’t mistake fluency for fact: The reality of AI tax research

Navigating the AI question: Understanding the difference between AI agents and tax engines

What’s the difference between use and sales tax?

How local sales tax requirements impact compliance

Most popular posts

International Amazon FBA Sellers’ guide to getting sales tax compliant

Five tips for boosting Amazon private label sales

How to file your quarterly estimated taxes

Resale certificate, how to verify