How much has GPT-5 reduced hallucinations?

GPT-5 has reduced hallucinations by **80%** compared to previous models when using its new 'Thinking Mode', making it significantly more reliable for business use.

What is 'Thinking Mode' in GPT-5?

'Thinking Mode' is a feature where the AI models reasons through a problem step-by-step before answering. This process drastically reduces errors, dropping the hallucination rate from 11.6% to 4.8% in general tests and even lower in specific benchmarks.

Can Australian businesses trust GPT-5 for critical tasks?

Yes, but with caveats. GPT-5 is reliable enough for customer service, data analysis, and coding when configured correctly. However, high-stakes decisions in law, medicine, or safety still require human verification.

Does GPT-5 still make up facts?

Yes, it can. While the error rate is much lower (around 1% on some benchmarks with web access), it is not zero. It can still hallucinate, especially if it cannot verify information against the internet.

What are the new features of GPT-5 for business?

Key features include **Tasks** (scheduled recurring prompts), **multi-modal processing** (handling text, code, and images together), **GPT-5 Codex** (optimised for coding), and improved **meeting transcription** with speaker identification.

GPT-5: Why ChatGPT Just Got 80% Less Likely to Lie

Remember when you'd ask ChatGPT a simple question and it'd confidently tell you something completely wrong? Those frustrating moments when the AI would make up statistics, cite non-existent research papers, or invent product features that never existed?

OpenAI just fixed that problem. On 7 August 2025, they launched GPT-5 with an extraordinary claim: 80% fewer hallucinations when using "thinking mode." That's not a minor improvement. It's the difference between an unreliable assistant and something you can actually trust with serious work.

For Australian businesses that've been hesitant to rely on AI for anything critical, this changes everything. Let's break down what's different, what the numbers actually mean, and when you should (and shouldn't) trust GPT-5 with your business decisions.

The Hallucination Breakthrough: What 80% Really Means

Here's what OpenAI discovered when they tested GPT-5 against millions of real ChatGPT queries. With web search enabled, GPT-5's responses contain about 45% fewer factual errors compared to GPT-4o. That's impressive on its own, but when you activate "thinking mode" (where the AI reasons through problems step-by-step), the error rate drops by a staggering 80% compared to the previous o3 model.

The actual numbers are even more compelling. On the LongFact-Concepts benchmark, GPT-5 has a hallucination rate of just 1.0%, compared to 5.2% for the earlier o3 model. On FActScore, it's down to 2.8% from 23.5% in o3. That's not just better, it's dramatically more reliable.

But there's a critical caveat. These impressive numbers depend heavily on web access. When GPT-5 can't verify information against the internet, hallucination rates climb significantly. In some tests without web access, GPT-5 actually hallucinated more than older models. Keep this in mind when you're designing AI workflows for your business.

One researcher, Simon Willison, reported zero hallucinations after two weeks of intensive testing. That's anecdotal, sure, but it matches what OpenAI's benchmarks suggest: when configured properly, GPT-5 is far more trustworthy than anything we've had before.

Why GPT-5 Stopped Making Things Up

The technical innovation behind this improvement is fascinating. GPT-5 isn't actually a single model, it's a collection of specialised models that OpenAI routes your prompts to based on complexity and intent.

Simple queries get handled by a small, efficient version that responds quickly without extensive reasoning. Complex questions get routed to larger models with "deep thinking" capabilities that can work through problems methodically. This adaptive approach means you're not wasting computational power (and time) on simple tasks, but you've got serious reasoning capacity when you need it.

The "thinking mode" works like having the AI think out loud before answering. When activated, the error rate drops from 11.6% to just 4.8%. That's more than a 50% reduction in mistakes just by letting the model reason through the problem properly.

OpenAI also trained GPT-5 to admit when it doesn't know something, rather than inventing plausible-sounding nonsense. This might seem obvious, but it's a huge shift from earlier models that would confidently fabricate information rather than acknowledge uncertainty.

The Performance Numbers That Actually Matter

Beyond hallucinations, GPT-5 sets new records across virtually every benchmark that matters for business applications.

On advanced mathematics (AIME 2025), GPT-5 achieves 94.6% accuracy without any tools, up from GPT-4o's 42.1%. That's not incremental improvement, it's a complete step change in capability.

For coding tasks, GPT-5 hits 74.9% on SWE-bench Verified and 88% on Aider Polyglot when thinking mode is enabled. What's remarkable is the boost that reasoning provides: +22.1 percentage points on SWE-bench and +61.3 points on Aider Polyglot. The model gets dramatically better when it takes time to think.

On competition-level problems (Codeforces), GPT-5 ranks in the 89th percentile among competitive programmers. It's the first AI model to exceed PhD-level human performance on the GPQA benchmark, which tests expertise in chemistry, physics, and biology.

These aren't just impressive numbers for AI researchers. They represent practical capabilities that Australian businesses can leverage right now for complex problem-solving, code generation, and analytical work.

New Features Australian Businesses Should Know About

GPT-5 launched with several features that make it far more useful for real business applications.

The Tasks feature lets you set up recurring prompts that run automatically. You can schedule weekly report generation, daily data analysis, or regular content creation without manual intervention. It's available to Plus, Pro, and Team subscribers.

The image library improvement means GPT-5 can now handle multiple data types simultaneously. Upload a PDF contract and ask for a summary. Send a spreadsheet and request analysis. The model processes text, code, and images in a single interaction without the clunky workarounds GPT-4 required.

For developers, GPT-5 Codex launched on 15 September 2025 as a specialised version optimised for software development. It's been integrated into GitHub Copilot and Visual Studio Code, where it can work autonomously for hours on complex tasks, iterating on implementation, fixing test failures, and delivering functional code without constant supervision.

Meeting transcription got a major upgrade through the gpt-4o-transcribe-diarize model, which provides ultra-low latency automatic speech recognition across 100+ languages. The diarization feature identifies who spoke when, transforming conversations into speaker-attributed transcripts. This is genuinely useful for Australian businesses managing remote teams or client meetings.

The context window expanded to 256,000 tokens in ChatGPT (400,000 via API). That means you can work across entire books, multi-hour meeting transcripts, or large code repositories without the AI losing track of earlier details. For document analysis or legal review work, this is transformative.

Real-World Business Applications (That Actually Work)

Australian businesses are already finding practical uses for GPT-5's improved reliability.

Customer service automation has become genuinely viable. Several Australian AI agencies (DIGITALON, AusGPT, MyChatGPT.com.au) now offer GPT-powered chatbots with local data sovereignty and enterprise security. The improved accuracy means these systems can handle technical troubleshooting and complex enquiries without constantly escalating to human agents.

Octopus Energy, a British energy supplier, integrated ChatGPT into their customer service channels, and it now handles almost half of all enquiries. With GPT-5's hallucination reduction, this kind of deployment becomes safer for industries where accuracy matters more.

Business intelligence and data analysis represent another sweet spot. GPT-5's mathematical and reasoning improvements mean it can perform demand forecasting, inventory optimisation, and trend analysis with far fewer errors than previous models. The Reserve Bank of Australia's 2025 survey found that about 20% of Australian firms are using AI for these "moderate" adoption tasks.

Code generation and review workflows have improved dramatically. SSW, an Australian consulting company, reports that GPT-powered bots can be set up in one day for $3,990 + GST, providing reliable customer support and technical assistance. With GPT-5's coding improvements, these systems require less human oversight.

Content creation and research acceleration work well when you understand GPT-5's limitations. The model excels at first-pass drafting, summarisation, and idea generation. It's not a replacement for subject matter expertise, but it's a powerful assistant that cuts research time significantly.

The Australian Adoption Reality Check

Despite GPT-5's capabilities, Australian businesses have been surprisingly slow to embrace AI in meaningful ways.

The Reserve Bank of Australia's 2025 survey of 100 medium and large-sized firms found that nearly 40% reported only "minimal" AI adoption. Most common use cases were basic tasks like summarising emails or drafting text using off-the-shelf products. Less than 10% of firms had embedded AI into advanced processes like fraud detection.

Among SMEs, the picture is similar. A National AI Readiness Index Report found that 92% of businesses use ChatGPT or Microsoft Copilot, but only 19% have adopted advanced AI systems that drive real business outcomes. Three-quarters of SMEs are racing into AI without a formal strategy.

The barriers are predictable: budget constraints (28% of businesses), security concerns (29%), skills gaps, and uncertainty about the regulatory environment. Australia's business culture tends to be cautious, with relatively low trust in AI compared to other developed nations.

But there's a generational divide. Nearly 70% of Gen Z and Millennials are aware of ChatGPT, compared to just 23% of Baby Boomers. As younger professionals move into decision-making roles, adoption will likely accelerate.

For Australian businesses wanting to move beyond "ChatGPT for emails," GPT-5's reliability improvements remove one of the biggest adoption barriers. When you can trust the AI's output for critical work, the ROI calculation changes dramatically.

When to Trust GPT-5 (And When to Double-Check)

Here's the practical guidance: GPT-5 is accurate for coding, data analysis, and structured work with proper configuration. Enable web browsing for factual queries. Use thinking mode for complex problems. Set clear constraints and validation rules.

But OpenAI's own system card still warns: "verify GPT-5's work when stakes are high." The model is dramatically better, but it's not infallible. It still makes up things sometimes, especially with open-ended questions or in fields where facts are critical, like medicine or law.

ChatGPT can sound confident even when it's wrong. That confidence is both a feature and a bug. For high-stakes decisions, treat GPT-5 as a first-pass assistant, not a final authority.

General factual questions ("How many people live in Canada?") are generally accurate. Specific claims requiring recent data need verification against primary sources. Legal advice, medical diagnosis, financial recommendations, and safety-critical engineering decisions should always be validated by qualified professionals.

The best approach combines automated checks, statistical analysis, and human review. Set clear goals for factual accuracy and bias reduction. Use GPT-5 to accelerate research and draft content, but maintain human oversight for quality control and strategic decisions.

What This Means for Your Business

If you've been waiting for AI to become reliable enough for serious business applications, GPT-5 represents that threshold. The 80% hallucination reduction isn't just a marketing claim, it's a fundamental shift in AI trustworthiness.

For Australian businesses, this creates opportunities in customer service automation, business intelligence, development workflows, and content creation. The improved accuracy means you can deploy AI systems with less human oversight, reducing costs while maintaining quality.

But success requires strategy. The 76% of Australian SMEs racing into AI without formal roadmaps are likely to waste money on tools that don't align with business objectives. Start with clear use cases where accuracy improvements deliver measurable value. Build validation processes to catch errors. Train teams to use AI effectively.

GPT-5 won't replace human expertise, but it's finally good enough to be a reliable partner in complex work. That's the real breakthrough: not perfect AI, but AI that's trustworthy enough to integrate into critical business processes.

The question isn't whether your business should use AI anymore. It's whether you'll adopt it strategically or watch competitors pull ahead while you're still figuring out the basics.

Sources