A client called me in January with what she described as a "Copilot problem." She'd asked Microsoft 365 Copilot a simple question: what's our current parental leave entitlement? Copilot answered confidently, with specific figures, cited a policy document, and she'd nearly forwarded that response to a new employee.
Except the entitlement had changed eighteen months ago. The document Copilot had found was sitting on a subsite that nobody had touched since 2019. The new policy existed too, on the main HR site, but Copilot had surfaced the older version because it matched the semantic query better by some internal metric nobody could fully explain.
She wasn't annoyed at Copilot, exactly. She was annoyed at herself for not checking. And then she was annoyed at me for not warning her that the quality of Copilot's answers would be directly tied to the quality of her SharePoint content.
Fair enough, honestly. I should've led with that.
This is that conversation.

You're Probably Paying for SharePoint AI You've Never Switched On
Most Australian organisations with M365 licences have AI capabilities sitting dormant in SharePoint. Here's what's available, what it costs, and how...
Read full articleWhat the Semantic Index Actually Is
Microsoft built something called the Microsoft Graph Semantic Index, and it's been quietly indexing your M365 content since 2023. It's not optional and it doesn't announce itself (Microsoft Learn: Semantic Index for Copilot).
Here's what it actually does. Traditional search is lexical: it looks for your exact words. If you search for "parental leave" it finds documents containing those two words. The Semantic Index goes further. It creates vector embeddings (mathematical representations of meaning) for your content. That means Copilot can find a document about "parental entitlements" or "maternity and paternity provisions" even if you searched for "leave policy," because the model understands conceptual similarity, not just keyword overlap.
There are two layers to the index. A user-level index covers personal data (your emails, your OneDrive files, your meeting recordings). A tenant-level index covers shared content across SharePoint sites, Teams channels, and document libraries. Both are continuously updated as content changes.
Supported content types include SharePoint pages (and yes, this includes news articles and wiki pages, which are now generally available), Word documents, PDFs, PowerPoint presentations, OneNote notebooks, and Exchange emails. Other M365 content such as Teams messages and Excel files may be accessible to Copilot through separate search pathways, but aren't confirmed in the Semantic Index's documented content type table.
The permission boundary is strict and it's worth understanding clearly: Copilot only surfaces content the querying user can access. If someone asks Copilot about the executive remuneration policy and they don't have access to that SharePoint library, Copilot won't find it. That's a feature. But it also means that if your permissions architecture is a mess (and whose isn't, after a decade of organic SharePoint growth), those permission boundaries create AI blind spots. Copilot can't answer questions about content it can't see, even when the answer exists in your organisation.
The other thing worth knowing: deleted documents don't disappear from the index immediately. They can remain indexed during their time in the recycle bin, and after permanent deletion, removal from the index may not be immediate. So that document you deleted last week might still be influencing Copilot answers today.
The GEO Parallel: Same Problem, Same Solution
If you've been following the conversation about generative engine optimisation (GEO), which is the practice of structuring external web content so that AI search tools like Perplexity, ChatGPT, and Google's AI Overviews can find and cite it accurately, I want to suggest something. That conversation and the SharePoint conversation are now the same conversation.
I've started calling this "internal GEO." It's not an official Microsoft term; it's a framing that enterprise consultants have been using through 2025 to describe something that's clearly true. The principles are identical: clear headings, accurate metadata, fresh and factual content, consistent terminology, logical hierarchy, and the removal of superseded or conflicting content.
The difference is the stakes. On the public web, a stale page might cost you some traffic. Inside your intranet, a stale page might cause a staff member to be told the wrong parental leave entitlement.
If your team is doing any work on external AI search optimisation, the frameworks you're building apply directly to your SharePoint content. Start treating your intranet like a content asset that an AI is going to read, because that's exactly what's happening.
The Ignite 2025 Update That Changes the Maths
I want to call this out separately because it's significant and it's recent.
At Microsoft Ignite 2025, Microsoft announced that Copilot metadata reasoning is now generally available (Microsoft TechCommunity: SharePoint Showcase Ignite 2025). Here's what that means in plain terms.
Before this update, Copilot read document text. That was it. Your carefully maintained SharePoint metadata columns (content types, taxonomy terms, managed metadata fields) were effectively invisible to Copilot's reasoning engine. You might have had a "Document Status" column faithfully set to "Superseded" on every outdated policy document. Copilot couldn't see it.
Now it can.
Copilot now reasons over custom SharePoint metadata columns. That "Superseded" status? Copilot can use it. Your "Last Reviewed" date field? Part of Copilot's contextual understanding. The department taxonomy you spent three months arguing about in 2022? Now contributing directly to answer quality.
For organisations that invested in proper SharePoint metadata governance, this is the payoff. The return on investment from that governance work is now directly measurable in Copilot answer quality. For organisations that never got around to metadata hygiene, this is a motivating signal to start.
The Five Failure Modes (With Names, So We Can Talk About Them)
I find it helps to name these patterns specifically. When you're in a content audit conversation, it's easier to say "we've got a Haunted Subsite problem" than to describe the whole situation from scratch.
The Haunted Subsite. This is the SharePoint site that was created for a project or team that no longer exists. Nobody owns it, nobody's maintained it in years, but it's still there, still accessible, still indexed. Copilot will happily surface its content because it has no way of knowing the team dissolved and the information is irrelevant. I've seen organisations with dozens of these.
The Version Graveyard. "Draft v3 FINAL (2).docx". You know this file. It lives in a document library alongside "Draft v3 FINAL.docx" and "Draft v2 FINAL reviewed PW.docx" and the actual final version with a completely different name. Copilot might surface any of them. The one it picks is determined by semantic relevance and recency, not by which one is actually authoritative. Without metadata to distinguish the authoritative version, it's a coin toss.
The Orphaned Document. This is the file that got uploaded once, never linked to anything, never reviewed, and has been sitting in a document library for four years accumulating no views. It's indexed, it might be factually wrong, and Copilot has no way to know it's an orphan. Clear headings and metadata are the only signals Copilot has to judge document quality.
The Permission Paradox. Your most accurate and well-maintained content is in a library locked down to a small team. Copilot can't see it for users outside that group, so it reaches for the next best match, which might be older content from a more broadly accessible site. The security boundary that protects sensitive content also hides accurate information from the AI.
The Stale Page. This is the parental leave example from the top of this article. Content that was accurate when it was created, wasn't updated when circumstances changed, and is now confidently wrong. These are the most dangerous because they look authoritative. They're well-structured, they've got the right headings, they just haven't been touched since 2019.
New Tools Worth Knowing About
Two tools came out of Ignite 2025 that are directly relevant here.
The first is the Content Management Assessment tool, which Microsoft has specifically positioned as the pre-Copilot readiness gate to run before any serious Copilot rollout (Microsoft Learn: Get ready for Copilot with SharePoint Advanced Management). It gives you a consolidated report covering site health, permissions posture, and lifecycle readiness. It's available through the SharePoint Admin Centre if you've got a SharePoint Advanced Management licence. Run this first. It'll surface a lot of what you'd otherwise spend weeks discovering manually.
The second is something called Knowledge Agent in SharePoint, which is currently in public preview (Microsoft TechCommunity: Knowledge Agent in SharePoint). This is an AI that scans your tenant specifically looking for stale content and knowledge gaps that would affect Copilot quality. It's not a magic fix, but it's a useful ongoing signal rather than a one-time audit.
The Audit Framework (Step 0 Through Step 5)
Here's the practical sequence. This is roughly what we'd walk through with a client preparing for a Microsoft 365 Copilot rollout.
Step 0: Run the Content Management Assessment.
Don't skip this. It's the consolidated starting point that'll show you the state of your tenant's permissions, site lifecycle health, and inactive sites at a glance. Think of it as the dashboard before you start opening drawers. If you don't have a SharePoint Advanced Management licence yet, that's worth evaluating before a Copilot deployment anyway.
Step 1: Identify stale content.
Go to the SharePoint Admin Centre and look at Site Usage and Last Modified data. Any content that hasn't been updated in twelve months or more needs a human eye on it. That doesn't mean deleting everything old (some content legitimately doesn't change), but it means making a conscious decision rather than leaving it to chance. Microsoft's own documentation explicitly states that inactive sites and outdated content "clutter Copilot's data source and lead to less accurate responses." That's not my opinion. That's Microsoft's.
Step 2: Review metadata coverage.
Check your most important document libraries for metadata completeness. At minimum, you want Content Type, Date Modified, Author, and Status columns populated consistently. Since the Ignite 2025 update, this matters more than it used to. A "Status: Superseded" tag on an outdated policy is now actually useful to Copilot's reasoning, not just useful to your content governance spreadsheet.
Step 3: Audit duplicate documents.
Run a SharePoint Search query for common terms in your most important policy areas. What comes back? If you're getting three versions of your IT security policy, that's a problem. Decide which is authoritative, mark it with metadata, archive or delete the others. Copilot doesn't know which version to trust; you need to tell it through structure and metadata.
Step 4: Check permissions architecture.
Map out where your most accurate, well-maintained content lives and who can see it. Copilot will never surface content to users who don't have access, so if your best information is locked down to a subset of staff, other users get the next-best option. That might be fine. But it's worth knowing it's happening, and in some cases it's worth reviewing whether those permissions still reflect your intent.
Step 5: Structure priority pages.
Identify your top twenty most-used content areas (HR policies, IT processes, onboarding materials, whatever matters most to your organisation). Apply a consistent structure to each: a clear H1, a summary paragraph at the top, H2 sections for major topics. Add metadata for subject, audience, and last reviewed date. This is exactly what you'd do for external SEO. It works internally for the same reasons.
What "Copilot-Ready" SharePoint Actually Looks Like
Here's my honest assessment. There's no such thing as "done" with SharePoint content governance. Organisations that treat this as a one-time project will be back at square one in eighteen months.
What Copilot-ready SharePoint looks like is less a state and more a practice. It means having clear ownership for content areas so that someone is responsible when a policy changes. It means building review cycles into your content lifecycle so that documents have expiry prompts or review dates. It means your metadata columns are actually populated, not just created and ignored.
The organisations I've seen get the most out of Copilot in the enterprise aren't the ones that did the biggest launch. They're the ones that treated SharePoint content as a first-class asset before the AI deployment. Some of them did it because they wanted better search results years ago. The AI payoff came as a bonus.
It's the same insight I keep coming back to when people ask about AI readiness in general. The organisations that do well with AI are the ones that had disciplined data and content practices before AI arrived. The AI doesn't create order from chaos. It amplifies what's already there.
The Ongoing Discipline
The worker productivity data on information findability hasn't moved meaningfully in over a decade. A 2012 McKinsey study put knowledge workers spending around 1.8 hours per day (roughly 9 hours per week) searching for and gathering information (McKinsey Global Institute: The social economy). IDC research has put the figure even higher, around 2.5 hours a day. Every enterprise analyst who has looked at this problem reaches roughly the same conclusion: a significant chunk of the working day disappears into the search problem.
That's the problem Microsoft 365 Copilot is positioned to solve. But it can only solve it if the information is there, it's accurate, and it's findable. A well-prompted AI on a poorly maintained SharePoint doesn't give you back those 1.8 hours. It gives you confidently wrong answers faster.
I'll be watching how organisations approach this over the next year. My prediction is that we'll see a clear divide emerge between organisations that treat Copilot content quality as an ongoing governance function and those that treat it as a one-time setup. The former will accumulate compounding returns. The latter will have a lot of conversations about why Copilot keeps getting things wrong.
The conversation my client and I had in January could've happened at any organisation rolling out Microsoft 365 Copilot right now. The parental leave answer wasn't a Copilot failure. It was a content governance failure that Copilot made visible.
In a way, that's the most useful thing about deploying AI on your internal content. It turns invisible problems into obvious ones. The question is whether you fix them before the AI answers go out to staff, or after.
Key Takeaways
Understanding the Semantic Index:
- Microsoft Graph Semantic Index has been indexing your M365 content since 2023, covering SharePoint pages, Word, PDF, PowerPoint, Excel, OneNote, Teams messages, and Exchange emails
- It uses vector embeddings to understand meaning, not just keywords, so Copilot finds conceptually relevant content even when exact terms don't match
- Deleted documents can remain indexed for one to seven days after permanent deletion
The Ignite 2025 Metadata Update:
- Copilot now reasons over custom SharePoint metadata columns (content types, taxonomy terms, managed metadata), now generally available
- Organisations with mature metadata governance get measurably better Copilot results
- "Status: Superseded" and "Last Reviewed" fields are now genuinely useful to Copilot's reasoning
The Five Failure Modes to Fix:
- Haunted Subsite: abandoned sites with stale, unowned content
- Version Graveyard: multiple conflicting document versions with no clear authority
- Orphaned Document: unlinked files with no structural context
- Permission Paradox: accurate content locked away from users who need it
- Stale Page: well-structured but outdated content that looks authoritative
The Audit Sequence:
- Start with the Content Management Assessment tool (SharePoint Advanced Management)
- Identify and review content not updated in twelve months or more
- Ensure metadata columns are populated consistently
- Resolve duplicate documents with clear authoritative versions
- Structure priority pages with consistent H1, summary, and H2 sections
---
Sources
- Microsoft Learn. "Semantic Index for Copilot." 2023. https://learn.microsoft.com/en-us/microsoftsearch/semantic-index-for-copilot
- Microsoft Learn. "Get ready for Copilot with SharePoint Advanced Management." 2025. https://learn.microsoft.com/en-us/sharepoint/get-ready-copilot-sharepoint-advanced-management
- Microsoft TechCommunity. "SharePoint Showcase Announcements at Microsoft Ignite 2025." November 2025. https://techcommunity.microsoft.com/blog/spblog/sharepoint-showcase-announcements-at-microsoft-ignite-2025/4470378
- Microsoft TechCommunity. "Introducing Knowledge Agent in SharePoint." November 2025. https://techcommunity.microsoft.com/blog/spblog/introducing-knowledge-agent-in-sharepoint/4454154
- Microsoft TechCommunity. "New capabilities for AI admins from Ignite 2025." November 2025. https://techcommunity.microsoft.com/blog/microsoft365copilotblog/new-capabilities-for-ai-admins-from-ignite-2025/4478906
- Microsoft Learn. "Optimising SharePoint for Employee Self-Service agents." 2025. https://learn.microsoft.com/en-us/copilot/microsoft-365/employee-self-service/optimization-sharepoint
- McKinsey Global Institute. "The social economy: Unlocking value and productivity through social technologies." July 2012. https://www.mckinsey.com/industries/technology-media-and-telecommunications/our-insights/the-social-economy
---
