Back to Explore
Domain Health
Starter+·4 tools

Domain Health

Domain Health gives you a quantitative picture of your corpus quality before retrieval quality becomes a problem. Every ingested domain gets a composite health score built from four dimensions: embedding coverage (what percentage of ingested documents produced valid embeddings), chunk distribution (are documents being split into useful segments or degenerating into single-chunk blobs?), duplicate detection (how much redundancy exists across documents?), and freshness (how old is the most recently updated document, and what's the median age?).

Gap analysis goes beyond what's in your corpus to identify what's missing. When users or agents query topics that return low-confidence results or zero hits, those query patterns accumulate. Domain Health surfaces them as gap signals - "users have asked about deployment pipelines 14 times this week, but the engineering domain has no documents about deployment." This is not just a search quality metric; it's a corpus strategy tool. It tells you what to ingest next.

Ingestion diagnostics run at ingest time and report on documents that failed to process cleanly. MIME-type exclusions surface documents that were silently skipped because their format isn't supported by the current pipeline configuration - this is critical because the default pipeline excludes non-markdown formats like JSON, YAML, and CSV unless explicitly configured. Oversized documents are flagged when they exceed the chunker's optimal range (documents that are too small don't exercise the semantic chunker; documents that are too large produce chunks that lose specificity). Embedding failures are logged with the specific error so you can distinguish between transient infrastructure issues and systematic content problems.

The freshness dimension deserves special attention. A domain where the newest document is six months old is not necessarily unhealthy - reference material and standards documents update slowly. But a domain where the newest document is six months old and users are querying it daily is a risk. Domain Health combines freshness metrics with query frequency to produce a staleness risk score that accounts for how actively the domain is being used, not just how old it is.

For teams managing multiple domains, the health dashboard provides a portfolio view. You can identify which domains need investment (high query volume, low health score), which are stable (low query volume, high health score), and which are at risk (declining health scores over time). This turns corpus management from a reactive activity (fix retrieval when it breaks) into a proactive one (invest in domains before quality degrades).

MCP Tools

get_freshness_report
○ Free

Staleness flags per topic - identifies which knowledge areas have gone stale and need re-ingestion or review.

get_contradictions
○ Free

Cross-platform divergence alerts - surfaces cases where different source documents disagree on the same topic.

list_topics
○ Free

List all topics in the knowledge base for a user or team, providing an overview of corpus coverage.

get_audit_trail
○ Free

Full change history for a topic - every update, supersession, and re-ingestion event with timestamps and actor identity.

Ready to get started?

VaultCrux is still gated. Request access and we will provision the credentials your agent needs.