# Retrieval And Caching Design

## Purpose

This document defines how the platform should retrieve, normalize, cache, and index system and customer documents sourced from Google Drive and Google Docs.

## Retrieval Goals

- keep private agency expertise and customer materials available with low latency
- avoid repeated fetches for infrequently changing documents
- enforce tenant and approval boundaries before relevance ranking
- preserve source provenance for all retrieved context

## Source Classes

The platform should support two primary source classes.

### `system sources`

Internal agency-owned materials such as:

- marketing playbooks
- consultative process docs
- service delivery standards
- tool-use guidance

### `customer sources`

Tenant-owned materials such as:

- brand guidelines
- approved creative assets
- approved campaign history
- workflow rules
- customer-authored briefs

## Retrieval Pipeline

```mermaid
flowchart TD
    source[GoogleDriveOrDocsSource] --> sync[SyncJob]
    sync --> normalize[Normalizer]
    normalize --> classify[DocumentClassifier]
    classify --> chunk[Chunker]
    chunk --> cache[DocumentCache]
    chunk --> index[RetrievalIndex]
    request[UserRequest] --> scope[ScopeFilter]
    scope --> retrieve[Retriever]
    cache --> retrieve
    index --> retrieve
    retrieve --> promptPack[PromptPackage]
```

## Synchronization Model

### Sync Triggers

- scheduled refresh for long-lived documents
- explicit refresh for admin-managed changes
- version or hash comparison on ingestion

### Sync Responsibilities

- fetch source metadata
- download canonical content
- normalize to markdown or plain text
- detect approval state and tenant scope
- write cache metadata and retrieval records

## Normalization Rules

Before indexing, all content should be normalized into a common internal format.

### Normalize To

- plain text or markdown body
- structured metadata
- stable chunk identifiers
- source URI and revision identifiers

### Required Metadata

- `documentId`
- `sourceUri`
- `sourceType`
- `sourceRevision`
- `tenantId`
- `brandId`
- `documentType`
- `approvalState`
- `updatedAt`
- `lastSyncedAt`

## Chunking Strategy

Chunking should preserve semantic meaning and retrieval explainability.

### Guidelines

- chunk by headings or logical sections when available
- keep brand policy chunks separate from examples
- keep customer-approved assets separate from session-generated artifacts
- include small overlap only where needed to preserve continuity
- attach chunk-level metadata for tenant, brand, and approval state

## Cache Strategy

The cache should store normalized source documents and retrieval-ready chunks.

### System Cache

- long time-to-live
- refreshed on source hash change
- shared across all tenants
- restricted to internal-only retrieval

### Customer Cache

- scoped by tenant and brand
- refreshed on source change or approval transition
- filtered before search results are scored

### Session Cache

- short-lived
- attached to the active conversation or task
- excluded from long-term tenant retrieval until approved

## Retrieval Order Of Operations

Every retrieval should follow this order:

1. resolve user identity and access profile
2. determine the job type and context needs
3. hard-filter candidate sources by tenant, role, and approval state
4. rank only the in-scope candidate set
5. assemble context with provenance metadata
6. pass the prompt package to orchestration

## Caching And Invalidation Rules

- invalidate cached content when source revision changes
- invalidate customer retrieval entries when approval state changes
- invalidate or suppress retired documents immediately
- keep cache metadata even when a document body is refreshed so lineage is preserved
- avoid per-request source fetching except for explicit preview or debugging flows

## Performance Recommendations

- precompute normalized chunks during sync rather than at request time
- maintain separate indexes for system and customer sources
- prioritize exact scope filters before semantic search
- cap the final prompt payload by relevance and token budget

## Failure Handling

- if Google retrieval fails, prefer cached approved content over empty context
- mark stale cache entries with age and last successful sync time
- fail closed when tenant scope cannot be determined
- do not silently substitute one tenant's material for another

## Suggested Retrieval Metadata For Responses

Responses should retain enough metadata to explain grounding.

- `retrievedSourceCount`
- `retrievedSourceIds`
- `retrievedPromptAssetIds`
- `cacheAge`
- `approvalStatesUsed`
- `tenantScope`