Explanation

How the stitched hearing retrieval POC works.

`neal` turns the stitched Day 14 Julie Critchley hearing into timed joined evidence records. This deployed POC indexes `430` segment records across two kept source windows, and each record keeps a playback clip, a `5 fps` embedding derivative, an aligned PDF transcript slice, and the metadata needed to open both the video and the matching PDF page range.

Gemini embeddings Vectorize 430 joined segments 2 stitched source windows Cloudflare Worker R2-served clip + PDF evidence Grounded Flash-Lite answer
Stage 1

Ingest and vector creation

Keep and stitch the hearing ranges

Start with two kept source windows from the original MP4, remove the intro and on-video break, then build one stitched hearing-relative timeline: `00:31:40-01:17:35` and `01:30:40-01:42:00`.

Join modalities per segment

For each segment, create a muted `5 fps` embedding clip, a browser playback clip, a preview image, and the aligned transcript excerpt for that same time range.

Store searchable evidence

Gemini creates `RETRIEVAL_DOCUMENT` vectors, and Vectorize stores them alongside stitched time, original source time, speaker label, alignment confidence, video URL, PDF URL, and PDF preview metadata.

kept source windows stitched hearing master 10s segments video + transcript document embeddings Vectorize index
Stage 2

Search query path

Best-fit query style

This corpus is strongest for transcript-backed hearing questions such as `bundle 52 volume 5 page 150`, `what is said about whistleblowers`, `who is the regulator health improvement scotland`, or `what is the planning process`.

Query embedding

Gemini creates a `RETRIEVAL_QUERY` embedding for the search text, while the Worker also classifies the query as visual, textual, or mixed. In practice, text-heavy hearing questions work best because the answer is grounded from transcript-backed segment evidence.

Nearest evidence lookup

Vectorize returns the closest joined segment records. The Worker keeps the top transcript-backed video matches and passes the best evidence bundle to the answer model.

user query query embedding Vectorize nearest neighbors modality-aware rerank top 5 joined segment matches
Stage 3

Result assembly and grounded answer

Top evidence bundle

The Worker takes the highest ranked joined matches, preserving the playback clip, transcript excerpt, PDF page range, and alignment confidence for each segment.

Grounded answer generation

`gemini-3.1-flash-lite-preview` with `MINIMAL` thinking answers only from the returned evidence, so the response stays tied to the retrieved records.

Explainable output

The UI shows the answer, top inline segment viewers, PDF preview images, transcript excerpts, PDF open links, speaker labels, and the Gemini plus Vectorize search costs.

retrieved joined records grounded Flash-Lite answer clip + PDF + transcript links
ASCII Map
1. INGEST + VECTOR CREATION

original hearing MP4
        +
PDF transcript anchored at
"(Adjourned for a short time)"
        |
        v
keep source windows
00:31:40 -> 01:17:35
01:30:40 -> 01:42:00
        |
        v
stitched hearing timeline
        |
        v
segment into timed chunks
(10s clips with 2s overlap)
        |
        +--> playback MP4 per segment
        |
        +--> muted 5 fps embedding clip per segment
        |
        +--> preview frame per segment
        |
        +--> transcript excerpt aligned from PDF
        |
        v
joined evidence record per segment
[video] + [transcript excerpt] + [time/page metadata]
        |
        v
Gemini embedding model
taskType = RETRIEVAL_DOCUMENT
        |
        v
Vectorize stores:
vector values
+ start/end
+ sourceStart/sourceEnd
+ playbackUrl
+ previewUrl
+ pdfUrl
+ pdfPreviewUrl
+ speakerLabel
+ alignmentConfidence
+ transcriptExcerpt
+ transcriptSource
+ pdfPageStart/pdfPageEnd


2. SEARCH QUERY

user asks:
"bundle 52 volume 5 page 150"
or
"who is the regulator health improvement scotland"
        |
        v
Worker receives query
        |
        +--> classify query intent
             visual / text / mixed
        |
        v
Gemini embedding model
taskType = RETRIEVAL_QUERY
        |
        v
query vector sent to Vectorize
        |
        v
Vectorize nearest-neighbor lookup
returns top matching joined records


3. RESULT + RAG OUTPUT

top matches from Vectorize
(video + transcript + PDF metadata)
        |
        v
Worker selects top segment evidence
        |
        v
top evidence bundle passed to LLM
(gemini-3.1-flash-lite-preview)
        |
        v
grounded answer generated from retrieved evidence only
        |
        v
UI shows:
- concise answer
- playable clip
- PDF preview
- transcript excerpt
- open PDF link
- transcript source
- Gemini cost
- Vectorize cost
Back to search