RAG Chatbot API

This lightweight reference document describes the GOFA AI Retrieval‑Augmented (RAG) health assistant endpoint. The service uses Vertex AI Gemini with a configured RAG corpus. All responses must be grounded in retrieved content (no hallucination). System rules enforce:

For Chinese input: responses are in Cantonese colloquial style.
For English input: responses are concise and professional.
Each response includes a mandatory medical disclaimer.
Non-health-related topics are politely declined and redirected to health-related areas.

1. Endpoint

POST /api/rag-chatbot

(Related: POST /api/rag-chatbot/tts – not covered here.)

2. Auth & Headers

Header	Value / Example	Required	Description
Authorization	`Bearer <Firebase ID Token>`	Yes	User authentication
Content-Type	`application/json`	Yes	Format of the request body
Accept	`application/json`	No	Force JSON mode (or use `?format=json`)

3. Request Body

{
  "messages": [
    { "role": "user", "content": "Explain my fall risk" }
  ]
}

Explanation:

messages: OpenAI-style conversation history. The backend prepends its own system prompt; the frontend does not need to include it.
Supported roles: user | assistant | system (typically only user and previous assistant turns are sent).

Validation

If messages is empty or contains no valid content, a 400 error is returned: { "error": "No valid message data provided" }.

4. Status Codes

Code	Meaning	Example Error JSON
200	Success (streaming or JSON)	(depends on mode)
400	Missing/invalid `messages`	`{ "error": "No valid message data provided" }`
401	Authentication failed	`{ "error": "Authentication failed" }`
500	Server error	`{ "error": "Chat processing failed" }` or `Client ID not found`

5. Response Modes

Mode	Trigger Condition	Content-Type	Characteristics
Streaming (default)	No `?format=json` and `Accept` not JSON	`text/event-stream`	Character-level incremental updates; final chunk includes source sentinel
JSON (single shot)	`?format=json` or `Accept: application/json`	`application/json`	Full response with `sourceFiles` array; no sentinel

6. Streaming Mode (Default)

Each SSE event data: is followed by a JSON object: {"type":"text-delta","textDelta":"..."}.

Example tail of a stream (with source sentinel at the end):

data: {"type":"text-delta","textDelta":"You"}
data: {"type":"text-delta","textDelta":" are"}
...
data: {"type":"text-delta","textDelta":"."}
data: {"type":"text-delta","textDelta":"\n[__SOURCE_FILES__][\"Handout_MOOC12_CH2.pdf\",\"Guide.pdf\"]"}

Sentinel Pattern (always at the very end):

\n[__SOURCE_FILES__]<JSON array of unique source file titles>

If no sources: \n[__SOURCE_FILES__][] is still emitted.

Error streams also end with a sentinel to avoid hanging parsers:

... "⚠️ Model generation failed. Please try again later.\n[__SOURCE_FILES__][]"

Basic Frontend Parsing (Simplified TS Example)

accumulated += chunk.textDelta;
const S = '[__SOURCE_FILES__]';
const i = accumulated.lastIndexOf(S);
if (i >= 0) {
  const answer = accumulated.slice(0, i).trimEnd();
  const raw = accumulated.slice(i + S.length).trim();
  let sources: string[] = [];
  try { sources = JSON.parse(raw) || []; } catch {}
}

Do Not Parse Sources Prematurely

Do not attempt to parse sources before detecting the sentinel; incomplete JSON may result.

Minimal successful stream:

data: {"type":"text-delta","textDelta":"R"}
data: {"type":"text-delta","textDelta":"i"}
...
data: {"type":"text-delta","textDelta":"\n[__SOURCE_FILES__][\"A.pdf\",\"B.pdf\"]"}

7. JSON Mode

Triggered by ?format=json or Accept: application/json.

Example:

{
  "result": "Full grounded answer (without sentinel)",
  "sourceFiles": ["Handout_MOOC12_CH2.pdf", "Guide.pdf"],
  "title": "Handout_MOOC12_CH2.pdf, Guide.pdf",
  "model": "gemini-2.5-flash"
}

Use case: When atomic results are needed (e.g., export, batch processing, synchronous backend workflows) or when SSE is not desired.

8. Front-End Tips

Goal	Recommendation
Typing effect	Append each `textDelta` immediately
Separate sources	Locate sentinel and split after completion
TTS playback	Strip sentinel + source JSON before sending to TTS
Persistence	Store structured data (answer, sources, model, createdAt) to avoid reparsing raw text
Retry logic	If no sentinel is found, treat as no sources (`sources = []`)

9. FAQ (Short)

Question	Answer
Multiple sentinels?	Only one, at the very end.
Parse failure?	Treat as zero sources.
Source order stable?	Order reflects first-seen unique titles; no guaranteed sorting.
Why not custom SSE events?	Restricted by ai-sdk chunk type limitations.

10. curl Examples

Streaming:

curl -N \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  https://your-host/api/rag-chatbot \
  -d '{"messages":[{"role":"user","content":"Explain my fall risk"}]}'

JSON mode:

curl \
  -H "Authorization: Bearer $TOKEN" \
  -H "Accept: application/json" \
  -H "Content-Type: application/json" \
  "https://your-host/api/rag-chatbot?format=json" \
  -d '{"messages":[{"role":"user","content":"Explain my fall risk"}]}'

11. Error Samples

{ "error": "Authentication failed" }

{ "error": "No valid message data provided" }

{ "error": "Chat processing failed", "details": "Vertex timeout" }

12. Best Practices (Condensed)

Prioritize streaming for better UX; use JSON mode for atomic results.
Always wait for the sentinel before extracting sources.
Persist structured data: answer, sources, model, createdAt.
Wrap JSON.parse in try/catch.
Sanitize or truncate very long answers for mobile display.

Keep this document synchronized with backend implementations; update automated/HTTP tests accordingly.

1. Endpoint​

2. Auth & Headers​

3. Request Body​

4. Status Codes​

5. Response Modes​

6. Streaming Mode (Default)​

Basic Frontend Parsing (Simplified TS Example)​

7. JSON Mode​

8. Front-End Tips​

9. FAQ (Short)​

10. curl Examples​

11. Error Samples​

12. Best Practices (Condensed)​