RAG Chatbot API
This lightweight reference document describes the GOFA AI Retrieval‑Augmented (RAG) health assistant endpoint. The service uses Vertex AI Gemini with a configured RAG corpus. All responses must be grounded in retrieved content (no hallucination). System rules enforce:
- For Chinese input: responses are in Cantonese colloquial style.
- For English input: responses are concise and professional.
- Each response includes a mandatory medical disclaimer.
- Non-health-related topics are politely declined and redirected to health-related areas.
1. Endpoint
POST /api/rag-chatbot
(Related: POST /api/rag-chatbot/tts – not covered here.)
2. Auth & Headers
| Header | Value / Example | Required | Description |
|---|---|---|---|
| Authorization | Bearer <Firebase ID Token> | Yes | User authentication |
| Content-Type | application/json | Yes | Format of the request body |
| Accept | application/json | No | Force JSON mode (or use ?format=json) |
3. Request Body
{
"messages": [
{ "role": "user", "content": "Explain my fall risk" }
]
}
Explanation:
messages: OpenAI-style conversation history. The backend prepends its own system prompt; the frontend does not need to include it.- Supported roles:
user|assistant|system(typically onlyuserand previousassistantturns are sent).
If messages is empty or contains no valid content, a 400 error is returned: { "error": "No valid message data provided" }.
4. Status Codes
| Code | Meaning | Example Error JSON |
|---|---|---|
| 200 | Success (streaming or JSON) | (depends on mode) |
| 400 | Missing/invalid messages | { "error": "No valid message data provided" } |
| 401 | Authentication failed | { "error": "Authentication failed" } |
| 500 | Server error | { "error": "Chat processing failed" } or Client ID not found |
5. Response Modes
| Mode | Trigger Condition | Content-Type | Characteristics |
|---|---|---|---|
| Streaming (default) | No ?format=json and Accept not JSON | text/event-stream | Character-level incremental updates; final chunk includes source sentinel |
| JSON (single shot) | ?format=json or Accept: application/json | application/json | Full response with sourceFiles array; no sentinel |
6. Streaming Mode (Default)
Each SSE event data: is followed by a JSON object: {"type":"text-delta","textDelta":"..."}.
Example tail of a stream (with source sentinel at the end):
data: {"type":"text-delta","textDelta":"You"}
data: {"type":"text-delta","textDelta":" are"}
...
data: {"type":"text-delta","textDelta":"."}
data: {"type":"text-delta","textDelta":"\n[__SOURCE_FILES__][\"Handout_MOOC12_CH2.pdf\",\"Guide.pdf\"]"}
Sentinel Pattern (always at the very end):
\n[__SOURCE_FILES__]<JSON array of unique source file titles>
If no sources: \n[__SOURCE_FILES__][] is still emitted.
Error streams also end with a sentinel to avoid hanging parsers:
... "⚠️ Model generation failed. Please try again later.\n[__SOURCE_FILES__][]"
Basic Frontend Parsing (Simplified TS Example)
accumulated += chunk.textDelta;
const S = '[__SOURCE_FILES__]';
const i = accumulated.lastIndexOf(S);
if (i >= 0) {
const answer = accumulated.slice(0, i).trimEnd();
const raw = accumulated.slice(i + S.length).trim();
let sources: string[] = [];
try { sources = JSON.parse(raw) || []; } catch {}
}
Do not attempt to parse sources before detecting the sentinel; incomplete JSON may result.
Minimal successful stream:
data: {"type":"text-delta","textDelta":"R"}
data: {"type":"text-delta","textDelta":"i"}
...
data: {"type":"text-delta","textDelta":"\n[__SOURCE_FILES__][\"A.pdf\",\"B.pdf\"]"}
7. JSON Mode
Triggered by ?format=json or Accept: application/json.
Example:
{
"result": "Full grounded answer (without sentinel)",
"sourceFiles": ["Handout_MOOC12_CH2.pdf", "Guide.pdf"],
"title": "Handout_MOOC12_CH2.pdf, Guide.pdf",
"model": "gemini-2.5-flash"
}
Use case: When atomic results are needed (e.g., export, batch processing, synchronous backend workflows) or when SSE is not desired.
8. Front-End Tips
| Goal | Recommendation |
|---|---|
| Typing effect | Append each textDelta immediately |
| Separate sources | Locate sentinel and split after completion |
| TTS playback | Strip sentinel + source JSON before sending to TTS |
| Persistence | Store structured data (answer, sources, model, createdAt) to avoid reparsing raw text |
| Retry logic | If no sentinel is found, treat as no sources (sources = []) |
9. FAQ (Short)
| Question | Answer |
|---|---|
| Multiple sentinels? | Only one, at the very end. |
| Parse failure? | Treat as zero sources. |
| Source order stable? | Order reflects first-seen unique titles; no guaranteed sorting. |
| Why not custom SSE events? | Restricted by ai-sdk chunk type limitations. |
10. curl Examples
Streaming:
curl -N \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://your-host/api/rag-chatbot \
-d '{"messages":[{"role":"user","content":"Explain my fall risk"}]}'
JSON mode:
curl \
-H "Authorization: Bearer $TOKEN" \
-H "Accept: application/json" \
-H "Content-Type: application/json" \
"https://your-host/api/rag-chatbot?format=json" \
-d '{"messages":[{"role":"user","content":"Explain my fall risk"}]}'
11. Error Samples
{ "error": "Authentication failed" }
{ "error": "No valid message data provided" }
{ "error": "Chat processing failed", "details": "Vertex timeout" }
12. Best Practices (Condensed)
- Prioritize streaming for better UX; use JSON mode for atomic results.
- Always wait for the sentinel before extracting sources.
- Persist structured data: answer, sources, model, createdAt.
- Wrap
JSON.parsein try/catch. - Sanitize or truncate very long answers for mobile display.
Keep this document synchronized with backend implementations; update automated/HTTP tests accordingly.