Skip to main content

RAG Chatbot API

This lightweight reference document describes the GOFA AI Retrieval‑Augmented (RAG) health assistant endpoint. The service uses Vertex AI Gemini with a configured RAG corpus. All responses must be grounded in retrieved content (no hallucination). System rules enforce:

  • For Chinese input: responses are in Cantonese colloquial style.
  • For English input: responses are concise and professional.
  • Each response includes a mandatory medical disclaimer.
  • Non-health-related topics are politely declined and redirected to health-related areas.

1. Endpoint

POST /api/rag-chatbot

(Related: POST /api/rag-chatbot/tts – not covered here.)

2. Auth & Headers

HeaderValue / ExampleRequiredDescription
AuthorizationBearer <Firebase ID Token>YesUser authentication
Content-Typeapplication/jsonYesFormat of the request body
Acceptapplication/jsonNoForce JSON mode (or use ?format=json)

3. Request Body

{
"messages": [
{ "role": "user", "content": "Explain my fall risk" }
]
}

Explanation:

  • messages: OpenAI-style conversation history. The backend prepends its own system prompt; the frontend does not need to include it.
  • Supported roles: user | assistant | system (typically only user and previous assistant turns are sent).
Validation

If messages is empty or contains no valid content, a 400 error is returned: { "error": "No valid message data provided" }.

4. Status Codes

CodeMeaningExample Error JSON
200Success (streaming or JSON)(depends on mode)
400Missing/invalid messages{ "error": "No valid message data provided" }
401Authentication failed{ "error": "Authentication failed" }
500Server error{ "error": "Chat processing failed" } or Client ID not found

5. Response Modes

ModeTrigger ConditionContent-TypeCharacteristics
Streaming (default)No ?format=json and Accept not JSONtext/event-streamCharacter-level incremental updates; final chunk includes source sentinel
JSON (single shot)?format=json or Accept: application/jsonapplication/jsonFull response with sourceFiles array; no sentinel

6. Streaming Mode (Default)

Each SSE event data: is followed by a JSON object: {"type":"text-delta","textDelta":"..."}.

Example tail of a stream (with source sentinel at the end):

data: {"type":"text-delta","textDelta":"You"}
data: {"type":"text-delta","textDelta":" are"}
...
data: {"type":"text-delta","textDelta":"."}
data: {"type":"text-delta","textDelta":"\n[__SOURCE_FILES__][\"Handout_MOOC12_CH2.pdf\",\"Guide.pdf\"]"}

Sentinel Pattern (always at the very end):

\n[__SOURCE_FILES__]<JSON array of unique source file titles>

If no sources: \n[__SOURCE_FILES__][] is still emitted.

Error streams also end with a sentinel to avoid hanging parsers:

... "⚠️ Model generation failed. Please try again later.\n[__SOURCE_FILES__][]"

Basic Frontend Parsing (Simplified TS Example)

accumulated += chunk.textDelta;
const S = '[__SOURCE_FILES__]';
const i = accumulated.lastIndexOf(S);
if (i >= 0) {
const answer = accumulated.slice(0, i).trimEnd();
const raw = accumulated.slice(i + S.length).trim();
let sources: string[] = [];
try { sources = JSON.parse(raw) || []; } catch {}
}
Do Not Parse Sources Prematurely

Do not attempt to parse sources before detecting the sentinel; incomplete JSON may result.

Minimal successful stream:

data: {"type":"text-delta","textDelta":"R"}
data: {"type":"text-delta","textDelta":"i"}
...
data: {"type":"text-delta","textDelta":"\n[__SOURCE_FILES__][\"A.pdf\",\"B.pdf\"]"}

7. JSON Mode

Triggered by ?format=json or Accept: application/json.

Example:

{
"result": "Full grounded answer (without sentinel)",
"sourceFiles": ["Handout_MOOC12_CH2.pdf", "Guide.pdf"],
"title": "Handout_MOOC12_CH2.pdf, Guide.pdf",
"model": "gemini-2.5-flash"
}

Use case: When atomic results are needed (e.g., export, batch processing, synchronous backend workflows) or when SSE is not desired.

8. Front-End Tips

GoalRecommendation
Typing effectAppend each textDelta immediately
Separate sourcesLocate sentinel and split after completion
TTS playbackStrip sentinel + source JSON before sending to TTS
PersistenceStore structured data (answer, sources, model, createdAt) to avoid reparsing raw text
Retry logicIf no sentinel is found, treat as no sources (sources = [])

9. FAQ (Short)

QuestionAnswer
Multiple sentinels?Only one, at the very end.
Parse failure?Treat as zero sources.
Source order stable?Order reflects first-seen unique titles; no guaranteed sorting.
Why not custom SSE events?Restricted by ai-sdk chunk type limitations.

10. curl Examples

Streaming:

curl -N \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
https://your-host/api/rag-chatbot \
-d '{"messages":[{"role":"user","content":"Explain my fall risk"}]}'

JSON mode:

curl \
-H "Authorization: Bearer $TOKEN" \
-H "Accept: application/json" \
-H "Content-Type: application/json" \
"https://your-host/api/rag-chatbot?format=json" \
-d '{"messages":[{"role":"user","content":"Explain my fall risk"}]}'

11. Error Samples

{ "error": "Authentication failed" }
{ "error": "No valid message data provided" }
{ "error": "Chat processing failed", "details": "Vertex timeout" }

12. Best Practices (Condensed)

  1. Prioritize streaming for better UX; use JSON mode for atomic results.
  2. Always wait for the sentinel before extracting sources.
  3. Persist structured data: answer, sources, model, createdAt.
  4. Wrap JSON.parse in try/catch.
  5. Sanitize or truncate very long answers for mobile display.

Keep this document synchronized with backend implementations; update automated/HTTP tests accordingly.