Using a Server#
Once a server is running (see Serving), you can reach it three ways: the native
/generate endpoint, the Python SDK, or the OpenAI-compatible API. Every model is
reachable via /generate and the SDK; the OpenAI routes cover the chat, speech, and
image models.
Native /generate#
POST /generate takes a multipart form and returns either a single JSON document or an
NDJSON stream.
Field |
Default |
Meaning |
|---|---|---|
|
— |
Text prompt (optional if media is provided). |
|
— |
One or more media uploads; each file’s modality is inferred from its extension. |
|
auto |
Comma-separated input modalities; auto-detected from the data when omitted. |
|
|
Comma-separated desired outputs (e.g. |
|
|
|
|
— |
JSON object of model-specific parameters (e.g. |
|
(uuid) |
Optional client-supplied id; the server generates one when omitted. |
A non-streaming response groups outputs by modality, each payload base64-encoded:
{
"request_id": "…",
"outputs": {
"text": [{"data": "<base64>", "metadata": {}}],
"image": [{"data": "<base64-png>", "metadata": {}}]
}
}
A streaming response is application/x-ndjson — one JSON object per line as chunks
arrive. GET /health returns {"status": "healthy"}.
# text (non-streaming → JSON)
curl -s http://localhost:8000/generate -F 'text=Hello' -F 'streaming=false'
# image understanding (image in, text out)
curl -s http://localhost:8000/generate -F 'text=What is in this image?' -F 'files=@cat.jpg'
# text-to-speech (base64 PCM in outputs.audio)
curl -s http://localhost:8000/generate \
-F 'text=hello there' -F 'output_modalities=audio' \
-F 'model_kwargs={"voice":"tara"}' -F 'streaming=false'
Python SDK#
The SDK (mstar.client.MStarClient) is a thin HTTP client over /generate. It
depends only on requests (plus numpy for the audio helpers) — no torch — so it can
run anywhere:
from mstar import MStarClient
client = MStarClient("http://localhost:8000") # optional: timeout=600.0
The core method is generate:
generate(*, text=None, images=None, audio=None, video=None, output_modalities=("text",), input_modalities=None, stream=False, request_id=None, **model_kwargs)Submit a request.
images/audio/videoaccept a path, rawbytes, a(filename, bytes)tuple, or a list of those. Extra keyword args are forwarded as the model’smodel_kwargs(e.g.voice="tara",temperature=0.7,max_output_tokens=256);Nonevalues are dropped. Returns aGenerateResultwhenstream=False, or an iterator of stream events whenstream=True.
Convenience wrappers:
Method |
Returns |
|---|---|
|
Text generation (and, with |
|
PNG |
|
An |
|
Sugar for |
|
|
Result and event types live in mstar.client:
GenerateResult—.text,.images(list of PNG bytes),.audio(anAudioBufferorNone),.raw; plus.save_image(path)/.save_audio(path).AudioBuffer— decoded PCM with.sample_rate;.to_wav(path),.to_numpy(),len(...).Stream events —
TextChunk(text),ImageChunk(data)(.save(path)),AudioChunk(pcm, sample_rate).
res = client.chat("Hello!") # GenerateResult
print(res.text)
open("cat.png", "wb").write(client.generate_image("a cat in a hat"))
client.tts("Hi there", voice="tara").to_wav("out.wav")
for event in client.stream(text="Tell me a story"):
print(getattr(event, "text", ""), end="", flush=True)
OpenAI-compatible API#
mstar mounts OpenAI-style routes under /v1 for the models with standard OpenAI
semantics. Point any OpenAI client at http://<host>:<port>/v1:
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="none")
Endpoints and model coverage:
Endpoint |
Models |
Notes |
|---|---|---|
|
all |
Lists the served model. |
|
|
Text chat (streaming + non-streaming). Qwen3-Omni can also emit speech. |
|
|
Text-to-speech. |
|
|
Text-to-image. |
|
|
Image editing (image + prompt → image). |
Models without an OpenAI surface (pi05, vjepa2, vjepa2_ac) return 404 on
/v1/*; use /generate or the SDK for them.
# chat
client.chat.completions.create(model="bagel", messages=[{"role": "user", "content": "hi"}])
# text-to-speech
client.audio.speech.create(model="orpheus", input="hello there", voice="tara")
# image generation
client.images.generate(model="bagel", prompt="a cat in a hat")
Per-model notes:
BAGEL — chat returns text only; use
/v1/images/generationsand/v1/images/editsfor image output.Qwen3-Omni — text sampling uses
thinker_*keys and speech usestalker_*; set the speaker withvoice(defaultEthan) and request audio output by including"audio"inmodalities. Non-OpenAI knobs (e.g.talker_top_k) go throughextra_body.Orpheus — set the speaker with
voice— one oftara(default),zoe,zac,jess,leo,mia,julia,leah(theavailable_voiceslist in the Orpheus config).