mstar.api_server.media_io#
Media decode/encode helpers shared by the native and OpenAI-compatible APIs.
Two directions:
Inbound — turn media referenced by an OpenAI-style request (
data:URLs,http(s)URLs, or base64 blobs) into files under the API server’supload_dir, somodel.load_image/load_audio/load_videocan read them by path. This is the same contract/generatealready uses for multipart uploads.Outbound — wrap raw model audio output (16-bit PCM, no container header) into a real audio container (WAV by default) and encode image bytes (PNG) as a
data:URL for OpenAI chat image output.
Only stdlib + numpy are required. mp3 / flac / ogg encoding is opt-in
and degrades to WAV when the optional soundfile backend is unavailable, so
the base install stays slim.
Functions
|
Map a MIME type to one of our modality strings (image/audio/video). |
|
Encode raw 16-bit PCM into |
|
Wrap raw little-endian 16-bit PCM (the model's audio output) into a WAV blob. |
|
Encode PNG image bytes (the model's image output) as a data URL. |
|
Resolve a media reference (data URL, http(s) URL, or local path). |
|
Persist a bare base64 blob with a known |
|
Persist a |
|
Download an |
|
A 44-byte WAV header with streaming (unknown-length) size fields. |
- mstar.api_server.media_io.modality_from_mime(mime)[source]#
Map a MIME type to one of our modality strings (image/audio/video).
- mstar.api_server.media_io.pcm16_to_container(pcm, sample_rate, fmt='wav')[source]#
Encode raw 16-bit PCM into
fmt. Returns(bytes, mime_type).wavandpcmuse the stdlib (the bytes are already PCM_16). Compressed formats need the optionalsoundfilebackend; if it is missing we fall back to WAV and log once.
- mstar.api_server.media_io.pcm16_to_wav_bytes(pcm, sample_rate, num_channels=1)[source]#
Wrap raw little-endian 16-bit PCM (the model’s audio output) into a WAV blob.
- mstar.api_server.media_io.png_to_data_url(png_bytes)[source]#
Encode PNG image bytes (the model’s image output) as a data URL.
- mstar.api_server.media_io.resolve_media_ref(ref, upload_dir, *, allow_remote=True)[source]#
Resolve a media reference (data URL, http(s) URL, or local path).
Returns
(modality, path). Local paths are passed through unchanged (modality inferred from extension).
- mstar.api_server.media_io.save_base64(b64, fmt, modality_hint, upload_dir)[source]#
Persist a bare base64 blob with a known
fmt(e.g."wav").
- mstar.api_server.media_io.save_data_url(data_url, upload_dir)[source]#
Persist a
data:<mime>;base64,<payload>URL. Returns (modality, path).
- mstar.api_server.media_io.save_remote_url(url, upload_dir, timeout=30.0)[source]#
Download an
http(s)URL intoupload_dir. Returns (modality, path).Note: fetching arbitrary URLs has SSRF surface. Callers exposing this publicly should allowlist hosts or disable remote fetch (data-URL only).
- mstar.api_server.media_io.wav_stream_header(sample_rate, num_channels=1, bits=16)[source]#
A 44-byte WAV header with streaming (unknown-length) size fields.
Used to stream TTS audio over a single HTTP response: emit this header, then 16-bit PCM frames as they arrive. The 0xFFFFFFFF placeholders signal an open-ended stream, which players and the OpenAI client’s
stream_to_filehandle.