mstar.model.pi05.components.tokenization

mstar.model.pi05.components.tokenization#

Tokenization wrapper for Pi0.5: PaliGemma tokenizer + state discretization.

Functions

normalize_prompt(prompt)

Lowercase + strip whitespace, matching openpi's PaligemmaTokenizer.

Classes

Pi05Tokenizer(hf_tokenizer, config)

Wrapper around the HF PaliGemma tokenizer that also tokenizes robot state.

class mstar.model.pi05.components.tokenization.Pi05Tokenizer(hf_tokenizer, config)[source]#

Bases: object

Wrapper around the HF PaliGemma tokenizer that also tokenizes robot state.

Robot state values are discretized into state_token_bins bins and mapped to language token IDs starting at state_token_offset. Pi0.5 reuses bottom-of-vocab tokens for state bins so that PaliGemma’s embedding table can embed them directly.

Parameters:

config (Pi05Config)

encode_prompt(prompt)[source]#
Parameters:

prompt (str)

Return type:

Tensor

mstar.model.pi05.components.tokenization.normalize_prompt(prompt)[source]#

Lowercase + strip whitespace, matching openpi’s PaligemmaTokenizer.

Parameters:

prompt (str)

Return type:

str