features.md

February 4, 2026 · View on GitHub

🌟 Features

🎨 Mixing & matching models

You can use different models in different rooms (e.g. OpenAI GPT-4o alongside Llama running on Groq, etc.)

You can also use different models within the same room (e.g. 💬 text-generation handled by one 🤖 agent, 🦻 speech-to-text handled by another, 🗣️ text-to-speech by a 3rd, etc.)

The bot supports the following use-purposes:

💬 text-generation: communicating with you via text (though certain models may "see" images as well)
🦻 speech-to-text: turning your voice messages into text
🗣️ text-to-speech: turning bot or users text messages into voice messages
🖌️ image-generation: generating images based on instructions

In a given room, each different purpose can be served by a different ☁️ provider and model. This combination of provider and model configuration is called an 🤖 agent. Each purpose can be served by a different handler agent.

See a 🖼️ Screenshot of an example room configuration.

For more information about configuring handlers, see the 🤝 Handlers / Configuring documentation section.

💬 Text Generation

Text Generation is the bot's ability to respond to users' messages with text.

Screenshot of Text Generation - a user sends a message and the bot replies in a new conversation thread

Some models also support vision, so you may be able to mix text and images in the same conversation.

In multi-user (group) rooms, to avoid disturbing the normal conversation between people, the bot is auto-configured to only respond to messages starting with the command prefix (!bai) or direct mentions via the 💬 Text Generation / 🗟 Prefix Requirement Type setting.

Normally, the bot only responds to allowed 👥 Users. In certain cases, it's useful for an allowed user to provoke the bot to respond even in foreign threads or reply chains. You can learn more about this feature in the On-demand involvement section below.

A few other features (like 🗣️ Text-to-Speech and 🦻 Speech-to-Text) combine well with Text Generation, so you don't necessarily need to communicate with the bot via text (with Seamless voice interaction, you can communicate only with voice).

You may also wish to see:

🛠️ Configuration / 💬 Text Generation for configuration options related to Text Generation
📖 Usage / 💬 Text Generation section for more details on how to use the bot for Text Generation in a room

🛠️ Built-in Tools (OpenAI only)

The OpenAI provider supports built-in tools that extend the model's capabilities:

🔍 Web Search (web_search): allows the model to search the web for up-to-date information. 🖼️ Screenshot
💻 Code Interpreter (code_interpreter): allows the model to write and execute Python code in a sandbox

These tools are disabled by default and need to be explicitly enabled in the agent's text_generation.tools configuration. See the OpenAI sample configuration for reference.

To enable tools on an existing dynamically-created agent, you need to update the agent to re-create it with the text_generation.tools section added and enable the tools you need

💡 Note: These tools run on OpenAI's infrastructure and may incur additional costs. Web search results include citations that are incorporated into the response.

On-demand involvement

In the following 2 cases, it's useful to involve the bot in conversations on-demand:

In multi-user rooms (with the 🗟 Prefix Requirement setting set to "required")
In rooms with foreign users (users that are not authorized bot 👥 users)

In these instances, an allowed 👥 user can also provoke the bot to respond to any thread or reply chain by mentioning the bot (e.g. @baibot Hello!). The following screenshots demonstrate this behavior:

🖼️ On-demand involvement in the room
🖼️ On-demand involvement in a thread (the Alice user in this example is not an allowed user, yet her messages are still considered as part of the conversation context)
🖼️ On-demand involvement in a reply chain (the Alice user in this example is not an allowed user, yet her messages are still considered as part of the conversation context)

💡 NOTE: Normally, the bot only considers messages from allowed 👥 Users and ignores all other messages when responding. However, when the bot is explicitly invoked (via mention) in a thread or reply chain, it will consider all messages in the thread and reply chain (even those from foreign users) as part of the conversation context.

🗣️ Text-to-Speech

Text-to-Speech is the bot's ability to turn text messages into voice messages.

It can be performed on the bot's own text messages (responses to yours due to 💬 Text Generation) and/or on your own text messages.

Text-to-Speech can be enabled to be done automatically or on-demand (only after reacting to a message with 🗣️), and is configurable for different message types (🪄 Bot Messages Flow Type vs 🪄 User Messages Flow Type).

By default, the bot doesn't perform text-to-speech. It can be configured for Seamless voice interaction, where you can speak to the bot (instead of typing) and then hear its responses.

Another use-case is to have the bot operate in Text-to-Speech-only mode.

You may also wish to see:

🛠️ Configuration / 🗣️ Text-to-Speech for configuration options related to Text-to-Speech
📖 Usage / 🗣️ Text-to-Speech section for more details on how to use the bot for Text-to-Speech in a room

Text-to-Speech-only mode

You may wish to have the bot automatically turn your text messages into voice messages, but without doing 💬 Text Generation.

Screenshot of Text-to-Speech-only mode - text messages are turned to audio and posted as a reply, without Text Generation happening

This could be useful in a room with others, where you'd like to post text messages and have people in the room consume them more easily (by listening to audio).

To allow for this use-case, you can:

disable 💬 Text Generation (via 💬 Text Generation / 🪄 Auto Usage setting): !bai config room text-generation set-auto-usage never
enable 🗣️ Text-to-Speech for user messages (via 🗣️ Text-to-Speech / 🪄 User Messages Flow Type): !bai config room text-to-speech set-user-msgs-flow-type always (or on_demand)

🦻 Speech-to-Text

Speech-to-Text is the bot's ability to turn voice messages into text.

Default flow for Speech-to-Text and Text-Generation - your voice messages are transcribed to text and then answered via Text Generation

The default flow is shown in the screenshot above: your voice messages are transcribed to text and 💬 Text Generation is performed. By default, the bot offers 🗣️ Text-to-Speech for its answers via a 🗣️ emoji. You can click it to trigger text-to-speech on-demand.

You may also configure the bot for Seamless voice interaction or Transcribe-only mode, etc.

You may also wish to see:

🛠️ Configuration / 🦻 Speech-to-Text for configuration options related to Speech-to-Text
📖 Usage / 🦻 Speech-to-Text section for more details on how to use the bot for Speech-to-Text in a room

Seamless voice interaction

The bot can perform seamless voice interaction (🗣️-to-🗣️), allowing you to speak to the bot (instead of typing) and then hear its responses.

Screenshot of the Seamless voice interaction mode - your voice messages are transcribed to text, then answered via Text Generation, and finally the answer is turned into a voice message

The flow is like this:

👤 You sending a voice message
🤖 The bot:

(default) first turning your voice message into text (🦻 Speech-to-Text) and posting it as a reply. This lets you you see what the bot heard.
(default) then answering in text (💬 Text Generation). This lets you read/skim text, if you so prefer.
(can be enabled) finally turning the answer's text into a voice message (🗣️ Text-to-Speech)

👤 You continuing the conversation via text or voice messages

⚠️ Certain clients (like Element) only support sending voice messages as top-level room messages, not as thread replies. Until this client limitation is fixed, Element users can only send the 1st message as a voice message - subsequent replies in the same conversation thread will need to be sent as text messages.

By default, the last part of the aforementioned flow is not enabled, because we assume a saner default is to reply with text and merely offer text-to-speech to those who want it. Offering is done by the bot reacting to its own message with 🗣️, and letting you click this emoji to trigger text-to-speech on-demand.

To enable automatic text-to-speech for the bot's messages, set the 🗣️ Text-to-Speech / 🪄 Bot Messages Flow Type setting to only_for_voice or always (e.g. !bai config room text-to-speech set-bot-msgs-flow-type only_for_voice).

Transcribe-only mode

If you'd like to have the bot only turn voice messages into text (without generating text messages or voice messages), you can configure the bot for that.

Screenshot of Transcribe-only-mode for Speech-to-Text - your voice messages are transcribed to text, and the bot does not generate text messages or voice messages

To operate in this mode, you can:

disable 💬 Text Generation (via 💬 Text Generation / 🪄 Auto Usage setting): !bai config room text-generation set-auto-usage never
adjust the 🦻 Speech-to-Text / 🪄 Flow Type setting to make the bot only transcribe (without doing 💬 Text Generation): !bai config room speech-to-text set-flow-type only_transcribe
optionally adjust 🦻 Speech-to-Text / 🪄 Message Type for non-threaded only-transcribed messages, if you'd like to bot to send messages of type notice (for better compatibility with other bots in the room) instead of sending regular text messages (default)

Image Generation

🖌️ Image Creation

Image creation is the bot's ability to create images based on text prompts.

See a 🖼️ Screenshot of the Image Creation feature.

You may also wish to see:

🛠️ Configuration / 🖌️ Image Generation for configuration options related to Image Generation
📖 Usage / Image Generation / 🖌️ Creating Images section for more details on how to use the bot for Image Creation in a room
🖌️ Image Editing - another image generation feature
🫵 Sticker Creation - a special case of Image Creation

🎨 Image Editing

Image editing is the bot's ability to edit images based on a prompt and one or more existing images.

See a 🖼️ Screenshot of the Image Editing feature (manipulating a single image) and a 🖼️ Screenshot of the Image Editing feature (manipulating multiple images).

You may also wish to see:

🛠️ Configuration / 🖌️ Image Generation for configuration options related to Image Generation
📖 Usage / Image Generation / 🎨 Editing images section for more details on how to use the bot for Image Editing in a room
🖌️ Image Creation - another image generation feature

🫵 Sticker Creation

Sticker generation is the bot's ability to generate sticker images based on text prompts. It's a special case of 🖌️ Image Creation.

See a 🖼️ Screenshot of the Sticker Creation feature.

See 📖 Usage / Image Generation / 🫵 Creating Stickers for details.

🔒 Encryption

Message exchange

The bot works in both unencrypted and encrypted Matrix rooms.

If configured, the bot can make use of Matrix's Secure Storage (Recovery) feature, so that it can restore its encryption keys even its local database gets lost.

Configuration

The bot also stores its 🛠️ configuration (both 📍 per-room and 🌐globally) in Matrix Account Data, which is generally stored as plain-text in the server.

To overcome this Matrix limitation, the bot can optionally encrypt the configuration data before storing it in Account Data. This allows for the bot to be used securely even against untrusted servers, without leaking sensitive configuration data to them.