Stream Video provides a Vision Agents, an open-source Video AI framework built from the ground up to enable developers to build low-latency voice and vision applications running on the edge.Cartesia is available as an official text-to-speech (TTS) plugin. Their “Simple
Agent”
GitHub example or their voice and
video guides are great for getting started.