Pocket TTS — screenshot of kyutai.org

Pocket TTS

Pocket TTS is a small English text-to-speech model with voice cloning. It's designed for local deployment and runs efficiently on a laptop's CPU.

Visit kyutai.org →

Questions & Answers

What is Pocket TTS?
Pocket TTS is a 100-million parameter text-to-speech model featuring voice cloning capabilities. It is designed to run efficiently on a CPU, allowing for local deployment and real-time operation.
Who would benefit from using Pocket TTS?
Pocket TTS is ideal for developers, researchers, or users who require a high-quality text-to-speech solution with voice cloning that can operate entirely on local hardware without cloud dependencies.
How does Pocket TTS compare to other text-to-speech models?
Unlike many larger text-to-speech models that require powerful GPUs or cloud services, Pocket TTS is optimized to run on a standard laptop CPU in real time. Its smaller 100M-parameter size enables local deployment while still offering high-quality output and voice cloning.
When should I consider using Pocket TTS?
Pocket TTS is suitable for applications where privacy is a concern, internet connectivity is limited, or low-latency local processing is required. It's particularly useful for offline text-to-speech generation and voice cloning tasks on personal devices.
How can I install and run Pocket TTS locally?
To install Pocket TTS, you can use `uv`. After installation, it can be served via `uvx pocket-tts serve` for an API or `uvx pocket-tts generate` for a command-line interface, providing local access to its functionalities.