Exo is an open-source tool that connects multiple personal devices, such as Apple Silicon Macs, into a unified AI cluster. It enables users to run frontier AI models locally, leveraging distributed computing capabilities across their hardware.

Who is exo designed for?

Exo is designed for individuals and developers who want to run large AI models locally using their existing hardware. It's particularly useful for those with multiple Apple Silicon devices looking to build a personal AI inference cluster at home.

How does exo differentiate itself from other local AI inference solutions?

Exo distinguishes itself through automatic device discovery and day-0 support for RDMA over Thunderbolt 5, which significantly reduces latency between devices. It also features topology-aware auto-parallelism and integrates with MLX for efficient distributed inference.

Under what circumstances should I consider using exo?

You should consider using exo when you need to run AI models that exceed the memory or computational capacity of a single device. It is ideal for achieving faster inference speeds by distributing models across multiple interconnected devices, optimizing resource utilization.

What is RDMA over Thunderbolt and how does exo utilize it?

RDMA (Remote Direct Memory Access) over Thunderbolt 5 is a technology that allows direct memory access between devices, bypassing CPU intervention. Exo utilizes this to achieve up to a 99% reduction in latency for communication within an AI cluster, crucial for efficient distributed model inference.

github.com · 26 NOV '24

exo

Exo lets you build a local AI cluster using everyday devices like Mac Minis, intelligently connecting them for larger models and faster inference via RDMA over Thunderbolt.

Visit github.com →

Questions & Answers

What is exo?: Exo is an open-source tool that connects multiple personal devices, such as Apple Silicon Macs, into a unified AI cluster. It enables users to run frontier AI models locally, leveraging distributed computing capabilities across their hardware.
Who is exo designed for?: Exo is designed for individuals and developers who want to run large AI models locally using their existing hardware. It's particularly useful for those with multiple Apple Silicon devices looking to build a personal AI inference cluster at home.
How does exo differentiate itself from other local AI inference solutions?: Exo distinguishes itself through automatic device discovery and day-0 support for RDMA over Thunderbolt 5, which significantly reduces latency between devices. It also features topology-aware auto-parallelism and integrates with MLX for efficient distributed inference.
Under what circumstances should I consider using exo?: You should consider using exo when you need to run AI models that exceed the memory or computational capacity of a single device. It is ideal for achieving faster inference speeds by distributing models across multiple interconnected devices, optimizing resource utilization.
What is RDMA over Thunderbolt and how does exo utilize it?: RDMA (Remote Direct Memory Access) over Thunderbolt 5 is a technology that allows direct memory access between devices, bypassing CPU intervention. Exo utilizes this to achieve up to a 99% reduction in latency for communication within an AI cluster, crucial for efficient distributed model inference.

exo

Questions & Answers

More from AI

llm-sanity-checks

Pocket TTS

Prompt caching: 10x cheaper LLM tokens, but how?

DINOv3

Jan.ai

Inception Labs