How to insert 100M rows into sqlite in 33 seconds — screenshot of avi.im

How to insert 100M rows into sqlite in 33 seconds

This article details a quest to insert 100M rows into SQLite in 33 seconds, exploring optimizations from Python (batching, pragmas, PyPy) to Rust. It's a solid dive into pushing SQLite performance, clearly demonstrating the impact of underlying language and database settings.

Visit avi.im →

Questions & Answers

What is this article about?
This article describes an experiment to achieve high-speed data insertion into SQLite, specifically aiming to insert one billion rows under a minute. It details various optimizations and language choices (Python, Rust) to significantly reduce insertion times for large datasets.
Who would find this guide useful?
This guide is useful for developers and database administrators who need to quickly populate SQLite databases with large amounts of test data or for applications where fast, non-durable inserts are acceptable. It targets those interested in SQLite performance tuning and language-specific optimizations.
How does this approach differ from typical SQLite usage?
This approach differs by explicitly compromising on durability and transactional safety for raw insert speed. It leverages aggressive SQLite pragmas (like journal_mode = OFF, synchronous = 0) and optimized code execution (batching, prepared statements, faster runtimes like PyPy or Rust) that are generally not recommended for production environments requiring data integrity.
When should these SQLite insertion techniques be applied?
These techniques should be applied when generating large test datasets where data loss upon crash is acceptable and durability guarantees are not required. They are ideal for rapid prototyping, performance benchmarking, or creating throwaway databases for development purposes.
What are some key SQLite optimizations mentioned?
Key SQLite optimizations include using batch inserts (e.g., 100,000 rows at once), disabling journal_mode and synchronous pragmas, increasing cache_size, and using EXCLUSIVE locking mode. These settings reduce disk I/O and overhead, but come at the cost of durability and concurrency.