ArchiveBox — screenshot of github.com

ArchiveBox

ArchiveBox is a robust open-source, self-hosted web archive solution. I'd use it to keep local copies of links from various sources, ensuring content preservation even if original sites vanish.

Visit github.com →

Questions & Answers

What is ArchiveBox?
ArchiveBox is an open-source, self-hosted web archiving tool that saves local copies of web content, including pages, images, and videos. It can ingest URLs from various sources like browser bookmarks, history, RSS feeds, or direct input. Users can view and manage their archived data through a web UI or directly in the filesystem.
Who is ArchiveBox intended for?
ArchiveBox is designed for individuals, researchers, or organizations who want to preserve web content locally and ensure access to it even if original sites go offline. It caters to those needing a personal web archive under their full control, rather than relying on third-party cloud services.
How does ArchiveBox compare to other web archiving tools?
ArchiveBox distinguishes itself by focusing on self-hosting and offering a comprehensive set of archiving methods, including Wget, Chromium, and YouTube-DL, to capture diverse content types. Unlike cloud-based services, it provides users with complete control over their archived data and storage. It supports a wide array of input sources for URLs, making it highly versatile.
When should I consider using ArchiveBox?
You should use ArchiveBox when you require a resilient, offline-accessible personal archive of web pages, articles, videos, or social media posts. It is particularly useful for historical preservation, research, or ensuring continued access to important online resources, especially if you anticipate content might be removed or altered online.
What are the technical requirements and input methods for ArchiveBox?
ArchiveBox officially supports macOS, Ubuntu/Debian, and BSD, and can run on any system that supports Docker and/or Python. It allows for importing URLs from numerous sources, including browser bookmarks (Chrome, Firefox, Safari, Opera, IE), browser history, RSS feed URLs, Pocket, Pinboard, Instapaper, Reddit Saved Posts, and plain text files via stdin or as arguments.