html-distance
This is a Go library for computing HTML page proximity using Charikar's simhash for similarity fingerprinting. It leverages a BK Tree to efficiently find pages within a defined proximity.
This is a robust Golang tool to convert HTML, even entire websites, into clean Markdown. I find it particularly neat for LLM website use cases where structured content is critical.
Visit github.com →