Web page archive formats:
Tools for crawling, scraping and archiving Web pages:
- internetarchive/heritrix3 - Extensible, web-scale, archival-quality web crawler project (Java)
- internetarchive/Zeno - State-of-the-art web crawler (Go)
- internetarchive/gowarc - Read and write WARC files in Go
- webrecorder/pywb - Web Archiving Toolkit for replay and recording of web archives (Python)
Self-hosted solutions:
- ArchiveBox - A self-hosted app that lets you preserve content from websites in a variety of formats
- Wallabag - Save and classify articles, read them later