- A summary of my bot defence systems
- Butlerian Jihad - Blog posts on the topic of fighting off spam bots, search engine spiders and other non-humans wasting the precious resources we have on Earth
- EmacsWiki's robots.txt
VirtualTam's bookmarks
-
-
2025-04-09 Web page archive formats:
Tools for crawling, scraping and archiving Web pages:
- internetarchive/heritrix3 - Extensible, web-scale, archival-quality web crawler project (Java)
- internetarchive/Zeno - State-of-the-art web crawler (Go)
- internetarchive/gowarc - Read and write WARC files in Go
- webrecorder/pywb - Web Archiving Toolkit for replay and recording of web archives (Python)
Self-hosted solutions:
- ArchiveBox - A self-hosted app that lets you preserve content from websites in a variety of formats
- Wallabag - Save and classify articles, read them later
-
Consume any Web site / HTML source, also when an RSS / Atom feed is not available.
-
- https://www.quora.com/What-are-good-Python-interview-questions
- https://www.reddit.com/r/Python/comments/1knw7z/python_interview_questions/
- https://resources.workable.com/python-developer-interview-questions
- https://stackoverflow.com/questions/2573135/python-progression-path-from-apprentice-to-guru/2576240#2576240
- https://docs.python.org/3/tutorial/
- https://www.reddit.com/r/Python/comments/6v0amj/the_more_i_learn_about_python_the_more_i_realized/
- https://github.com/00111000/Imports-in-Python
Language:
- GIL
- memory management, object references
- data structures
- duck typing
- monkey patching
- generators
- list & dict comprehensions
- decorators
- introspection
- ...
Development tools:
- debugging
- packaging
- lint
- testing: unitary, functional, TDD, BDD
Libraries:
- data parsing
- data scraping
- database management
- http requests
- web frameworks
- ...