扒网站
如何扒一个网站
ArchiveBox工具
互联网存档解决方案,空间 用于收集、保存和查看要脱机保留的网站。
使用
- adding links to archive
archivebox add ‘https://example.com’ # add URLs one at a time archivebox add < ~/Downloads/bookmarks.json # or pipe in URLs in any text-based format archivebox schedule –every=day –depth=1 https://example.com/rss.xml # or auto-import URLs regularly on a schedule
- viewing the archived content archivebox server 0.0.0.0:8000 # use the interactive web UI archivebox list ‘https://example.com’ # use the CLI commands (–help for more) ls ./archive/*/index.json # or browse directly via the filesystem