# web-archiver

**Repository Path**: mirrors_schollz/web-archiver

## Basic Information

- **Project Name**: web-archiver
- **Description**: A tiny Python clone of https://archive.org/web/ for your own personal websites.
- **Primary Language**: Unknown
- **License**: MIT
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2020-09-25
- **Last Updated**: 2025-12-14

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# web-archiver
A tiny Python clone of https://archive.org/web/ for your own personal websites.

To use simply install

```
$ pip install -r requirements.txt
```

and then add your sites into the file ```sites```. Then to run just used

```
$ python run.py
```

To check out your files, goto the ```output``` directory and use

```
$ python3 -m http.server
```

# To-do

- Archive site with ```wget --recursive --no-clobber --page-requisites --html-extension --convert-links --restrict-file-names=windows --domains DOMAIN.COM  http://DOMAIN.COM/ with date```
- Take screenshot with
```
from selenium import webdriver

browser = webdriver.Firefox()
browser.get('http://www.google.com/')
browser.save_screenshot('screenshot.png')
browser.quit()
```
- Generate site to be able to traverse the sites easily. Index page for all pages. One index page for each web archive with screenshots/dates that take you to the actual page.