Browse Source

polishing

master
Andreas Demmelbauer 5 years ago
parent
commit
4025c6f580
1 changed files with 5 additions and 2 deletions
  1. +5
    -2
      README.md

+ 5
- 2
README.md View File

@@ -23,20 +23,22 @@ news pages
### What it does ### What it does
* Fetching the news feed from the original website * Fetching the news feed from the original website
* scrape contents of new entries and save them into a directory structure * scrape contents of new entries and save them into a directory structure
* exclude articles if a string in the 'exclude' list is included in the title
* save a full featured RSS file * save a full featured RSS file


### ... and what it doesn't ### ... and what it doesn't
* Managing when it scrapes (use crontab or sth else for that)
* Managing when it scrapes (but install instructions for crontab are included)
* serving the feeds and assets via HTTPS (use your favorite web server for that) * serving the feeds and assets via HTTPS (use your favorite web server for that)
* Dealing with article comments * Dealing with article comments
* Archiving feeds (But content and assets - but without meta data) * Archiving feeds (But content and assets - but without meta data)
* Using some sort of database (the file structure is everything) * Using some sort of database (the file structure is everything)
* Cleaning up old assets * Cleaning up old assets
* Automaticly updating the basedir if it changed.
* Automatically updating the basedir if it changed.


### Ugly stuff? ### Ugly stuff?
* the html files (feed content) get stored along the assets, even if they don't * the html files (feed content) get stored along the assets, even if they don't
need to be exploited via HTTPS. need to be exploited via HTTPS.
* almost no exception handling yet.


### How to use ### How to use
* git clone this project and enter directory * git clone this project and enter directory
@@ -60,5 +62,6 @@ news pages
`base_url/destination` (e.g. `https://yourdomain.tld/some-url/newspaper.xml`) `base_url/destination` (e.g. `https://yourdomain.tld/some-url/newspaper.xml`)


### TODOs ### TODOs
* Handle exceptions
* Decide what should happen with old news articles and assets which are not * Decide what should happen with old news articles and assets which are not
listed in the current feed anymore. listed in the current feed anymore.

Loading…
Cancel
Save