From 4025c6f580be5b53721986c9d42c8944330fa92d Mon Sep 17 00:00:00 2001 From: Andreas Demmelbauer Date: Thu, 4 Apr 2019 09:53:45 -0700 Subject: [PATCH] polishing --- README.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 1322606..34ed0ae 100644 --- a/README.md +++ b/README.md @@ -23,20 +23,22 @@ news pages ### What it does * Fetching the news feed from the original website * scrape contents of new entries and save them into a directory structure +* exclude articles if a string in the 'exclude' list is included in the title * save a full featured RSS file ### ... and what it doesn't -* Managing when it scrapes (use crontab or sth else for that) +* Managing when it scrapes (but install instructions for crontab are included) * serving the feeds and assets via HTTPS (use your favorite web server for that) * Dealing with article comments * Archiving feeds (But content and assets - but without meta data) * Using some sort of database (the file structure is everything) * Cleaning up old assets -* Automaticly updating the basedir if it changed. +* Automatically updating the basedir if it changed. ### Ugly stuff? * the html files (feed content) get stored along the assets, even if they don't need to be exploited via HTTPS. +* almost no exception handling yet. ### How to use * git clone this project and enter directory @@ -60,5 +62,6 @@ news pages `base_url/destination` (e.g. `https://yourdomain.tld/some-url/newspaper.xml`) ### TODOs +* Handle exceptions * Decide what should happen with old news articles and assets which are not listed in the current feed anymore.