選択できるのは25トピックまでです。 トピックは、先頭が英数字で、英数字とダッシュ('-')を使用した35文字以内のものにしてください。
Andreas Demmelbauer 7e0a95212c initial commit 5年前
public initial commit 5年前
.gitignore initial commit 5年前
README.md initial commit 5年前
config.example.json initial commit 5年前
feedcake.py initial commit 5年前
requirements.txt initial commit 5年前

README.md

Feedcake

“Gib mir ein Stück Kuchen und ich will den ganzen cake.”

The Problem

Most news platforms don’t give you the full article via rss/atom.
This wouldn’t be a big problem. But some of them do crazy 1984-ish stuff on their websites or they have built up paywalls for users with privacy addons.

Goal of this script

Getting a full-featured news feed (full articles with images) from various news pages

Benefits for the user

  • They don’t need to go on the website to read the articles
  • No ads
  • No tracking

Possible downsides for the user

  • articles don’t get updated once they are scraped
  • articles arrive with some delay
  • interactive/special elements in articles may not work

What it does

  • Fetching the news feed from the original website
  • scrape contents of new entries and save them into a directory structure
  • save a full featured RSS file

... and what it doesn’t

  • Managing when it scrapes (use crontab or sth else for that)
  • serving the feeds and assets via HTTPS (use your favorite web server for that)
  • Dealing with article comments
  • Archiving feeds (But content and assets - but without meta data)
  • Using some sort of database (the file structure is everything)
  • Cleaning up old assets
  • Automaticly updating the basedir if it changed.

Ugly stuff?

  • the html files (feed content) get stored along the assets, even if they don’t need to be exploited via HTTPS.

How to use

  • git clone this project and enter directory
  • install python3, pip and virtualenv
  • Create virtualenv: virtualenv -p python3 ~/.virtualenvs/feedcake
  • Activate your new virtualenv: source ~/.virtualenvs/feedcake/bin/activate
  • switch into the projects directory: cd feedcake
  • Install requirements: pip3 install -r requirements.txt
  • copy the config-example: cp config-example.json config.json.
  • edit config.json
  • copy the cron-example: cp cron-example.sh cron.sh.
  • edit cron.sh
  • add cronjob for cron.sh: crontab -e
    • */5 * * * * /absolute/path/to/cron.sh > /path/to/logfile 2>&1
  • setup your webserver: the base_url must point to the public directory
    You should add basic http authentication or at least keep the url private
  • After running the script the first time, your desired feed is available at base_url/destination (e.g. https://yourdomain.tld/some-url/newspaper.xml)

TODOs

  • Decide what should happen with old news articles and assets which are not listed in the current feed anymore.