You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 2.6 KiB

5 years ago
5 years ago
5 years ago
5 years ago
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364
  1. # Feedcake
  2. ## „Gib mir ein Stück Kuchen und ich will den ganzen cake.“
  3. ### The Problem
  4. Most news platforms don't give you the full article via rss/atom.
  5. This wouldn't be a big problem. But some of them do crazy 1984-ish stuff on their
  6. websites or they have built up paywalls for users with privacy addons.
  7. ### Goal of this script
  8. Getting a full-featured news feed (full articles with images) from various
  9. news pages
  10. ### Benefits for the user
  11. * They don't need to go on the website to read the articles
  12. * No ads
  13. * No tracking
  14. ### Possible downsides for the user
  15. * articles don't get updated once they are scraped
  16. * articles arrive with some delay
  17. * interactive/special elements in articles may not work
  18. ### What it does
  19. * Fetching the news feed from the original website
  20. * scrape contents of new entries and save them into a directory structure
  21. * save a full featured RSS file
  22. ### ... and what it doesn't
  23. * Managing when it scrapes (use crontab or sth else for that)
  24. * serving the feeds and assets via HTTPS (use your favorite web server for that)
  25. * Dealing with article comments
  26. * Archiving feeds (But content and assets - but without meta data)
  27. * Using some sort of database (the file structure is everything)
  28. * Cleaning up old assets
  29. * Automaticly updating the basedir if it changed.
  30. ### Ugly stuff?
  31. * the html files (feed content) get stored along the assets, even if they don't
  32. need to be exploited via HTTPS.
  33. ### How to use
  34. * git clone this project and enter directory
  35. * install python3, pip and virtualenv
  36. * Create virtualenv: `virtualenv -p python3 ~/.virtualenvs/feedcake`
  37. * Activate your new virtualenv: `source ~/.virtualenvs/feedcake/bin/activate`
  38. * switch into the projects directory: `cd feedcake`
  39. * Install requirements: `pip3 install -r requirements.txt`
  40. * copy the config-example: `cp config-example.json config.json`.
  41. * edit `config.json`
  42. * copy the cron-example: `cp cron-example.sh cron.sh`.
  43. * edit `cron.sh`
  44. * make `cron.sh` executable: `chmod +x cron.sh`
  45. * add cronjob for `cron.sh`: `crontab -e`
  46. * `*/5 * * * * /absolute/path/to/cron.sh >> /path/to/logfile 2>&1`
  47. * setup your webserver:
  48. * let your webserver somehow point to the `feeds` directory.
  49. You should protect the http path with a basic authentication.
  50. * let the `assets_url` specified in the config point to the `assets` directory.
  51. * After running the script the first time, your desired feed is available at
  52. `base_url/destination` (e.g. `https://yourdomain.tld/some-url/newspaper.xml`)
  53. ### TODOs
  54. * Decide what should happen with old news articles and assets which are not
  55. listed in the current feed anymore.