Introducing PyFeeds: DIY Atom feeds in times of social media and paywalls

PyFeeds is an extensible tool written in Python that creates Atom feeds for websites which do not provide them at all or do not provide them with full text content. It is based on Scrapy and built with extensibility in mind. Currently it has support for Ars Technica, Facebook Pages, Linux Weekly News (LWN), Spotify Podcasts and Vice.com as well as news sites which cater to a German speaking audience.

PyFeeds is meant to be installed on your server and run periodically in a cron job or similar job scheduler.

The easiest way to install PyFeeds is via pip in a virtual environment. PyFeeds does not provide any releases yet, so one might directly install the current master branch:

$ git clone https://github.com/pyfeeds/pyfeeds.git
$ cd pyfeeds
$ python3 -m venv venv
$ source bin/activate
$ pip install -e .

After installation feeds is available in your virtual environment.

PyFeeds supports Python 3.5+.

To list all available spiders:

$ feeds list

Feeds allows to crawl one or more spiders without configuration, e.g.:

$ feeds crawl arstechnica.com

A configuration file is supported too. Simply copy the template configuration and adjust it. Enable the spiders you are interested in and adjust the output path where Feeds stores the scraped Atom feeds:

$ cp feeds.cfg.dist feeds.cfg
$ $EDITOR feeds.cfg
$ feeds --config feeds.cfg crawl

Point your feed reader to the generated Atom feeds and start reading. Feeds works best when run periodically in a cron job. Run feeds --help or feeds <subcommand> --help for help and usage details.

If you want to create a feed for a site that is not supported yet, take a look at this article from the official documentation: Writing a custom spider If there is already a feed and you only want to create a full-text feed out of it, you can use the generic spider that provides full-text extraction similar to the readability feature of Firefox.

Related Posts: