Design
======

This is a quick prototype that turned out to be quite usable. The
design is minimal: some home-made ORM for the feed storage, crude
parallelism with the ``multiprocessing`` module and a simple plugin
API using ``importlib``.

More information about known issues and limitations in the
:doc:`usage` document.

Plugin system
-------------

Plugins are documented in the :doc:`plugins` section. You can also
refer to the :ref:`writing-plugins` section if you wish to write a new
plugin or extend an existing one.

The plugin system uses a simple :mod:`importlib` based architecture
where plugin are simple Python modules loaded at runtime based on a
module path provided by the user. This pattern was inspired by a
`StackOverflow discussion <http://stackoverflow.com/questions/932069/building-a-minimal-plugin-architecture-in-python>`_.

The following options were also considered:

  - `pluggy`_: used by py.test, tox and devpi
  - `yapsy`_
  - `PluginBase`_
  - `plugnplay`_
  - `click-plugins`_: relevant only to add new commands
  - `PyPA plugin discovery`_

.. _pluggy: https://github.com/pytest-dev/pluggy
.. _yapsy: http://yapsy.sourceforge.net/
.. _PluginBase: http://pluginbase.pocoo.org/
.. _plugnplay: https://github.com/daltonmatos/plugnplay
.. _click-plugins: https://github.com/click-contrib/click-plugins
.. _PyPA plugin discover: https://packaging.python.org/guides/creating-and-discovering-plugins/

Those options were ultimately not used because they add an aditionnal
dependency and are more complicated than a simple ``import``. We also
did not need plugin listing or discovery, which greatly simplifies our
design.

There is some code duplication between different parts (e.g. the
:func:`feed2exec.plugins.output` and :func:`feed2exec.plugins.filter`
plugin interfaces, the ``maildir`` and ``mbox`` plugins, etc), but
never more than twice.

Concurrent processing
---------------------

The threading design may be a little clunky and is certainly less
tested, which is why it is disabled by default (use ``--parallel`` to
use it). There are known deadlocks issues with high concurrency
scenarios (e.g. with ``catchup`` enabled).

I had multiple design in minds: the current one
(``multiprocessing.Pool`` and ``pool.apply_async``) vs ``aiohttp`` (on
the ``asyncio`` branch) vs ``pool.map`` (on the ``threadpoolmap``
branch). The ``aiohttp`` design was very hard to diagnose and debug,
which made me abandon the whole thing. After reading up on `Curio`_
and `Trio`_, I'm tempted to give async/await a try again, but that
would mean completely dropping 2.7 compatibility. The ``pool.map``
design is just badly adapted, as it would load all the feed's
datastructure in memory before processing them.

 .. _Curio: http://curio.readthedocs.io/
 .. _Trio: https://github.com/python-trio/trio

.. _testsuite:

Test suite
----------

The test suite is in ``feed2exec/tests`` but also as doctest comments
in some functions imported from the `ecdysis`_ project. You can run
all the tests with `pytest`_, using, for example::

  pytest-3

This is also hooked into the ``setup.py`` command, so this also works::

  python3 setup.py test

Note that some tests will fail in Python 2, as the code is written and
tested in Python3. Furthermore, the feed output is taken from an up to
date (5.2.1) feedparser version, so the tests are marked as expected
to fail for lower versions. You should, naturally, run tests before
submitting patches.

.. _pytest: http://pytest.org/
.. _ecdysis: https://gitlab.com/anarcat/ecdysis

The test suite also uses the `vcrpy
<https://pypi.python.org/pypi/vcrpy>`_ module to cache HTTP
requests. This tool caches HTTP requests locally so the test suite can
run offline. To add a new network test, you can simply add a new test
doing requests with the right decorator, and a new recording will be
added to the source tree. We commit the recordings in git so the test
suite actually runs offline, so be careful about the content added
there. Ideally, the license of that content should be documented in
``debian/copyright``.

`betamax <https://pypi.python.org/pypi/betamax>`_ was also
considered but requires a refactoring of *all* requests to use session
objects. This would have the added benefit of allowing a custom user
agent, so it is still considered and is a work in progress in the
`betamax` branch. The current approach on that branch uses a global
``session`` object which is problematic: a better approach may be to
encapsulate this in a ``FeedFetcher`` or simply ``Feed`` object, at
which point we would end up rearchitecturing the whole ``feeds.py``
file...

Comparison
----------

``feed2exec`` is a fairly new and minimal program, so features you may
expect from another feed reader may not be present. I chose to write a
new program because, when I started, both existing alternatives were
in a questionable state: feed2imap was mostly abandoned and
rss2email's maintainer was also unresponsive. Both were missing the
features I was looking for, which was to unify my feed parsers in a
single program: i needed something that could deliver mail, run
commands and send tweets. The latter isn't done yet, but I am hoping
to complete this eventually.

The program may not be for everyone, however, so I made those
comparison tables to clarify what feed2exec does compared to the
alternatives.

General information:

========= ======= ==== ==== ========
Program   Version Date SLOC Language
========= ======= ==== ==== ========
feed2exec  0.5    2017 1417  Python
feed2imap  1.2.5  2015 3249  Ruby
rss2email  3.9    2014 1986  Python
========= ======= ==== ==== ========

 * version: the version analysed
 * date: the date of that release
 * SLOC: Source Lines of Codes as counted by sloccount, only counting
   dominant language (e.g. excluding XML from test feeds)
 * Language: primary programming language

Delivery options:

========= ======= ==== ==== ==== ======== ====
Program   Maildir Mbox IMAP SMTP sendmail exec
========= ======= ==== ==== ==== ======== ====
feed2exec    ✓     ✓    ✗     ✗     ✗      ✓
feed2imap    ✓     ✗    ✓     ✗     ✗      ✗
rss2email    ✗     ✗    ✓     ✓     ✓      ✗
========= ======= ==== ==== ==== ======== ====

 * maildir: writing to `Maildir`_ folders. r2e has a `pull request
   <r2e-maildir>`_ to implement maildir support, but it's not merged
   at the time of writing
 * IMAP: sending emails to IMAP servers
 * SMTP: delivering emails over the SMTP protocol, with authentication
 * sendmail: delivering local using the local MTA
 * exec: run arbitrary comands to run on new entries. feed2imap has a
   ``execurl`` parameter to execute commands, but it receives an
   unparsed dump of the feed instead of individual entries. rss2email
   has a postprocess filter that is a Python plugin that can act on
   indiviual (or digest) messages which could possibly be extended to
   support arbitrary commands, but that is rather difficult to
   implement for normal users.

 .. _Maildir: https://en.wikipedia.org/wiki/Maildir
 .. _r2e-maildir: https://github.com/wking/rss2email/pull/21

Features:

========= ======= ==== ===== ====== ====== ===== ======
Program   Pause   OPML Retry Images Filter Reply Digest
========= ======= ==== ===== ====== ====== ===== ======
feed2exec    ✓     ✓     ✗     ✗       ✓     ✓     ✗
feed2imap    ✗     ✓     ✓     ✓       ✓     ✗     ✗
rss2email    ✓     ✓     ✓     ✗       ✓     ✓     ✓
========= ======= ==== ===== ====== ====== ===== ======

 * pause: feed reading can be disabled temporarily by user. in
   feed2exec, this is implemented with the ``pause`` configuration
   setting. the ``catchup`` option can also be used to catchup with
   feed entries.
 * retry: tolerate temporary errors. For example, ``feed2imap`` will
   report errors only after 10 failures.
 * images: download images found in feed. ``feed2imap`` can download
   images and attach them to the email.
 * filter: if we can apply arbitrary filters to the feed
   output. feed2imap can apply filters to the unparsed dump of the
   feed.
 * reply: if the generated email 'from' header is usable to make a
   reply. ``rss2email`` has a ``use-publisher-email`` setting (off by
   default) for this, for example. feed2exec does this by default.
 * digest: possibility of sending a single email per run instead of
   one per entry

.. note:: ``feed2imap`` supports only importing OPML feeds, exporting
          is supported by a third-party plugin.
