Blog's control panel: | Home | Tags | Index | Rss 2.0

Rss2Email: improve your newsfeed experience.

Fri, 06 Apr 2007 | Permalink | Tags: , , ,

As for mailinglists Rss feeds are also subjected to possibly long reading delays. Surely there is little to do if you physically lack the time , but then maybe you can make it a lot quicker. I could, and did.
My first problem was to have to deal with a news reader application: lots around, none CLI/ncursed based that is actually feature-rich, with snownews probably being the best one . I had the same problem when forced by some costumer to install an IM application: all my live discussions happen on IRC so I've got my irc client tuned to fit all my needs and make things as quick and easy as possible. The new IM client was just painful, totally different shortcuts, plugins behaving differently even if solving the same problem, and so on. Plus more space consumed on my desktop. It's been a big win when I discovered bitlbee and managed to integrate IM into my IRC client.

If you think about it, a newsfeed is pretty much like a moderated mailing list whose only moderator[s] can post to. Given this similarity the goal is to integrate rss with email, and use the email client, Mutt in my case, to deal with both like mailinglists.
There are also a tons extras you get for free. Depending on your settings your rss might be now checked for spam, for duplicates, get archives and backups, tags, and many other nice things advanced MUA like Mutt can offer.

Currently I'm using Mutt + Procmail + fetchmail + postfix to deal with my email, and snownews for Rss. All I had to do was to install rss2email as described in the Docs and in this article. If you want to have a look at my config you can find it here.
Next thing you want to do is to import all the feeds from snownews. You can use this bash script:
for feed in `awk -F\| '/^http/{ print $1 }' < ~/.snownews/urls`; do 
    r2e add "$feed"
done
Path to the urls file may vary or you might want to extend that grep to include other protocols.
R2E doesn't provide a delete all, so in case you want to flush the whole database in a go you can use this script:
IFS=$'\n'; for i in `r2e list | grep -v email | sort -r -n -k1`; do
	n=`echo  $i | cut -f1 -d:` ;  
	r2e delete $n ; 
done
At this point if everything including postfix is configured properly, feeds should be fetched and delivered to you as emails.

The last step is to configure procmail to handle the new messages. At the moment I've got a directory called lists under which I have a folder per mailing list named following a $domain.$listname pattern:

spike in space ~ > ls mail/lists/
list-id.securityfocus.com.bugtraq       
list-id.securityfocus.com.firewalls   
list-id.securityfocus.com.focus-ids    
list-id.securityfocus.com.focus-linux  
list-id.securityfocus.com.forensics    
list-id.securityfocus.com.linux-secnews
list-id.securityfocus.com.secureshell  
...
That can be handled very elegantly in procmail with a single rule:
:0:
* ^List-Id:[\ ]*[^<]*<\/[^\>]*
lists/`echo $MATCH | tr A-Z a-z | \
	    sed -e 's/\([^\.]*\)\.\([^>]*\)/\2\.\1/'`/
On a similar basis I created a directory called rss and configured the following rule:
:0:
* ^User-Agent: rss2email
* ^From: \/[^<]*
rss/`echo $MATCH | tr A-Z a-z | \
	  sed -e    's/\./ /g;
                    s/[ ]\+[[:punct:]]\+[ ]\+/ /;
                    s/[\\\/]\+/ /g;
                    s/^[[:space:]]*//;
                    s/[[:space:]]*$//;
                    s/[[:punct:]]*//g;
                    s/ /\./g'`/
Compared to normal mailinglist messages there isn't a List-Id header and so From is used. From is defined as: ml_name <ml_addr>. Unfortunately not all the Rss have a proper mail address specified, most dont, so that's replaced with the one you specified as default in the config file. Thus we cant use the address within < and >, the name before that should be used instead.
Here a quick sshot:
spike in space ~ > ls mail/rss
adminspotting.net                                 
caffeinated                                      
debian.gnulinux.system.administration.resources  
del.icio.us.tag.sysadmin                             
joel.on.software                                 
kerneltrap.your.source.for.current.kernel.news  
librenixcom.linux.sysadmin.central              
linux.journal                                    
....

All, done, just configure r2e to run from cron and enjoy your rss with the power of your mail client!
Futher integration with Mutt can be achieved by runnin r2e when a checkmail is executed, and by using send-hooks and an external script one could even post comments/pingback just by replaying to the post (depending on the blog engine this can be seriously easy). If by any chance I get some interest on this I might even implement it.



SpikeLab.org is a Filippo Spike Morelli copyright 2005-2008
This work is licensed under Creative Commons Att-SA License.