RSS feeds and recommendations are back

October 21, 2007

I’ve re-enabled RSS feeds and book recommendations, though with a few stipulations.

* RSS feeds for wishlists, wishlists-related, and inventory, are all re-enabled. However, they are currently being refreshed every 12 hours. I’ll probably be making the refresh time a lot shorter, once I’m comfortable that it won’t impact the BookMooch web site’s performance.

* RSS feeds for search are no longer available, maybe permanently so. The reason? First, I found that the RSS feed for searches didn’t work right anyway: multiple-word searches didn’t work, and advanced search was iffy. Also, the way the RSS feed was returned was one-document-for-the-entire-search, which wasn’t very helpful, because what people really wanted was just the new hits for the search to be returned, so it would be easy to see what new books were available for that search. Since this RSS feed causes a lot of server load, and doesn’t work right anyway, I’ve retired it for now.

* Recommendations have been re-enabled, but they’re only displayed for books that have already had recommendations calculated for them. Recommendations are no longer calculated on-the-fly, as that’s what caused problems before, in that tons of books needed recommendations, lots of pages were hit in parallel, and wham! the server got overloaded trying to keep up. In the background, I have on “thread” making recommendations for all the books on BM, and once that completes, just about every book on BM will have a “recommended” line

For techies: I previously used a 1 megabyte in-memory cache to store RSS feeds. I found a few problems with this:
1) when I restarted the server, there was nothing in memory, so the server would get hammered for a while
2) a 1mb cache was woefully inadequate, with RSS feeds often being 100k large, that only stores 10 RSS feeds in the cache.

I devised a on-the-filesystem cache, so that an RSS feed is generated the first time it is requested, and saved to disk, whereupon successive requests for that RSS feed simply read the stored XML document. At the time the file is save, I create a scheduled task to delete the file in 12 hours, so that it’ll be regenerated.

I found that after running for 5 hours, 1.4 gigs of data has accumulated in the on-disk RSS cache, which shows that an in-memory cache would really not work. Another interesting fact: 2500 different RSS feeds were requested in that 5h period, so that gives us an idea how many people are using the feeds (not that many, but that’s decent).

Below is a chart of server load, before RSS was enabled, and then after I enabled them. You can see that RSS feeds pounded the server for quite a while, about 1 hour actually, but that once they all got cached, server load when way, way down, and is good now. As long as this continues to be the case, I’ll start to shorten the duration RSS files are cached, so that they’re more up-to-date. You’ll note, however, that the server load is permanently greater after the RSS feeds have been enabled, even after the cache has kicked in. The web site seems very responsive none the less, which is really what I care about.

Sload2

8 Responses to “RSS feeds and recommendations are back”

  1. Kirk said

    Bummer about the searches; I used them to find quite a few books by and about authors I’m interested in…

    I didn’t mind the one-doc per search, but what I didn’t get was that I’d get blank documents for some searches that didn’t turn anything up, but not always.

    But I liked the ability to create searches for specific authors or keywords. However, if that kills the server, then I certainly understand. Maybe Google alerts would do the same thing?

    Kirk

  2. John, are you using the 404-handler-as-cache-controller pattern for the RSS feeds? If you are, then serving the cached RSS should have minimal impact–except when a non-cached request needs to be fulfilled.

  3. Yes Dossy, I am using that pattern, and getting an astonishing 13,000 (!) page fetches per second with apache bench again aolserver with the cached file.

    Once I upgrade the memory on the bookmooch server to once again hold the entire database in memory, I can expire the file caches much more often, as the whole problem with generating my RSS feeds is file system i/o from fetching data from the database, and nothing else.

  4. re: Kirk & the searches

    My plan is to bring the search rss back at some point, but to redo it from scratch so it works as people expect it. Since it didn’t work right, and it loaded down the server, I thought I’d take it away for now. When I have less high priority stuff on my todo list, I’ll revisit this.

  5. Anna said

    Will pages without recommendations be getting them in batches? Or will we have to wait until they’re all good to go?

  6. Taneli T said

    Is there a reason why the orange feed icon has disappeared from people’s bio pages? I realize it’s still at the top of the actual inventory page, but it was very handy to have it in the bio page (one page less to load, when all you want is the feed url).

    Looking forward to seeing the enhanced search rss in the future. Even in it’s previous form it was very efficient way to get info on new books added.

    Thanks for bringing the inventory and wishlist feeds back.

  7. M. said

    PLEASE, may we have the RSS feeds for wishlists refreshing in real time again? The personalized recommendations are great, but we need the feeds much more! Wishlist notifications aren’t working as they should – I didn’t get notified of at least two books books becoming available though I was the only one wishing them. I know others (if not all) have this problem.

  8. Brian said

    Recommendations theoretically don’t change. If book A is related to book B, odds are they will always be related. What changes are availability and newly listed books not previously included in the calculations.

    To reduce the load to calculate whole new recommendations, but keep availability up to date, would it be possible to create an uber-list of all recommendations that’s more or less permanent (say, recommendations based on any book that was ever listed on BM ever), with the only thing done on the fly (or on some short interval regular basis) is to filter the list for current availability?

    Wow. That was a really long sentence. Here’s what I mean.

    Let’s say these are all the recommendations for book A (with asterisk indicating those that are currently available):

    B*
    C
    D*
    E*
    F
    G

    So at the moment recommendations for book A are requested, the cached list of B-G can be whittled down to just B, D, and E.

    To accomodate newly listed books, perhaps the uber-recommendation list would only need to be rebuilt monthly or even less frequently.

    Just a thought.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: