RSS and recommendations off for now

October 19, 2007

Slow1-1
The recent New York Times article on BookMooch has greatly increased the daily hits and size of the database, and so BookMooch is going to run slowly for a few days.

I’ve ordered more memory for the server, which should solve the speed problem, but it’s “special” memory that needs to be manufactured (because I’m using a 64bit server with 8 CPUs) so it’ll probably take until the end of next week to get the memory upgrade up.

In the meantime, I’ve disabled:
a) RSS feeds (that’s about 50% of the hits on BookMooch these days)
b) Recommended books

That seems to have immediately greatly helped the speed. I have some ideas about how to improve the performance of RSS, so I’m going to try to restore them tomorrow.

Here’s an interesting chart:

Sload

you can see 10 minutes of running BookMooch normally, compared to 10 minutes with RSS disabled. The server load plummets, and the web site is really fast.

-john

13 Responses to “RSS and recommendations off for now”

  1. Greg said

    I’m glad the sites growing!!

  2. Liz said

    Hey, I read the article. It’s pretty intersting and Iwas able to use it in a homework assignment which is even better. Thanks for all you hard work, John!

  3. Kirk said

    I wonder if there aren’t lots of people hitting RSS feeds every few minutes. My reader is set to a 30-minute refresh, for the very reason that it’s easy to overload servers.

    I don’t suppose there’s any way you can throttle RSS access like that?

    Kirk

  4. Mark Murphy said

    Forgive me if you’ve already thought of this, and further forgive my focus on wishlist feeds for the examples…

    My guess, based on that graph, is that you’re dynamically building the RSS feeds upon request.

    Inventory changes “on the margins”, meaning only a tiny percentage of people have a match when a book gets added to inventory, at least for most of the books added to inventory.

    You may get better results from a semi-static approach to the RSS feeds. Rather than dynamically compute them upon request, just serve up a static file per feed. Initially, that static file is created using the same generation logic you’re using today, turned into a file via wget or curl or something. Then, only re-run the generation logic for the member if there’s been a change to their state.

    For example, suppose another copy of The Kite Runner comes in. It’s probably a fairly quick database operation to set an isFeedDirty flag to true for the such-and-so members who have The Kite Runner in their wishlist. So you update that flag for those members as you add The Kite Runner to inventory. Then, a scheduled task (e.g., cron job) rebuilds the static feeds for those members whose isFeedDirty flag is true. Maybe you also sweep and rebuild everyone’s feeds once per night during a period of low server activity, for things that the isFeedDirty trick misses by accident.

    Even if you run that scheduled task fairly frequently (e.g., every few minutes), you’re probably still ahead of the game, because you’re only doing the complicated work of feed-building for those members with changes, not every member requesting feeds. Also:

    — It’ll be fairly easy to offload the feed generation to another machine, just hitting the same database

    — You can even offload the static feeds themselves to another location (e.g., Amazon S3) through an appropriate domain name, if you are short on bandwidth or disk space, or if ordinary static Web traffic is bogging down the server

    Just two cents’ worth from an early site member and big BookMooch fan…

  5. Karen H said

    I thought that with the new system when I got an e-mail that a book on my wishlist was available it was sent to me two hours after it was posted and I had a 4 hour window to get it, but this isn’t true, I haven’t been able to mooch any, they are all gone. I’m very frustrated!

  6. yo said

    Use feedburner. Free bandwidth.

  7. Tony said

    The significant increase of the RSS requests came about due to the wishlist notification modification. I warned of it in the forums, but unfortuantely it was not caught in time. Now that the cat is out of the bag on how useful RSS is, the number of hits will just increase(once it is back on.)

    Hopefully your change of the RSS will be able to handle the ever increasing hits.

  8. Jenny said

    Hi John
    Could you post a link to the New York Times article as I would be interested to have a look.
    Jenny

  9. Kirk said

    Many of the RSS feeds are custom searches people set up, so you can’t make static documents for them; the same is true for wishlist feeds. As for bandwidth, that’s not the problem – it’s the server hit for processing.

    Kirk

  10. Taneli T said

    For me, RSS feeds are a great way to track those inventories that have proved interesting to me in the past (they’ve either sent me a book or I’ve sent them one) as well as tracking books by subject/keywords (since I’m mainly interested in film related books). Whatever happens, I hope these two things will remain, because these alone have made BookMooch so much more enjoyable and valuable.

    I was about to suggest the same what Kirk’s already said: to ease the RSS server load, maybe there should be a hard limit how often people can query the BookMooch RSS feeds (if it’s possible), so that nobody would update their feeds every minute or so.

    Maybe the limit should be set to 15 minutes or to 20 minutes? I use a RSS reader where the fastest refresh rate is every 30 minutes, and from the user’s perspective I think that’s ok. I would settle for even longer periods as long as the subject/keyword RSS just stays in place.

  11. Ben said

    Meh I never really got recommendations to work correctly for me so not a big deal. Besides fast bookmooch is much more preferable to slow bookmooch.

  12. Mark Williams said

    My requests are way down the last 24 hours(I usually get 12+ per day), and I assume those of others are as well. Some of the above posts seem to be suggesting it is the feeds themselves producing all the server load, but it is also the resultant mooching and browsing which follows an initial RSS hit.

    Most of us use the feeds to browse the inventories of our regular friends and newbies alike, and without this are far less likely to mooch; I think 50% is probably a fair guess at the number of mooches which initiate as a RSS hit, but I suppose mooching stats will bear this out, or not.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: