BookMooch was down

July 1, 2009

Hd23
BookMooch was unavailable today for 29 hours, starting from yesterday at 3am, coming back up this morning around 8am (Pacific time).

My main data drive crashed. Nonetheless, there doesn’t appear to be any data loss and BookMooch is now running with a new drive. However, if you think there’s a problem with your account please email the support volunteers and they’ll sort you out.


What went wrong

BookMooch is essentially one big database application, and the database server is where most of the work goes in running the site.

I used to have speed problems with the database, but in the fall of 2008 I changed the hard disk to a solid-state-drive, using what was the best at the time, a server-class drive from Mtron that was widely reviewed as very reliable, and benchmarked at about 5X faster than the fastest hard disks. Since the switch, BookMooch has largely been fast, and the increased speed has allowed me to add some new database-intense features, such as sorting the Topic pages so that popular books are shown first, making those pages much more useful.

Unfortunately, that Mtron solid-state-drive was one of the first generation of these server-class all-memory drives, and that’s what failed yesterday. It’s supposed to have all sorts of fancy features to prevent data loss, and while the drive did become non-writeable, it appears that there wasn’t any data loss (that was $800 well spent!).

In BookMooch’s first year, there were some (ahem) technical challenges, and so I had written programs to audit the database for corruption. I ran those same tests this morning:

1) check every user and book to make sure it is not corrupt

2) check every user and book in the most recent backup and make sure it is in the current database file, and if not, copy it (all were there)

3) check the references between users and books to make sure if you look at a book and it says: “on these wishlists:” or “available from:” that all the people really are listed. My “auditor program” found a few hundred books that didn’t have the links back from books->users, but these are probably not due to database corruption, but old bugs from a while ago. Those references have been fixed now, so it’s possible that some books will show up as being available where they weren’t before because of the missing reference, but again I think this is limited to maybe a few hundred books.


What about backups and fail-over?

Giving23
In February I asked people to volunteer to give a little of money to help BookMooch buy a new server. The really good news is that we’ve raised $15,449 in donations, and received another $6212 from Amazon for commissions on book sales from moochers. That puts BookMooch on a really solid financial footing, so I can afford to buy the hardware that is needed, and the occasional contracted-out-software-job, like the soon-to-be-released BookMooch iPhone app. Also FYI, I have put $31,900 of my own money into running BookMooch for 2 1/2 years, at which point your donations started covering the costs.

The money you’ve given me allowed me to buy a new server to run BookMooch on.

Currently, BookMooch runs a server which has:

* 8 CPUs
* 24 gigs of RAM
* one 32 gig Mtron solid-state drive
* one normal drive for backups

The new server has:

* 16 CPUs
* 64 gigs of RAM
* two 60 gig intel solid state drives
* one normal drive for backups

I had also purchased one extra 60 gig Intel solid state drive as a spare drive for the new server. That drive is what is now in the current BookMooch server. It was really good to have a spare around!

Newserver4 Newserver2 Newserver1

Some people have expressed surprise that a server can cost $10,000. I have pictures of the receipts for the machine above, and the price breakdown is:

* computer with 16 cpus: $6428
* 64 gigs of memory: $2087
* 3 solid state drives: $2691
* total = $11,202

Moving to “the cloud” doesn’t change the price of these components: you just end up paying for the same thing in monthly fees rather than up front.

I bought the new server about a month ago, unfortunately I’m having reliability problems with it (it crashes under heavy load) so I’m going back to the manufacturer (super micro) to get that fixed.

The new server will run the two drives in a RAID1 array, so that if one drive dies, the other has a complete, up-to-the-second copy of the same data. That *should* prevent the kind of problem that just occurred.

The new drives are also much larger, at 60gigs each. Since the BookMooch database is 24gigs in size, that means I’ll be able to make a live copy of the database on the solid state drive itself. This is important, because otherwise making a backup to a normal drive is slooooowww and causes “locking” problems with the database, since BookMooch is heavily used internationally, and doesn’t have a “slow period of the day”. Faster backups mean more frequent backups.

My plan is to keep the old server around as a fail-over, in case the new server itself dies.


Cloud235
Why aren’t you using S3 / Google Apps / the Cloud to host BookMooch

BookMooch is a very intense database-backed application. It handles 400,000 queries *per second* on a regular basis. The benchmarks I’ve seen for Google apps put it at 700 queries per second and S3 is at several thousand queries per second.

The best parallel I can give you is Amazon: the size of the database in BookMooch is on a scale similar to Amazon’s. Naturally, they have far more use than BookMooch, but the size of the database in BM is quite large and the operations it does are quite complicated.

So yes, I could move BookMooch to a service, but it would require lots and lots of machines, as well as much more complexity because there would then be many machines running BookMooch, with some of them having hardware failures of their own.

Also, and this is definitely a major issue, all the “cloud” vendors charge per-database-transaction, and so it’s likely that a very-database-intensive application like BookMooch could be very expensive to host in a cloud service.

My background is in high performance computing, so I guess you’ll have to just trust me on this one. If you read the academic computing literature in this field, you’ll find that very large memory-based caches are standard. For instance, Facebook runs memcached servers achieving 300,000 queries per second with 32 gigs to 64 gigs in memory per cache server. Generally, when your application is mostly database-blocked, you want lots of memory, and that’s a much cheaper way to go (up to a limit of about 64 gigs) than adding lots of cheap machines with not-much-memory together.

43 Responses to “BookMooch was down”

  1. John,

    I just have to say, you rock my world 🙂 You go with your bad computer lingo! Thanks for all you do!

    Jenn

  2. shaun said

    im glad to see the sites back up!!!

  3. Joyce said

    John, you definitely rock!

    I fear to make a suggestion as it is clear that you have considered state of the art solutions, but on the off change a suggestion might help, have you considered live replication as a way to keep real time copies of transactions? Replication logs can virtually guarantee no data loss in the event of a future crash.

  4. Yeah!! I was having the heebie-jeebies!! About to go down to the local pub and put my misery into a tankard or two!!
    Thanks, John et al for all you do.

  5. wolverinex said

    glad to see the site back. All seems ok and I can now send out the books which were requested….

  6. Kyle said

    My daughter and I enjoy bookmooch soooo much!! We are so thankful for all you do John! 🙂 Thank U very much! We missed it while it was down. Glad it’s back! 😀

  7. Cheryl said

    Thanks for the repair and the education!

    Cheers!

  8. Michelle said

    So good to have bookmooch back. I skimmed through your post – not at all my area – it sounds like you have a great handle on things. How can we get people to “give more”? That would have to help, right? I’m going to start by noting the option to “give a little” on my homepage.

    Thanks again for everything 🙂

    • Carla Hadley said

      As soon as I read Michelle’s post, I started wondering how I could personally encourage other moochers to “give a little”. I’ve decided to offer a 2-for-1 special to anyone who has the “I give a little” link – maybe that will encourage people who have been procrastinating! 🙂

      Thanks for all you do John!

  9. Maggie said

    Thanks for not only fixing so quickly, but with no data loss and then a very detailed explanation!

  10. Jimmye said

    John,
    You are a wizard! While your tech-speak is awe-inspiring, it’s also a little scary. Glad you have a handle on it.
    You keep pulling the levers and throwing the switches behind the curtain and my world is at peace! Thanks for all you do and give to keep Bookmooch up and running and the BEST site ever!

  11. Pat said

    An extremely fast recovery for a complex site! Keep doing what you’re doing because I doubt anyone else could have fixed it quicker.

  12. John, you rule. It amazes me how complex the site is to program — and how diligently you work to keep it the best site it can possibly be. The most amazing thing is how simple and friendly the site is to USE, in contrast to how complicated the programming must be! It’s like watching a fine dancer — the better they are, the easier they make fiendishly complicated things look. You program with the dedication and care that most people would attribute to fine artists. Thank you so much, and keep up the wonderful work. 🙂

  13. ermama said

    Thanks so much for getting BM back up and runnig so quickly. I have to admit I was going through withdrawal:)

  14. ermama said

    Thanks so much for getting BM back up and running so quickly. I have to admit I was going through withdrawal:)

  15. vandana said

    Thank you!

  16. Pat said

    Thanks for your hard work, John. I appreciate this site so much.

  17. Elwood said

    John,
    Can’t tell you thanks enough for all that you do and had to do to get bookmooch up and running again. Your site is awesome and it’s a great service to use to find books to read. I will donate to you soon, just so I can know that I’ve helped the cause a little. Thanks agian and keep up the fantastic job ! Everyone appreciates YOU and bokmooch more then you know !

  18. Irene said

    Thanks for keeping us updated and for bringing the site back online quickly. I missed it while it was down but can now rest easy…as I’m sure you can!!! Thanks for a terrific site!!!

  19. I really enjoyed your technical explanations. Thanks.

  20. Sophie said

    You’re my hero, John.

  21. April said

    Thank you so much for getting BM up and working again, John. I did not realize how addicted I was to this wonderful site until it went down.

    Again, thank you for all that you do.

  22. matt said

    Gosh! Reading your explanations was like watching an episode of Big Bang Theory! I didn’t get much of it but I enjoyed it very much.
    Thanks for being EVERY MOOCHER’S GEEKY HERO. 🙂

  23. Candace said

    I can’t tell you how lost I felt when I tried all morning yesterday to get on Bookmooch. I even emailed a couple of people I have met on Bookmooch to see if my computer was having a problem. They of course by that time saw your note on what was going on and so had I. I am excited it didn’t take 2-3 days. You did a great job getting us back on. I tell people from other sites that Bookmooch is the best program. You have got this down to a science. It blew me away when I read how many hits you get a second. I feel better now when sometimes I wait a few extra seconds. Keep it going, and hopefully it won’t happen again soon, or ever!

  24. Mimsy said

    Bookmooch is back! All is well in the world.

    John and the Bookmooch Team are now my new heroes. Thank you guys, for getting it back up again so fast. 🙂

  25. Margaret said

    John, I love BookMooch and You so much! This site is amazing; I’m spreading the word here in Oregon …. Thank you sincerely for everything you do; Thanks to you, I read 14 books last month and hope to double that this month!

  26. Pat36030 said

    I missed you so much! Glad you’re back! Thank you for all that you do!

  27. Asuetoo said

    Awsome job getting things back up again to all of you. Not an auspicious start to the 4th and John’s birthday weekend is it? Thanks guys!
    Asuetoo

  28. Penny Waugh said

    Thanks so much for getting the site going again, although the technical explanation went right over my head. Don’t know what I’d do without BookMooch now, though I haven’t bbeen here very long!

  29. Christina Spear said

    Thanks for getting BookMooch back up and running so quickly! I appreciate all that is done “behind the scenes”. I had one book to mail out meanwhile, so carried on with business until the site was back up. Glad to see everything back to normal again. Appreciate all your efforts John!

  30. Hercules40 said

    Thank you John: thank you for the transparency; thank you for the site; thank you for the hard work; thank you for the explanation.

    Cloud computing is not the answer to everything — I can attest to that by my current job. I remember years ago building multi-CPU servers and discussing the upper limit possibilities of how they could be used. BookMooch is the theory put in practice. I also believe you have a great handle of how to tackle the technical challenge in the future.

    For all these reasons, I will continue to “give a little” every month (for everything this site has given me).

    Thanks again John.

  31. tennantfamily said

    We really appreciate the work, thought and hours you put into BookMooch. Thank you for getting it back up so quickly, and for adding the blog message so we could see it was the site, not our home computer problem. As another favourite was also down at the same time I did worry it was us.
    A heartfelt thank you for all you do for all of us.

  32. bgmeyer said

    * computer with 16 cpus: $6428
    * 64 gigs of memory: $2087
    * 3 solid state drives: $2691
    * total = $11,202

    Price of having BookMooch for us all again: priceless!

    Thanks for everything you do, John.
    barb

  33. Yvonne Laffoon said

    I have to add my thanks to all the others! I really appreciate someone who can wrangle computers, servers, and tech-speak! I missed BM while it was down and rejoiced when it was back up, especially after expecting it to be down for days! You are a miracle worker and I’m glad you know what to do! Computer Man rides again!

    Yvonne

  34. Megan said

    Thank you for everything you do — I’d be lost without BookMooch! I didn’t even realize how much I rely upon it until I couldn’t get in whenever I wanted 🙂

  35. Heather19 said

    I’m not a tech person and sometimes don’t understand programmer-talk, but I *love* that you are sharing with us exactly what BookMooch is made of and what kinds of tech stuff goes into it. It’s nice to have a clearer picture of it.

  36. Veronica said

    I have just “given a little” because I think it’s important to support people who are putting in an enormous amount of work and not always being recognised, and I’ve had much reading pleasure from BookMooch in the relatively short time I’ve been a member. Not sure how long I will be able to continue giving, but every little bit helps!

  37. akashy said

    i missed bookmooch! am soo glad the site is back up. Thanks, john! keep up the good work. 🙂

  38. Ed said

    John,

    Appreciate the continual transparency and spirit with which you maintain this great site.

    Something does tell me that spending so much $ up front for equipment you have to manage yourself seems inefficient (for you mostly).
    Have you had any serious chats with techs over at Amazon about whether they can handle the # of transactions?
    I thought the whole point of cloud computing was to optimize the resources – I doubt that BM has loads that cannot be handled by cloud solutions.

    You can then remove yourself from the IT nightmares and focus more on BM strategy, UI, or whatever is on your priority list.

    Just a thought. If you want any help assessing any of this I can offer to spend some time on it for you just to have some #s to compare to. Perhaps it would be prohibitive to run it in the cloud, I don’t know but I’d be curious.

  39. Chrissy said

    I didn’t understand a single thing you wrote about why BookMooch was down, but I love that you tell us why things happen! You have built such a great community here in BookMooch and I love you for it!

  40. Absence makes the heart grow fonder! It is so wonderful to be part of the BM community!!!!! Thank you for all you do! 🙂

  41. Ed Hahn said

    John,

    I think what you are doing is phenomenal and I trust your choices completely.

    Thanks for doing what you do.

    Ed

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: