A New Server for BookMooch

September 23, 2007

Xeon
BookMooch is now running a new server. I’m hoping this solves the speed problems for a good while.

I emptied out the full text index, so I could rebuild it from scratch, which it is now doing. I think it could take as much 3 days for all the books to re-appear in the full text searching. Searches will return only a partial catalog of books until this reindexing is complete.

For the geeks: the reason I switched to a new server is that after rewriting the database layer myself in C this August, the CPU was no longer what was slowing BookMooch down. Instead, the server was spending 80% of its time waiting for the disk drive. Switching to a faster disk drive (actually, 3 of them) with large caches helped a bit, but not that much.

So, I wanted to move to a 64 bit operating system, so I could tell my hand-written database layer to cache the entire database in memory. Database writes invalidate the cache, and write to disk via a buffer every 60 seconds. When BookMooch starts up, I now do an entire table scan of the user and book database tables, so that they reside in memory. Those two BookMooch tables currently fit inside 3.5gb of RAM, and they’re the most used tables.

The new server is:
– running Debian 4.0, 64 bit version
– has 8gb of RAM
– has one 250gb SATA drive
– has 2 XEON CPUs of 4 cores each (ie, it acts like 8 CPUs)

BookMooch uses aolserver as its web server, is written in Tcl with portions written in C (such as the database layer). Other than ImageMagick to resize people’s uploaded photos, and TclDOM for reading the XML returned by Amazon and OCLC, there are no libraries in BookMooch: everything is custom written for BookMooch itself.

Benchmarks: A really complicated page, such as a full text search for “cards” (which has several thousand database lookups), currently takes 0.16 seconds to render, and apache bench can pull 85 pages per second of that page (ie, the multiple CPUs kick in). A simple page, such as a book details page, takes 0.0025 seconds to render

Tr
I get a lot of questions at conferences about how BookMooch manages to be in several languages, and keep up to date with them.

I call my technique “wiki + machine translation”, and here is how it works:

1) every piece of text on the web site needs to be a function call, and break up paragraphs into separate chunks

2) use a separate domain name for each language (ie http://fr.bookmooch.com)

3) this function looks at the domain name, and if that domain name is english, then the function returns what it was given (ie, everything is first written in english), that way the main language used is no database load

4) if you’re in a non-english language, look for that string in the translation database table you have. If you don’t have that text translated, then have google translate it, and store that in your database, displaying that to the user

5) for a single page a user requests, store in a global variable all the strings that were translated. Assign that information a unique ID, and keep that data around at least 10 mins, possibly in a database. This is important, because sometimes there is if/then logic on a page, and different text gets translated for a user’s case.

6) put a “correct this translation” button on each non-english page, pointing to that unique ID

7) make a translation correction page, that stores these changes in the database

8) user can now see the translation live on the site

Jp1Manga
I’ve been taking a few days off from BookMooch to clear my head, and that’s been a good thing because it’s allowed me to fix some tough problems that my usually addled brain can’t solve.

Specifically:

1) Japanese character sets are finally working. If you search for something at amazon.co.jp on the current BookMooch search page, you’ll see a bunch of “noise” rather than Japanese character sets. I figured out what the problem was, two days ago, just working a bit after dinner. It took me only 30 minutes with a clear head! The funny thing is that I’ve worked on this for days without any success, so a holiday really can pay off.

2) the new 64bit server is about ready to be used. BookMooch is running fine on it, and it’s much much faster, from about 3 pages per second capacity on the current system, to about 125 pages per second on the new one. For those tech geeks out there, this is an 8 CPU XEON system, with 8GB of RAM, running Debian’s 64bit edition. In the next few days, BookMooch will move to this server. The problems were all relating to “64 bit-ness” with odd things crashing. I’ll do my best to make sure there are no problems, though it will mean BookMooch being down for a few hours in the middle of the (American) evening.


Update on Japanese support, as of October 2:

I’ve fixed a long-standing url problem with BookMooch, namely if you tried to search BookMooch or Amazon (via BookMooch) it wouldn’t work (there were these funny “$map” strings on the URL).

I’ve fixed this url problem, so searches of Amazon through BookMooch will now work:

Jp2

I *think* this also means that full text searching in japanese will soon work on BookMooch, when I put the new version up. I quick test of BookMooch searching for 集英社 seems to work right:

Jp3

Next up is a japanese native version of the BookMooch web interface, coming soon…

-john

BookMooch is now back up

September 15, 2007

Happy Book
BookMooch is now back up and working.

It might be a little slow, due to pent up demand from being down for a few days.

Thanks again to all you moochers for your loving & supporting emails these past few days. They really helped keep me thinking positively when things were going badly.

-john

I’m running the last of the cleanup tasks now, and expect that this will be last thing to do before BookMooch goes up again. I think it’ll be about 12 hours from now.

In the end 27,000 books were “found” in this cleanup.

I was not able to rebuild the search from scratch in the alloted time. After running overnight, the rebuild was 1/4 finished, and I’m sure people are antsy. The search rebuild will occur in a week or two, when BookMooch moves to the new (64 bit Linux) server. Sorry!

Tying up loose ends

September 14, 2007

Mc Step 6
Things are looking good rebuilding the BM database, but since I have BM down, I thought I’d “scrub” the data and make sure everything really is ok.

I’ve found a few other problems with the data, that I’d like to tell you about, and that I’d like to beg the time from to fix.

Rebuilding search

I know there have been problems with some search words not returning complete results, and some words, like “university”, returning no results at all. I’m going to rebuild the search index from scratch, which will fix that, but will take another 24h.

I *think* I’m not going to index the “book description” on this pass, because that seems to be what takes the most time, and it’s not really as useful. I ran the book reindex overnight last night, and it had only finished about 10%, so I am going to re-run the re-index without the book description, and hopefully it’ll finish within 24h (I don’t think anyone could handle BM being down for 10 days!).

Missing books

Each book in your inventory, wishlist and save-for-later is supposed to have a back-reference, from the book back to the user who wants it/has it. This is what you see when you look at a book detail page, and the page displays the book as “available:…” from other people, as well as being on other people’s wishlists.

I knew that in some cases the back reference wasn’t there. I’m not sure why: it could be any of a bunch of problems we’ve had over the first year of BookMooch’s life.

I wrote a program to find all the missing back references and fix them, and wow, there are a lot of them!

Until today, if you went to your inventory, and one of your books was missing a back-reference, I didn’t display it in your inventory. It was “there” but no-one would find it to mooch it from you. If you had the only copy of the book, and the back reference was missing, then others would, on the book details page, see the book as “no copies available” The same thing could happen with your wishlist: you wouldn’t get notified when some books became available, because of the missing back-reference.

In practical terms, what this means is that a lot of people are going to find books in their inventory that they added in the past, that “for some odd reason” didn’t show up in their inventory page.

This is great news, like finding a famous author’s unpublished manuscript in the attic.

At a quick glance, it looks like about 10% of the books in BookMooch are missing a back-reference, so when BookMooch comes back up, there should be another 70,000 books available. Be sure to check your wishlist & save-for-later lists, to see if something you want has just appeared.

-john

Continuouscaredog
A status report, pretty good news:

Yesterday’s backup of the books seems to be mostly ok, and contains 698,000 books. The corrupted current database only has 73,000 books (almost 90% lost!), so this means that most of the book data should be intact, and only one day old.

The member’s backup also seems mostly ok, with 41,970 member, compared to yesterday’s backup of 41,384, so I’m not sure there is much data loss with the members info.

I should be able to restore most of the books from yesterday’s backup, re-associate everything with the current member data. I think a few dozen members may be corrupt in the current file, so I’ll repair those against the backup from yesterday, and if that fails, from a backup 5 days ago.

This is really fragile work, however, because if I make any mistakes I actually harm the data and it’s hard to fix when the server is back up and running (how do you deal with new stuff occuring vs older, maybe correct data?)

So, I think it’ll be tomorrow before the server goes up, because I want to do the best possible job, especially since any problems that still remain will need to be fixed by hand by either me, my volunteer admins, or you guys (on your own accounts).

Sick As A Dog Hc
Sorry folks, but the BookMooch database has gotten corrupted and that means BookMooch can’t run.

I have a backup from yesterday, but that has some corruption too (sigh). And yet another backup from 5 days ago also has corruption. One of those days…

So… I need to shut BookMooch down for a day or two, get the data onto my computer, and write some programs to piece all the data back together from the various backups I have.

That means no BookMooch for a day or two, maybe even three, depending on how hard this is.

I’ve been working on this non-stop for 8 hours, and my head is about to explode from stress, so I’m taking 2h off for dinner, and I’ll back on this right away.

A million mooch apologies!

-john

I don’t yet know what to make of this, but the mooches per day has doubled for 3 of the past 4 days. It’s too soon to see if it’s a trend, but this is a HUGE amount more usage of BookMooch, all of a sudden.

We were coasting at 1000 book mooches per day for a long time, went to 1600/day when I sped the server up two weeks ago, and now usage is way, way up. Maybe it’s the school year starting?

Mday1

(sorry about the long flat line in the chart above, that’s when the statistics collector wasn’t working)

And here is the past few days of detail, zoomed in:

Mpd2B

If this keeps up, that would mean 90,000 books exchanged per month!

Non-anonymous smooching

September 2, 2007

Lip
I’ve been giving some thought to how the newly-added “give a smooch” feature would actually get used in the real world, and I think that perhaps I made a mistake requiring smooches to always be anonymous.

Say, for instance, that you get a very nice “thank you” note with a book you’ve mooched. You’d like to thank the person for their kindness, but you can’t use the smooch feature, because that would hardly be anonymous. You can use the “charity” feature and give them a point a note, but “charity” isn’t what you feel.

Instead, you really just want to send a “gee, that was a nice thing to do” note back. The “smooch” feature would be perfect for that, if it didn’t require you to be anonymous. You can simply “smooch” someone for their kindness, they get a point, and you get to explain why, in your note, you’re rewarding/thanking them.

So… I’m thinking of making smooches *optionally* anonymous. The bullet points would change slightly, and there would be an additional choice, like so:

Smooc1

and the two choices are:

Smooq