Thursday, May 10, 2007

StructureCloud




It might be foolhardy to do this, but I'm going to start working on StructureCloud. It's just burning a hole in my brain, and I've got to get some code out so that I can concentrate on other projects.

StructureCloud is going to be like the new version of Crank, but for cloud computing and with a web interface. You will be able to run some of the most important crystallographic programs on it, and they will run on the cloud.

The cloud can be EC2 machines, or could be extra machines in your lab running StructureCloud processes.

My target market is people going to the Canadian Light Source (CLS), with two main users in mind, people like me, who want to be able to do lots of neat experiments using massive computational results, and users like Nham, who know how to use crystallography programs but might have trouble setting them up on their own computer.

For now, I'm going to store all the results on S3 to make things simple for me. In the future, you could move to having the data stored on your own servers, you would just have to replicate the S3 API. Since their API is available, I think it won't be too long until someone else replicates it.

We're going to use SQS as a queuing system to store the jobs that need to be run. I'll have to come up with some good way to do locking of jobs on the queue, so that you don't get race conditions where multiple StructureCloud processing nodes try to run the same job. This is probably built into SQS.

I'm going to do it all in Ruby on Rails, because Ruby really rocks.

For now, we're going to have a really simple job building system, but it would be really cool if we could move to a system where we could basically replicate what CCP4i can do. This is going to be tough on the web, because we would have to do a lot of Javascript programming to do what CCP4i does with Tcl. I'm trying to think if there is a way to abstract this with Ruby, kind of like how RJS works. For now, it will be really simple, kind of like how Crank works, where you can build up a job from individual programs, and then the Crank CCP4i interface generates an XML file that the Crank program proper runs.

It would be really nice if you could integrate this all together so that the user could look at the actual shell scripts that would be generated from different sections of the StructureCloud interface.

Oh, and it's all going to be GPLed. I don't know exactly how I'm going to make money off this, but I'm having trouble concentrating on anything else, I just think about StructureCloud all the time anyways, so I just need to get it out of my head, and into the world. If anyone wants to help, I would love to have collaborators.