Archive for the 'distributed systems' Category

Master’s Thesis On WebFS

Tuesday, October 17th, 2006

As you may know, I have started working on my master’s thesis for my company. Thanks to this enlightened company, the produced software will be open source.

If you care to know what this thesis is all about, here is a quick document about the system I am designing and building. I am waiting for your feedback.

This document is highly unscientific and quite general but might allow you to understand better what I am writing and why I like it.

I will update regularly this blog on the status and deliverables of my thesis, so stay tuned… Even better fetch my RSS feed.

Link to PDF

Processes vs. Threads

Thursday, August 31st, 2006

In the two decade long battle between processes vs threads, here is another post. It states threads shouldn’t scale as well as processes.

The issue is there are not that much difference between a thread and a process.

Link

Amazing Amazon EC2 (New Webservice)

Thursday, August 24th, 2006

Amazon EC2(Elastic Compute Cloud) is the next logical step after S3. It is built on top of S3 and allows you to instantiate a virtual machine. You pay by the use (you also need to include S3 costs) and you can use it how you want.

Amazon EC2

This is a smart move from Amazon. Smarter than Sun’s one. The problem with the grid architecture is to parallelize a computation (whether an addition or a program; for instance how do I parallelize int i = 5 + 8; i++; the answer is it is uselss).

Amazon is creating a really amazing web services stack. Those days Google is all the rage, but Amazon is really creating something much more than their ecommerce shop.

The service is currently in beta. I am subscribed to it and plan to test it. What is sure is that it will provoke a lot of “Schumpeter destructive creation”.

Link to the documentation

Large Scale Distributed File System (3)

Tuesday, July 11th, 2006

This is the last part on my presentation on large scale distributed file system. Here is the report (in French) synthetizing all we have said on the subject.

This presentation was made for a CNAM class and I wish to thank all the professors and the other students for their support and help.

Link

Corba’s Errors

Tuesday, June 20th, 2006

Here is a must read on the Rise and Fall of Corba. Some great lessons on how not to create a useful and used standard. Computer history is still rare so this article is interesting.

Link

Large Scale Distributed File System (2)

Friday, June 2nd, 2006

This presentation is about Internet based distributed file systems. Something we think is going to be quite big in the near future. We are not talking about Google remote hard drive, nor MSN one. We are talking about the next step of the storage industry.

Please have a look and enjoy it.

The last part to follow is the report on those two presentations and the class will be over (and hopefully succeeded).

It was a lot of work for me and Ingrid but it was fun. (All the docs are in French, if someone translates them please send them back to me so I can put them online).

PDF slides

Slides with comments

Reality-Check: Grid Computing

Sunday, May 28th, 2006

From Tim Bray, his own état de l’art analysis. Quite interesting. Especialy for someone like me with a strong focus on distributed computing.

I will comment on it later (after my exams probably, right now I have a lot, a lot of work)

Link

Interview of Amazon’s CTO

Tuesday, May 16th, 2006

ACM Queue has just released an interview of Amazon’s CTO Werner Vogels. He is an ancient from Cornell University and is specialised on HA and scalable system (see his thesis work for more information) I read his blog regularly since he addresses themes I work on.

He speaks on various subjects. It seems great to work for Amazon. Should I send them my resume?

My favorite points (I defend them too through this blog and I might have over-interpreted some of his quotes)

  • Growth The goal of a new company is to grow (Amazon is still a new company). You will eventually scale up or you will look for another job.
  • Operations vs. development In our fast-paced world, those two worlds should be the same. There should be no operations service whose job is to finish “manually” the applications developed in house. Most operations team exist only because the executives are afraid of code and they are bound to disappear.
  • Amazon is a technology company The last generation of companies rely heavily on technology. They sell their technology as a service (Google, Amazon, eBay,…) and not as a shrink wrapped software. They tend to see processes everywhere. After defining a process, they automate it. Human work comes into play only when a process fails. The responsible process is then updated. We are in vertuous circle. An employee manages some processes instead of having a job. (Here a process = a service or a composition of service)
  • Distributed system is the new norm. Distributed systems are emerging from research and are more and more used. A lot of companies do not know still how to use them efficiently. SOA can help them improve this. Actually SOA is distribution. Where is fault-tolerance and replication? Can we manage them as services?
  • REST vs SOAP. No one cares. SOAP and the WS stack is used with IDE (they consume the WSDL file to generate the proxy automatically). REST is used by small libraries (PHP, Python). To create an open Web service you need to support both. They will appeal to two different (and complementary) market segments.
  • Scaling need careful planning. When scaling, you are less efficient as when not scaling. This is a general idea valid both in computer science and in life. For instance, you can both work and manage five persons. When managing two hundred persons, most of your time will be spent managing them and not working. You need to delegate. Since scaling takes resources, you need to plan it in time and know it will be resource consuming.
  • Cost of messages. This is something quite obvious. SOA creates a network intensive computing environment: you will send a lot more messages to create a page but on your LAN. This is a fine practice since message are costly on WAN/Internet, not on LAN. (To be exact, they both are costly but the differential costs make the LAN negligible.)
  • Use SLA. One question has always been: how to measure performance? Actually it depends on your goals. SLA is a good way to formalize your goals in technical terms. It is how carriers are working with their suppliers.
  • Link to his blog
    Link to his interview

Large Distributed File System

Friday, March 31st, 2006

We have just heard in class in CNAM a speech on distributed applications. Sadly it’s all in French. The speech is about large scale distributed file systems such as SAN or the GFS. We started from NFS to the Google File System.

If this subject interests you, please read the slides. I have left all comments in since my slides were mostly diagrams.

There will be another speech on Internet based DFS. We will design for this event a naive DFS implementation.

It is a lot of work, but really interesting.

Link (PDF) to the slides