Archive for the 'software' Category

How Vista Shipped?

Saturday, November 25th, 2006

A link to how the shut down menu was built in Vista… Besides this funny thing, no organisation has still been able to handle those huge software projects.

I wonder if building the pyramids was the same?

Link

An Excellent Introduction To Semantic Web

Sunday, November 19th, 2006

This post is an excellent introduction to the Semantic Web Vision and current issues associated.

Link

Distributed Computing And Computer Languages

Sunday, November 19th, 2006

Larry in this excellent post blogs about the issue between a language and its uses. He quickly describes the issue between a language and its domain specific attributes and point out how much the effort has shifted from language development to framework. He gives the example of Prolog between C to show how those languages work on different issue. (By the way, have you seen Prolog code in production?)

I could not agree more on this and his idea of the shift from single core to multi-core CPU. I would even add that OpenMP (or others framework) are clearly impractical: too complex to learn, to use and to debug. And not addressing all issues raised by distributed computing (ie: which consistency is needed?).

A language is a trade-off between specific use cases (ie: embedded system in Java) and a broad abstractions (ie: synchronized in Java). For instance, Erlang has not been embraced as a general language but solve the threading issue. Erlang is not seen as a good general language (for a lot of reasons).

No language offers yet powerful high and low level abstraction to manage multi-processor. You could use a framework (openMP) for those, but since it is a central feature of a modern language this needs to be in the language construct. It allows you to get more information on the context and build cleverer code.

Currently, I know only of Erlang, Java and Ada to offer some sort of high level concurrency management. But a developer is not an expert in distributed system. Most patterns of distributed codes is known (where to add a mutex, a guard condition, …) and could be added automatically by the compiler with the right language construct.

More to follow on this…

Link

New Google Service: Your Own Search Engine

Monday, November 13th, 2006

Google has just released a new service. It allows you to create your search engine quite easily. Here is my test one.

This is technically impressive. I need to test it further though.

Link

Synchronising Two Machines Via Internet

Thursday, November 2nd, 2006

I am building a small Ruby script to teach myself a few things about the language and solve a current issue I have. Here are the notes about the application I am building. All comments are appreciated.

My Use Case
My company has provided me with a very cool laptop and let me use Ubuntu Linux. However, it is really heavy (or I am not strong enough) and I don’t want to carry it back home and forward (or I would prefer to use my brain instead of developing my muscular masses).

I do not want both computers to be always on (waste of energy). I would like instead when I shut down one of them computer to send all my updated data on Internet and when I boot one to fetch all the new data from Internet.

The obvious solution is use SSH+Rsync but since I work in corporate trenches I cannot count on anything except my faithful 80 port.

in sync

My data daily update rate is pretty small. So S3 and its cheap hosting rate for low data ($0.15 per Go per month + $0.20 per Go transferred), might do the trick if and only S3 is used as a pivot storage medium (a bus actually). Only updated will be put on S3 and are deleted after consumption (yes it is a bus).

The initial sync might be expensive but it is a one time expense.

How to build it?
I don’t want to put in S3 all my data (mainly for cost reasons) but only the updated data. They will be inside two different computers. Besides, since it contains all my personal data, I would rather not overwrite some new data with the old one so extra-care must be taken for those. My 2 PCs will not be used at the same time. If two different updates set are sent at the same time, the application will quit.

We cannot count on clock since both computers can be in different time zone and could be out of sync. We could use S3’s time or a time’s server but using a logical clock is simpler and more efficient.

The grain of this application is file. Some optimization could be applied later there to send only the updated part(s) of the file. All updated data will be archived in one zip file.

S3syncer has two main actions: put and get

  • Get will get all new updated data and set them up on the local computer.
  • Put will take all data updated after the last get operations and will put them on S3.

Manifest
The manifest contains all the metadata of the zip archive. Since storage space is not an issue, it will be written in XML and put on S3 with the archive.

<manifest>
<version></version>
<logical_clock></logical_clock>
<emitter>MAC address or any UUID</emitter>
<new>
<file name=”…” path=”…”/>
</new>
<updated>

</updated>
<deleted>
<file name=”…” path=”…” archivedName=”can be different if file same name”/>
</deleted>
</manifest>

NB 3 classes of actions => space economy

File naming in S3
manifest<logical_clock>.xml
archive<logical_clock>.zip

Every field is self explanatory. The logical clock is a way to know for a machine if some updates have been missed. Basically it is updated with each put. We have n -> n+1 with n and n+1 logical timestamp and -> a relation for happened before.

The logical clock is present on each local computer and updated by a call to get. (It allows also to retrieve several different updates.)

It goes from 0 to 8 and then to 0 (mod 8).

Configuration file
Since we are in Ruby, we could use YAML serialization layer, but a simple Ruby class might be smarter although less user friendly than a XML file.

For the first version, a Ruby class will be fine.

Get in more depth
Get works that way:

  1. Get the manifest file self.logical_clock + 1 (if do not exist: quit).
  2. Check emitter != self (else quit since no update needs to be done)
  3. Get archives
  4. store it in a temp directory
  5. apply updates on the local FS.
  6. delete local archives. Keep manifest + store local times (to handle change of time)
  7. delete files on S3
  8. self.logical_clock ++
  9. check for next manifest file

NB If logical clock do not exist => 0

Put in more depth
Put works that way:

  1. Find all data updated using the timestamp stored. How? Using time… But error prone
  2. check for manifest existence on s3. If yes: quit b/C error.
  3. Create manifest
  4. Send it to S3

NB Issues can exist if you change the timestamp after the get. In this case some files can be forgotten. This will be detected by checking no file has been updated 2 hours before the last get. If so, we will resend all the files.

Solve conflicts
The conflicts will be detected through the logical clock. In this case, the application would quit and the manifest file should be deleted.

The language used will be Ruby mainly because I like it and I am still learning it) with s3.rb. If I am courageous enough I will provide a Debian and an Ubuntu package.

Other uses

We can easily extend the system to support multiple computers and use it to sync a common directory (a little bit as Coda does).

References
http://townx.org/blog/elliot/thoughts_on_rsync_and_s3
http://www.amazon.com/gp/browse.html?node=16427261
http://search.cpan.org/dist/File-RsyncP/FileList/FileList.pm
http://samba.anu.edu.au/rsync/documentation.html

Master’s Thesis On WebFS

Tuesday, October 17th, 2006

As you may know, I have started working on my master’s thesis for my company. Thanks to this enlightened company, the produced software will be open source.

If you care to know what this thesis is all about, here is a quick document about the system I am designing and building. I am waiting for your feedback.

This document is highly unscientific and quite general but might allow you to understand better what I am writing and why I like it.

I will update regularly this blog on the status and deliverables of my thesis, so stay tuned… Even better fetch my RSS feed.

Link to PDF

What Is Jackrabbit?

Sunday, October 1st, 2006

Jukka Zitting from the Apache Software Foundation has just written those slides explaining what is Apache Jackrabbit.

Apache Jackrabbit

Not really technical but good for newbies and non tech people.

Link

Linux Becomes RT

Sunday, October 1st, 2006

This is an important news. It should foster and ease embedded devices developement. It used to quite expensive to use an OS for embedded/RT developement (usually you need both).

Link

How A Market Works

Saturday, September 30th, 2006

Eric Sink in his excellent blog gives us as example a product he worked on. It shows how small player outsmarts big ones but how big ones win the end (making smaller richer).

His logo

A must-read in this fantasy time.

Link

Fun To Look

Friday, August 25th, 2006

If you want to get a glimpse of the future of information management, you should click here.

OReilly

Link (Via Oreilly Radar)