Archive for the ‘Uncategorized’ Category

UN data free and accessible — data.un.org

Saturday, February 23rd, 2008

At TED in 2007, Hans Roslings announced that the UN had agreed to make their data freely available. Previously it was paid only. 

Well they have not only followed up with action, they have done it in style.They have just introduced a new website where you can query all the databases. It is features a very nice google style search, with lots of ajaxy goodness to make finding data even easier.You can find it here : http://data.un.org/.

It is early days yet, and all the data is not there, but you can already do some interesting searches. Some rather peculiar ones too. For example, you can find out how many Internet users there were in the world from 1960 to the present. Here is the data for 1960 : Internet Usage in 1960 - 30 years before Al Gore even invented it :-)

Mashups at the World Bank

Saturday, February 23rd, 2008

I have been interested for a while in the idea of the ‘corporate mashup’. What could be done with corporate data mashed up with things like Google Maps and other visualizations? I found this one today from the World Bank — summaries of its projects in a Google Maps mashup.

http://geo.worldbank.org/

It is also very good to see that these organizations are starting to open up access to this information — triggered in no small part by Hans Roslings at Gapminder.

Google Maps StreetView Privacy Issues

Friday, June 1st, 2007

This link illustrates some of the potential concerns with Google Maps new StreetView.

Personally I like the approach of this guy — maybe it is true that in a world where we have ubiquitous information, anonymity comes with overflowing the buffers of the observers.

This is something that we have been coming to terms with for a while. Records such as court records that were once theoretically available freely were in practice quite hard to come by. You had to go to the court in question and apply for the paper copies. Now as they become available freely and electronically, there are new considerations.

For example, over on Groklaw the addresses and phone numbers of court staff and lawyers are available in the court records which are now available for easy download. This could be a serious issue in a controversial case.

I’m not sure what the long term prospect is. I suspect that we will just become used to the fact that a lot of what we do is visible. The blogsphere proves that many of us are comfortable sharing extraordinary amounts of personal information with the world as a whole.

Interesting times.

Easter in Chicago

Wednesday, April 11th, 2007

I spent easter in Chicago visiting friends - including Ajey and Shraddha and their son Arush. Photos on flickr: Easter in Chicago

IMG_1908

No Bravery

Friday, February 10th, 2006

Says it all really -
a sad song for a sad time (Warning - video contains disturbing images)

James Blunt via the freeway blogger via onegoodmove.

Shared Memory in Teams

Sunday, January 22nd, 2006

A while ago I read the The Tipping Point. There’s a section there on how partners share memory - two people who are in a relationship remember differently than single people.

The study cited is here:
Transactive Memory in Close Relationships

When reading it, you may note the Grossly Misleading Graph (Figure 1) - but apart from that the results are interesting, and seem fairly familiar. I guess we all know long-together partners who finish each other’s sentences, where one will delegate a question to the other saying the other “always remembers the birthdays” for example.

It makes sense that the same effect occurs in development teams after a while. We delegate memory - the team naturally spreads the knowledge around, and people end up with different levels of “expertise” in different parts of the system. We need to be aware of this when we talk about sharing the knowledge in XP - it means that we will never get complete knowledge transfer. It is normal healthy humanity that some people know more about a certain part of the code.

But that doesn’t mean that the attempt to break apart the “silo” and spread knowledge is a waste of time. Rather, it should inform the way we go about it. Basically, I think it means that we should strive to achieve a common understanding of the system rather than an equally detailed understanding of it. There should be plenty of people who “know where to look” as Vishy put it once.

As an analogy, one of that couple may do most of the cooking. That doesn’t mean that the other cannot cook, or doesn’t want to, it just means that they mostly don’t - maybe they just help out with the vegetables. Even so you can be sure that they both know which cupboard has the spices. Just as we try to avoid “silos” in development, we would be worried if the cook hides the spices, or locks the cabinet, or even worse, locks the door to the kitchen.

So I think it is unrealistic expect everyone in the team to get to the point where they know all parts of the system equally well. Rather, make sure that the team has a common language and way of doing things - so they know where to look.

Grossly Misleading Graph

Sunday, January 22nd, 2006

From:
Transactive Memory in close Relationships

Figure 1 from Transactive Memory in Close Relationships

This falls into a classic error in displaying data - note the y axis starts at 20, not zero. This magnifies the effect - it makes it look like the difference between the values is relatively much greater than it actually is.

Consider the Natural, assigned and unassigned values on the right 2 bars. As displayed, it looks like the values are 24 and 32 (I am rounding to the nearest whole number - the paper does not specify the totals. So from the graph it would seem that the ‘Assigned Expertise’ is 4 / 12 as high as the ‘No Assignment’ value. That is, a glance at the chart would imply a 3-fold improvement of recall for ‘Assigned Expertise’ over ‘No Assignment’.

What are the real numbers? More like this:

Figure 1 with fixed Y axis

Still an interesting result, but nowhere near as much so as the original chart implies.

You see this a lot in charts during business presentations - it is a temptation to make a result seem as dramatic as possible, but it distorts the actual result, and may bias us towards a particular interpretation. I always look at any presentation chart to make sure that the axis starts at zero.

For more info on charts and best practices, see here:
EIA Guidelines for Statistical Graphs
The page concerning bar charts in particular is here:
Vertical and Horizontal Bars, Pie and Dot Charts, and Three-Dimensional Features

Here’s a comparison of the two graphs, both converted to the same format:

Figure 1 redrawn Figure 1 with fixed Y axis

Photos from Calgary and Banff

Friday, December 23rd, 2005

So last weekend Mojo, Ashish, Ben and myself were all in Calgary for the TW Christmas Party. On the Saturday we took a car up to Banff for the day. Banff is about 2 hours outside Calgary, right in the Rockies. Here are some pictures from the trip. Click on the picture to see the whole set…

As you can see, Mojo in particular was having fun :-)

Acceptance Testing - XPToronto

Friday, December 16th, 2005

At the last XP Toronto we had a series of ‘Lightning Talks’. Basically 5 of us gave presentations on topics associated with TDD, each of 5 minutes, and followed with 15 minutes worth of discussion.

My contribution was this talk on Acceptance Testing. It is a summary of what I think are the important trends and tools in approaching this topic.

XPToronto - Acceptance Testing Presentation (PDF)

The format is inspired by the now famous Identity 2.0 keynote given by Dick Hardt at OSCON. It was an interesting experience, and people said that they found the presentation style useful.

This style seems to lend itself well to a narrative kind of presentation, so mine became a bit of a story about the history of and approaches to Acceptance Testing, sort of a starting point for the conversation rather than a set of bullet pointed assertions. This is particularly good for a lightning talk I think in that we don’t have much time anyway. I’m also not a fan of the bullet point anyway, so it was a muuch better fit for the way I like to present.

There are some differences to a normal presentation though:

  • The narrative can’t be interrupted - there is no place to pause and ask questions, and no way to go off on a tangent if the audience wants to pursue a different trail.

  • The slides may not provide a useful reference for the participants. I am in two minds on this one - I think that pointers to more information are probably better than bullet point assertions anyway

  • The medium may dilute the message. This is a problem with slideware in general, and it may actually be worse with this style until people get used to it. But I don’t know if this will be the case or not - the visual nature of the points may actually reinforce them and make it easier to recall parts of the argument. For example after the presentation a few people said that they liked the ‘brick wall separating the testers from the devs’ in slide 14 and 52, and maybe that image will come to mind next time they are in a meeting between the two groups?

Over all, I like the approach. I try to avoid slideware in general, and when I have used it I tend to a few slides, maybe some photos and lots of talking over bullet points. So I guess it is a good fit for me.

Q: When is an interface not an interface?

Friday, August 27th, 2004

A: When you can’t load it into the JVM without it hitting the DB

I am working on a large and somewhat aged app that has an interesting pattern.

Our services are Singletons (yes, I know, Singletons are Evil(TM) ) are implemented like this:

public interface Foo {
	public static final Foo INSTANCE = (Foo) Plugins.load(Foo.class);
	void doSomething();
}

The Plugin class statically initialises the instance by looking it up in a Properties file, something like this:

public class Foo {
	public static Object load(Class clazz) {
		String implClassName = getProperty(clazz.getName());
		try {
			Class implClass = Class.forName(implClassName);
			return implClass.newInstance();
		}
		catch(Exception ex) {
			logger.log("Could not load class " + implClassName, ex);
		}
	}
}

The properties file looks something like:

#interface=impl
com.demo.Foo=com.demo.impl.AConcreteFoo

The idea is that you can change the services depending on the location, server, whether you’re doing tests etc. So far, so nice.

But there’s a problem. Actually a few.

We also have a class that loads codes from a db table - through a statically loaded plugin. So what happens if I do something like:

public void testSomethingThatUsesAFoo() {
	Mock mockFoo = mock(Foo.class);
	...
}

Well Foo’s class is loaded by the classloader. It goes off and runs all the static initializers. And one of those calls into another service, static loaded and initialized, which loads the database, reads in all the codes, etc. so before you know it, just by importing an Interface, you have a test that requires the real database to be present. You can’t run the unit tests without a db, you can’t take your laptop home and do some refactoring. Nasty.

Worse is the fact that the error that is thrown by the plugin loader is obscured by a zillion ClassNotFoundExceptions, since the Plugin load failure, being in a static initializer, means the JVM cannot load the class.

Also since the exception is not centrally caught, it is also not possible to know when the system is in a fully configured and running state - there may be a Plugin that will not load until daily processing runs at 3AM next Wednesday - at which point that component fails in an isolated and undetected way. So the system keeps running while part of it is completely broken. Again - ouch.

From a testability standpoint, since the INSTANCE is defined as public static final, we can’t swap in a different implementation during test without running the tests in a different VM, with a different plugins.properties. Fine (in fact desirable) for Integration tests, but a real pain for unit tests.

So what’s the solution? We could just make each of the Foo users directly call (Foo) Plugins.load(Foo.class), but that doesn’t really help - it still means that the loading of services is dependent on random (or at least obscure) factors such as the order of class loading and static initialization. And it still doesn’t solve the random future failures problem.

The solution we have chosen is to move to a ServiceLocator pattern, backed by PicoContainer (Dependency Injection). We have a ready built pico configuration file in the properties file described above - basically the pico code to set the system up boils down to:

for each property in plugins.properties {
        pico.register( Class.forName(property.name), Class.forName(property.value) );
}
pico.start();

We will also be turning static initializers in the implementation classes into start() methods and externalizing their dependencies via their constructors.

Should be interesting.