Prospective members discuss

March 28, 2009

On the members area of this blog, a discussion has begun. About ontologies and their role.

The problem with ontologies is that they have to be formal and correct. As one of Gödel’s theorems says, a logical system cannot be complete and, at the same time, correct. There is a similarity with ontologies, here.

One cannot use them to establish ‘observational’ relationships – such as for instance the relationships between genes that always appear to be up-regulated together in high-throughput experiments, or between concepts co-occurring in the same sentence in text. Let alone hypothetical, inferred, relationships, based on overlapping ‘concept-clouds’.

What the Alliance is about is to try and achieve a uniform data structure that makes it possible to establish, with different levels of certainty, relationships between concepts in a given domain (we start with biomedical). The idea of a single, overarching, monolithic ontology doesn’t fit this model. Instead, several ontologies must co-exist, and yes, if combined, they can yield contradictions.

The triple store that at least some Alliance members have in mind allows several ontologies at the same time, without them necessarily having to be consistent with one another. The resulting fuzzines does, of course, mean that there may be domains in the triple store that are less suitable for ontological inferences and even contain contradictions. But even then, they are of value in gaining insight in which elements of (biological) models are based on observations, where the theory is not yet ‘crystallized’.

The difference with Cyc is that we integrate knowledge other than formal, ontological knowledge as well: observational and inferred, hypothesized knowledge. And this knowledge can be highly significant for decisions as to whether to accept or reject hypotheses, or in the process of improving the hypotheses for new experiments. Cyc is of course a great resource to use, in addition to all the ontologies constructed by NCBO and others.

Jan Velterop and Erik van Mulligen

Too much knowledge?

March 24, 2009

Inside Higher Ed published (March 23, 2009) a viewpoint by historian Ken Coates (University of Waterloo), describing the problems of keeping up with the scientific literature and information stream in general. The piece is not even called ‘Information Overload’, but ‘Knowledge Overload’. This is a view one encounters often. But ‘knowledge overload’ is a bit of a difficult notion for me. Though I do see the problem of being able to deal with all the information and knowledge that comes at us.

Instead of seeing the increasing amounts of knowledge available as a problem, we should start seeing it as a serious opportunity. But of course the tools to be able to navigate the ocean of knowledge need to be built. That’s one of the tasks we hope the Concept Web Alliance and its members can take on.

Jan Velterop

Invitation to members

March 16, 2009

Prospective members of the Concept Web Alliance are invited to click the ‘Members’ tab on this blog and comment and discuss the document displayed there. Many thanks.

Jan Velterop

Allowing knowledge to mushroom

March 14, 2009

I watched Tim Berners-Lee’s TED talk today, in which he calls for “Raw Data Now”. He also appeals for those data to be connected. That pretty much sums up the mission of the Concept Web Alliance, in my view. Connecting data. And not just data with data, actually, but also data with documents, data with people, people with documents, et cetera. We can treat documents, data, people, just about anything, as concepts, and then describe their connections in the form of triples. Creating a multi-dimensional web of knowledge. Or perhaps – since the picture we have of a web is often just two-dimensional – a ‘mycelium’ of knowledge (I’m taking some liberties here with Tim’s roots and flowers analogy).

The WWW works because in any web page we can embed links to any other web page, creating triples such as > links to >

The Concept Web will work when we will have a widely adhered-to protocol to connect any concept with any other on the Internet in the form of triples, and, importantly, the tools to read and reason with large numbers of those triples, thus giving us the means to allow the buried ‘mycelium’ of concepts and their connections to mushroom into ‘ingestible’ knowledge.

Jan Velterop

Deploring or exploring?

March 3, 2009

When Homo sapiens was still in the early stages of his evolutionary development, he hadn’t yet figured out many other uses for water than to drink it. And perhaps to bath and swim in it. This is conjecture, of course, but the earliest evidence of the use of boats, or even just rafts, dates from much later than the emergence of Homo sapiens, so assuming that he was just using water to drink may be an acceptable point of departure for my story.

Water is one of the most abundant resources on earth, but if you’re just using it to drink, you don’t quite get much of its potential out of it. When people invented rafts, and developed boats – probably in the form of dug-out logs – a whole new world, literally, opened up to them. They all of a sudden didn’t have to see expanses of water as impediments to getting to the other side, and once navigation was thus discovered, waterways and seas became the most important transportation routes upon eventually empires were built. The rest is history, to use a cliché.

There is something similar going on with the way we use information. The image that I have in mind is that there are virtually oceans of information available to humans, but that the only use we make of that information is ‘by the drink’ – by reading articles or bits of articles. That way, the knowledge contained in the ever growing seas of information (just think of the amounts of information coming out of, say, microarray experiments), is unlikely to come out in full. There remains an enormous amount of “unknown knowns” (apologies for using a Rumsfeldism) if we do not find a way to do more with information than read articles and books, or consult databases. We have to develop ways of extracting knowledge out of large amounts of information. Thousands of papers, and thousands of database entries. Or hundreds of thousands. We can’t read those. We have to invent the equivalents of rafts and boats to navigate information. And still read, but manageable amounts (after all, we still drink, too).

In whatever information navigation we already do, we stay very close to the coast, and only to the coasts we know. We search. And we pretend that we are navigating the vast expanse of knowledge that search capabilities on the internet have opened up. But are we? Is searching not a retrograde step in terms of knowledge discovery? Aren’t we inclined to search for knowledge and relations between bits of information we already know to exist? And so foster more homophily in the process than before, when large-scale search wasn’t yet possible? And stay in our knowledge comfort-zone. Look for confirmation rather than for falsification. We should give chance more of a chance. Serendipitous discoveries are, after all, the ‘stuff’ of which breakthroughs are made.

Some people deplore the fact that more and more information becomes available. They talk of information overload or overabundance. And if the only thing you can imagine doing with it is read (‘drink’), then you may have reason to be negative about it. If you think like this you may seek solutions in selection, in limiting access, in having the choices made for you. But if you can imagine truly navigating the ever growing seas of information, you will not deplore the abundance, but instead, start exploring it.

Jan Velterop

PS. This entry first appeared on 10 February 2009 on The Parachute, a blog devoted to Open Access.