Groups*

As a result of the Conference in New York in May, twenty groups have been proposed to take on the task of determining scope, deliverables and a time table, and subsequently make detailed proposals, for specific topics of importance for the CWA.

For each group, a number of individuals who have already signed the CWA Declaration are being approached directly and invited to participate, but anybody with an interest in participating in these groups is cordially invited to join (email: info@conceptweballiance.org). Unless you have their email addresses, the info@conceptweballiance.org address should also be used to get in touch with the Chairs of the working groups – please indicate the name of the Chair and the working group number in your email.

The proposed working groups are:

Category 1: organizational and operational
1.1    Governance (co-Chairs confirmed: Albert Mons/Jan Velterop)
1.2    Policies
1.3    Organizational structure (Chair confirmed: Albert Mons)
1.4    Legal and licensing
1.5    Capacity building (Chair confirmed: Albert Mons)
1.6    Commercialization / valorization (co-Chairs confirmed: Albert Mons/Jan Velterop)
1.7   Sustainability / public fundraising (Chair confirmed: Barend Mons)

Category 2: Scientific and Technical
2.1    Content capture (Chair confirmed: Marco Roos)
2.2    Tool development (Chair confirmed: Erik van Mulligen)
2.3    Storage and maintenance (Chair confirmed: Scott Marshall)
2.4    Unified persistent ID (Chair confirmed: Geoffrey Bilder)
2.5    Quality control (Chair confirmed: Christine Chichester)
2.6    Triple model / format (Chair confirmed: Andrew Gibson)
2.7    Attribution (micro- and nano-credits)
2.8    Content acquisition (Chair confirmed: Christine Chichester)
2.9    Triple Browser/reasoning (Chair confirmed: Erik Schultes)
2.10  Multi-linguality (Chair confirmed: Gerard Meijssen)

Category 3: Strategy/Advocacy
3.1    Scientific credibility (Chair confirmed: Antoine van Kampen)
3.2    ConceptWiki (Chair confirmed: Christine Chichester)
3.3    Advocacy (Chair confirmed: Jan Velterop)
3.4    Conference 2010 (Chair confirmed: Karin van Haren)

Proposals for additional working groups would be considered as well. Please email info@conceptweballiance.org

——

Prospective members of the Concept Web Alliance are invited to comment and discuss the following draft:
cwa-logo-small

The Concept Web Alliance and its relation to founding members

Summary

This summary describes the rationale and global scope of the Concept Web Alliance (CWA), a mostly virtual organization, eventually with establishments in all continents.

CWA is meant to provide a collaborative environment for academics from different disciplines, addressing the volume and the complexity of current scientific data and communication. This initiative may eventually be evolving in a public-private partnership supporting a global solution for the information overload currently being prohibitive to efficient innovation and knowledge discovery. The initial focus will be on highly complex biological data. The initial European focal point will be hosted in the Netherlands Bioinformatics Centre and the USA focal point will be at Stanford University, Center for Biomedical Informatics Research, which is host to the National Center for Biomedical Ontology.

Rationale

  • Researchers and other knowledge intensive Web users are not necessarily looking for information but for ‘actionable knowledge’
  • Traditional formats to capture complex information are no longer adequate to absorb, analyze and share the current scientific output
  • The (current) Text Web (text, data bases, and rough data repositories) has opened a ‘Can of Words’, plagued by ambiguity, redundancy, fragmentation and inaccuracies.
  • Data sources, particularly those outside the mainstream scientific literature and databases, are often not interoperable.
  • Stored information can most efficiently be turned into human, actionable knowledge when captured in a dynamic, non-redundant and unambiguous ‘ontological’ format, which we broadly refer to as the Concept Web, in which each node is a unique concept.
  • Multiple direct and indirect ‘associations’ between concepts need to be captured and represented between any pair of concepts in the Concept Web, with attributes and values attached, further to be called ‘edges’. A concept-edge-concept combination is a ‘triple’, the smallest building block of the Concept Web.
  • Information silos, including the rapidly developing biobanks, must be made interoperable in order to make optimal use of the knowledge they contain possible.
  • Concept Web information formats should allow mind-machine interaction and support intellectual networking and collaborative intelligence to support ‘science 2.0’
  • The transition from text to new publication and reasoning formats should be coached by international thought leaders to ensure optimal formats for mind-machine interaction
  • Science management, both in funding institutions and in academic environments, is in need of interoperable tools and data formats to be able to “reason” with all data available in order to have efficient resource allocation.

Scope

  • CWA will be an academic think-tank to review, discuss, design and potentially certify solutions for the interoperability and usability of massive, dispersed and complex data, in concordance with, but beyond the current ‘Semantic Web’.
  • Its members will be interested and committed intellectuals who have their own academic, industrial or independent positions and work together for as long as they see fit to:
  1. Share thoughts, hypotheses, methods, tools and data to enable the Concept Web
  2. Identify, collect, analyze, reformat and share quality data sets
  3. Set up collaborative intelligence environments to manage complex data and information
  4. Use computational and community annotation approaches to validate and improve data
  5. Prototype and deploy technology to enable knowledge discovery in the Concept Web, including preconceived experimental in silico workflows
  6. Organize practical workshops and symposia on the topics of interest
  7. Act as advocate for the transitions needed.

General organization

  • CWA will largely build on existing institutions as a not-for-profit organization with tax exempt status in all regions. It will be organized in such a way as to allow for funding to be routed through the CWA to its partners and other contract research collaborators.
  • CWA will mainly serve as a coordinating and administrative body to further the stated goals.
  • From its inception, CWA will have a Board of Directors, elected by participating Founding Members.
  • CWA will, eventually, have minimally one physical location in each continent, associated with a leading university or organization that is also a Founding Member
  • Founding and Associate member organizations (the latter joining CWA after its initial establishment) will be allowed to grant eligible faculty members a formal CWA affiliation in addition to their university or institution affiliations.
  • The Board of Directors may, at its discretion, also grant formal affiliation to individual scientists whose home institution is not a CWA member.
  • A dedicated Web 2.0 community (and sub working groups) will be formed in existing community software environments such as LinkedIn, and Epernicus and will be ‘semantically supported’ by the Concept Web technology to form natural communities of expertise.

Specific activities

  • CWA offers solicited advice on Open Source and Open Access issues to its partners
  • CWA will use the state of the art (bio)informatics and semantic tools of its partners to reach its content harmonization aims and for creating the Concept Web
  • In case partners create unique ‘edges’ in the ‘triple index’ of the Concept Web by proprietary (bio)informatics technology, through human curation of informatics output or through laboratory confirmation of associations between concepts such as protein-protein connections, those partners retain the IP (mainly copyright) on the ‘triple collection’ they produce and maintain, but make them available in open access for non-commercial purposes. The Concept Web can be viewed as a large collection and the sum of all so far contributed edges between each pair of concepts.
  • The way in which the Concept Web can be stored, maintained, expanded, analyzed, used for reasoning and knowledge discovery and exchanged is the realm of the research and development activities of the Concept Web Alliance.
  • Individual data sources will be invited to ‘connect’ to the Concept Web through user friendly interfaces, API or Plug in so that dispersed data sources can be concept-indexed and become interoperable via the triple index.
  • Everyone will have full open access to the common triple index for non-commercial purposes.
  • Tools that can efficiently operate on the Concept Web can be ‘concept web certified’
  • The final aim is full interoperability of data sources and informatics tools, across natural languages to support the knowledge discovery, description and exchange process in science.

cwa-summary-15-3-09

CWA initiating group:
Amos Bairoch, SIB and UniProt (Switzerland)
Katy Borner, Indiana University (USA)
Carole Goble, University of Manchester (UK)
Henning Hermjakob, EBI (UK)
Mark Musen, NCBO and Stanford Biomedical Informatics group (USA)
Gert Jan van Ommen, LUMC and CMSB (Netherlands)
Abel Packer, Bireme and SciELO (Brazil)
Jan Velterop, Concept Web Alliance preparatory group (UK)
Mark Wilkinson, University of Vancouver (Canada)
Frank van Harmelen, Free university of Amsterdam (Netherlands)

Organising committee:
Albert Mons
Ruben Kok, Director NBIC (Netherlands)
Jacintha van Beemen

Support for the meeting and start up of CWA:
Netherlands Bioinformatics Centre, Netherlands
Bill Melton Foundation, USA
Centre for Medical Systems Biology, Netherlands

11 Responses to Groups*

  1. From what I can tell about the proposed alliance, it isn’t intended to be based on any particular technological implementation, but rather on the concept of certainty of identity of concepts? That is to be applauded.

    I am puzzled though. As I understand it, ontologies are currently most valuable at specific domain level (e.g. finance, or biological data), where there are clear boundaries/constraints, and less so if one attempts to join these all at the level of a universal or common foundation ontology.

    Yet CWA reads as though it is has the potential to be another of the recurrent proposals for a “Foundation Ontology”, SUMO, or Cyc. In which case, I would raise John Sowa’s three questions (in his ontolog forum post http://ontolog.cim3.net/forum/ontolog-forum/2009-02/msg00423.html):

    “Whenever anybody proposes a project to build a large formal ontology, I bring up the Cyc project….

    1. What are you proposing that is different from Cyc?
    2. What makes you think that you can be more successful than Cyc?
    3. Why don’t you start with Cyc as a foundation?”

    I think it would be useful to see CWA’s answers to that “Devil’s Advocate” position as an aid to understanding and promoting your intended work. If CWA is not intended to be that, then how is it constrained in scope?

  2. catherinelyons says:

    The CWA material I have read implies that the concepts (i.e., nodes) are a given: scientifically stable and conforming to technical standards; it is the edges that are in flux. In my understanding of the state of bioinformatics, the OBO initiatives have not arrived at that point yet. Is CWA proposing a social model that would motivate and therefore hasten full interoperabililty, across all life science?

    Catherine Lyons
    Names for Life, LLC

  3. barendmons says:

    Let me react to Catherine’s and Norman’s comments in one response. Indeed, concepts in biology (the ‘nodes’) should be as well-defined and as stable as possible and it is the ‘edges’ that are in ‘flux’ (I like that term).
    I meet a lot of ontology people nowadays – I would not consider myself one – and it seems to me that they very much think about ontologies as more or less static, curated ‘concept maps’ with curated ‘edges’. That, in my view is the level where the 3 questions are very pertinent. However, the world can not be described in such an ontological format. Next to being unmanageable, even concepts are ‘fuzzy at the boundaries’, and across disciplines may have quite different semantic environments. That is why we have introduced two additional layers of triples that represent ‘observational’ connections (such as co-occurrence per sentence or co-expression of genes in certain experiments) and even hypothetical (inferred relationships that have never been made explicit, such as predicted protein-protein interactions or statistical correlation between a mutation and a disease). My personal view is, but again, CWA is not about anyone’s view in particular, that sub-ontologies can be derived from the massive triple store where these triples can ‘move freely’. A very formal ontology can be derived by selecting only curated triples of certain classes and when reasoning is the goal, observational and hypothetical triples can be added. Also, given a certain domain, certain triples can be excluded. There can be a triple with a connection between the concepts ‘Lion’ and ‘Zoo’, which as such is perfectly ok. However, when an ontology of migration of lions in Africa is made, that triple is probably quite out of context and an be removed by pre-filtering on attributes or manually.
    Hope this helps ?
    Barend Mons

    • I may be missing something. In the triple you use as an example: “triple with a connection between the concepts ‘Lion’ and ‘Zoo’, which as such is perfectly ok.” what is the middle term X (Lion – X – Zoo)? Are all triples simply of the form A “is connected to” B, or does the middle term have different values (A is part of B; A is superset of B)? How would “Lion X Zoo” relate to “Dodo X Zoo”?

      Can you point us to a concise paper describing the triple concept?

  4. Thanks for asking me to join this exciting initiative (Jan). Obviously, this is a huge task.
    Let’s start with transforming the peer-reviewed journals, adding machine-processable triples to article narratives. For example, in the long term, the Cochrane Collaboration should publish its systematic reviews in a machine-processable format.

    As an editor of an open access medical journal (http://www.jmir.org) I am also more than willing to participate in any experiments to publish articles with machine-processable triples (let’s start with randomized trials). We “just” need the language/ontologies, and the applications which demonstrate that the additional efforts required to add this information makes it worthwhile.

  5. Researchers not only need access to information/knowledge, but they also need “guidance” on how to
    a) select a research topic
    b) how to search the web
    c) analyze research results
    d) present the research results

    Would this be covered somewhere in CWA? It is very important to guide the prospective researcher and I am in the process of preparing a proposal on ICT Support for Health and Medical Research in India. We have covered very importantly access to information, developing “research education tools”, registry of research being conducted in India, etc. Hopefully I would be able to share this with the CWA members.

  6. Dan Bolser says:

    Sounds good, but what can it do for me?

    Concrete examples of specific use-cases and ‘happy day’ scenarios are prominently needed.

    WHAT HAS … “the transition from text to new publication and reasoning formats should be coached by international thought leaders to ensure optimal formats for mind-machine interaction” … EVER DONE FOR ME?

    I mean it’s a nice idea, but how is it practically relevant to scientific research right now?

    • Dan Bolser says:

      Some related questions:

      How is a ‘concept’ implemented technically? i.e. what is the data structure of a concept?

      Is a concept data only, or do concepts implement methods?

      How does this project relate to the various ‘minimal information for describing … x’ projects? For example, where people have set down standards for the ‘minimal information required’ to describe protein protein interaction.

      If concepts are distributed and not federated, how will a concept store be located?

      Will it be possible to optimize queries over distributed concept stores?

  7. Dan,

    There maybe examples of “happy day” scenarios which may have not got recorded or blogged anywhere. We have to work towards finding a solution where researchers, without having to tax their brain, be able to get properly analyzed information that would help them with their research. So lets work towards that goal.

    • Dan Bolser says:

      Of course its good to have a clearly defined goal, but demonstrating the utility of the technology that has been inspired by that goal is a critical step before any new system of this kind can become accepted.

      You mentioned a need for clinical ontologies and support for reasoning over those with respect to health care informatics. Surely this is a well defined field with clear requirements. What we need is to compare the potential of CWA to meet those requirements relative to existing technology (for example).

      To be clear, I really like the overarching vision that is described here, but I’m also aware that this is not the first proposal of this nature (so called ‘expert systems’ for health care were very popular topics in the 60’s and 70’s for example). The key point is, how will this system succeed where others have failed or only achieved marginal success?

Leave a reply to Naina Pandita Cancel reply