“Concept Web Alliance Hits Ground Running in Bid to Harness Semantic Web for Life Sciences”
Vivien Marx has written a detailed report about the CWA’s inaugural conference in New York. Her full story can be found on BioInform (May 15, 2009).
Here are some excerpts:
At the conference, which attracted around 60 attendees from research institutions, pharmaceutical companies, and publishers, the CWA’s 11 founders explained that the alliance aims to pull together existing academic projects, as well as seek new ideas and methods, to address the challenges associated with high-volume scholarly and professional data production, storage, interoperability, and analyses.
CWA founders said that they also plan to reach out to commercial entities, including publishers and software vendors, to assure access, reliability, acceptance, distribution, and growth strategies for the alliance, its methods, and technology.
While the CWA is initially focused on the life sciences, co-founder Barend Mons of the Erasmus Medical Centre at the University of Rotterdam and Leiden University Medical Centre, said in his presentation that membership and involvement is open to other disciplines as well.
The CWA seeks to assure that “the data can talk to each other,” but the motivation is “down-to-Earth” and not intended to appeal to only “a few highbrow scientists,” Mons said.
“Ideally triples representing concept/relation-concept ‘facts’ of curated observational and hypothetical connections form a dynamic Concept Web,” the founding declaration states.
A triple is comprised of two concepts and an explanation of the link between these two concepts, much like a sentence with a subject, object and predicate, CWA founders explained at the meeting.
Mons said that the concept web builds on the way “we essentially think as humans, by connecting concepts.” Each concept is “a unit of thought” and the edge is the relation between the two concepts, which will “ideally” carry unique identifiers to allow searches by computers as well as reading humans, he said.
The alliance is not trying to “make the ontology of the world,” Mons said, but instead seeks to capture concept relations. Triples can be curated, observed, or hypothetically derived.
Barend Mons’ brother, linguist Albert Mons, who is on the CWA organizing committee, told BioInform via e-mail that the concept web is “an interpretation and concrete implementation of the rather more generic notion of the semantic web.”
Semantically rich triples include facts such as those assembled in curated databases such as Swiss-Prot, which are “as true as you can get in science,” assertions based on established experimental results or peer-reviewed journals, triples of concepts that have been mined with the help of computational tools, triples based on new data, or hypothetical triples extrapolated from experimental or simulated data, Mons said.
Scientists can add to the “triple store” insight they gather in the course of their research, Mons said. As Velterop said in his talk, publishers might find it useful to “tripletize” material or help to create triple stores of material contained in journal articles.
In one discussion segment, Harvard Medical School systems biology researcher Walter Fontana said that triple stores might offer the chance for scientists to catalog and publish scientific results for experiments “that did not work out.” Velterop agreed that negative data are an “underused” data source.
The links or edges between concepts can be “richly annotated,” Mons explained, such that they include time-date stamps to offer information on the provenance of the data, additional facts that sustain or contradict a concept, or offer quantitative information such as the number of co-occurrences of triples in the literature to help scientists “build hypotheses on their computer screens.”
CWA plans to advocate for efforts to develop and refine ways of harvesting triples as well as methods to reason over them, Mons said at the meeting.
Gert Jan van Ommen from Leiden University Medical Center said in his talk that the CWA might benefit from current projects that are aiming to overcome fragmentation of information in biobanks.
The rising interest in pharmacogenomics and the increase in resources being allocated to integrate biobanks across national, cultural, and language borders in Europe might serve as a use case for the application of triples to organize, classify, and link clinical and -omics data, he said.
“I think biobanking is one of the fields [where the CWA] should look at for applications,” van Ommen said. Storing data in triples would “significantly reduce the complexity” of tasks facing biobanks, he said, because concepts could be translated into numbers that would correspond to the same concepts in Finnish, Greek, or Italian, for example.
Other speakers said that the Concept Web Alliance could also draw on existing efforts such as the open access OmegaWiki, which provides lexical and ontological information on many of the world’s languages; and the BioExpert project, a pilot knowledgebase that contains information about the peroxisome with information linked graphically via semantic web architecture.
In a blog entry posted this week, Wilbanks noted that he signed the CWA declaration because he agrees with its “core ideas of cooperation and coordination of methods and infrastructure.” However, he urged the group to follow the same model that made the World Wide Web successful, which he attributed to the fact that it was “technically open from the very start.”
“I think the CWA has the opportunity to be a tremendous public good,” John Wilbanks noted on his blog.
The World Wide Web Consortium has also formed a group tasked with looking into the application of semantic technology for life science research. The group, the W3C Semantic Web for Health Care and Life Sciences Interest Group, fully supports the CWA “and shares the spirit and intention” of its goals and welcomes its “new allies,” Scott Marshall, a researcher at the Free University of Amsterdam, who co-chairs the committee, told BioInform via e-mail.
The concept web and the semantic web are not “essentially different,” he said. The CWA is “an effective form of communication of the potential applications of the semantic web,” he added, underscoring that “the time is ripe to coordinate how we will share certain types of knowledge.”
Ruben Kok, managing director of the Netherlands Bioinformatics Centre, told BioInform at the meeting that the CWA approach is built upon the way research is communicated in the Netherlands. In a small country, he said, a networked approach to research is common.
As Kok told BioInform via e-mail, NBIC believes the CWA will “develop, advocate, and implement generic solutions for data interoperability, which we expect will greatly benefit the life sciences.” NBIC sees that the CWA needs to be established as an international organization and has not only taken the lead to “get this off the ground” but also plans to be “one of the driving forces for the next phase as well,” he said.
Triples can help overcome many barriers that hinder knowledge sharing, CWA co-founder Abel Packer told BioInform. He is director of Bireme, the Latin American and Caribbean Center on Health Sciences Information, headquartered in Brazil. “With triples you can easily transfer scientific knowledge from English to Spanish to Portuguese,” he said.
CrossRef, a cross-publisher membership organization that provides digital object identifiers to link scholarly articles across journals from different publishing houses, is also interested in the CWA because the organization pursues efforts to enable researchers “to easily identify and use trustworthy electronic content,” Geoffrey Bilder, CrossRef’s director of strategic initiatives said in his presentation.
Current scientific publishing faces a “ridiculous situation” in which researchers take data, convert it into narrative, and then mine the corpus “with complex text-mining tools” to turn it into data again, Bilder said.
Triples can help circumvent problems, he said, by linking objects and their properties. If two of those objects obtain URIs, “you have a database that transcends web sites,” Bilder said. Using RDF-enabled triples, both humans and computers will be able to mine scientific articles.
CWA could help scientists practice “reading avoidance” — figuring out what they don’t have to read and “what they need to know,” Bilder told BioInform.
Going forward, he said it’s unlikely that authors will help publishers build triples of texts and mark them up in RDF, but noted that it’s nevertheless “a great opportunity” for publishers, akin to creating a table of contents or an index.
Bilder explained that maintaining the scholarly record is also about “provenance,” understanding how information came to be and who created it, since trustworthiness tells researchers whether it is worth the risk to use the information and base further science on it.
Many publishers, librarians, and authors are interested in finding new ways to authenticate articles, to assure persistence of scholarly literature, or to disambiguate authors with similar names or track names that change over the course of careers. Using triples can help in this quest, he said.
Carole Goble’s slide presentation on Sustainability and Governance of the CWA is on line.
These are the webcasts that were recorded as podcasts of the proceedings on Friday May 8th. There are multiple streams because of breaks in the recording due to either the meeting structure or accidental stoppages in the recording. Unfortunately, they do not cover the entire meeting. These nine recordings are in chronological order from the start of the meeting:
First webcast (Eric Siegel and Barend Mons)
second (Barend Mons, continued)
third (Geoffrey Bilder and Gert-Jan van Ommen)
fourth (Gert-Jan van Ommen, continued, Jan Velterop, answering some questions from the webcast audience, and Mark Wilkinson)
fifth (Gerard Meijssen)
sixth (Andrew Gibson, Katy Wolstencroft, Marco Roos, and Scott Marshall)
seventh (Barend Mons again)
eighth (Carole Goble and Stephen Uzzo)
ninth (Stephen Uzzo, continued, and Abel Packer)
6-8 May, 2009, New York, USA. Inaugural Meeting of the Concept Web Alliance. 6 and 7 May for ‘Magnet Group’ only; 8 May for all prospective members. Location: New York Hall of Science
See list of participants below.
Evening: Welcome cocktail reception (sponsored by the Netherlands Bioinformatics Centre)
06.00-06.30 pm – Welcome to all participants, hand out of draft declaration
06.30-08.00 pm – Cocktail Reception with Passed Hors d’Oeuvres at NYHoS (Viscusi Gallery)
8.00-9.00 am – Arrival at NYHoS (Viscusi Gallery), registration and breakfast
9.00-9.15 am – Introduction by Day Chair: presentation results of discussions of initiating group on May 7
9.15-10.45 am – Presentations by chairs and/or discussants of May 7 working groups (Chair: Barend Mons)
10.45-11.00 am – Coffee break
11.00-12.00 pm – General discussion with and feed back from participants (Chairs: Mark Wilkinson and Gert Jan van Ommen)
12.00-12.30 pm – Brief introductions to demos and prototypes
12.30-02.00 pm – Lunch, exposition and further demos (time for networking/”time to talk”)
02.00-03.30 pm – Commitment and sustainability (specific discussion on sustainability with goal to get consensus and commitment) (Chairs: Carole Goble and Jan Velterop)
03.30-04.00pm – Coffee break, and filling out of questionnaires/comment forms
04.00-04.15 pm – Presentation “The Needs of Science Education” by Stephen Uzzo, NYHoS
04.15-04.30 pm – Closing statements
04.30-05.30pm – Informal Networking
Location supported by New York Hall of Science
Catering and start up cost CWA supported by NBIC, Netherlands Bioinformatics Centre
The conference was attended by:
Abel Packer – Bireme, initiating group member
Barend Mons – Concept Web Alliance, initiating group member
Carole Goble – MyGrid, initiating group member
Frederique Lisacek – SIB (standing in for Amos Bairoch, initiating group member)
Gert Jan van Ommen – LUMC, initiating group member
Jan Velterop – Concept Web Alliance–Knewco, initiating group member
Katy Borner – Indiana University, initiating group member
Mark Musen – Stanford NCBO, initiating group member
Mark Wilkinson – SADI / iCAPTURE , initiating group member
Antoine van Kampen – NBIC, organizing committee
Ruben Kok – NBIC, organizing committee
Albert Mons – Concept Web Alliance, organizing committee
Jacintha van Beemen – Concept Web Alliance, organizing committee
Benjamin Good – University of British Columbia
Bill Melton – Melton Foundation, organizing committee
Stephen Uzzo – New York Hall of Science, organizing committee
Anders Söderbäck – Swedisch National Library (Kunglige Biblioteket)
Bruce Gomes – Pfizer
Rajiv Salimath – Melton Foundation, Knewco
Adam Bly – SEED Media
Andrew P. Gibson – University of Amsterdam
Andrew Su – GNT Novartis
Bruce Kiesel – Thomson Reuters
Chris Lawrence – New York Hall of Science
Chris C. Wood – Santa Fe Institute
Eero Vuorio – University of Turku
Eric Marshall – New York Hall of Science
Eric Siegel – New York Hall of Science
Erik A. Schultes – Hedgehog Research
Geoffrey Bilder – CrossRef
Gerard Meijssen – Open Progress
Herbert Gruttemeier – French National Institute for Scientific and Technical Information
Izja Lederhendler – NIH
Jan-Eric Litton – Karolinska Institute
Jeffrey Grethe – Neuroscience Information Framework
Jill Sorensen – Innovation Institute
John Wilbanks – Creative Commons
Joseph Jackson – Freedom of Science
Julie Steele – O’Rielly Media
Karin Wagemakers – Carelliance
Karsten Uil – Charta
Katy Wolstencroft – University of Manchester
Kei-Hoi Cheung – Yale University
Laura Fregonese – LUMC
Lisa Denissen-Sidebotham – Shearman and Sterling
MacKenzie Smith – MIT
Magali Roux – CNRS
Marcela Tello-Ruiz – CSH Reactome
Marcia Rudy – New York Hall of Science
Marco Roos – LUMC/University of Amsterdam
Martijn Schuemie – Erasmus Biosemantics Group
Martin Krallinger – Spanish National Cancer Centre
Mike Pollard – Discovery Logic
Misha Kapushesky – EBI
Naina Pandita – National Informatics Centre, New Delhi
Peter-Jan Roes – Charta
Peter Walgemoed – Carelliance
Richard Cave – PLoS
Richard Gallagher – The Scientist
Rick Johnson – SPARC
Rick Verhoeven – Consultant
Roy Kaplan – SEED Media
Scott Marshall – University of Amsterdam
Segolene Ayme – Orphanet
Susanna-Assunta Sansone – EBI/ELIXIR
Timo Hannay – Nature
Veronica Olazabal – Rockefeller Foundation
Vivien Marx – Publisher
Walter Fontana – Harvard University
Wim van der Stelt – Springer
Ying Ding – Indiana University
Yoran Koren – Referata