Monday, March 9, 2009

What is curated data?

The compound term curated data merely means data (in a loose sense, that may include even the most sophisticated computational formats for structured information, media, and even knowledge) that has been collected and organized under the supervision of one or more persons considered to be qualified to engage in such an activity. Such a person may in fact be a scientist or other expert in a relevant field, or may merely be someone acting in a clerical capacity who simply verifies that the data is "acceptable" according to the requirements of whoever has commissioned the curation of the data. The implication is that the resulting database (or data series or data set) is of high quality. The contrast is with data which may have been gathered through some automated process or using particularly low or unskilled workers such that the quality of the data is unverified and possibly unreliable.

An example of curated data is the extensive computable data supplied with Mathematica from Wolfram Research.

-- Jack Krupansky

Sunday, March 8, 2009

The DIKW hierarchy - Data, Information, Knowledge, Wisdom

The DIKW hierarchy is a rough model for relating data, information, knowledge, and wisdom. Granted, the model lacks scientific precision and may not have a lot of functional utility, among numerous criticisms, but I personally find that it helps to clarify what level of refinement and robustness we are dealing with.

There is implicitly a lower-level below the DIKW hierarchy, the signal level where we have raw sensor readings before they are formatted into data.

Data (or a data item) has little meaning directly attached to it. We have primitive data types such as integers, floating point numbers, character strings, boolean flags, etc., and we have streams of data. We speak of data values or the value of a data item.

Information takes data and applies structure and rudimentary meaning. We have records, structures, database tables, and other methods and mechanisms for organizing raw data items into somewhat abstract structures. These structures may be coupled with methods for manipulating the information and rules that constrain the information and structures or define relationships among subsets of the information. This is the meat and potatoes of computation as we know it today.

Knowledge moves towards representation of meaning that may begin to approximate human knowledge. Knowledge has a meaning structure but may or may not be based on information structures as well. We may also approximate knowledge using semi-structured information.

Wisdom corresponds to judgment in the application of knowledge, but is not yet readily achievable in a typical computational environment.

This DIKW model is surely very limited and may not ultimately give us a lot of intellectual leverage, but it is a decent starting point.

I see the current Semantic Web as mostly focused at the information level, trying to give us computational power at the Web level that we currently have within individual computers and individual applications.

The hope is that once we have mastered information at the Web level, maybe then we can layer knowledge on top of that. And then maybe wisdom can be laid on top of that. Or so the fantasy goes.

-- Jack Krupansky

Wolfram Alpha - computational knowledge engine

Wolfram Research (Stephen Wolfram) is on the verge of unveiling a new project called "Alpha" which is billed as a "computational knowledge engine." It combines the computational power of Mathematica with tools to "explicitly curate all data so that it is immediately computable" to be able to "take questions people ask in natural language, and represent them in a precise form that fits into the computations one can do" and "handle all the shorthand notations that people in every possible field use." Wolfram says:

... I'm happy to say that with a mixture of many clever algorithms and heuristics, lots of linguistic discovery and linguistic curation, and what probably amount to some serious theoretical breakthroughs, we're actually managing to make it work.

He does add the caveat that:

And -- like Mathematica, or NKS -- the project will never be finished.

But he triumphantly announces that:

... I'm happy to say that we've almost reached the point where we feel we can expose the first part of it.

It's going to be a website: With one simple input field that gives access to a huge system, with trillions of pieces of curated data and millions of lines of algorithms.

Having a simple Google-like search engine box is all well and good, but the real question is the extent to which the engine is "open", both in terms of programmatic API and Web Services access and integrating with external data.

How it compares with and meshes with the Semantic Web remains to be seen.

In any case, this does sound like a significant leap forward

-- Jack Krupansky

Thursday, March 5, 2009

Check out Knoodl which facilitates community-oriented development of OWL based ontologies and RDF knowledgebases

This is mostly just a note to myself to look into Knoodl:

Knoodl facilitates community-oriented development of OWL based ontologies and RDF knowledgebases. It also serves as a semantic technology platform, offering a service based interface so that communities can build their own semantic applications using their ontologies and knowledgebases. Knoodl is a product of Revelytix, Inc. and is hosted in the Amazon EC2 cloud and is available for free.

According to their web site, Knoodl offers:

  • Cloud-based application (Amazon EC2)
  • Ontology editing
  • Ontology import/export
  • Collaboration
  • Role-based security
  • Scalable RDF store (Mulgara)
  • NEW SPARQL Endpoints NEW
  • SPARQL query wizard (March '09)
  • Ontology guided search (March '09)
  • Graphical ontology mapping wizard (March '09)
  • User designed widgets and gadgets for viewing data (March '09)
  • User designed widgets and gadgets for entering data and submitting queries (March '09)

They tell us that:

All content in Knoodl is organized into Communities. You can browse the list of Communities by clicking on the Community menu at the top of the screen and selecting Directory. Within Communities, there are regular Wikis and there are Vocabularies. A Vocabulary is a combination of an OWL based ontology editor and a wiki. Wikitext in Knoodl is not semantic, it is there to provide users with the ability to collaborate more effectively and add rich documentation. Each Vocabulary represents an ontology. Every resource (class, property, and instance) in the ontology has its own page in the Vocabulary.

To get started, you can take the tour and see how to get started, then dive in and check out some of the example vocabularies, and see what vocabularies people have already uploaded. Better yet, register for an account, create or join a community, and start contributing!

Sounds quite interesting.

One question I have: Is the "k" in "Knoodl"enunciated or is it more like the silent "k" in "knowledge"? I am guessing that it is pronounced "noodle" rather than "ka-noodle", but who knows.

Hmmm... I wonder if any of ontologies for vocabularies include pronunciations?! Anybody have an ontology for natural language speech, the spoken word?

-- Jack Krupansky