Tuesday, December 30, 2008

SKOS and term definitions

I had been considering the Simple Knowledge Organization System (SKOS) for representing term definitions that I currently write on HTML Web pages, but after reading about it, trying it, and reading the discussion on the SKOS mail list, I have to reconsider. SKOS is design to publish knowledge hierarchies or thesauri, but actually is not designed to be the underlying model for them. Most of my terms are distinct and un-organized, too new and fluid to be nailed down into a fixed hierarchy. Yes, I can use SKOS to publish them, but I would be better off finding a Semantic Web scheme for actually representing them at the source.

I want to have some way to create a Semantic Web resource simply to be the semantic anchor for a concept with the possibility that there may be many terms that define that single concept and there may be many definitions for each of those terms and there may be many authors and sources for those definitions. Rather than creating a single SKOS hierarchy, I wish to work on concepts, terms, and definitions as independent entities. Yes, maybe they can be collected into an SKOS list or hierarchy eventually, but that is not my initial or even my ultimate goal.

So, these are the four concepts I am looking at:

  1. A concept "anchor" resource. May also refer to another anchored concept to refine it or to combine multiple concepts. Some "concepts" may in fact be flagged as being domains, and concepts and domains can be linked.
  2. A term text resource. Refers to an anchored concept. May also link to a domain if no anchored concept exists yet. May also be a "link" term which simply links to another term but with some changes to be applied.
  3. A term definition text resource. Refers to a term text resource, or even possibly multiple terms such as synonyms.
  4. A glossary resource. A list of term references.

Although a single term requires all of the first three (to be complete), there is no need or requirement for all three to be colocated or designed together. Terms for a minimal glossary may simply have term text and definition, without a concept resource for the term.

Term definitions can be both anchored to a specific term text resource or refer to the term by its text and some context anchor (e.g., domain.) This supports terms which have distinct definitions in distinct contexts.

-- Jack Krupansky

Sunday, December 28, 2008

Concepts, islands, and archipelagos

The heart of a true semantic web is the concept or a collection of concepts. The heart of the Semantic Web is the resource which is represented by a URI. A resource can be anything. It can be a document on the Web, a reference to a physical real-world object or phenomenon, or even an idea or abstract concept. The Semantic Web itself does not recognize concepts per se, only resources that users associated with concepts in their minds. So, if we want to refer to a concept in the Semantic Web, we need to assign it a resource and URI.

A given application domain, such as astronomy or auto repair or health care, would encompass a collection of concepts. The users of a given domain would need to agree on the terms to be used for the concepts and how they are mapped to resources and URIs. In other words, for a given application domain, the users share knowledge of the resource URI to be used for each concept in that domain.

Different domains may or may not have different users and they may or may not have different concepts, but the users of different domains are free to assign terms and concepts and resources and URIs differently than how they are assigned in other domains. Sometimes concepts in different domains will be distinct and separate and sometimes they will overlap or even be virtually identical.

Different organizations or groups may also have their own distinct concept and resource mappings for a given domain, so that there may be multiple mappings for the same domain concepts to different URIs.

There is no requirement that all concepts in an application domain be present in a given concept and resource mapping. Sometimes only a subset is needed. Sometimes it might be too impractical to represent all concepts.

Granted, there are clearly benefits to agreeing to share concept and resource mappings for each domain, but there are sometimes benefits to having the freedom to exercise full control over the mapping.

Ultimately, each concept and resource mapping for a given domain is essentially an island in this "sea" called the Semantic Web. If coherent and constructed well, call it an island of excellence. Anyone can visit and utilize the resources of a given island, but only to the extent that they agree to accept the concept and resource mappings.

Each island is a land to itself, but sometimes it makes sense for two or more islands to interact and define and make use of shared concepts and resources. These distinct islands may not share and agree on all of their concepts and resource mappings, but enough to make a collaboration of some sort worthwhile. We can think of these collaborating islands as an archipelago.

There may be many archipelagos in the "sea" of the Semantic Web. An unlimited number of them may also choose to share subsets of concept and resource mappings. Sometimes it will make perfect sense to have very large archipelagos, while at other times smaller island groups or even single islands may make perfect sense. Sharing concepts and resource mappings can present many valuable opportunities, but sometimes sharing can be a significant burden or maybe not even be practical at all.

But unless each island is truly "excellent", connecting them together in a network would be futile.

In short, constructing a Semantic Web means carefully mapping concepts to resources, constructing domain islands of excellence, and then interconnecting those domain islands of excellence so that collaboration is enabled and empowered.

-- Jack Krupansky

Monday, December 8, 2008

XML Schema definition language

I am biting the bullet and diving deep to get a full handle on the XML Schema definition language which defines the rules for constructing rules for how XML documents are constructed. In other words, if you want to define a new XML document format, you need to construct a schema for it. If you want to construct a schema, you need to understand the rules for XML Schema. There are in fact tools to simplify the task of constructing XML schemas, but for my purposes I really do need to become proficient, if not expert, in the finest level of details of XML schemas.

I am starting with the primer, XML Schema Part 0: Primer Second Edition, W3C Recommendation 28 October 2004. This is very dense material, but the primer includes lots of examples.

-- Jack Krupansky