Thursday, March 20, 2008

RDFa and microformats - embedding Semantic Web data within HTML Web pages

For all of its raw power, one difficulty with the Semantic Web with its RDF resources is that it is separate from the traditional Web of HTML pages that we all can view and interact with. The only way that you and I can readily access the Semantic Web is if some computer program does the access for us and maps between RDF and HTML or some other format suitable for a user interface. Now comes RDFa, which is an emerging standard for allowing RDF resources to be embedded directly in HTML (or XHTML) Web pages, a hybrid solution designed to provide the best of both worlds. Sometimes the term microformats is used to refer to how RDF resources can be packaged and embedded in Web pages, although microformats are not limited to RDFa.

The initial thrust of RDFa is for XHTML Web pages, but there is an effort underway for HTML web pages as well.

How it all plays out remains to be seen and the effort is still nascent, but does show promise.

OTOH, even success on the RDFa/microformat front will not automatically bridge The Semantic Abyss, but it certainly does help to bridge the gap between the HTML world of the Web and the RDF world of the Semantic Web.

You can read the RDFa Wikipedia article.

You can read the Microformats Wikipedia article.

You can also read the W3C RDFa Primer - Embedding Structured Data in Web Pages.

-- Jack Krupansky

Danny Ayers weekly update on the Semantic Web

Semantic Web expert Danny Ayers of Talis does a weekly roundup of news about the Semantic Web on the Nodalities - From Semantic Web to Web of Data blog. Alas, all of this info is great if you are a Semantic Web insider, but little of it really helps us to grapple with the chasm of The Semantic Abyss. OTOH, every little bit does help and some days there are actually breakthroughs even if only on a very modest scale.

So, if you want to take a peek behind the curtain of the Semantic Web, check out Danny Ayers' latest This Week's Semantic Web.

-- Jack Krupansky

Monday, March 17, 2008

Yahoo takes a run at the Semantic Web

A post on a Semantic Web email list just alerted me to a new initiative by Yahoo for "supporting semantic web standards." Now, the question is what that really means.

The article is on the BBC web site and is entitled "Yahoo makes semantic search shift."

The article references a recent post on the Yahoo Search Blog entitled "The Yahoo! Search Open Ecosystem." The post is little more than a teaser, but tells us:

The Data Web in Action
While there has been remarkable progress made toward understanding the semantics of web content, the benefits of a data web have not reached the mainstream consumer. Without a killer semantic web app for consumers, site owners have been reluctant to support standards like
RDF, or even microformats. We believe that app can be web search.

By supporting semantic web standards, Yahoo! Search and site owners can bring a far richer and more useful search experience to consumers. For example, by marking up its profile pages with microformats, LinkedIn can allow Yahoo! Search and others to understand the semantic content and the relationships of the many components of its site. With a richer understanding of LinkedIn's structured data included in our index, we will be able to present users with more compelling and useful search results for their site. The benefit to LinkedIn is, of course, increased traffic quality and quantity from sites like Yahoo! Search that utilize its structured data.

In the coming weeks, we'll be releasing more detailed specifications that will describe our support of semantic web standards. Initially, we plan to support a number of microformats, including hCard, hCalendar, hReview, hAtom, and XFN. Yahoo! Search will work with the web community to evolve the vocabulary framework for embedding structured data. For starters, we plan to support vocabulary components from Dublin Core, Creative Commons, FOAF, GeoRSS, MediaRSS, and others based on feedback. And, we will support RDFa and eRDF markup to embed these into existing HTML pages. Finally, we are announcing support for the OpenSearch specification, with extensions for structured queries to deep web data sources.

We believe that our open approach will let each of these formats evolve within their own passionate communities, while providing the necessary incentive to site owners (increased traffic from search) for more widespread adoption. Site owners interested in learning more about the open search platform can sign up here.

So-called microformats as well as RDFa are very useful technologies, but they do no automatically open up a vast new market. Google succeeded not because they identified a new technology that they convinced web page designers to use, but because they excel at "mining" existing web pages.

In my own view, the Semantic Web will not be ready for prime-time use by consumers until we have tools that can automatically annotate data with the necessary Semantic Web annotations, rather than each user having to do extra work on their own.

I wish the Yahooligans good luck and do sincerely hope that they make good progress, but they have a steep uphill climb ahead of them.

That said, this Yahoo initiative can only help to focus even more attention on the Semantic Web and to spur further innovation to make the Semantic Web ever-more pervasive. Only after businesses such as Yahoo push forward and show people how far we still have to go will people begin to recognize the vastness of The Semantic Abyss.

-- Jack Krupansky

The fable of the blind men and the elphant as applied to the Semantic Web

The Semantic Web is a very sophisticated animal, or "beast" if you will, an elephant in some sense. The level of sophistication is so extreme that even a lot of well-meaning software professionals cannot agree on what exactly the Semantic Web is and is not. This situation is in fact a great real-world example of the the fable of the blind men and the elephant, with each participant focused on only their particular perspective and none able to sense the totality of the phenomenon they were experiencing.

And even if some guru were to inform the blind men of the holistic phenomenon that they were experiencing, albeit in their own disjoint fragments, none of them would be in a position to either comprehend or disagree with the holistic view.

Alas, this does appear to be the case with the Semantic Web. Personally I still believe that many of us professionals can actually make sense of the whole elephant, but the evidence to date does appear tilted against me at this moment.

So, what are some of the diverse perspectives on what the Semantic Web is or is not:

  • The Semantic Web is simply XML
  • The Semantic Web is simply XML "technologies", such as XML Schema
  • The Semantic Web is based on RDF
  • The Semantic Web requires RDF
  • The Semantic Web does not require RDF
  • The Semantic Web is based on XML Schemas
  • The Semantic Web is primarily about interfaces for Web Services
  • The Semantic Web is not primarily about interfaces for Web Services
  • The Semantic Web is exemplified by RSS/Atom, web feeds, and Web 2.0
  • The Semantic Web is not exemplified by RSS/Atom, web feeds, and Web 2.0
  • The Semantic Web is capable of representing human knowledge
  • The Semantic Web is not capable of representing human knowledge
  • The Semantic Web is capable of representing human meaning
  • The Semantic Web is not capable of representing human meaning
  • The Semantic Web is about meaning
  • The Semantic Web is about inference, not human meaning
  • The Semantic Web requires ontologies
  • The Semantic Web does not require ontologies
  • The Semantic Web is nothing without OWL
  • The Semantic Web is perfectly usable without OWL
  • The Semantic Web is well-suited for representing data from relational databases
  • The Semantic Web is ill-suited for representing data from relational databases
  • The Semantic Web can represent any form of data, including non-Web data
  • The Semantic Web is unable to represent non-Web data
  • And so on...

Over time I will seek to address all of these "perspectives", and more.

-- Jack Krupansky

Thursday, March 6, 2008

Computational Logic and Cognitive Science

As if to speak directly to the core meaning of this blog, there is an academic summer school coming up at the end of the summer entitled Computational Logic and Cognitive Science at Technische Universität Dresden:

The summer academy will focus on the long-lasting controversy of the relationship between modern formal logic (including its use for automated reasoning and computation) and, on the other hand, the rationality and common sense underlying human reasoning. Traditionally, a huge gap is perceived between the symbolic representation of knowledge used in modern logic and the sub-symbolic representation considered dominant in human reasoning. Psychological experiments of the past even suggested that people often don't reason logically and, in general, that logic seems to play only a minor role in human reasoning.  However, recently, new ways of explaining human reasoning seem to revive its relatedness to logic.  For this reason this summer academy attempts to bring together researchers from both sides for an exchange of views.

In short, we are making good progress at filling and bridging the semantic abyss, but if anybody thinks we are close to being "there" any time soon, they should guess again.


Call for Participation

   ICCL Summer School 2008
   Technische Universität Dresden
   August 24 -- September 6, 2008

-- Jack Krupansky

Tuesday, March 4, 2008

Vision for Factpad - A Consumer Tool for Storing Facts for the Semantic Web

I have just posted a description on one of my web sites for what I call Vision for Factpad - A Consumer Tool for Storing Facts for the Semantic Web, which is a summary of the kind of tool I think is needed to enable average consumers to actively participate in creating a truly consumer-centric Semantic Web. Put simply, there needs to be an easy way for even average consumers to create, manage, maintain, publish, and share relatively simple facts in the context of the global Semantic Web in a form that is independent of and not under the control of any vendor. No more walled-garden web sites.

The current description for Factpad is still quite rough around the edges, but I will be incrementally evolving it over time.

I currently have no intentions of turning the Factpad concept into a commercial product or service by myself, but I am hoping that others will read about my vision/concept and develop products and services along those same lines.

Please let me know if you run across similar efforts so that I can keep track of them.

-- Jack Krupansky

Sunday, March 2, 2008

The Semantic Web in a nutshell - RDF - heart of the Semantic Web

The heart and soul of the Semantic Web is RDF, short for Resource Description Framework, which is a language for describing entities, known as resources, in terms of their identity, their attributes or properties, and the relationships between these resources and classes of resources.

At its core, the Semantic Web, and hence RDF, is all about resources, which can be absolutely anything in the universe, either real or imagined. Typically a resource is either a class of resources or an instance of a class of resources.

Webs or networks of resources can be constructured and are known as RDF graphs. Many RDF graphs will be interconnected in larger webs or networks or graphs, but some may exist as smaller, isolated graphs.

Each resource is named with an identifier, which is a URI (Uniform Resource Identifier, which is very similar to a URL or Uniform Resource Locator), known as an RDF URI. An RDF graph refers to resources using RDF URI references.

The properties or attributes of resources and classes of resources are represented using a collection of statements about the resources, known as RDF statements. Relationships between resources and classes of resources are also represented using RDF statements.

Each RDF statement is represented in a form known as an RDF triple or simply triple. An RDF triple has three components: an RDF subject, an RDF predicate, and an RDF object, all three of which are RDF URI references. Typically, the subject is the identifier for a resource, the predicate is the name of a property, and the object is either a reference to another resource or a property value.

RDF statements are collected together to form a RDF document. Typically an RDF document will be stored on the Web and the URL for that document, known as an absolute URI, will be used as the base for the RDF URIs of all resources in that file.

Since even simple resources may require many statements and there may be many resources, software may access and pre-process a large number of RDF documents and store the pre-processed triples in a database known as a triple store. That is not a requirement for the Semantic Web, but it can dramatically improve performance.

Each resource will typically have a label in an RDF document. A reference to a label is known as a fragment identifier. Typically an RDF URI reference consists of the absolute URI for the RDF document followed by the fragment identifier for the specific resource.

Conceptually, RDF is independent of the lower level XML language, but in practice an RDF graph is serialized into XML.

Because URIs can be quite long and cumbersome, RDF statements typically utilize XML namespaces, which allow a short name to be used as a synonym for the full URI associated with the namespace within the scope of a resource in an RDF document file. XML supports a qualified name or QName, which permits the full RDF URI reference to be represented by a QName consisting of an XML namespace name and the name within the namespace. It is also possible to declare a default namespace so that even the XML namespace need not be specified on every URI reference.

People casually refer to the Semantic Web and resource properties as metadata, but that term is not a formal aspect of the Semantic Web.

There is a lot more to the Semantic Web than what I have said here and I may have oversimplified or misrepresented some aspects of the Semantic Web, but this should give you a good start at understanding what the Semantic Web is about.

In summary:

  • XML is the lower-level language in which RDF resources and graphs are represented as text documents
  • An RDF document contains the XML serialization of RDF statements about one or more resources
  • Every resource has a name represented as a URI
  • XML namespaces simplify references to resources
  • RDF statements (triples) specify the attributes and classes of resources
  • RDF statements specify the relationships between resources and classes of resources

-- Jack Krupansky

Twine - a new service based on Semantic Web technologies that helps users share, organize, and find information with people they trust

Twine, from Radar Networks, is billed as "a new service that gives users a smarter way to share, organize, and find information with people they trust." Radar states that "Twine is one of the first mainstream applications of the Semantic Web, or what is sometimes referred to as Web 3.0." Radar summarizes Twine as follows:

Twine Ties it all Together

Twine is a new service that intelligently helps you share, organize and find information with people you trust.

Share more productively. In Twine you can safely share information and knowledge, and collaborate around common interests, activities and goals. Twine helps you better leverage and contribute to the collective intelligence of your network. Use Twine to share more productively with friends, colleagues, groups and teams.

Get more organized. Twine provides one place to tie everything together: emails, bookmarks, documents, contacts, photos, videos, product info, data records, and more. And, because Twine actually understands the meaning of any information you add in, it helps you organize all your stuff automatically. Finally, you can search and browse everything and everyone you know, about anything, in one convenient place.

Find and be found. You are like a snowflake – you are totally one-of-a-kind. Twine recognizes what makes you special: your unique interests, personality, knowledge and relationships, to help you find and discover things, and be found by others, more relevantly.

Who is Twine For?

Friends. Colleagues. Groups. Teams. Anyone who needs help dealing with the growing array of information and relationships on the Web today. Whether you just need to organize and share with friends, or you need to collaborate better with teams, Twine provides the smartest way to tie it all together.

How Does Twine work?

We thought you'd never ask! Well, in a nutshell Twine uses the Semantic Web, natural language processing, and machine learning to make your information and relationships smarter. But if that's all Greek to you, just think of Twine as your very own intelligent personal Web assistant, working for you behind the scenes so you can be more productive.

Radar more tersely summarizes Twine as:

A revolutionary new way to share, organize, and find information.

Use Twine to better leverage and contribute to the collective intelligence of your friends, colleagues, groups and teams. Twine ties it all together.

As far as what capabilities of the Semantic Web they actually use and how they use them, the details are not public, as far as I could tell from a brief scan of the web site.

There is an interesting article about Twine from the International Herald Tribune by Anne Eisenberg entitled "Twine ties things together on Web." It notes that:

Twine, at, can scan almost any electronic document for the names of people, places, businesses and many other entities that its algorithms recognize.

Then it does something unusual: it automatically tags or marks all of these items in orange and transfers them to an index on the right side of the screen. This index grows with every document you view, as the program adds subjects that it can recognize or infer from their context.

Customers have individual accounts on Twine's Web site, where they save URLs or other information. They can make their collections, or "twines," private, share them in groups with other members having common interests like politics or fashion, or even make the twines public.

The article notes that you can sign up for the free beta program, but that there is a waiting list of 30,000 people.

I would be interested in hearing any overall or detailed feedback about Twine.

A quick scan of did not reveal whether their Semantic Web resources (RDF files, schemas, etc.) are "open" and accessible to other Semantic Web software agents. Superficially, it sounds like yet another "walled garden" web site, but I simply do not know for sure.

In any case, Twine certainly does look like an interesting application of Semantic Web technologies.

-- Jack Krupansky

What is taking me so long?

I had hoped to have posted a lot more material to this blog by now, but in the interest of being as accurate as possible I have been reading more deeply about the underpinnings of the Semantic Web, and the more I read, the less I feel that I know the material well enough to blog about it credibly, causing me to read more, in a vicious cycle. But, I am making progress and I should begin to post more shortly.

Just to give you a feel for how deep this journey is taking me, take a look at my current placeholder for a Semantic Web glossary. I still need to understand the Semantic Web at a much deeper level before adding definitions for all of these terms, but the list already has 172 terms, acronyms, and key phrases, and I am just getting started.

As if I didn't already know, the Semantic Abyss is indeed quite deep. In some ways it is even deeper than I had imagined, but I do suspect that there are plenty of "shallows" and shortcuts that can and do enable some degree of practical alignment between the Semantic Web and the real world.

-- Jack Krupansky

Saturday, March 1, 2008

Semantic Web glossary

As my own explorations of the Semantic Web progress I will be incrementally building up a Semantic Web glossary. My current glossary is currently only a placeholder and merely a list of terms and phrases used when writing about the Semantic Web, but eventually my intention is to flesh out meaningful definitions for all of those terms.

Please let me know of terms you feel I should add, as well as other Semantic Web glossary efforts.

See: Semantic Web glossary.

-- Jack Krupansky