Applying Semantic Web technologies to extract intelligence from Twitter data
Morton Swimmer, Senior Threat Researcher with Trend Micro, Inc., has an interesting slide presentation about the use of Semantic Web technologies to analyze Twitter data for "intelligence", particularly to identify malware threats. See "Twarfing: Gathering Intelligence from Twitter Data." The slides were for a recent presentation at the New York Semantic Web Meetup.
Twitter tweets are analyzed, mapped into RDF, stored in an RDF quadstore database, and then queried via SPARQL. His approach makes use of the SIOC, FOAF (Friend Of A Friend), GeoOWL, and Dublin Core ontologies.
Currently, JSON and CouchDB are used in the processing of Tweets.
He mentions "probable" use of Lucene in future work. A "cocktail napkin" block diagram identifies Lucene, but it is not clear whether that is in the current architecture or a future design.
The presentation includes a couple of SPARQL examples of "patterns" to identify both users who are promoting malware sites and the sites themselves, based on past references to sites that have been identified as malware sites.
He also mentions the use of "text signatures" to identify similar references across a wide range of tweets.
-- Jack Krupansky