Exploring the IBM Watson Concept Insights service using
Netfix and Amazon are two examples of companies with famously precise "recommendation engines": their entire business model relies on being able to present their users with attractive products related to the ones that they have already viewed. This kind of service is known as a "recommendation engine", and though each specific implementation differs, the core idea—providing a way to map objects to and from one another—is always the same.
watsongraph Python library is makes it as easy to map text to concepts and
concepts to other concepts, freeing up those working with concept mining or relationship analysis to focus on
their actual datasets, not the analytical tools that they are using. To do this it uses the IBM Watson Concept
Insights service and IBM's
wikipedia/en-20120601 cognitive graph, a
machine-learned model of the entire Wikipedian corpus, to reconstruct flexible
ConceptMap objects amongst concepts of intrest. These maps can in turn be measured for their relative
overlap, creating a flexible yet durable way to model the relationships within almost any dataset.
The advantage of doing things this way is that you don't need to sink any time into the development of any sophisticated probabilistic models: the system "just works" right out of the box. Creating relationships is as simple as:
>>> from watsongraph.conceptmodel import ConceptModel >>> ibm = ConceptModel(['IBM']) >>> ibm.explode() >>> ibm.concepts() ['.NET Framework', 'ARM architecture', 'Advanced Micro Devices', ...] >>> len(ibm.concepts()) 37 >>> 'Server (computing)' in ibm.concepts() True >>> ibm.augment('Server (computing)') >>> len(ibm.concepts()) 58 >>> ibm.edges() [(0.89564085, 'IBM', 'Digital Equipment Corporation'), (0.8793883, 'Solaris (operating system)', 'Server (computing)'), '...' ]
And modeling users (or user agents) using this system is easy too:
>>> from watsongraph.user import User >>> from watsongraph.item import Item >>> Bob = User(user_id="Bob") >>> Bob.input_interests(["Data science", "Machine learning", "Big data", "Cognitive science"]) >>> meetup = Item("Meetup", "This is a description of a pretty awesome event...") >>> relay = Item("Relay", "This is a description of another pretty awesome event...") >>> Bob.interest_in(meetup) 1.633861635 >>> Bob.interest_in(relay) 1.54593405 # Update the "Bob" model to account for our new information on Bob's preferences. >>> Bob.express_interest(meetup)
Try it out yourself!
pip install watsongraph and take a look at the source repository on GitHub for more information.
Since this library talks to IBM's Bluemix cloud services, the only requirement to using it is that you will need
to generate an access token—see the documentation for more
information on that.
In this post I will focus on a related but sort of meta-analytical issue: parametrizing Concept Insights output. After all, in order to use a "black box" system you will need to have some sense of what kind of output it will give you for the domain you are applying it to. In order to demonstrate the range of such output I have visualized relative conceptual interdependence, as Watson sees it, of several small but nevertheless interesting datasets.
First up, here is the conceptual output for the seven planets. Lines width and weight is by the scale of the conceptual relationship on a scale of 0.5-to-1. Hover over the individual nodes to get names, and over the individual edges to get relationship strengths.
The data here is kind of hard to interpret, but it does give us a sense of the scale of the relations that Watson tends to generate. Earth seems to be relatively unique, for a planet, having only relatively weak (<0.70) conceptual relatedness to the rest of the solar system. The same is true of Mercury. It's unsurprising that Watson highly correlates the gas giants. Somewhat more surprisingly Venus, thanks to its atmosphere, is also considered relatively highly correlated to them. The relationships amongst the rocky planets, meanwhile, are much weaker than those amongst the gas giants.
Next let's take a look at what Watson has to say about my top 30 contributions on Wikipedia in terms of edit volume. This time nodes are scaled in size and intensity against the number of edits (which is also visible in the tooltip):
It doesn't take very long to see that the vast majority of my most highly edited articles are very strongly correlated! The density of the cloud surprised me. But you can also see that Watson is silent on the connections of a number of points here, all of which were "one-off" diversions for me in terms of editing activity. As for Asinara: this is a small island at the tip of Italian Sicily, while Ferdinandea is a seamount and former volcanic island claimed by Italy which succeeds in conceptually "linking" the island back into the rest of the data.
Here is probably the most interesting of the force graphs that I made. Here we measure the conceptual interdependence of the fifty largest US companies (taken from the Fortune 500 list). Nodes are colored according to industry.
The relationships here are very interesting, and I recommend playing around with it for a while to see what you can learn. The Verizon-AT&T-Comcast substructure is unique. Oil companies are considered to be very strongly related to one another, while technology companies and car companies are considered to be distinguishably unique (IBM itself is amongst the points floating around on the outside). General Electric and Beoing are also surprisingly unique—what is this United Technology Company? Freddie Mae and Fannie Mac are, unsurprisingly, placed very close to one another. And so on.
watsongraph is still a beta library, as it is not quite yet ready for prime time: there are a number
of important structural improvements
that remain to be carried out to take it to its first full release,
0.3.0. But there are already
quite a few lessons to be learned from visualizing. And, ok, it's also just plain old good fun!