OnPoint by Keith Ng


Wolfram Alpha: Tech journos FAIL

Wolfram Alpha is not a search engine. The creator of Alpha, Stephen Wolfram, subtly hinted at this when he said:

We are not a search engine.”

And in its FAQ:

Is Wolfram|Alpha a search engine?

No. It's a computational knowledge engine: it generates output by doing computations from its own internal knowledge base, instead of searching the web and returning links.”

Not only are some tech journalists unable to semantically parse the sentence “not a search engine”, but they don’t seem to know what a search engine – arguably the most important invention of the internet since the internet itself – actually is.

A search engine uses “web crawlers” to visit every webpage it can access, sticks the information in a catalogue and allows users to search through it.

Alpha is a closed semantic database. It is filled with entries that are semantically linked to each other. It has nothing to do with searching the web.

(Think of search engines as a crack team of hot-shit robot librarians. By night they go read through all the books, figure out what’s in each one, and write it all down in a central catalogue. By day, when you come in and ask them for information, they go through the catalogue, figure out which ones you might want, then dump a few million books on your lap. Oh, and the library is full of porn.)

Wolfram Alpha, on the other hand, is like an encyclopedia. It is a closed database. It has what’s inside, and it has references to source material, but the volume of knowledge that it holds is a tiny tiny fraction of what’s available in the whole library.)

A semantic search engine has the properties of both. It can scurry around the internet cataloging, indexing information and automatically discern semantic relationships based on the information it gathers. Alpha does not do this. It merely searches its own database using semantic relationships that are defined by human staff.

Treating Alpha as just an answer box (from Associated Press. FACEPALM.) misses the point entirely. It’s not a knowledge base for humans. (Most) humans understand semantic relationship the old fashioned way. With words. Alpha barely does this. It gives you graph titles in lieu of explanations. Not so helpful for humans.

Alpha's promise lies in the fact that other computers can ask it questions and get answers that are meaningful to them; those machines can then use the answers and pass them along with the semantic relationships (i.e. What the answer means) intact.

It’s a knowledge base for machines. And it’s a really important building block to allow machines to understand things about the real world. This would, in turn and in time, allow us to understanding more things about the world.

In theory, you should be able to ask a computer about global warming, and it will find that it is caused by greenhouse gases, which includes carbon dioxide; at the same time, under entries for fossil fuel combustion, it will list carbon dioxide as an output, and it will link to the total amount of emissions resulting from fossil fuel combustion. With the semantic relationships joining all these facts together, a computer should be able to put them together and understand a link.

In theory. Right now, it’s kinda shit.

It knows that a World War II happened between 1939 and 1945, that a Adolf Hitler was involved and a Nazi Germany was involved. But it has no idea what Adolf Hitler did apart from the fact that he was a head of state, it doesn’t know what country Adolf Hitler was a head of state of, or whether Nazi Germany participated in any wars.

It doesn’t work because the data simply isn’t there. It’s sum knowledge of WW2 is start date, end date, a dozen countries and a dozen head of states. It’s not much.

Worse, not only does it not know much, but it doesn’t even know what it knows. It knows that WW2 involved Nazi Germany, but it doesn’t know that Nazi Germany was involved in WW2. This means that its understanding of its own semantic relationships is broken. Very bad.

Its natural language processing is also lousy, but NLP search is for chumps and charlatans (that’s the fancy name for “consultants”). NLP is useful for improving web crawling and indexing, but FFS people, it’s 2009 – users can make a goddamn search query without the aid of complex NLP algorithms.

On the bright side, Alpha works as pretty awesome statistics calculator. It’s basically the Mathematica suite as an online tool, and knows things like the indefinite integral of the Fibonacci sequence. Which… pretty much means that it has virtually zero value for the average user in the short-term. But when it actually works, and its database expands exponentially, and people start building tools to work with it, and it starts connecting with other services… then it’ll rise up and destroy us all.

Still not particularly useful for your average user, but more interesting, anyway.

25 responses to this post

Post your response…

Please sign in using your Public Address credentials…


You may also create an account or retrieve your password.