By Steven Levy and first posted on medium.com.
For the “Cliff Notes” version, check out this video by Rand Fishkin from MOZ.com.
“I need to know a bit about your background,” says Geoffrey Hinton. “Did you get a science degree?”
Hinton, a sinewy, dry-witted Englishman by way of Canada, is standing at a white board in Mountain View, California, on the campus of Google, the company he joined in 2013 as a Distinguished Researcher. Hinton is perhaps the world’s premier expert on neural network systems, an artificial intelligence technique that he helped pioneer in the mid 1980s. (He once remarked he’s been thinking about neural nets since he was sixteen.) For much of the period since then, neural nets — which roughly simulate the way the human brain does its learning— have been described as a promising means for computers to master difficult things like vision and natural language. After years of waiting for this revolution to arrive, people began to wonder whether the promises would ever be kept.
But about ten years ago, in Hinton’s lab at the University of Toronto, he and some other researchers made a breakthrough that suddenly made neural nets the hottest thing in AI. Not only Google but other companies such as Facebook, Microsoft and IBM began frantically pursuing the relatively minuscule number of computer scientists versed in the black art of organizing several layers of artificial neurons so that the entire system could be trained, or even train itself, to divine coherence from random inputs, much in a way that a newborn learns to organize the data pouring into his or her virgin senses. With this newly effective process, dubbed Deep Learning, some of the long-standing logjams of computation (like being able to see, hear, and be unbeatable at Breakout) would finally be untangled. The age of intelligent computers systems — long awaited and long feared — would suddenly be breathing down our necks. And Google search would work a whole lot better.
This breakthrough will be crucial in Google Search’s next big step: understanding the real world to make a huge leap in accurately giving users the answers to their questions as well as spontaneously surfacing information to satisfy their needs. To keep search vital, Google must get even smarter.
This is very much in character for the Internet giant. From its earliest days, the company’s founders have been explicit that Google is an artificial intelligence company. It uses its AI not just in search — though its search engine is positively drenched with artificial intelligence techniques — but in its advertising systems, its self-driving cars, and its plans to put nanoparticles in the human bloodstream for early disease detection. As Larry Page told me in 2002:
We don’t always produce what people want. That’s what we work on really hard. It’s really difficult. To do that you have to be smart, you have to understand everything in the world, you have to understand the query. What we’re trying to do is artificial intelligence…the ultimate search engine would be smart. And so we work to get closer and closer to that.
Google was already well along that path when Geoff Hinton made his breakthrough. Over the years, the company has been a leader in using a more traditional form of what is called machine learning to make its search engine smarter. Only a few years into the company’s history, it hired a group of AI-savvy engineers and scientists who jiggered the search engine to learn things like synonyms. When millions of users used a certain word interchangeably with another (dog or puppy, for instance), Google would quickly utilize that knowledge to understand queries better. And when Google took on the task of translating web sites to deliver results from sites in different languages, its scientists made use of a process that fed massive amounts of translated documents and their sources into the system. That way, Google’s search engine “learned” how one language mapped to another. Using that AI procedure, Google could translate web sites into languages not spoken by any of its engineers.
Deep learning is now viewed as a step beyond that more straightforward variety of machine learning. Since it is based on the architecture of the human brain, its adherents argue that, in theory, deep learning is the launch pad for computer-based feats of intelligence not possible — at least not easily—with previous approaches. That’s why Hinton’s breakthrough is so important to Google, as well as every other company dealing in search and related problems. Google has worked hard in the past few years to reshape its search engine to generate a conversational experience. But to truly attain the skills of even a very young human being, the frontiers of AI must be expanded, and Deep Learning is the tool du jour for accomplishing this.
Explaining the circumstances by which neural nets earned the sobriquet Deep Learning isn’t easy. But Hinton is game to try, though I felt I detected a hopeless sigh when he learned he was addressing an English major.
Neural nets are modeled on the way biological brains learn. When you attempt a new task, a certain set of neurons will fire. You observe the results, and in subsequent trials your brain uses feedback to adjust which neurons get activated. Over time, the connections between some pairs of neurons grow stronger and other links weaken, laying the foundation of a memory.
A neural net essentially replicates this process in code. But instead of duplicating the dazzlingly complex tangle of neurons in a human brain, a neural net, which is much smaller, has its neurons organized neatly into layers. In the first layer (or first few layers) are feature detectors, a computational version of the human senses. When a computer feeds input into a neural net—say, a database of images, sounds or text files—the system learns what those files are by detecting the presence or absence of what it determines as key features in them. For example, if the task was to characterize emails as either spam or legitimate messages, neural net researchers might feed the system many messages, along with the label of either SPAM or NOT_SPAM. The network would automatically intuit complex features of words (“Nigerian prince,” “Viagra”), patterns of words, and information in the message header that would be useful in determining whether a message should be labeled spam or not.
In early neural net experiments, computers were unable to design features by themselves, so features had to be designed by hand. Hinton’s original contribution was helping establish a technique called “back propagation,” a form of feedback that allowed the system to more efficiently learn from its mistakes and assign its own features.
“Back in 1986, when we first developed back propagation, we were excited by the fact you could learn multiple layers of feature detectors, and we thought we solved the problem,” says Hinton. “And it was very disappointing that we didn’t make huge breakthroughs in practical problems. We were completely wrong in our guess about how much computation was needed and how many labeled examples were needed.”
But even though many researchers had lost faith in neural nets over the years, Hinton felt strongly that they would eventually be practical. In 1995, he and his students tried losing the labels, at least in the earlier parts of the learning process. This technique was called “unsupervised pre-training.” meaning that the system figures out how to organize input on its own. But Hinton says that the real key to making this work was a mathematical trick, an approximation that saved computation time as the information moved through the layers of neurons — this allowed for many more iterations to refine the network. As often happens, speed becomes transformative, in this case making it possible to perform learning that previous neural nets couldn’t attempt. It was as if a person could suddenly cram in, say, the equivalent of five hours of skiing practice in ten minutes.
With unsupervised learning, only in the latter stages would the system’s human masters intervene, by labeling the more desirable outputs and rewarding successful outcomes. “Think about little kids, when they learn to recognize cows,” says Hinton. “It’s not like they had a million different images and their mothers are labeling the cows. They just learn what cows are by looking around, and eventually they say, ‘What’s that?’ and their mother says, ‘That’s a cow’ and then they’ve got it. This works much more like that.” (Later, researchers would master an effective alternative to unsupervised learning that relied on better initializing techniques and the use of larger datasets.)
When Hinton’s group tested this model, it had the benefit of something unavailable at the time neural nets were first conceived — super fast GPUs (Graphic Processing Units). Though those chips were designed to churn out the formulae for advanced graphics, they were also ideal for the calculations required in neural nets. Hinton bought a bunch of GPUs for his lab and got two students to operate the system. They ran a test to see if they could get the neural network to recognize phonemes in speech. This, of course, was a task that many technology companies — certainly including Google — had been trying to master. Since speech was going to be the input in the coming age of mobile, computers simply had to learn to listen better
How did it do?
“They got dramatic results,” says Hinton. “Their very first results were about as good as the state of the art that had been fine-tuned for 30 years, and it was clear that if we could get results that good on the first serious try, we were going to end up getting much better results.” Over the next few years, the Hinton team made additional serious tries. By the time they published their results, the system, says Hinton, had matched the best performance of the existing commercial models. “The point is, this was done by two students in a lab,” he says.
Deep Learning was born.
In 2007, in the midst of this work, Hinton gave a Google Tech Talk in Mountain View about Deep Learning, which galvanized the geeks in attendance, and won a huge following on YouTube. It helped spread the news that neural nets were finally going to be a powerful tool. And the rush was on to hire people who understood this new technique. Hinton’s students went to IBM, Microsoft, and, of course, Google. That represented three of the four major companies working in the field (the other one, Nuance, includes Apple among its suppliers). All were free to use the work from Hinton’s lab in the systems each would help refine in his respective company. “We basically gave it away because we were very concerned to prove we had the goods,” says Hinton. “What was interesting was that MSR [Microsoft Research] and IBM got it before Google but Google turned it into a product faster than anyone else.”
Hinton’s arrival at Google was only one of a series of huge hires in that season. Only a few months earlier, Ray Kurzweil, the Panglossian philosopher of AI, joined a team that already included AI legends like Peter Norvig (who wrote the standard textbook for AI courses), and Sebastian Thrun (a key inventor of the self-driving car).
But now the company was intoxicated by deep learning, apparently convinced that it would produce the big breakthroughs in the next generation of search. Already the advent of mobile computing had forced the company to change the very character of its search engine. To go farther, it had to know the world in the same sense that a human would know the world — but also of course perform the superhuman task of knowing everything in the world and being able to find it in less than half a second.
So it was probably only a matter of time before Jeff Dean would get involved in this.
Dean is a Google legend. He was already well known in computer science circles when he came to Google in 1999, and hiring him was a milestone for what was a relatively obscure Internet company with a double-figure headcount. In the intervening years, Dean became a leader in creating Google’s software infrastructure. In the process, a geek underground of Dean fans emerged, creating a comical meme about the engineer’s prowess called “Jeff Dean Facts.” Most of them reference super-geeky coding arcana, but some of the more intelligible ones are
- Jeff Dean can beat you at connect four. In three moves.
- One day Jeff Dean grabbed his Etch-a-Sketch instead of his laptop on his way out the door. On his way back home to get his real laptop, he programmed the Etch-a-Sketch to play Tetris.
- Jeff Dean is still waiting for mathematicians to discover the joke he hid in the digits of Pi.
Dean, now 46, had long known about neural nets — his undergraduate thesis project made use of them. In the intervening years, though, he had come to the conclusion of most of his peers that they were not ready for prime time. “There was a lot of promise back then but they faded away for a while because we didn’t have enough computational power to make them sing,” he says, stretching his lanky frame in a Googleplex conference room last fall. In 2011, though, Dean ran into Andrew Ng in one of Google’s many snack pantries. Ng was a Stanford AI professor — one of the giants in the field— who’d been spending a day a week at the search company. When Dean asked Ng what he was up to, he was surprised at the answer: “We’re trying to train neural nets.” Ng told Dean that things had changed — after the deep learning breakthrough, they worked pretty well, and if Google could figure out how to train really big nets, amazing things would happen.
Dean thought this sounded like fun, and began “dabbling with it” for about six months, and then became convinced that a project to build a massive neural net system could very quickly bring concrete results. So he and Ng made it a full time project. (Ng has since left Google, and has recently joined Baidu — to develop the Chinese search leader’s own AI projects.)
For about a year, the project was known informally as “The Google Brain” and based within Google X, the company’s long-range, high-ambition research department. “It’s a kind of joking internal name, but we tried not to use it externally because it sounds a little strange,” says Dean. In 2012, results began to accrue, the team moved out of the purely experimental Google X division and situated itself in the search organization. It also began to avoid using the term “brain.” The preferred term for outsiders is “Google’s Deep Learning Project,” which does not have the same ring but is less likely to incite pitchfork gatherings at the gates of the Googleplex.
Dean says that the team started by experimenting with unsupervised learning, because “we have way more unsupervised data in the world than supervised data.” That resulted in the first publication from Dean’s team, an experiment where the Google Brain (spread over 16,000 microprocessors, creating a neural net of a billion connections) was exposed to 10 million YouTube images in an attempt to see if the system could learn to identify what it saw. Not surprisingly, given YouTube content, the system figured out on its own what a cat was, and got pretty good at doing what a lot of users did — finding videos with feline stars. “We never told it during training, ‘This is a cat,’” Dean told the New York Times. “It basically invented the concept of a cat.”
And that was just a test to see what the system could do. Very quickly, the Deep Learning Project built a mightier neural net and began taking on tasks like speech recognition. “We have a nice portfolio of research projects, some of which are short and medium term — fairly well understood things that can really help products soon — and some of which are long term objectives. Things for which we don’t have a particular product in mind, but we know would be incredibly useful.”
One example of this appeared not long after I spoke to Dean, when four Google deep learning scientists published a paper entitled “Show and Tell.” It not only marked a scientific breakthrough but produced a direct application to Google search. The paper introduced a “neural image caption generator” (NIC) designed to provide captions for images without any human invention. Basically, the system was acting as if it were a photo editor at a newspaper. It was a humongous experiment involving vision and language. What made this system unusual is that it layered a learning system for visual images onto a neural net capable of generating sentences in natural language.
Nobody is saying that this system has exceeded the human ability to classify photos; indeed, if a human hired to write captions performed at the level of this neural net, the newbie wouldn’t last until lunchtime. But it did shockingly, shockingly well for a machine. Some of the dead-on hits included “a group of young people playing a game of frisbee,” “a person riding a motorcycle on a dirt road,” and “a herd of elephants walking across a dry grass field.” Considering that the system “learned” on its own concepts like a Frisbee, road, and herd of elephants, that’s pretty impressive. So we can forgive the system when it mistakes a X-games bike rider for a skateboarder, or misidentifies a canary yellow sports car for a school bus. It’s only the first stirrings of a system that knows the world.
And that’s only the beginning for the Google Brain. Dean isn’t prepared to say that Google has the world’s biggest neural net system, but he concedes, “It’s the biggest of the ones I know about.”
While Hinton’s hiring and Dean’s brain were major steps in pushing the company towards deep learning, perhaps the biggest move yet occurred in 2013, when Google spent $400 million to acquire DeepMind, a London-based artificial intelligence company. DeepMind has its own take on deep learning, based on a closer study of the brain itself. To make the purchase, Google aced out its key competitors, who also had designs on the company. And for good reason: DeepMind may well turn out to be as big a bargain as the $1.7 billion Google paid for YouTube or the mere $50 million for a fledging open-source mobile operating system called Android.
The CEO and co-founder is Demis Hassabis. A compact, dark-haired man of 38, Hassabis speaks quickly, as if he were a podcast played at double speed. “My whole career has been leading up to the AI company,” he says, taking a break in the company’s vertically sprawling new headquarters in central London, near the St Pancras train station. DeepMind recently moved here from a small office building in Bloomsbury. It’s an unusual setup where a new structure was merged with an existing wing of the old hospital, causing a sort of time-travel whiplash. The conference rooms are named after philosophers, writers and artists associated with vast intellectual leaps, like DaVinci, Gödel and Shelly (ominously, Mary, not Percy). The team has recently grown to take on two Oxford University-based companies that DeepMind (and of course its parent Google) acquired. One is Dark Blue Labs, which uses deep learning for natural language understanding; the other, Vision Factory, uses the technique for object recognition.
At 14, Hassabis was an avid computer game programmer as well as a chess prodigy. Working under the mentorship of game wizard Peter Molyneux, he had key roles in landmark titles such as Black and White and Theme Park.Then he started his own game company, eventually employing 60 people, while still in his twenties. But gaming, he says, was a means to an end, the end being the development of an intelligent general purpose artificial intelligence machine. By 2004, he felt that he had taken gaming AI as far as he could in that field. But it was too soon to start an AI company — the computer power he needed wasn’t cheap and plentiful enough. So he studied for a doctorate in cognitive neuroscience at the University College London.
In 2007, he co-authored an article on the neural basis of memory that the journal Science named one of the year’s ten biggest breakthroughs. He became a fellow at the Gatsby Computational Neuroscience Unit and also was affiliated with UCL, MIT and Harvard. In 2010, though, he decided it was finally time to form a company to do advanced AI, and he co-founded it with Gatsby colleague Shane Legg and Mustafa Suleyman, a serial entrepreneur who dropped out of Oxford at 19. Funders included Peter Theil’s Founders Fund and Elon Musk (who later expressed concerns about the downside of AI). Geoffrey Hinton was one of its advisors.
DeepMind operated in stealth, with only one result released publicly before the Google purchase. It was enough to cause a frenzy of speculation with a dash of uneducated derision. The paper described DeepMind’s success atpassively training a neural net to play vintage Atari computer games. The neural-net system was left to its own deep learning devices to learn game rules — the system simply tried its hand at millions of sessions of Pong, Space Invaders, Beam Rider and other classics, and taught itself to do equal or surpass an accomplished adolescent. (Take notice, Twitch!) Even more intriguing, some of its more successful strategies were ones that no humans had ever envisioned. “This is a particular potential of this type of technology,” says Hassabis. “We’re imbuing it with the ability to learn for itself from experience just like a human would do and therefore it can master things that maybe we don’t know how to program. It’s exciting to see that when it comes up with a new strategy in an Atari game that the programmers didn’t know about.”
It’s a small step towards Hassabis’s big goal of a brain that will not only know a lot of facts, but it will know what to do next. DeepMind is not satisfied to build only an engine for limited domains, like Atari games, commuting, or handling appointments. It wants to create a general artificial intelligence machine that will process information anywhere it can get it, and then do pretty much everything. “The general AI that we work on here is a process that automatically converts unstructured information into useful, actionable knowledge,” he says. “We have a prototype of this — the human brain. We can tie our shoelaces, we can ride cycles and we can do physics with the same architecture. So we know this is possible and then the idea for our research program is to slowly widen and widen those domains.”
Does it sound scary to you that Hassabis is envisioning a giant artificial brain that sucks up the world’s information, structures it into a form it understands, and then takes action? Well, it’s kind of scary to Hassabis, too. At least to the point where he acknowledges that the advanced techniques his own group is pioneering may lead to a problem where AI gets out of human control, or at least becomes so powerful that its uses might best be constrained. (Hassabis’ DeepMind co-founder Shane Legg is even more emphatic: he considers a human extinction due to artificial intelligence the top threat in this century. And DeepMind investor Elon Musk has just dropped $10 million to study AI dangers.) That’s why, as a condition of the DeepMind purchase, Hassabis and his co-founders demanded that Google set up an outside board of advisors to monitor the progress of the company’s AI efforts. DeepMind had already decided that it would never license its technology to the military or spy agencies, and it got Google to agree to that as well.
Less comforting is that Hassabis won’t reveal the makeup of that board, except to say that it consists of “top professors in computation, neuroscience and machine learning.” Since DeepMind’s work is still in the early stages — no Singularities in sight as of yet — he assures us there’s no need to make the committee members public. “There’s no issues here currently but in the next five or ten years maybe there will be,” he says. “So really it’s just getting ahead of the game.”
But the game is moving fast. Last fall, DeepMind published another major paper, describing a project that synthesizes some ideas from neuroscience memory techniques to create a neural network with the properties of a Turing Machine, which is synonymous for a universal computing device. This means that such a system, given enough time and memory, can in theory compute anything. The paper focused on the practical: with the ability to “record” information and draw on it later—kind of an artificial version of a person’s “working memory”—the Neural Turing Machine was not only able to learn faster and to perform more complex tasks than previous neural nets, but “to generalise well outside its training regime,” write the DeepMind authors. One can’t help but feel that it’s a significant step taken towards that general purpose AI engine that Hassabis dreams about.
Indeed, as of now, all Google’s deep learning work has yet to make a big mark on Google search or other products. But that’s about to change.
Since Jeff Dean’s deep learning project has moved from Google X to the Knowledge division (which includes search), his team has been working closely with a number of search-related teams, including language and image recognition. The Google Brain has become sort of an AI utility in the company. “It’s like an internal service,” says Dean. “If people in our group are really interested in a particular problem, we’ll find the right outlets for something if we’re able to do something good.” Dean says that around 35 to 40 groups are using it at Google now. Besides search and speech, he says, “We have stuff in ads, street view, and some stuff in the self-driving cars.”
As for longer-range projects, Dean talks of an attempt to do a better form of real-time translation. That’s a high bar these days — besides Google’s own current, well-regarded system, Microsoft’s Skype has impressed observers with instant voice translation. But Dean is excited about his own team’s effort to push things forward. “This is a model that uses only neural nets to do end-to-end language translation,” he says. “You train on pairs of sentences in one language or another that mean the same thing. French to English say. You feed in English sentences one word at a time, boom, boom, boom… and then you feed in a special ‘end of English’ token. All of a sudden, the model starts spitting out French.”
Dean shows a head-to-head comparison between the neural model and Google’s current system — and his deep learning newcomer one is superior in picking up nuances in diction that are key to conveying meaning. “I think it’s indicative that if we scale this up, it’s going to do pretty powerful things,” says Dean.
DeepMind is also ready for production. Hassabis says within six months or so, its technology will find their way into Google products. His organization is broken up into divisions, and one — headed by his co-founder Mustafa Suleyman—is devoted to applied uses of the AI, working closely with Google to see what might be of use.
Hassabis has some ideas how DeepMind tech might enhance people’s lives. He believes that a more proactive version of search — not only finding things for people but making decisions for them — would be a valuable provider of the most precious commodity imaginable — time. “There’s more books in the world that I would find fascinating than I could possibly read in my lifetime,” says Hassabis. “So why is it that any time I’m on a long haul flight or on a rare holiday somewhere that I might be thinking what book should I read? That should never happen. I think a lot of those things will be better automated.”
Down the road, Hassabis envisions DeepMind’s work finding its way into more exotic Google projects like the self-driving car, and even Calico, a spinoff company devoted to extending human lifespan.
It is ultimately significant that DeepMind and Google Brain — along with Hinton’s deep learning group — are all in Google’s search organization. Many years ago, Larry Page and Sergey Brin spoke, maybe only half jokingly, of search being an implant in our brains. No one talks about implants now. Instead of tapping our brains to make search better, Google is building brains of its own.