The latest insights from your peers on the latest in Enterprise IT, straight to your inbox.
The very first digital computers were called “electronic brains.” Although their primary practical application—and claim for human-like or even superhuman cognitive ability—was initially limited to the speed by which they calculated numbers, the open ended nature of computer technology promised much more. The computer scientists giving birth to the concept of “artificial intelligence” in the mid 1950s were confident of the arrival in the near future of machines capable of human-level understanding and reasoning, as all intelligence—they argued— can be reduced to manipulating symbols and the mathematical formulas computers were so good at processing.
It didn’t work as advertised. It turned out that intelligence involves much more than defining concepts with logical rules. Expert systems, an offshoot of this approach that was successful in certain practical applications in the 1970s and 1980s, eventually withered away because they did not scale. Extracting knowledge from experts and encoding it as rules turned out to be too labor-intensive and failure prone to be viable for most problems.
An alternative approach to computerizing cognitive abilities was also born in the 1950s. It was called “machine learning,” a decidedly less sexy and attention-grabbing name. While the “artificial intelligence” approach was related to symbolic logic, a branch of mathematics, the “machine learning” approach was related to statistics. And there was another important distinction between the two: The “artificial intelligence” approach was part of the dominant computer science paradigm and practice of a programmer defining what the computer had to do by coding an algorithm, a model, or a program in a programming language. By contrast, the “machine learning” approach relied on statistical procedures to find patterns in data. That data, rather than a program, defined what the next step in a process would be.
With the advent of the Web and the vast amounts of data it has generated, “Big Data” gave rise to a new type of machine learning: “deep learning.” Like artificial intelligence and machine learning, deep learning has been around for many years, a variant of machine learning largely based on the concept of artificial neural networks, which in turn were influenced by a computational model for human neural networks developed in 1943.
A decade later, this approach was torpedoed by leading researchers from the symbolist school of artificial intelligence, who demonstrated the limitations of artificial networks with a limited number of layers. These limitations could be overcome by adding layers—each “layer” being an increasing level of abstraction—but computers at the time simply didn’t have enough processing power to efficiently handle the requirements of large neural networks.
Still, a few out-of-the-mainstream researchers, mostly in Canada and Switzerland, continued investigating and experimenting with artificial neural networks. Artificial neural networks finally became all the rage in 2012 when, among other developments, the Google “Brain Team” trained a cluster of 16,000 computers to recognize an image of a cat after processing 10 million digital images taken from YouTube videos.
Deep neural networks became more feasible as the amount of available data mushroomed, as algorithms improved, and as computer power increased. Twenty years of experimenting with various techniques for optimizing the performance of the artificial networks resulted in more sophisticated algorithms. But these algorithms required a lot of data, especially labeled data, to be “trained.”
The advent of the Web has brought about an explosion of readily-available data. Even more important, the time-consuming and expensive task of labeling the data has become both automated and crowd sourced. The image of the cat a user shared on a social network was both available as a digital object and was labeled “cat.”
Finally, computers became more powerful, in part through the development of graphics processing units, or GPUs, which are custom chips very efficient at manipulating images. Their highly parallel structure makes them more efficient than general-purpose CPUs for algorithms where the processing of large blocks of data is done in parallel.