When you think of artificial intelligence (AI), do you envision C-3PO or matrix multiplication? HAL 9000 or pruning decision trees? This is an example of ambiguous language, and for a field which has gained so much traction in recent years, it’s particularly important that we think about and define what we mean by artificial intelligence – especially when communicating between managers, salespeople, and the technical side of things. These days, AI is often used as a synonym for deep learning, perhaps because both ideas entered popular tech-consciousness at the same time. In this article I’ll go over the big picture definition of AI and how it differs from machine learning and deep learning.
One way of thinking about practical AI is as an umbrella term for efforts to give computers the ability to perform intelligent tasks as well as humans can or exhibit behaviors that are associated with humanity. That’s why we see artificial intelligence being applied in order to do things like handwriting or voice recognition, identification of images, evaluating the meaning of sentences, or music composition. Tasks like these have historically been difficult for computers, but search for image misclassification rates now and you’ll see reports of algorithms having better success than humans. You may have heard about algorithms capable of beating highly skilled humans at complex games like Chess or Go.
These specific tasks are examples of weak, or narrow, AI. Of course, we could imagine a stronger kind of AI capable of not just performing specific tasks but of actually having a form of consciousness or mind – but this remains in the realms of philosophy and science fiction. There is also the concept of general artificial intelligence able to apply problem solving in a variety of ways rather just in one predefined manner. This is also considered to not yet be feasible, with even the most sophisticated personal assistant algorithms just boiling queries down into narrow tasks under the hood.
So let’s focus on the narrow objectives. There are a number of ways to accomplish them:
Rule-based systems are essentially a collection of rules and an engine that is capable of performing actions (and resolving conflicts between multiple actions) based on the input it sees. It’s easy to imagine building a ruleset for tic-tac-toe. The inputs would be the locations of the plays made so far and the turn number, and the output would be the location to play next. Systems like this don’t scale well with complexity, though. As the behavior gets more complicated, the ruleset must also, and it becomes more and more difficult to build the logic to resolve conflicts.
Simulations also utilize rules, but rather than just matching the inputs, a simulation is able to extrapolate into potential future states and can incorporate randomness into its decisions to account for input instability. Simulations can also exhibit emergence, the property of a system as a whole to develop properties on large scales that aren’t seen on small scales. In our tic-tac-toe example, the AI would now be able to consider chains of moves and the potential outcomes of certain sequences of decisions.
If we go one step more abstract, we get to AI which can learn rules based on observations. Deductive reasoning can take a series of inputs and deduce a new rule using logic – often these new rules are more generalizable or more succinct ways of representing seemingly complex behavior. Some examples include expert systems and automated theorem proving. Inductive reasoning usually involves statistics to generate rules that are likely to be true. For both types of reasoning, the confidence we have in the learned rules depends on the confidence we have in the observations. Machine learning (ML) is an umbrella term for a number of statistical inference methods for generating rules (aka models) from large amounts of data (aka observations). Those methods can use a variety of techniques to learn rules, which is called training the model. The techniques might be based on minimizing error of predictions or even just identifying patterns or trends (unsupervised learning). No matter what, though, if the data used for training is not representative of the true behavior, those models can report very good accuracy metrics but be completely misguided.
Deep learning is a subset of machine learning, and while you’ll often see it given the headline in newspaper articles, it’s really an example of one possible way to apply machine learning ideas. Specifically, deep learning tools employ hidden layers, which use unsupervised learning to learn features that are derived from the layer before. After several hidden layers, the features the model has created are far removed from the original inputs to the model, both in terms of complexity but also inexplicability. In going deep, these so-called black box models end up dealing with mathematical abstractions that don’t relate back to ideas comprehensible by humans. How else can these algorithms surpass human performance on AI tasks but by going beyond what humans can understand?
Any mixture of the above
When thinking about ML in the context of AI, it’s important to remember that machine learning is just a tool meant to help you make decisions. What we’ve seen is that developments in machine learning have led to better AI in certain circumstances. For example, an advanced game-playing AI might use simulations as part of its machine learning algorithm to decide which branches of a decision tree to prune first. Or deep learning – a kind of machine learning that processes observations in a more abstract and convoluted way – might be used in conjunction with historical user data to determine what word you just spoke into the microphone. In this scenario, the AI’s goal is to identify words. Machine learning (often via deep learning algorithms) helps it achieve this goal, but there are also other inputs at play that factor into the final decision. As AI becomes a more and more important part of doing business and generating value, it’s important to remember that we can make advances not just in each contributing component of an AI but also in the ways in which we connect those components to each other.