Machine learning and artificial intelligence have exploded in the last few years, with Google basing nearly all of their activities around powerful techniques like LSTM networks that can analyse language with almost giddying facility, image and speech recognition outstripping human performance and eBay's VP of engineering, Japjit Tulsi, provocatively saying "If you’re not doing AI today, don’t expect to be around in a few years."
This has all naturally led a lot of people to wonder: how do you do AI?
A short trip through the history of AI
The discipline of AI is frequently associated with Alan Turing, who in 1950 wrote his astonishing and far sighted paper, Can Machines Think? He argued that they could.
Artificial intelligence was conceived of as the problem of letting computers—rigid, rule-following machines capable of making stupid mistakes faster than ever before—handle human vagueness more imaginatively and flexibly. This desire crystallised into 1956 Dartmouth Summer Conference on Artificial Intelligence, which many consider as the beginning of AI as a formal discipline.
In 1957, the perceptron algorithm was developed. This was the progenitor of all modern machine learning systems. It was limited, however, being unable to model complex information directly. At the time, symbolic AI—constructing intelligent behaviour from programmed logical rules—seemed like the answer, but ultimately this proved too inflexible and difficult to program to live up to its early promise.
It wasn't till 1974 that the back-propagation algorithm allowed more powerful multi-layer neural networks to be trained, setting the scene for deep learning and machines that could teach themselves. Truly deep networks nevertheless had to wait for data in the kind of volumes produced by mass adoption of the internet, and the enormous, cheap power of GPUs—vector processors like a Cray supercomputer on a chip.
Off the peg and bespoke
AI is now at a stage of maturity that standardised systems are now available to perform some simple tasks. Google's Vision API can automatically tag photos and frames of video; IBM Watson has a sentiment classifier that will identify each of the main five characters from Inside Out, in the same colours but with Fear and Disgust swapped; Microsoft's LUIS has a powerful suite of natural language features. There are many others.
Not only that but AI frameworks are available to accelerate your own AI modelling, too. Tensorflow is probably the best-known, and includes some wonderful things like Inception, a pre-trained ImageNet neural classifier you can retune with transfer learning, but Keras is also very full-featured and can be used to drive Tensorflow.
Tensorflow will now run on a mobile device natively, which brings us to Apple's own native AI system, CoreML, which can also be driven by Keras. There's a huge opportunity to get your hands dirty and build something cool. How easy this is is exemplified by the wonderful Not Hotdog, which demonstrates several of the things mentioned here (particularly transfer learning).
Applications: Image classification
Humans are, for the most part, very visual. Using pictures instead of descriptions to discover things about the world is a capability we're only just learning to exploit. Indeed, one could argue that image classification is in roughly the same state as search engines in about 1995: we have some powerful, usable working systems, but there's clearly so much more to come.
Understanding users through their photostream
Given this propensity for the visual, it's not surprising that many people express themselves pictorially, through services like Instagram and Snapchat. This can tell us a lot about that person.
For example, looking at a user's pictures of themselves can tell us what they like to wear. If you sell any kind of fashion, the application is obvious. Simply do a Not Hotdog on your company lookbook and then use it to predict which style to guide your users towards.
Lots of other applications are possible. Does your user take a lot of pictures of food? Possibly they are interested in restaurant recommendations. Are they taking a lot of photos with blue in them? They might like the Picasso exhibition near them.
Direct visual search
Sometimes a user might not know what something is called, but they can see it.
Visual search has been a clear use-case for vision systems since the beginning of their existence, but it is now straightforward to take a system like Inception and use transfer learning to identify a catalogue of objects you want to help your users find. If you run a parts catalogue for appliances, for example, a photo of the part (or the appliance) may be all you need to guide your users to the right product.
Application: Speech-to-text audio tagging
As mentioned earlier, services now exist that can identify images frame-by-frame in video, so you can find only the bits of the A Team with Hannibal in. Wouldn't it be great if you could do the same, but get only the bits where he says "I love it when a plan comes together"?
Speech-to-text is now an incredibly powerful tool, and could clearly be applied to the soundtrack of a video to find specific phrases. The video could then be tagged with its transcription and searched.
Automated subtitling for the hearing-impaired is surely another powerful use case.
Add in machine translation, and you can watch any video recorded in any language as soon as it's published.
Application: RNN autosuggestions
As mentioned elsewhere, recurrent neural networks are an astonishingly powerful class of network designed to analyse time-series of text. These are used extensively in machine translation, but notably, an RNN will often work on incomplete data—it is a probabilistic prediction model, and having a couple of letters of a word in context can be enough to guess the whole word correctly.
This has a clear application in autosuggestion. We all have a bit of a love-hate relationship with it (though I was recently rather pleased when my phone incorrected "reckon" to "RNN") but used judiciously, it can dramatically speed up input flows.
It can also be used to suggest entire replies to routine questions in, say, a helpdesk flow. This can free up your helpdesk workers to deal with only the more complex queries, while still giving your customers a conversation-like flow (which most prefer to the large pages of FAQs we've all had to wade through hitherto).
Add in speech-to-text capabilities and a slight update to your "calls may be monitored and recorded for training purposes" boilerplate and you can do this on the phone, too.
Application: sentiment classifying furious customers
While we're making customer service employees' lives better (and, hopefully, reducing some of the legendary churn that industry experiences), the helpdesk employee's worst nightmare could be made a little better using sentiment classification.
Text can be analysed for emotions like anger to find out how someone feels. Identifying an angry customer quicker can lead the helpdesk employee start de-escalation immediately, hopefully reducing frustration for both parties.
In addition, furious customers can be routed more effectively: one person need not endure an entire day of anger, but they can also be sent towards more experienced and resilient operatives who can deal with their problem more quickly.
This latter course does, however, slightly risk creating an incentive for customers to be livid when calling a helpline, which brings us neatly to our next point.
Machines that teach as they learn
AI has allowed computers to learn how to do human-like tasks by imitation. This process, technically called training, allows a machine learning model to infer general relationships between concepts from a large set of specific examples.
The challenge in that is that there may be relationships we don't know about and actively want to avoid. The law of unintended consequences runs strong in this field.
In fact, machine learning generally is an experimental process. The outcome cannot be predicted with certainty at the beginning. Because of these two reasons, machine learning projects should always be agile.
The anatomy of a machine learning project
Firstly, every machine learning project starts with an insight that leads to the choice to train a specific model with specific data. Most of what we've been talking about so far falls into this part of the cycle.
However, as just mentioned, model choice is a complex process, and it's likely the project will need to iterate through several variations of a model before settling on the right one. The success of a model is determined by validation, using data of the same type as the training data, but separate from the original set. Testing on different data is vital because we need to be sure the algorithm isn't over-specific (called "overfitting"): a facial recognition algorithm might only be able to recognise faces it's seen before, for example.
This refinement process is a necessary part of any serious machine learning project, and so at each iteration, the model grows in suitability.
The need for evaluation
There is a larger sense in which we need to perform this re-evaluation, too. Since we haven't instructed the model to infer only useful relationships between things (and this is very hard to do), it will infer every relationship between things. This can and does lead to unintended consequences, and if these go unchecked, they can make (or fake) the news.
Evaluating a machine learning project broadly and with an open mind is, therefore, equally vital to its success. We are not alone in believing that organisations must transform and adopt an agile culture to fully benefit from AI. AI is as much a teacher as a pupil.
A brief warning about GDPR
The General Data Protection Regulation is a good and worthwhile piece of legislation, protecting users' data from misuse and preventing cavalier overreach by developers. Unfortunately, data is what turns the engines of AI. If you're going to funnel your users' data into a machine learning system, you need to think about GDPR. (Note: this paragraph does not constitute legal advice. You have to figure this out with your data controller. Sorry.)
On the whole, while this may seem like an annoying hoop to jump through, I think this will lead to better systems. Often times, having to slow down and think more carefully about what they are doing and why has led researchers to deeper insights and better solutions. My favourite example of this is Mitchell Feigenbaum, who in 1975 was running a computer physics model on a very, very slow calculator; he had so long to wait that he started looking more closely at the patterns in his results, and in the process discovered Feigenbaum Constants. Don't fear having to think more deeply about what you're going to use that data for. It may benefit you in surprising ways.
A clockwork AI
I think it can be argued that the first machine learning system in history was actually Breguet's astonishing Pendule Sympathique, a system composed of a clock and watch. When the watch was docked with the clock, it would be wound, set to the correct time and regulated, that is, made to run faster or slower in line with the error measured by the clock. While this is clearly a very simple system compared to a modern machine learning application, the use of feedback to correct error is essentially the same method we employ now. Steampunk-tastic.
Wild optimism, symbolic AI and structured neural nets
In the 1960s, the inability of the perceptron algorithm to model complex information was seen as a grave drawback to machine learning systems, and the focus was mainly on symbolic AI—using logical rules to model intelligence directly. Some of the progress in that area was frankly astonishing, and in 1970 Marvin Minsky claimed, apparently with a straight face, that by the end of the seventies artificial general intelligence would be a solved problem.
Alas, it was not to be, and symbolic AI largely hit a dead end, but most of the really incredible developments in deep learning use neural networks with a little structure built in—for example, convolutional networks are shaped for vision, recurrent networks are shaped for time series and so on. I'd argue this is the true endpoint of symbolic AI: understanding the ways we can use our logical understanding of problems to enhance the power of deep learning. I fully expect more and more complex logic to be embedded in the topologies of neural networks as the field develops.
Chatbots are not what you think
Although this situation will not persist forever, conversational agents (also known as chatbots) are mostly not powered by AI. Some have natural language capability to understand what a user means, but the majority simply navigate a directed graph of possible meanings—a web of possibilities exactly like the web of possibilities in a webpage.
Turing is almost certainly the reason these agents are often identified as AI: they are conversational, and designed to perform reasonably well at his test. However, they are very specialised, and cannot cope with any concept outside the web of specifically programmed possibilities available to them.
The singularity is still some way off
While artificial intelligence is really beginning to live up to the name, artificial general intelligence—making a machine that can deal with general situations like a human—is a long way off. Likely, such a system will combine existing techniques and some new ones, and systems like AlphaGo show that machine creativity is a real and possible goal, a system that can understand the world around us with the facility of a human mind is not in clear sight.
I find the physicist David Deutsch's essay, Creative Blocks, to be particularly interesting in this regard.
Author: Andrew Wyld