Before you start pitching to your investors about how amazing the world will be once you get some AI in your product, spare a thought for the engineering team behind Tay, the Microsoft AI chatbot.
Unfortunately, the conversations didn’t stay playful for long. Pretty soon after Tay launched, people starting tweeting the bot with all sorts of misogynistic, racist, and Donald Trumpist remarks. And Tay — being essentially a robot parrot with an internet connection — started repeating these sentiments back to users
AI/ML technology is amazing, and it’s changing and evolving at a rapid pace. It’s gone from a hard, obscure technology to a simple service that you can quickly integrate into your product.
But it can often be quite hard to get a handle on how well it’s working. Or if it is actually working at all. In a recent project, we decided to use AI to help us choose a good rescue pet to adopt. Basically a recommendation engine that was driven by which animals you liked or didn’t.
So if you looked at this dog
and liked it, should we show you
We used image analysis AI to generate more information about each animal. For example we already knew from the rescue shelter data whether the animal was a dog or a cat. But image analysis helped us understand if it had black and white spots, or was mainly brown, or had short hair. We used text analysis AI to better understand what was written about the animal. For example rescue pets can be rescued by one shelter, but fostered somewhere else. Which would kind of suck if you turned up to the shelter expecting to introduce your kids to the pet of their dreams, only to discover they were located somewhere else. That would be a long trip home.
That all worked. The problem we hit was around recommendations. What to show you next. And specifically around testing this functionality.
To explain: when we write code, it’s good practice to also write some test code to check that. So say the feature you code is a doorbell. Your test might be:
when I press the red button with my finger, I should hear a chime when I press the side with my finger, I should not hear a chime
These tests get fed into Continuous Integration, and get run each time a developer makes changes or deploys code. They can even be run in the live environment to monitor ongoing performance. That’s how we can be certain that our code is working as planned.
But how do you create a test to see if the recommendation you’ve made is the right one? We’ve asked the AI engine
“Hey - what should we recommend now?”
and the AI has replied
It can be hard to understand if pet id #42 is in fact the right pet to show for this person or not.
ML works really well in some situations. Say you needed to make a recommendation on what I should watch next on Netflix. You could test to see if I started watching the recommended show. Even better would be to test if I watched to the end. Gave it a thumbs up.
To explain why this works well:
The data set we were working with was quite different
The problem we faced was not knowing if our fancy AI based Machine Learning < insert trendy keyword here > recommendation engine was actually making any difference at all. How different were the results it recommended vs using simple random pick code?
Before you rush out and AI all the Things, get some inspiration here
AI and Machine Learning are amazing technologies. But in some cases it can be hard to know when they are actually working
26 Nov 2018