Building Microsoft's What-Dog AI in under 100 Lines of Code

Rather recently, Microsoft released an app using AI to detect a dog’s breed. When I tested it on my beagle, though…

The app identifies a beagle as a Saluki

Hmm, not quite, app. Not quite.

In my non-SitePoint time, I also work for Diffbot – the startup you may have heard of over the past few weeks – who also dabble in AI. To test how they compare, in this tutorial we’ll recreate Microsoft’s application using Diffbot’s technology to see if it does a better job at recognizing the adorable beasts we throw at it!

We’ll build a very primitive single-file “app” for uploading images and outputting the information about the breed under the form.

Prerequisites

If you’d like to follow along, please register for a free 14-day token at Diffbot.com, if you don’t have an account there yet.

To install the client, we use the following composer.json file:

{
    "require": {
        "swader/diffbot-php-client": "^2",
        "php-http/guzzle6-adapter": "^1.0"
    },
    "minimum-stability": "dev",
    "prefer-stable": true,
    "require-dev": {
        "symfony/var-dumper": "^3.0"
    }
}

Then, we run composer install.

The minimum stability flag is there because a part of the Puli package is still in beta, and it’s a dependency of the PHP HTTP project now. The prefer stable directive is there to make sure the highest stable version of a package is used if available. We also need an HTTP client, and in this case I opted for Guzzle6, though the Diffbot PHP client supports any modern HTTP client via Httplug, so feel free to use your own favorite.

Once these items have been installed, we can create an index.php file, which will contain all of our application’s logic. But first, bootstrapping:

<?php

require 'vendor/autoload.php';

$token = 'my_token';

The Upload

Let’s build a primitive upload form above the PHP content of our index.php file.

<form action="/" method="post" enctype="multipart/form-data">
    <h2>Please either paste in a link to the image, or upload the image directly.</h2>
    <h3>URL</h3>
    <input type="text" name="url" id="url" placeholder="Image URL">
    <h3>Upload</h3>
    <input type="file" name="file" id="file">
    <input type="submit" value="Analyze">
</form>

<?php

...

We’re focusing on the PHP side only here, so we’ll leave out the CSS. I apologize to your eyes.

Ugly form

We’ll be using Imgur to host the images, so that we don’t have to host the application in order to make the calls to Diffbot (the images will be public even if our app isn’t, saving us hosting costs). Let’s first register an application on Imgur via this link:

Imgur registration

This will produce a client ID and a secret, though we’ll only be using the client ID (anonymous uploads), so we should add it to our file:

$token = 'my_token';
$imgur_client = 'client';

Analyzing the Images

So, how will the analysis happen, anyway?

As described in the docs, Diffbot’s Image API can accept a URL and then scans the page for images. All found images are additionally analyzed and some data is returned about them.

The data we need are the tags Diffbot attaches to the image entries. tags is an array of JSON objects, each of which contains a tag label, and a link to http://dbpedia.org for the related resource. We won’t be needing these links in this tutorial, but we will be looking into them in a later piece. The tags array takes a form similar to this:

"tags": [
        {
          "id": 4368,
          "label": "Beagle",
          "uri": "http://dbpedia.org/resource/Beagle"
        },
        {
          "id": 2370241,
          "label": "Treeing Walker Coonhound",
          "uri": "http://dbpedia.org/resource/Treeing_Walker_Coonhound"
        }
      ]

As you can see, each tag has the aforementioned values. If there’s only one tag, only one object will be present. By default, Diffbot returns up to 5 tags per entry – so each image can have up to 5 tags, and they don’t have to be directly related (e.g. submitting an image of a running shoe might return both the tag Nike and the tag shoe).

It is these tag labels we’ll be using as suggested guesses of dog breeds. Once the request goes through and returns the tags in the response, we’ll print the suggested labels below the image.

Processing Submissions

To process the form, we’ll add some basic logic below the token declaration. :

if ($_SERVER['REQUEST_METHOD'] == 'POST') {
    if (isset($_FILES['file']['tmp_name']) && !empty($_FILES['file']['tmp_name'])) {
        $filename = $_FILES['file']['tmp_name'];

        $c = new Client();
        $response = $c->request('POST', 'https://api.imgur.com/3/image.json', [
            'headers' => [
                'authorization' => 'Client-ID ' . $imgur_client
            ],
            'form_params' => [
                'image' => base64_encode(fread(fopen($filename, "r"),
                    filesize($filename)))
            ]
        ]);

        $body = json_decode($response->getBody()->getContents(), true);
        $url = $body['data']['link'];
        if (empty($url)) {
            echo "<h2>Upload failed</h2>";
            die($body['data']['error']);
        }
    }

    if (!isset($url) && isset($_POST['url'])) {
        $url = $_POST['url'];
    }

    if (!isset($url) || empty($url)) {
        die("That's not gonna work.");
    }

    $d = new Swader\Diffbot\Diffbot($token);
    /** @var Image $imageDetails */
    $imageDetails = $d->createImageAPI($url)->call();
    $tags = $imageDetails->getTags();

    echo "<img width='500' src='{$url}'>";

    switch (count($tags)) {
        case 0:
            echo "<h4>We couldn't figure out the breed :(</h4>";
            break;
        case 1:
            echo "<h4>The breed is probably " . labelSearchLink($tags[0]['label']) . "</h4>";
            echo iframeSearch($tags[0]['label']);
            break;
        default:
            echo "<h4>The breed could be any of the following:</h4>";
            echo "<ul>";
            foreach ($tags as $tag) {
                echo "<li>" . labelSearchLink($tag['label']) . "</li>";
            }
            echo "</ul>";
            echo iframeSearch($tags[0]['label']);
            break;
    }
}

We first check if a file was selected for upload. If so, it takes precedence over a link-based submission. The image is uploaded to Imgur, and the URL Imgur returns is then passed to Diffbot. If only a URL was provided, it’s used directly.

We used Guzzle as the HTTP client directly because we’ve already installed it so the Diffbot PHP client can use it to make API calls.

After the image data is returned, we grab the tags from the Image object and output them on the screen, along with a link to Bing search results for that breed, and an iframe displaying those results right then and there.

The functions building the search-link and iframe HTML element are below:

function labelSearchLink($label) {
    return '<a href="http://www.bing.com/images/search?q='.urlencode($label).'&qs=AS&pq=treein&sc=8-6&sp=1&cvid=92698E3A769C4AFE8C6CA1B1F80FC66D&FORM=QBLH" target="_blank">'.$label.'</a>';
}

function iframeSearch($label) {
    return '<iframe width="100%" height="400" src="http://www.bing.com/images/search?q='.urlencode($label).'&qs=AS&pq=treein&sc=8-6&sp=1&cvid=92698E3A769C4AFE8C6CA1B1F80FC66D&FORM=QBLH" />';
}

Beagle slightly misidentified

Again, please excuse the design of both the code and the web page – as this is just a quick prototype, CSS and frameworks would have been distracting.

Testing and Comparison

As we can see from the image above, Diffbot has misidentified the hound as well – but not as grossly as Microsoft. In this case, my beagle really does look more like a treeing walker coonhound than a typical beagle.

Let’s see some more examples.

Diffbot fails, MS succeeds

Ah, curses! Microsoft wins this round – Diffbot thought it had a better chance of guessing between a basset hound and a treeing walker coonhound, but missed on both. How about another?

Bingo!

Bingo! Both are spot on, though Diffbot is playing it safe by, again, suggesting the walker as an alternative. Okay, that one was a bit too obvious – how about a hard one?

Derps

Hilariously, this derpy image seems to remind both AIs of a Welsh corgi!

What if there’s more than one dog in the image, though?

Whoops, Diffbot got it very wrong

Adorable, Diffbot, but no cigar – well done Microsoft!

Okay, last one.

Sleeping beagle

Excellent work on both fronts. Obviously, the “dog detecting AI” is maxed out! Granted, Diffbot does have a small advantage in that it is also able to detect faces, text, brands, other animal types and more in images, but their “dog recognition” is toe to toe.

Conclusion

In this tutorial, we saw how easy it is to harness the power of modern AI to identify dog breeds at least somewhat accurately. While both engines have much room to improve, the more content we feed them, the better they’ll become.

This was a demonstration of the ease of use of powerful remote machine learning algorithms, and an introduction into a more complex topic we’ll be exploring soon – stay tuned!

Frequently Asked Questions about Building Microsoft’s What-Dog AI

How does the What-Dog AI work?

The What-Dog AI works by using a machine learning model that has been trained on a large dataset of dog images. The model has learned to recognize different breeds of dogs by identifying unique features and patterns in the images. When you input a new image, the AI compares it to the patterns it has learned and makes a prediction about the breed of the dog in the image.

Can I use the What-Dog AI on any device?

Yes, the What-Dog AI is designed to be platform-agnostic, meaning it can be used on any device that supports JavaScript. This includes desktop computers, laptops, tablets, and smartphones. You just need to have a modern web browser installed on your device.

How accurate is the What-Dog AI?

The accuracy of the What-Dog AI depends on the quality of the image you input and the diversity of the training data. The AI has been trained on a large dataset of dog images, so it should be able to accurately identify most common breeds. However, it may struggle with rare breeds or mixed breeds that it has not been trained on.

Can the What-Dog AI identify mixed breed dogs?

The What-Dog AI can sometimes identify mixed breed dogs, but its accuracy may be lower than for purebred dogs. This is because mixed breed dogs can have a wide range of physical characteristics, making them harder to classify accurately.

How can I improve the accuracy of the What-Dog AI?

You can improve the accuracy of the What-Dog AI by providing clear, well-lit images of the dog. The AI needs to be able to clearly see the dog’s features in order to make an accurate prediction. Images where the dog is looking directly at the camera are often the most effective.

Is the What-Dog AI free to use?

Yes, the What-Dog AI is free to use. You can access it directly from the website and use it as many times as you like without any charges.

Can I use the What-Dog AI for commercial purposes?

The What-Dog AI is intended for personal use and educational purposes. If you wish to use it for commercial purposes, you should contact the developers for permission.

How can I contribute to the development of the What-Dog AI?

If you’re interested in contributing to the development of the What-Dog AI, you can get involved by providing feedback, reporting bugs, or contributing to the codebase. The project is open-source, so anyone can contribute.

Can the What-Dog AI identify other animals besides dogs?

The What-Dog AI is specifically designed to identify dog breeds. It may not be able to accurately identify other animals.

How can I learn more about the technology behind the What-Dog AI?

If you’re interested in learning more about the technology behind the What-Dog AI, you can check out the source code on GitHub. The developers have also published a detailed explanation of the machine learning model used in the AI.