Today we're going to write a program that tells you whether an image is a hotdog or not!
If you didn't get the reference, here's the scene from HBO's Silicon Valley
Before we can start working on this, we need to ask how we can even get a computer to tell us what's in an image. The long answer is machine learning, a field of computer science that involves a lot of research and computational resources. The short answer is we'll use an API.
There are plenty of image recognition APIs and services such as Google Cloud's Cloud Vision API or AWS' Amazon Rekognition, however, we're going to be using Clarifai since we can easily use it for free with no hassle. We'll be using their API and, if you want to delve into their docs, you can find them right here. For this tutorial, I'll be sticking with the code provided in their API Guide in the "Images" section.
To begin, let's start up a repl on Repl.it
Then, in a new tab, create a Clarifai account.
Once you've done so, go to the application dashboard and select "Create New Application"
Put in a name for your application and get the API key for that application
That's it for stuff outside the editor, now it's time to get working!
In the Python repl, create an .env
file and put in your API key like so
API_KEY=afadee25a3fa4ce192d66cae17f7200b
We're creating a .env
file because the .env
file is only accessible to the owner of the repl, making it a smart location for API keys or other secrets that you wouldn't want others to have access to. It's important that you do not put the key in quotes otherwise it will not work, for more info on .env
files in Repl.it check out this post.
Next, let's create a requirements.txt
file for our dependencies and put the following into its contents. We'll be needing dotenv
so we can access the .env
file as well as clarifai
to use the Clarifai API via Python
dotenv
clarifai
Let's start main.py
by grabbing the api key from our .env
file
import os
API_KEY = os.getenv("API_KEY")
Then we follow it with the imports we need for Clarifai
from clarifai.rest import ClarifaiApp
from clarifai.rest import Image
Using ClarifaiApp and our API key, we create our app
app = ClarifaiApp(api_key=API_KEY)
Next, we'll need to obtain a "model" for the app. The model represents the type of information to get from the image. In this case, we want to get a particular type of food so we'll use the "food items" model. If you're interested in the other models offered, check out Clarifai's models page
model = app.models.get("food-items-v1.0")
As for the image I'll use a Wikipedia image of a hot dog
And, to use it with our model, we have Clarifai's Image
url = "https://upload.wikimedia.org/wikipedia/commons/thumb/b/b1/Hot_dog_with_mustard.png/1200px-Hot_dog_with_mustard.png"
image = Image(url=url)
Lastly, we predict
result = model.predict([image])
The returned result
is a dictionary that should look similar to this
Feel free to poke around and see what specific structures are in the object but the loop that will give us the tags Clarifai returns is
for concept in result["outputs"][0]["data"]["concepts"]:
print (concept["name"])
Since we only care if the displayed food is a hot dog, let's check each tag for being either "hotdog" or "hot dog"
hotdog = False
for concept in result["outputs"][0]["data"]["concepts"]:
if concept["name"] == "hotdog" or concept["name"] == "hot dog":
hotdog = True
break
if hotdog:
print ("Hotdog")
else:
print ("Not hotdog")
And, with that, we can now determine whether or not an image is a hot dog. Go ahead and try out different urls and see it work!
Of course, we can do a lot better than just a script that prints "Hot dog" or "Not hot dog". If you remember from the Silicon Valley clip, they actually returned an image with the text "Hot dog" or "Not hot dog" on it. So, let's make that happen.
To do that with Python, we'll use the Python Imaging Library (PIL). I'll be using this custom font but feel free to use your own. If you use this one, you'll download a zip file with the ttf as its contents. Once you extract the ttf file or get your own, upload it to your repl
In order to use PIL, we'll need to add pillow
to our requirements.txt
dotenv
clarifai
pillow
With that in, let's add the needed imports in app.py
from PIL import Image, ImageDraw, ImageFont
Since we've been retrieving the image by a url, we don't actually have the image saved locally. However, we can use requests to obtain the image
import requests
from io import BytesIO
response = requests.get(url)
base = Image.open(BytesIO(response.content)).convert('RGBA')
Since I named my ttf file font.ttf
, we'll prepare the text like so
size = width, height = pattern.size
draw = ImageDraw.Draw(pattern, "RGBA")
font = ImageFont.truetype("font.ttf", 100)
We only want to write "Hotdog" if it's a hotdog and "Not hot dog" if it isn't so we need distinguish that. I also put in some shifts in placement so that the text is more centered
if hotdog:
draw.text((275 10),"Hotdog",(0, 0, 0, 255), font=font)
else:
draw.text((50, 10),"Not hotdog",(0, 0, 0, 255), font=font)
Since we used RGBA for coloring, we're going to have to save the file as a png
pattern.save("result.png")
With that, we've put the characters from Silicon Valley to shame. However, we now need to resize the images because, if you provide a smaller image, the text will go out of view. So, proceeding the line assigning size
, let's put the following
if size[0] < 1200:
base = base.resize((1200, size[1]*1200/size[0]))
size = width, height = base.size
The reason for the second size
assignment is because we may have changed the size of the image in the line before.
Running now should resolve the issue with size. The next thing to tackle is the color of the text; if the text is white and the image is dark then we're fine but what about when they're both bright colors? To solve that, we should place the text inside a rectangle like subtitles.
First, let's make the determined string "Hotdog" or "Not hotdog" be a variable
text = "Hotdog" if hotdog else "Not hotdog"
Next, I'm going to move the coordinates for the text into variables too
x = 275 if hotdog else 50
y = 10
With these variables, we can get the width of the rectangle needed then draw it
w, h = font.getsize(text)
draw.rectangle((x, y, x + w, y + h), fill='white')
Now we should always be able to read the text. As one final thing, let's make the background of the text green or red based on whether or not it's a hotdog
draw.rectange((x, y, x + w, y + h), fill="green" if hotdog else "red")
As you can see, a hamster is not a hot dog and you made "Not hotdog" using Python!
As always if you have any questions or want to say hi, feel to send me an email at [email protected] and till next time!