Bounding Boxes Field is None in Ultralytics YOLO Model Results

When working with object detection using Ultralytics YOLO v8 in Python and attempting to add bounding boxes for classified objects to the camera image, it is possible to encounter a problem with the boxes field being undefined (equal to None).

The solution is to make sure you are using the yolov8n.pt model and not yolov8n-cls.pt: the latter does not seem to have this value set.

The -cls version of the model only returns text descriptions and not the bounding boxes.

In short the solution is to load the model using:

YOLO("yolov8n.pt")

instead of:

YOLO("yolov8n-cls.pt")

The following code shows a complete example of classifying objects using YOLO and adding bounding boxes.
Comments indicate where the problem with boxes being equal to None appears.

import cv2

from ultralytics import YOLO

captureObject = cv2.VideoCapture(0)
captureObject.set(3, 840)
captureObject.set(4, 780)

# Do not use yolov8n-cls.pt unless you do not need bounding boxes.
yoloModel = YOLO("yolov8n.pt")

# Get all class labels.
classLabels = list(yoloModel.names.values())

# Main loop.
while True:
  ret, img = captureObject.read()
  cv2.imshow("webcam", img)

  # Classify objects.
  results = yoloModel(img, stream=True)

  for r in results:
    boundingBoxes = r.boxes
    # The value of boxes is None if using yolov8n-cls.pt
    if boundingBoxes != None:
      for box in boundingBoxes:
        # Get coordinates.
        x1, y1, x2, y2 = box.xyxy[0]
        # Convert to integer types.
        x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)

        # Draw bounding box rectangle inside camera image.
        cv2.rectangle(img, 
          (x1, y1), 
          (x2, y2), 
          (255, 0, 255), 
          3)

        # Add classification label on top of bounding box.
        classIndex = int(box.cls[0])
        label = classLabels[classIndex]
        cv2.putText(img, 
          label, 
          [x1, y1], 
          cv2.FONT_HERSHEY_SIMPLEX, 
          1, 
          (255, 0, 0), 
          2)

  # Re-paint with overlay rectangles.
  cv2.imshow("webcam", img)

  # Exit with 'q' key.
  if cv2.waitKey(1) == ord("q"):
    break

captureObject.release()
cv2.destroyAllWindows()

 

Get All Values in a Python Dictionary

Sometimes we need to get a list of all of the values only (without keys) from a Python dictionary.

Suppose we have a dictionary numbers defined as follows:

numbers = dict()

numbers["a"] = 1
numbers["b"] = 2
numbers["c"] = 3
numbers["d"] = 4

To return a list of just the values at all of the keys from this data structure, we can use the values method.
Note that the output needs to be converted to a regular list.

result = list(numbers.values())

The result is:

[1, 2, 3, 4]

Another option is to use a list comprehension. This is especially useful if we want to do some further computations on all of the values immediately.

result = [numbers[key] for key in numbers.keys()]

The result is:

[1, 2, 3, 4]

The built-in function keys() returns all keys in the dictionary.
Then, numbers[key] is called for each key to get the value at that key.
Finally, the list comprehension results in a list of all values.

 

Simple RAG with a Locally Running LLM

This is a simple example of a RAG (Retrieval-Augmented Generation) application with a locally running LLM (Large Language Model).

For this example we will use Mistral running with Ollama on macOS.

See this post for more details on how to get it up and running.

First, ensure the model is running and responding to queries over HTTP:

$ curl -X POST http://localhost:11434/api/generate
       -d '{"model":"mistral", "prompt":"Hello"}'

This should reply with a stream of tokens.

The idea of Retrieval Augmented Generation is to append information to the prompt which is not otherwise available to the model.

A simple example piece of data is the current system time. Normally, a language model does not have access to that information. If we ask:

>>> What time is it?
I don't have access to the current time, 
but you can use a world clock website or app 
to find out the current time in your location.

The following script uses RAG to append the current time to the prompt, so the LLM can answer with this new context.

simple-rag-request.py:

import json
import requests

from datetime import datetime

# Function to get extra data for RAG.
def getRAGData():
  currentTime = datetime.now().strftime("%I:%M %p")
  return "Current time is: " + currentTime + ". "

# Main program.
inputPrompt = input("Prompt: ")

API_URI = "http://localhost:11434/api/generate"

# API request body.
postBody = dict()
postBody["model"] = "mistral"

combinedPrompt = getRAGData() + inputPrompt
postBody["prompt"] = combinedPrompt
postBody["stream"] = False

result = requests.post(API_URI, json=postBody)

jsonResult = json.loads(result.text)
finalResponse = jsonResult["response"]

print(finalResponse)

Now we can run the script and see how the extra information informs the result:

$ python simple-rag-request.py
Prompt: what time is it?

The current time is 9:23 PM.

This idea is easily extended to querying proprietary data in our own databases, or any other data we wish to inject.

 

Run the Mistral 7B LLM Locally

We can run the Mistral 7B (seven billion parameter) Large Language Model locally easily using Ollama. In this example we assume running on macOS.

First, install Ollama.

Download the installer from:

https://github.com/jmorganca/ollama

Double-click the app to install the binary command.

Now, in a terminal, run:

$ ollama --version

The output should be similar to:

ollama version 0.1.13

If the command is successfully installed, we can download the Mistral 7B model with:

$ ollama run mistral

This will download and start the model.

Once loaded, we should see:

>>> Send a message (/? for help)

Now, try test a prompt:

>>> What is the capital of Estonia?

The capital of Estonia is Tallinn.

 

Empty Error When Running Llama with llama-cpp

When running multiple open source Llama Large Language Models (LLMs) in the command line with llama-cpp and the command line llm command, we may encounter an empty error such as:

$ llm -m modelName "test"
Error:

The empty error provides no clues, but this can happen if we have the incorrect version of llama-cpp installed for the model we are using.
Different models may use different incompatible file formats internally, so we must ensure we have the correct version of llama-cpp for the given model.

For example, for LLama-2 Uncensored, we can use llama-cpp-python version 0.1.78.

For Llama-2, we can use version 0.2.11.

The following installed versions work at the time of writing:

For Llama-2 Uncensored, install using:

$ pip install llama-cpp-python==0.1.78

For Llama-2, use:

$ pip install llama-cpp-python==0.2.11

We can check which version of llama-cpp is installed using:

$ llm --version

To see all the models installed use:

$ llm models

To run a test again after switching versions:

$ llm -m modelName "test prompt"

 

Classify an Object in an Image in Python Using the YOLO Model

To perform object classification on an image file using Python, we can use the open source pre-trained YOLO model from Ultralytics.

First, install the library using:

$ pip install ultralytics

For example, assume we have an image of a tractor in a local file tractor.jpeg under images/.

Note that we can also run the model from the command line using:

$ yolo predict source='images/tractor.jpeg'

In Python, we need to extract the result from all of the model output, which requires a bit more code.

The model’s predict function will return a list of results with probability values, as well as a list of all labels.

The code below will extract the highest probability label and print it.

from ultralytics import YOLO

model = YOLO("yolov8n-cls.pt")

# Path to an image file assumed to exist.
results = model.predict("images/tractor.jpeg")

# Overall results is a list.
result = results[0]

probabilities = result.probs

# Top1 is the most likely result.
topLabelNumber = probabilities.top1

# Now find the label name for that label number.
allNames = result.names
for labelNumber, label in allNames.items():
  if labelNumber == topLabelNumber:
    resultLabel = label

print("Classification result:")
print(resultLabel)

 

Synthesize Speech in a Different Language using Python

To synthesize speech in Python in a language other than English using pyttsx3, we need to find which voice is available for the desired language.

First, we can print out the list of all available voices.
Each of the voice objects will include a list of languages that the voice supports (usually one).

In this example we will synthesize a string in Polish. For other languages other than English, simply find the voice which supports that language in the full output list of voices.

 

import pyttsx3

synthesizer = pyttsx3.init()

voices = synthesizer.getProperty("voices")

for voice in voices:
  if "zosia" in voice.id: # The Polish voice.
    print(voice.id) # Full ID string.
    print("Languages for voice:")
    print(voice.languages)

synthesizer.setProperty("language", "pl_PL")

synthesizer.setProperty("voice", 
  "com.apple.speech.synthesis.voice.zosia"
)

synthesizer.say("Cześć, jak się masz?")

synthesizer.runAndWait()

API Design: Paginated Responses by Default

One important way to reduce performance issues and potential abuse in an API is using pagination by default.

For example, suppose we have a call like:

GET /items

Conceptually, this REST resource represents a list of all items available.

In a real production API, however, this should default to actually getting the first page only. Specifically:

GET /items

should be an equivalent call to:

GET /items?page=1

This is because as the items collection grows, in theory the list of all items can become extremely large.

If /items attempts to return all items at once, the endpoint becomes a performance problem and a potential API security issue: it opens the API up to Resource Exhaustion Attacks. Attackers can abuse the API by requesting very large lists repeatedly, in parallel, potentially depleting server resources and causing denial-of-service to legitimate users.
Implemeting pagination-by-default helps prevent this abuse.

There should also be a hard limit on the maximum page size, for calls where the page size is specified.
For example, 500 items could be a hard maximum.
For any larger page sizes, we can return an error such as the following:

GET /items?pageSize=501
400 Bad request
{
  "error": Page size too large"
}

Following these guidelines will help an API be more performant and resistant to abuse.

 

Read Header Values from a File in a cURL Request

It can be cumbersome to type many different header names and values when composing a cURL command.

We can read all of the headers sent with a request from a file using the syntax below.

NOTE: make sure to have curl version 7.55.0 or higher.

To check:

$ curl --version

To cURL with headers read from a file:

$ curl -H @headers_file.txt http://somesite.com

Here is an example file:

headers_file.txt:

Accept: application/json
Content-type: application/json

We can confirm the headers are sent correctly using verbose mode (-v).

$ curl -H @headers_file.txt http://somesite.com -v

As another test, We can see exactly what is sent to a remote server by first receiving the request locally with netcat. In a terminal, open:

$ nc -l 9090

Then launch the request in a second terminal:

$ curl -H @headers_file.txt http://localhost:9090

In the terminal listening with netcat, we should receive a request with the headers specified in the file:

GET / HTTP/1.1
Host: localhost:9090
User-Agent: curl/7.77.0
Accept: application/json
Content-type: application/json

Note that default values for headers will be overridden.

 

Synthesize Speech using Python

We can perform text-to-speech in Python using the PyTTSX3 speech synthesis library.
Install the PyTTSX3 library:
$ pip install pyttsx3
The following example script will synthesize the audio for speaking “hello”.

synthesize-hello.py:

import pyttsx3

synthesizer = pyttsx3.init()

synthesizer.say("hello")

synthesizer.runAndWait()
synthesizer.stop()
To perform speech synthesis with a specific voice, use the following.
This is specific to macOs.

synthesize-by-voice.py:

import pyttsx3

synthesizer = pyttsx3.init()

voices = synthesizer.getProperty("voices")
for voice in voices:
  print(voice.id)

voiceChoice = input("Enter name: ")

synthesizer.setProperty("voice",
  "com.apple.speech.synthesis.voice." + str(voiceChoice))

stringToSay = input("Enter text to read: ")

synthesizer.say(stringToSay)
synthesizer.runAndWait()
synthesizer.stop()
The example run below shows the available voices and the input choosing a specific voice.
The input string is then synthesized as speech.
com.apple.speech.synthesis.voice.Alex
com.apple.speech.synthesis.voice.alice
com.apple.speech.synthesis.voice.alva

...

com.apple.speech.synthesis.voice.yuri
com.apple.speech.synthesis.voice.zosia
com.apple.speech.synthesis.voice.zuzana

Enter name: yuri
Enter text to read: this is a fake voice
The output is audio.