Object detection

This sections describes the components used for object detection.

Object detection components that are independent of the implementation are separated from specific models. Examples of the first category include visual annotators, components to save detections to CSV format, or to load precalculated detections. Examples of specific implementation are a tensorflow, OpenCV or other deep learning framework port of the Darknet YOLOv4 model or any of its variants. The consumption or object detection as a service through gRPC or other protocol is also an implementation alternative, that could reduce the dependencies for videoanalytics library.

API reference

The main module contains classes and methods for tasks related with object detection.

Format conventions

The are different conventions to represent bounding boxes, being some of them:

  • Each bounding box is represented by its top left coordinates and width and height in absolute values (pixels). This is the most convenient format for extracting patches or annotating.

  • Each bounding box is represented by its center coordinates and width and height in normalized values (0.0-1.0). This is the format used by YOLO.

Modules utils and evaluation contain utilities for working with different formats.

The adopted format for representing the detections in the global context is storing a tuple with the entry name “DETECTIONS” with the following components:

  • out_boxes: a list of boxes in absolute coordinates (top left, width, height, in pixels).

  • out_scores: a list of the scores (confidence) for each predicted box (0.0-1.0).

  • out_classes: a list of the class numeric identifier for each box (\(0,...,n_{classes}-1\))

  • num_boxes: the size of the list

The convention used for CSV format in components is to store the each detection as a row. The columns are:

  • frame_num: frame number, incremented from variable “START_FRAME” at iteration zero.

  • class_idx: numeric identifier of the detected class.

  • x,y: top left bounding box coordinate in pixels.

  • w,h: width and height of the bounding box in pixels.

  • score: confidence for the detection.

  • filename (optional): this field is fulfilled with the “IMG_FILENAME” variable, if present in the context.

class videoanalytics.pipeline.sinks.object_detection.DetectionsAnnotator(name, context, class_names_filename, context_name='DETECTIONS', show_label=True)

Annotates the detections in a frame displaying a bounding box around each identified object.

This component READS the following entries in the global context:

Variable name

Description

DETECTIONS

Output of an object detection model.

FRAME

Numpy array representing the frame.

This component WRITES the following entries in the global context:

Variable name

Description

FRAME

Numpy array representing the frame.

Parameters
  • name (str) – the component unique name.

  • context (dict) – The global context.

  • class_names_filename (str) – text file with class names.

  • show_label (bool) – display class name in bounding box.

  • context_name (str) – name of the variable in the context containing detections.

process()

This method is called for each active component in the pipeline.

setup()

This method is called after all components from the pipeline are instanced.

shutdown()

This method is called after the process finished.

class videoanalytics.pipeline.sinks.object_detection.DetectionsCSVWriter(name, context, filename, context_name='DETECTIONS')

Writes the detections to a CSV file.

This component READS the following entries in the global context:

Variable name

Description

DETECTIONS

Output of an object detection model.

FRAME

Numpy array representing the frame.

START_FRAME

Initial frame index.

IMG_FILENAME (*)

Image filename (for image sequences)

(*) Optional.

Parameters
  • name (str) – the component unique name.

  • context (dict) – The global context.

  • filename (str) – CSV output file.

  • show_label (bool) – display class name in bounding box.

  • context_name (str) – name of the variable in the context containing detections.

process()

This method is called for each active component in the pipeline.

setup()

This method is called after all components from the pipeline are instanced.

shutdown()

This method is called after the process finished.

class videoanalytics.pipeline.sinks.object_detection.ObjectDetectorCSV(name, context, filename, context_name='DETECTIONS')

This components reads precomputed detections from a CSV file.

This component READS the following entries in the global context:

Variable name

Description

START_FRAME

Initial frame index.

This component WRITES the following entries in the global context:

Variable name

Description

DETECTIONS

Output of an object detection model.

Parameters
  • name (str) – the component unique name.

  • context (dict) – The global context.

  • class_names_filename (str) – text file with class names.

  • show_label (bool) – display class name in bounding box.

  • context_name (str) – name of the variable in the context containing detections.

process()

This method is called for each active component in the pipeline.

setup()

This method is called after all components from the pipeline are instanced.

shutdown()

This method is called after the process finished.

YOLOv4 implementation in tensorflow

This module contains a YOLOv4 object detector tensorflow implementation.

class videoanalytics.pipeline.sinks.object_detection.yolo4.YOLOv4DetectorTF(name, context, weights_filename, allowed_classes=None, yolo_input_size=416, yolo_max_output_size_per_class=50, yolo_max_total_size=50, yolo_iou_threshold=0.45, yolo_score_threshold=0.4, context_name='DETECTIONS')

YOLOv4 object detector tensorflow implementation.

This component READS the following entries in the global context:

Variable name

Description

FRAME

Numpy array representing the frame.

This component UPDATES the following entries in the global context:

Variable name

Description

DETECTIONS

List holding numpy array with bounding boxes.

Parameters
  • name (str) – the component unique name.

  • context (dict) – The global context.

  • weights_filename (str) – model weights filename.

  • allowed_classes (list) – set of allowed classes. This option is to restrict the detections to a subset of classes relevant to the application domain. If None, all classes are allowed.

  • yolo_input_size (int) – size in pixels of the input cell. The input image is resized using opencv.

  • yolo_max_output_size_per_class (int) – maximum number of detections per class.

  • yolo_max_total_size (int) – maximum number of detections.

  • yolo_iou_threshold (float) – minimum IoU to accept detection.

  • yolo_score_threshold (float) – minimum score to accept detected class as valid.

  • context_name (str) – variable name used for storing detections in context

process()

This method is called for each active component in the pipeline.

setup()

This method is called after all components from the pipeline are instanced.

shutdown()

This method is called after the process finished.

Utilities

videoanalytics.pipeline.sinks.object_detection.utils.convert_detections(df, box_format='xyxy')

Convert a dataframe in YOLO normalized format to absolute coordinates (x0,y0,x1,y1)

Parameters

df (pandas.DataFrame) – input dataframe with columns x,y,w,h.

Rerturns:

A dataframe with columns x0,y0,x1,y1.

videoanalytics.pipeline.sinks.object_detection.utils.load_detections_from_file_list(det_list, box_format='xyxy')

Given a list of files, constructs a dataframe with the bounding boxes for each image. The list of files is typically obtained from the test directory text annotations (YOLO normalized format is assumed).

Parameters
  • det_list (list) – list of filenames.

  • box_format (str) – Currently only “xyxy” is supported.

Returns

A dataframe (see format below).

The returned dataframe contains the following columns:
  • filename: name of the file.

  • frame_num: set to the index of the file in the list (this field is reserved for videos).

  • class_idx: class index.

  • x,y: bounding box center coordinates (normalized)

  • w,h: bounding box dimensions (normalized).

  • img_w,img_h: image width and height in pixels.

If the box_format is ‘xyxy’ then the following columns will be transformed/added:
  • x,y,w,h: will be transformed to pixels

  • x0,y0,x1,y1: will be set to the bounding box top-left, bottom-right coordinates in pixels.

videoanalytics.pipeline.sinks.object_detection.utils.plot_predictions_vs_ground_truth(df_gt_dets, df_pred_dets, img_path, img_name, class_idx, ax)
Given an image and class id present in two dataframes containing bounding boxes

for a set of images in x0,y0,x1,y1 format, plot the bounding boxes corresponding to the ground truth and predictions.

Parameters
  • df_gt_dets (pandas.DataFrame) – ground truth detections as returned by videoanalytics.pipeline.sinks.object_detection.utils.load_detections_from_file_list().

  • df_pred_dets (pandas.DataFrame) – predictions.

  • img_path (str) – path of the parent directory containing the input images.

  • img_name (str) – name of the image.

  • class_idx (int) – class index to plot (only one class is supported).

  • ax (Axes) – matplotlib axes instance.

Detection performance evaluation

videoanalytics.pipeline.sinks.object_detection.evaluation.evaluate_object_detection_predictions(df_gt_dets, df_pred_dets, classes, model_name)

Evaluate the predictions returned by an object detection model.

Parameters
Returns

  • A dataframe containing the results.