Object detection¶

This sections describes the components used for object detection.

Object detection components that are independent of the implementation are separated from specific models. Examples of the first category include visual annotators, components to save detections to CSV format, or to load precalculated detections. Examples of specific implementation are a tensorflow, OpenCV or other deep learning framework port of the Darknet YOLOv4 model or any of its variants. The consumption or object detection as a service through gRPC or other protocol is also an implementation alternative, that could reduce the dependencies for videoanalytics library.

API reference¶

The main module contains classes and methods for tasks related with object detection.

Format conventions¶

The are different conventions to represent bounding boxes, being some of them:

Each bounding box is represented by its top left coordinates and width and height in absolute values (pixels). This is the most convenient format for extracting patches or annotating.
Each bounding box is represented by its center coordinates and width and height in normalized values (0.0-1.0). This is the format used by YOLO.

Modules utils and evaluation contain utilities for working with different formats.

The adopted format for representing the detections in the global context is storing a tuple with the entry name “DETECTIONS” with the following components:

out_boxes: a list of boxes in absolute coordinates (top left, width, height, in pixels).
out_scores: a list of the scores (confidence) for each predicted box (0.0-1.0).
out_classes: a list of the class numeric identifier for each box (\(0,...,n_{classes}-1\))
num_boxes: the size of the list

The convention used for CSV format in components is to store the each detection as a row. The columns are:

frame_num: frame number, incremented from variable “START_FRAME” at iteration zero.
class_idx: numeric identifier of the detected class.
x,y: top left bounding box coordinate in pixels.
w,h: width and height of the bounding box in pixels.
score: confidence for the detection.
filename (optional): this field is fulfilled with the “IMG_FILENAME” variable, if present in the context.

class videoanalytics.pipeline.sinks.object_detection.DetectionsAnnotator(name, context, class_names_filename, context_name='DETECTIONS', show_label=True)¶

Annotates the detections in a frame displaying a bounding box around each identified object.

This component READS the following entries in the global context:

Variable name	Description
DETECTIONS	Output of an object detection model.
FRAME	Numpy array representing the frame.

This component WRITES the following entries in the global context:

Variable name	Description
FRAME	Numpy array representing the frame.

Parameters

name (str) – the component unique name.
context (dict) – The global context.
class_names_filename (str) – text file with class names.
show_label (bool) – display class name in bounding box.
context_name (str) – name of the variable in the context containing detections.

process()¶: This method is called for each active component in the pipeline.

setup()¶: This method is called after all components from the pipeline are instanced.

shutdown()¶: This method is called after the process finished.

class videoanalytics.pipeline.sinks.object_detection.DetectionsCSVWriter(name, context, filename, context_name='DETECTIONS')¶

Writes the detections to a CSV file.

This component READS the following entries in the global context:

Variable name	Description
DETECTIONS	Output of an object detection model.
FRAME	Numpy array representing the frame.
START_FRAME	Initial frame index.
IMG_FILENAME (*)	Image filename (for image sequences)

(*) Optional.

Parameters

name (str) – the component unique name.
context (dict) – The global context.
filename (str) – CSV output file.
show_label (bool) – display class name in bounding box.
context_name (str) – name of the variable in the context containing detections.

process()¶: This method is called for each active component in the pipeline.

setup()¶: This method is called after all components from the pipeline are instanced.

shutdown()¶: This method is called after the process finished.

class videoanalytics.pipeline.sinks.object_detection.ObjectDetectorCSV(name, context, filename, context_name='DETECTIONS')¶

This components reads precomputed detections from a CSV file.

This component READS the following entries in the global context:

Variable name	Description
START_FRAME	Initial frame index.

This component WRITES the following entries in the global context:

Variable name	Description
DETECTIONS	Output of an object detection model.

Parameters

name (str) – the component unique name.
context (dict) – The global context.
class_names_filename (str) – text file with class names.
show_label (bool) – display class name in bounding box.
context_name (str) – name of the variable in the context containing detections.

process()¶: This method is called for each active component in the pipeline.

setup()¶: This method is called after all components from the pipeline are instanced.

shutdown()¶: This method is called after the process finished.

YOLOv4 implementation in tensorflow¶

This module contains a YOLOv4 object detector tensorflow implementation.

class videoanalytics.pipeline.sinks.object_detection.yolo4.YOLOv4DetectorTF(name, context, weights_filename, allowed_classes=None, yolo_input_size=416, yolo_max_output_size_per_class=50, yolo_max_total_size=50, yolo_iou_threshold=0.45, yolo_score_threshold=0.4, context_name='DETECTIONS')¶

YOLOv4 object detector tensorflow implementation.

This component READS the following entries in the global context:

Variable name	Description
FRAME	Numpy array representing the frame.

This component UPDATES the following entries in the global context:

Variable name	Description
DETECTIONS	List holding numpy array with bounding boxes.

Parameters

name (str) – the component unique name.
context (dict) – The global context.
weights_filename (str) – model weights filename.
allowed_classes (list) – set of allowed classes. This option is to restrict the detections to a subset of classes relevant to the application domain. If None, all classes are allowed.
yolo_input_size (int) – size in pixels of the input cell. The input image is resized using opencv.
yolo_max_output_size_per_class (int) – maximum number of detections per class.
yolo_max_total_size (int) – maximum number of detections.
yolo_iou_threshold (float) – minimum IoU to accept detection.
yolo_score_threshold (float) – minimum score to accept detected class as valid.
context_name (str) – variable name used for storing detections in context

process()¶: This method is called for each active component in the pipeline.

setup()¶: This method is called after all components from the pipeline are instanced.

shutdown()¶: This method is called after the process finished.

Utilities¶

videoanalytics.pipeline.sinks.object_detection.utils.convert_detections(df, box_format='xyxy')¶

Convert a dataframe in YOLO normalized format to absolute coordinates (x0,y0,x1,y1)

Parameters: df (pandas.DataFrame) – input dataframe with columns x,y,w,h.

Rerturns:: A dataframe with columns x0,y0,x1,y1.

videoanalytics.pipeline.sinks.object_detection.utils.load_detections_from_file_list(det_list, box_format='xyxy')¶

Given a list of files, constructs a dataframe with the bounding boxes for each image. The list of files is typically obtained from the test directory text annotations (YOLO normalized format is assumed).

Parameters

det_list (list) – list of filenames.
box_format (str) – Currently only “xyxy” is supported.

Returns

A dataframe (see format below).

The returned dataframe contains the following columns:

filename: name of the file.
frame_num: set to the index of the file in the list (this field is reserved for videos).
class_idx: class index.
x,y: bounding box center coordinates (normalized)
w,h: bounding box dimensions (normalized).
img_w,img_h: image width and height in pixels.

If the box_format is ‘xyxy’ then the following columns will be transformed/added:

x,y,w,h: will be transformed to pixels
x0,y0,x1,y1: will be set to the bounding box top-left, bottom-right coordinates in pixels.

videoanalytics.pipeline.sinks.object_detection.utils.plot_predictions_vs_ground_truth(df_gt_dets, df_pred_dets, img_path, img_name, class_idx, ax)¶

Given an image and class id present in two dataframes containing bounding boxes: for a set of images in x0,y0,x1,y1 format, plot the bounding boxes corresponding to the ground truth and predictions.

Parameters

df_gt_dets (pandas.DataFrame) – ground truth detections as returned by videoanalytics.pipeline.sinks.object_detection.utils.load_detections_from_file_list().
df_pred_dets (pandas.DataFrame) – predictions.
img_path (str) – path of the parent directory containing the input images.
img_name (str) – name of the image.
class_idx (int) – class index to plot (only one class is supported).
ax (Axes) – matplotlib axes instance.

Detection performance evaluation¶

videoanalytics.pipeline.sinks.object_detection.evaluation.evaluate_object_detection_predictions(df_gt_dets, df_pred_dets, classes, model_name)¶

Evaluate the predictions returned by an object detection model.

Parameters

df_gt_dets (pandas.DataFrame) – ground truth detections as returned by videoanalytics.pipeline.sinks.object_detection.utils.load_detections_from_file_list().
df_pred_dets (pandas.DataFrame) – predictions.
classes (list) – list of class indexes to evaluate.
model_name (str) – model name to use in the returned dataframe.

Returns

A dataframe containing the results.