Object detection pipeline¶
This example shows a basic pipeline that performs object detection on a fragment of video and then proposes a method for evaluation of different detectors.
Contents:
A minimal object detection pipeline.
A pipeline for evaluating model performance (mAP) and execution time.
Model selection.
[1]:
%load_ext autoreload
%autoreload 2
[2]:
from videoanalytics.pipeline import Pipeline
from videoanalytics.pipeline.sources import VideoReader
from videoanalytics.pipeline.sinks import VideoWriter
A minimal object detection pipeline¶
Configuration¶
Input and output video¶
We will be using the same video as in the previous examples. Note: the video used in this example was downloaded from youtube.
[3]:
DATA_PATH = "../data/"
# Input
INPUT_VIDEO = DATA_PATH+"/input/test_video.mp4"
START_FRAME = 0
MAX_FRAMES = 100
[4]:
%%HTML
<div style="text-align: center">
<video width="600" height="400" controls>
<source src="../data/input/test_video.mp4" type="video/mp4">
</video>
</div>
[5]:
# Output
OUTPUT_VIDEO = DATA_PATH+ "/output/test_output.avi"
Configuration of object detection components¶
Object detection needs a trained model. For this example we will be using a tensorflow implementation of YOLOv4 for which the weights shall be provided as a checkpoint/frozen graph.
To visualize bounding boxes, a text file with the names of the classes is also provided. Also, detected bounding boxes are stored in a CSV for later analysis or to work with other algorithms such as SORT.
[6]:
# Specific components for object detection
from videoanalytics.pipeline.sinks.object_detection import DetectionsAnnotator, DetectionsCSVWriter
from videoanalytics.pipeline.sinks.object_detection.yolo4 import YOLOv4DetectorTF
[7]:
# Detector
# Object Detector model weights (Tensorflow)
DETECTOR_WEIGHTS_FILENAME = DATA_PATH+ "object_detection/checkpoints/yolov4-416-tf"
#DETECTOR_WEIGHTS_FILENAME = DATA_PATH+ "object_detection/checkpoints/yolov4-tiny-416"
# Classes names for Detections Annotator
DETECTOR_CLASSES_FILENAME = DATA_PATH+"object_detection/classes_definitions/coco.txt"
# CSV with Detections filename
DETECTIONS_FILENAME = DATA_PATH+"/output/detections.csv"
Pipeline instantiation and execution¶
[8]:
# 1. Create the global context
context = {}
# 2. Create the pipeline
pipeline = Pipeline()
# 3. Add components
# 3.1 Source
pipeline.add_component( VideoReader( "input",context,
video_path=INPUT_VIDEO,
start_frame=START_FRAME,
max_frames=MAX_FRAMES))
[9]:
# 3.2 Detector
pipeline.add_component( YOLOv4DetectorTF("detector",context,weights_filename=DETECTOR_WEIGHTS_FILENAME) )
[10]:
# 3.3 Save detections to CSV
pipeline.add_component( DetectionsCSVWriter("det_csv_writer",context,filename=DETECTIONS_FILENAME) )
[11]:
# 3.4 Annotate detections in output video
pipeline.add_component( DetectionsAnnotator("annotator",context,
class_names_filename=DETECTOR_CLASSES_FILENAME,
show_label=True) )
[12]:
# 3.3 Sink
pipeline.add_component(VideoWriter("writer",context,filename=OUTPUT_VIDEO))
[13]:
# 4. Define connections
pipeline.set_connections([
("input", "detector"),
("detector", "det_csv_writer"),
("detector", "annotator"),
("annotator", "writer")
])
[14]:
import matplotlib.pyplot as plt
fig,axes = plt.subplots(1,1,figsize=(22,8))
pipeline.plot(ax=axes)
[15]:
# 5. Execute
pipeline.execute()
print("Total execution time [s]:", pipeline.get_total_execution_time())
Total execution time [s]: 51.311861097000474
Evaluation of execution time¶
Note that most of the execution time is being occupied by the object detector. In the setup being tested, an average of 0.5 seconds per frame (~2FPS) makes it unsuitable for real time.
In the following section a different scheme is proposed to evaluate the performance of the detection, and in the final section both model detection performance and infered time are considered to assist in the configuration of an optimal pipeline for a given scenario.
[16]:
import pandas as pd
# 6. Report (optional)
metrics_df = pd.DataFrame.from_dict(pipeline.get_metrics(), orient='index',columns=["time [s]"])
metrics_df
[16]:
| time [s] | |
|---|---|
| input_avg_dt | 0.006368 |
| detector_avg_dt | 0.482754 |
| annotator_avg_dt | 0.000493 |
| writer_avg_dt | 0.021735 |
| det_csv_writer_avg_dt | 0.000040 |
[17]:
!lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 94
Model name: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
Stepping: 3
CPU MHz: 2606.753
CPU max MHz: 3500,0000
CPU min MHz: 800,0000
BogoMIPS: 5199.98
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 6144K
NUMA node0 CPU(s): 0-7
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d
Exploration of results¶
Display the output video with annotated bounding boxes.
Note: currently XVID format is not supported by jupyter.
[18]:
%%HTML
<div style="text-align: center">
<video width="600" height="400" controls>
<source src="../data/output/test_output.avi" type="video/mp4">
</video>
</div>
By default DetectionsCSVWriter stores the results in YOLO normalized format (xc,yc,w,h).
[19]:
df_pred_dets = pd.read_csv(DETECTIONS_FILENAME,names=["frame_num","class_idx", "x","y","w","h","score","filename"])
df_pred_dets.head(5)
[19]:
| frame_num | class_idx | x | y | w | h | score | filename | |
|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 0 | 860.0 | 361.0 | 140.0 | 137.0 | 0.742915 | NaN |
| 1 | 0 | 0 | 696.0 | 387.0 | 111.0 | 104.0 | 0.620522 | NaN |
| 2 | 1 | 0 | 860.0 | 361.0 | 140.0 | 137.0 | 0.742915 | NaN |
| 3 | 1 | 0 | 696.0 | 387.0 | 111.0 | 104.0 | 0.620522 | NaN |
| 4 | 2 | 0 | 862.0 | 361.0 | 137.0 | 137.0 | 0.657860 | NaN |
We can convert this to (x0,y0,x1,y1) format with a more convenient format for visualization and interpretation of results with convert_detections(box_format=”xyxy”).
[20]:
from videoanalytics.pipeline.sinks.object_detection.utils import convert_detections
df_pred_dets = convert_detections(df_pred_dets,box_format="xyxy")
# Keep relevant columns only
df_pred_dets = df_pred_dets[['frame_num','class_idx','x0','y0','x1','y1','score']]
df_pred_dets.head(5)
[20]:
| frame_num | class_idx | x0 | y0 | x1 | y1 | score | |
|---|---|---|---|---|---|---|---|
| 0 | 0 | 0 | 860.0 | 361.0 | 1000.0 | 498.0 | 0.742915 |
| 1 | 0 | 0 | 696.0 | 387.0 | 807.0 | 491.0 | 0.620522 |
| 2 | 1 | 0 | 860.0 | 361.0 | 1000.0 | 498.0 | 0.742915 |
| 3 | 1 | 0 | 696.0 | 387.0 | 807.0 | 491.0 | 0.620522 |
| 4 | 2 | 0 | 862.0 | 361.0 | 999.0 | 498.0 | 0.657860 |
A pipeline for evaluation of results (frame-by-frame evaluation)¶
The previous example showed how to obtain detections from a video and the average inference time per frame, but it didnt provide a standard metric to evaluate the model precision.
Even if the task being performed is the identification of objects in a video and the relationship between contiguous frames could improve the quality of the detections, the object detector works on individual frames, so this example modifies the previous pipeline to read files from a directory containing images for which the expected (ground truth) detections are provided in separate text files.
[21]:
from glob import glob
import os
# Each .jpg in test directory has a matching .txt file with the annotations
TEST_IMG_SEQ_PATH = DATA_PATH+"/input/test_img_seq"
glob(TEST_IMG_SEQ_PATH+"/img*.*")[:5]
[21]:
['../data//input/test_img_seq/img8243.jpg',
'../data//input/test_img_seq/img7463.txt',
'../data//input/test_img_seq/img16689.txt',
'../data//input/test_img_seq/img1603.jpg',
'../data//input/test_img_seq/img16322.txt']
[22]:
img_list = glob(TEST_IMG_SEQ_PATH+"/img*.jpg")
img_list
[22]:
['../data//input/test_img_seq/img8243.jpg',
'../data//input/test_img_seq/img1603.jpg',
'../data//input/test_img_seq/img10956.jpg',
'../data//input/test_img_seq/img11071.jpg',
'../data//input/test_img_seq/img365.jpg',
'../data//input/test_img_seq/img2334.jpg',
'../data//input/test_img_seq/img8722.jpg',
'../data//input/test_img_seq/img16322.jpg',
'../data//input/test_img_seq/img12419.jpg',
'../data//input/test_img_seq/img8240.jpg',
'../data//input/test_img_seq/img7463.jpg',
'../data//input/test_img_seq/img9859.jpg',
'../data//input/test_img_seq/img16689.jpg',
'../data//input/test_img_seq/img5320.jpg',
'../data//input/test_img_seq/img10989.jpg',
'../data//input/test_img_seq/img10606.jpg',
'../data//input/test_img_seq/img15269.jpg',
'../data//input/test_img_seq/img2264.jpg',
'../data//input/test_img_seq/img12115.jpg']
Pipeline instantiation and execution¶
This step is similar to the other examples, with the only difference that in this case the input and outputs are a sequence of images.
ImageSequenceReader and ImageWriter are used instead of VideoReader and VideoFileWriter.
[23]:
from videoanalytics.pipeline.sources import ImageSequenceReader
from videoanalytics.pipeline.sinks import ImageWriter
[24]:
# 1. Create the global context
context = {}
# 2. Create the pipeline
pipeline = Pipeline()
# 3. Add components
# 3.1 Source
pipeline.add_component( ImageSequenceReader( "input",context, img_seq=img_list))
# 3.2 Detector
pipeline.add_component( YOLOv4DetectorTF("detector",context,weights_filename=DETECTOR_WEIGHTS_FILENAME) )
# 3.3 Save detections to CSV
pipeline.add_component( DetectionsCSVWriter("det_csv_writer",context,filename=DETECTIONS_FILENAME) )
# 3.4 Annotate detections in output video
pipeline.add_component( DetectionsAnnotator("annotator",context,
class_names_filename=DETECTOR_CLASSES_FILENAME,
show_label=True) )
# 3.5 Sink
pipeline.add_component(ImageWriter("writer",context,output_path= DATA_PATH+"/output/"))
[25]:
# 4. Define connections
pipeline.set_connections([
("input", "detector"),
("detector", "det_csv_writer"),
("detector", "annotator"),
("annotator", "writer")
])
[26]:
pipeline.optimize()
[27]:
import matplotlib.pyplot as plt
fig,axes = plt.subplots(1,1,figsize=(22,8))
pipeline.plot(ax=axes)
[28]:
pipeline.execute()
print("Total execution time [s]:", pipeline.get_total_execution_time())
Total execution time [s]: 11.51667647999966
Evaluation of results¶
Steps:
Build a dataframe with the bounding boxes for the ground truth.
Build a dataframe with the bounding boxes for the predictions.
Evaluate using different variants of mAP.
Build a dataframe with the bounding boxes for the ground truth.
[29]:
det_list = [os.path.splitext(filename)[0]+'.txt' for filename in img_list]
det_list
[29]:
['../data//input/test_img_seq/img8243.txt',
'../data//input/test_img_seq/img1603.txt',
'../data//input/test_img_seq/img10956.txt',
'../data//input/test_img_seq/img11071.txt',
'../data//input/test_img_seq/img365.txt',
'../data//input/test_img_seq/img2334.txt',
'../data//input/test_img_seq/img8722.txt',
'../data//input/test_img_seq/img16322.txt',
'../data//input/test_img_seq/img12419.txt',
'../data//input/test_img_seq/img8240.txt',
'../data//input/test_img_seq/img7463.txt',
'../data//input/test_img_seq/img9859.txt',
'../data//input/test_img_seq/img16689.txt',
'../data//input/test_img_seq/img5320.txt',
'../data//input/test_img_seq/img10989.txt',
'../data//input/test_img_seq/img10606.txt',
'../data//input/test_img_seq/img15269.txt',
'../data//input/test_img_seq/img2264.txt',
'../data//input/test_img_seq/img12115.txt']
[30]:
from videoanalytics.pipeline.sinks.object_detection.utils import load_detections_from_file_list
df_gt_dets = load_detections_from_file_list(det_list)
df_gt_dets.head(5)
[30]:
| filename | frame_num | class_idx | x | y | w | h | img_w | img_h | x0 | x1 | y0 | y1 | difficult | crowd | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | img8243 | 0 | 0 | 1287.99936 | 804.99960 | 338.00064 | 525.99996 | 1920 | 1080 | 1118.99904 | 1456.99968 | 541.99962 | 1067.99958 | 0 | 0 |
| 1 | img8243 | 0 | 0 | 612.00000 | 460.00008 | 110.00064 | 216.00000 | 1920 | 1080 | 556.99968 | 667.00032 | 352.00008 | 568.00008 | 0 | 0 |
| 2 | img8243 | 0 | 0 | 706.99968 | 426.99960 | 120.00000 | 185.99976 | 1920 | 1080 | 646.99968 | 766.99968 | 333.99972 | 519.99948 | 0 | 0 |
| 3 | img8243 | 0 | 0 | 1134.00000 | 456.99984 | 150.00000 | 253.99980 | 1920 | 1080 | 1059.00000 | 1209.00000 | 329.99994 | 583.99974 | 0 | 0 |
| 4 | img8243 | 0 | 1 | 1651.49952 | 737.49960 | 301.00032 | 388.99980 | 1920 | 1080 | 1500.99936 | 1801.99968 | 542.99970 | 931.99950 | 0 | 0 |
Build a dataframe with the bounding boxes for the predictions.
[31]:
df_pred_dets = pd.read_csv(DETECTIONS_FILENAME,names=["frame_num","class_idx", "x","y","w","h","score","filename"])
df_pred_dets = convert_detections(df_pred_dets)
df_pred_dets.head(5)
[31]:
| frame_num | class_idx | x | y | w | h | score | filename | x0 | y0 | x1 | y1 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 0 | 1125.0 | 559.0 | 305.0 | 483.0 | 0.718696 | img8243 | 1125.0 | 559.0 | 1430.0 | 1042.0 |
| 1 | 0 | 0 | 641.0 | 341.0 | 130.0 | 127.0 | 0.672495 | img8243 | 641.0 | 341.0 | 771.0 | 468.0 |
| 2 | 1 | 0 | 1191.0 | 365.0 | 90.0 | 134.0 | 0.589164 | img1603 | 1191.0 | 365.0 | 1281.0 | 499.0 |
| 3 | 2 | 0 | 0.0 | 537.0 | 362.0 | 481.0 | 0.862434 | img10956 | 0.0 | 537.0 | 362.0 | 1018.0 |
| 4 | 2 | 0 | 1202.0 | 800.0 | 625.0 | 280.0 | 0.629002 | img10956 | 1202.0 | 800.0 | 1827.0 | 1080.0 |
Test with a sample image for class 0 (person).
[32]:
from videoanalytics.pipeline.sinks.object_detection.utils import plot_predictions_vs_ground_truth
import matplotlib.pyplot as plt
fig,axes=plt.subplots(1,1,figsize=(22,12))
plot_predictions_vs_ground_truth(df_gt_dets,df_pred_dets,
img_path='../data//input/test_img_seq/',
img_name='img8243',
class_idx=0,
ax=axes);
Evaluate using different variants of mAP.
[33]:
from videoanalytics.pipeline.sinks.object_detection.evaluation import evaluate_object_detection_predictions
df_od_metrics = evaluate_object_detection_predictions(
df_gt_dets, df_pred_dets, classes = [0],model_name="yolov4")
df_od_metrics
[33]:
| VOC PASCAL | VOC PASCAL (all points) | COCO | |
|---|---|---|---|
| yolov4 | 0.545455 | 0.5 | 0.254455 |
Model selection¶
The following example defines a function to instantiate a minimal object detection pipeline specyfing the weights and classes file, which is the used to test a set of models and collect their metrics of interest (the mAPs and average per frame/total execution times).
[34]:
def make_pipeline(context, eval_img_list, weights_filename,detections_csv_filename,detector_classes_filename,
save_imgs=False,output_img_path=None):
pipeline = Pipeline()
connections = [
("input", "detector"),
("detector", "det_csv_writer")
]
pipeline.add_component( ImageSequenceReader( "input",context, img_seq=eval_img_list))
pipeline.add_component( YOLOv4DetectorTF("detector",context,weights_filename=weights_filename) )
pipeline.add_component( DetectionsCSVWriter("det_csv_writer",context,filename=detections_csv_filename) )
if save_imgs:
pipeline.add_component( DetectionsAnnotator("annotator",context,
class_names_filename=detector_classes_filename,
show_label=True) )
pipeline.add_component(ImageWriter("writer",context,output_path=output_img_path))
connections+=[
("detector", "annotator"),
("annotator", "writer")
]
pipeline.set_connections(connections)
return pipeline
Reload ground truth dataframe as an previous example.
[35]:
# Ground truth
det_list = [os.path.splitext(filename)[0]+'.txt' for filename in img_list]
df_gt_dets = load_detections_from_file_list(det_list)
df_gt_dets.head(5)
[35]:
| filename | frame_num | class_idx | x | y | w | h | img_w | img_h | x0 | x1 | y0 | y1 | difficult | crowd | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | img8243 | 0 | 0 | 1287.99936 | 804.99960 | 338.00064 | 525.99996 | 1920 | 1080 | 1118.99904 | 1456.99968 | 541.99962 | 1067.99958 | 0 | 0 |
| 1 | img8243 | 0 | 0 | 612.00000 | 460.00008 | 110.00064 | 216.00000 | 1920 | 1080 | 556.99968 | 667.00032 | 352.00008 | 568.00008 | 0 | 0 |
| 2 | img8243 | 0 | 0 | 706.99968 | 426.99960 | 120.00000 | 185.99976 | 1920 | 1080 | 646.99968 | 766.99968 | 333.99972 | 519.99948 | 0 | 0 |
| 3 | img8243 | 0 | 0 | 1134.00000 | 456.99984 | 150.00000 | 253.99980 | 1920 | 1080 | 1059.00000 | 1209.00000 | 329.99994 | 583.99974 | 0 | 0 |
| 4 | img8243 | 0 | 1 | 1651.49952 | 737.49960 | 301.00032 | 388.99980 | 1920 | 1080 | 1500.99936 | 1801.99968 | 542.99970 | 931.99950 | 0 | 0 |
Collect results in this datframe.
[36]:
columns = list(df_od_metrics.columns)+["avg_exec_time","total_exec_time"]
[37]:
df_model_sel_results = pd.DataFrame(columns=columns)
df_model_sel_results
[37]:
| VOC PASCAL | VOC PASCAL (all points) | COCO | avg_exec_time | total_exec_time |
|---|
Define paths with model parameters (for this example only YOLOv4 tensorflow implementation variants are considered) and temporary outputs.
[38]:
CHECKPOINTS_PATH = "../data/object_detection/checkpoints/"
OUTPUT_PATH = DATA_PATH+"/output/"
Dictionary containing models to test.
[39]:
models_to_test = {
"yolov4": {
"weights": CHECKPOINTS_PATH + "yolov4-416-tf"
},
"yolov4-tiny": {
"weights": CHECKPOINTS_PATH + "yolov4-tiny-416"
},
}
Run each model in the same test set.
[40]:
for model_name,model_params in models_to_test.items():
print("Evaluating model {}".format(model_name))
detections_csv_filename=OUTPUT_PATH+model_name+".csv"
# 1. Build pipeline
context = {}
pipeline = make_pipeline(context, img_list, weights_filename=model_params["weights"],
detections_csv_filename=detections_csv_filename,
detector_classes_filename=DETECTOR_CLASSES_FILENAME,
save_imgs=False)
# 2. Execute pipeline
pipeline.execute()
total_exec_time = pipeline.get_total_execution_time()
df_pipeline_metrics = pd.DataFrame.from_dict(pipeline.get_metrics(), orient='index',columns=["time [s]"])
# 3. Read predictions
df_pred_dets = pd.read_csv(detections_csv_filename,
names=["frame_num","class_idx", "x","y","w","h","score","filename"])
df_pred_dets = convert_detections(df_pred_dets)
# 4. Evaluate and append results
df_od_metrics = evaluate_object_detection_predictions( df_gt_dets, df_pred_dets,
classes = [0],model_name=model_name)
df_od_metrics['avg_exec_time'] = df_pipeline_metrics.loc['detector_avg_dt'][0]
df_od_metrics['total_exec_time'] = total_exec_time
df_model_sel_results = df_model_sel_results.append( df_od_metrics )
Evaluating model yolov4
Evaluating model yolov4-tiny
[41]:
df_model_sel_results
[41]:
| VOC PASCAL | VOC PASCAL (all points) | COCO | avg_exec_time | total_exec_time | |
|---|---|---|---|---|---|
| yolov4 | 0.545455 | 0.5 | 0.254455 | 0.494261 | 10.047086 |
| yolov4-tiny | 0.000000 | 0.0 | 0.000000 | 0.065422 | 1.896999 |