class Args(argparse.Namespace):
model = 'yolov8l.pt'
cfg = 'default.yaml'
iterative_steps = 15
target_prune_rate = 0.15
max_map_drop = 0.2
sched = Schedule(partial(sched_onecycle, α=10, β=4))
args=Args()
prune(args)Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
YOLOv8l summary (fused): 121 layers, 43,668,288 parameters, 0 gradients, 165.2 GFLOPs
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 3846.6±2037.7 MB/s, size: 53.3 KB)
val: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco128/
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.731 0.768 0.828 0.66
Speed: 0.7ms preprocess, 3.3ms inference, 0.0ms loss, 2.2ms postprocess per image
Results saved to runs/detect/val7
Before Pruning: MACs= 82.72641 G, #Params= 43.69152 M, mAP= 0.66035
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
engine/trainer: agnostic_nms=False, amp=False, augment=False, auto_augment=randaugment, batch=16, bgr=0.0, box=7.5, cache=False, cfg=None, classes=None, close_mosaic=10, cls=0.5, conf=None, copy_paste=0.0, copy_paste_mode=flip, cos_lr=False, cutmix=0.0, data=coco128.yaml, degrees=0.0, deterministic=True, device=None, dfl=1.5, dnn=False, dropout=0.0, dynamic=False, embed=None, epochs=10, erasing=0.4, exist_ok=False, fliplr=0.5, flipud=0.0, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, imgsz=640, int8=False, iou=0.7, keras=False, kobj=1.0, line_width=None, lr0=0.01, lrf=0.01, mask_ratio=4, max_det=300, mixup=0.0, mode=train, model=yolov8l.pt, momentum=0.937, mosaic=1.0, multi_scale=False, name=train7, nbs=64, nms=False, opset=None, optimize=False, optimizer=auto, overlap_mask=True, patience=100, perspective=0.0, plots=True, pose=12.0, pretrained=True, profile=False, project=None, rect=False, resume=False, retina_masks=False, save=True, save_conf=False, save_crop=False, save_dir=runs/detect/train7, save_frames=False, save_json=False, save_period=-1, save_txt=False, scale=0.5, seed=0, shear=0.0, show=False, show_boxes=True, show_conf=True, show_labels=True, simplify=True, single_cls=False, source=None, split=val, stream_buffer=False, task=detect, time=None, tracker=botsort.yaml, translate=0.1, val=True, verbose=False, vid_stride=1, visualize=False, warmup_bias_lr=0.1, warmup_epochs=3.0, warmup_momentum=0.8, weight_decay=0.0005, workers=8, workspace=None
Freezing layer 'model.22.dfl.conv.weight'
train: Fast image access ✅ (ping: 0.0±0.0 ms, read: 3560.8±1144.6 MB/s, size: 50.9 KB)
train: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco12
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 1679.0±396.4 MB/s, size: 52.5 KB)
val: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco128/
Plotting labels to runs/detect/train7/labels.jpg...
optimizer: 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically...
optimizer: AdamW(lr=0.000119, momentum=0.9) with parameter groups 105 weight(decay=0.0), 112 weight(decay=0.0005), 111 bias(decay=0.0)
Image sizes 640 train, 640 val
Using 8 dataloader workers
Logging results to runs/detect/train7
Starting training for 10 epochs...
Closing dataloader mosaic
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
1/10 17.6G 0.8369 0.7191 1.072 121 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.774 0.763 0.839 0.674
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
2/10 17.1G 0.8351 0.665 1.061 113 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.826 0.783 0.85 0.689
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
3/10 17.2G 0.8322 0.6222 1.066 118 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.858 0.794 0.86 0.704
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
4/10 17.2G 0.8023 0.5615 1.029 68 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.896 0.793 0.87 0.717
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
5/10 17.1G 0.7755 0.521 1.012 95 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.879 0.824 0.89 0.731
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
6/10 17.1G 0.7552 0.5039 1.011 122 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.869 0.84 0.892 0.738
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
7/10 17.1G 0.7342 0.4821 0.9817 75 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.885 0.835 0.896 0.749
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
8/10 17.2G 0.7389 0.4766 0.9989 142 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.884 0.855 0.904 0.762
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
9/10 17.2G 0.7197 0.4778 0.9785 104 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.875 0.866 0.909 0.767
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
10/10 17.2G 0.7149 0.457 1.007 164 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.882 0.867 0.911 0.768
10 epochs completed in 0.009 hours.
Optimizer stripped from runs/detect/train7/weights/last.pt, 175.3MB
Optimizer stripped from runs/detect/train7/weights/best.pt, 175.3MB
Validating runs/detect/train7/weights/best.pt...
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
YOLOv8l summary (fused): 121 layers, 43,668,288 parameters, 0 gradients, 165.2 GFLOPs
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.883 0.867 0.911 0.768
Speed: 0.1ms preprocess, 2.7ms inference, 0.0ms loss, 0.3ms postprocess per image
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
YOLOv8l summary (fused): 121 layers, 43,668,288 parameters, 0 gradients, 165.2 GFLOPs
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 5500.5±982.8 MB/s, size: 53.4 KB)
val: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco128/
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.901 0.849 0.904 0.769
Speed: 0.1ms preprocess, 5.4ms inference, 0.0ms loss, 0.4ms postprocess per image
Results saved to runs/detect/baseline_val
Before Pruning: MACs= 82.72641 G, #Params= 43.69152 M, mAP= 0.76904
Conv2d(3, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
0.0027046189978777607
After Pruning
Model Conv2d(3, 63, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 63, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
YOLOv8l summary (fused): 121 layers, 43,081,939 parameters, 74,176 gradients, 162.7 GFLOPs
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 5675.7±1427.5 MB/s, size: 44.7 KB)
val: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco128/
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.878 0.863 0.903 0.748
Speed: 0.1ms preprocess, 6.8ms inference, 0.0ms loss, 0.4ms postprocess per image
Results saved to runs/detect/step_0_pre_val
After post-pruning Validation
Model Conv2d(3, 63, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 63, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
After pruning iter 1: MACs=81.5020432 G, #Params=43.105009 M, mAP=0.7480799419444839, speed up=1.0150224847369225
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
engine/trainer: agnostic_nms=False, amp=False, augment=False, auto_augment=randaugment, batch=16, bgr=0.0, box=7.5, cache=False, cfg=None, classes=None, close_mosaic=10, cls=0.5, conf=None, copy_paste=0.0, copy_paste_mode=flip, cos_lr=False, cutmix=0.0, data=coco128.yaml, degrees=0.0, deterministic=True, device=None, dfl=1.5, dnn=False, dropout=0.0, dynamic=False, embed=None, epochs=10, erasing=0.4, exist_ok=False, fliplr=0.5, flipud=0.0, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, imgsz=640, int8=False, iou=0.7, keras=False, kobj=1.0, line_width=None, lr0=0.01, lrf=0.01, mask_ratio=4, max_det=300, mixup=0.0, mode=train, model=yolov8l.pt, momentum=0.937, mosaic=1.0, multi_scale=False, name=step_0_finetune, nbs=64, nms=False, opset=None, optimize=False, optimizer=auto, overlap_mask=True, patience=100, perspective=0.0, plots=True, pose=12.0, pretrained=True, profile=False, project=None, rect=False, resume=False, retina_masks=False, save=True, save_conf=False, save_crop=False, save_dir=runs/detect/step_0_finetune, save_frames=False, save_json=False, save_period=-1, save_txt=False, scale=0.5, seed=0, shear=0.0, show=False, show_boxes=True, show_conf=True, show_labels=True, simplify=True, single_cls=False, source=None, split=val, stream_buffer=False, task=detect, time=None, tracker=botsort.yaml, translate=0.1, val=True, verbose=False, vid_stride=1, visualize=False, warmup_bias_lr=0.1, warmup_epochs=3.0, warmup_momentum=0.8, weight_decay=0.0005, workers=8, workspace=None
Freezing layer 'model.22.dfl.conv.weight'
train: Fast image access ✅ (ping: 0.0±0.0 ms, read: 4199.8±1440.5 MB/s, size: 50.9 KB)
train: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco12
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 1993.0±415.7 MB/s, size: 52.5 KB)
val: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco128/
Plotting labels to runs/detect/step_0_finetune/labels.jpg...
optimizer: 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically...
optimizer: AdamW(lr=0.000119, momentum=0.9) with parameter groups 105 weight(decay=0.0), 112 weight(decay=0.0005), 111 bias(decay=0.0)
Image sizes 640 train, 640 val
Using 8 dataloader workers
Logging results to runs/detect/step_0_finetune
Starting training for 10 epochs...
Closing dataloader mosaic
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
1/10 17.3G 0.6682 0.4222 0.9629 121 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.901 0.849 0.908 0.756
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
2/10 17.3G 0.6351 0.3917 0.9467 113 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.907 0.847 0.915 0.757
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
3/10 17.5G 0.6704 0.4248 0.9809 118 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.904 0.854 0.918 0.762
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
4/10 17.4G 0.6577 0.3918 0.955 68 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.901 0.857 0.919 0.768
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
5/10 17.6G 0.6374 0.3958 0.9421 95 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.892 0.868 0.917 0.775
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
6/10 17.6G 0.6424 0.4056 0.9488 122 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.903 0.867 0.917 0.776
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
7/10 17.4G 0.628 0.3976 0.9314 75 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.923 0.856 0.921 0.783
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
8/10 17.5G 0.6647 0.3993 0.963 142 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.921 0.867 0.926 0.79
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
9/10 17.4G 0.6561 0.4047 0.9421 104 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.907 0.881 0.929 0.793
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
10/10 17.5G 0.6618 0.416 0.9685 164 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.913 0.88 0.931 0.794
10 epochs completed in 0.009 hours.
Optimizer stripped from runs/detect/step_0_finetune/weights/last.pt, 173.0MB
Optimizer stripped from runs/detect/step_0_finetune/weights/best.pt, 173.0MB
Validating runs/detect/step_0_finetune/weights/best.pt...
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
YOLOv8l summary (fused): 121 layers, 43,081,939 parameters, 0 gradients, 162.7 GFLOPs
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.913 0.88 0.931 0.794
Speed: 0.1ms preprocess, 3.1ms inference, 0.0ms loss, 0.3ms postprocess per image
After fine-tuning
Model Conv2d(3, 63, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 63, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
YOLOv8l summary (fused): 121 layers, 43,081,939 parameters, 0 gradients, 162.7 GFLOPs
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 5127.1±1250.8 MB/s, size: 53.4 KB)
val: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco128/
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.919 0.875 0.928 0.79
Speed: 0.1ms preprocess, 7.0ms inference, 0.0ms loss, 0.4ms postprocess per image
Results saved to runs/detect/step_0_post_val
After fine tuning mAP=0.7902131736934158
After post fine-tuning validation
Model Conv2d(3, 63, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 63, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
0.005179586515491673
After Pruning
Model Conv2d(3, 63, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 63, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
YOLOv8l summary (fused): 121 layers, 42,712,366 parameters, 74,160 gradients, 161.3 GFLOPs
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 5252.0±1282.8 MB/s, size: 44.7 KB)
val: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco128/
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.934 0.855 0.926 0.784
Speed: 0.1ms preprocess, 7.0ms inference, 0.0ms loss, 0.4ms postprocess per image
Results saved to runs/detect/step_1_pre_val
After post-pruning Validation
Model Conv2d(3, 63, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 63, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
After pruning iter 2: MACs=80.7933916 G, #Params=42.735334 M, mAP=0.7843893707557463, speed up=1.0239254072854147
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
engine/trainer: agnostic_nms=False, amp=False, augment=False, auto_augment=randaugment, batch=16, bgr=0.0, box=7.5, cache=False, cfg=None, classes=None, close_mosaic=10, cls=0.5, conf=None, copy_paste=0.0, copy_paste_mode=flip, cos_lr=False, cutmix=0.0, data=coco128.yaml, degrees=0.0, deterministic=True, device=None, dfl=1.5, dnn=False, dropout=0.0, dynamic=False, embed=None, epochs=10, erasing=0.4, exist_ok=False, fliplr=0.5, flipud=0.0, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, imgsz=640, int8=False, iou=0.7, keras=False, kobj=1.0, line_width=None, lr0=0.01, lrf=0.01, mask_ratio=4, max_det=300, mixup=0.0, mode=train, model=yolov8l.pt, momentum=0.937, mosaic=1.0, multi_scale=False, name=step_1_finetune, nbs=64, nms=False, opset=None, optimize=False, optimizer=auto, overlap_mask=True, patience=100, perspective=0.0, plots=True, pose=12.0, pretrained=True, profile=False, project=None, rect=False, resume=False, retina_masks=False, save=True, save_conf=False, save_crop=False, save_dir=runs/detect/step_1_finetune, save_frames=False, save_json=False, save_period=-1, save_txt=False, scale=0.5, seed=0, shear=0.0, show=False, show_boxes=True, show_conf=True, show_labels=True, simplify=True, single_cls=False, source=None, split=val, stream_buffer=False, task=detect, time=None, tracker=botsort.yaml, translate=0.1, val=True, verbose=False, vid_stride=1, visualize=False, warmup_bias_lr=0.1, warmup_epochs=3.0, warmup_momentum=0.8, weight_decay=0.0005, workers=8, workspace=None
Freezing layer 'model.22.dfl.conv.weight'
train: Fast image access ✅ (ping: 0.0±0.0 ms, read: 3475.7±1351.6 MB/s, size: 50.9 KB)
train: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco12
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 1459.9±423.3 MB/s, size: 52.5 KB)
val: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco128/
Plotting labels to runs/detect/step_1_finetune/labels.jpg...
optimizer: 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically...
optimizer: AdamW(lr=0.000119, momentum=0.9) with parameter groups 105 weight(decay=0.0), 112 weight(decay=0.0005), 111 bias(decay=0.0)
Image sizes 640 train, 640 val
Using 8 dataloader workers
Logging results to runs/detect/step_1_finetune
Starting training for 10 epochs...
Closing dataloader mosaic
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
1/10 17.1G 0.5668 0.3537 0.9157 121 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.933 0.866 0.93 0.789
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
2/10 17.2G 0.5344 0.3429 0.9029 113 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.92 0.886 0.937 0.797
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
3/10 17.2G 0.5649 0.3446 0.9291 118 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.918 0.885 0.936 0.796
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
4/10 17.3G 0.5479 0.3429 0.9087 68 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.925 0.875 0.938 0.8
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
5/10 17.5G 0.5515 0.3491 0.8995 95 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.926 0.875 0.938 0.799
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
6/10 17.5G 0.5535 0.3455 0.9062 122 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.903 0.879 0.936 0.799
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
7/10 17.3G 0.5605 0.353 0.8941 75 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.91 0.881 0.94 0.804
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
8/10 17.4G 0.6074 0.3693 0.9276 142 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.921 0.89 0.944 0.814
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
9/10 17.4G 0.5933 0.3803 0.9049 104 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.927 0.895 0.945 0.814
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
10/10 17.6G 0.6217 0.3959 0.9434 164 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.927 0.896 0.946 0.817
10 epochs completed in 0.009 hours.
Optimizer stripped from runs/detect/step_1_finetune/weights/last.pt, 171.5MB
Optimizer stripped from runs/detect/step_1_finetune/weights/best.pt, 171.5MB
Validating runs/detect/step_1_finetune/weights/best.pt...
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
YOLOv8l summary (fused): 121 layers, 42,712,366 parameters, 0 gradients, 161.3 GFLOPs
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.927 0.896 0.946 0.817
Speed: 0.1ms preprocess, 3.1ms inference, 0.0ms loss, 0.3ms postprocess per image
After fine-tuning
Model Conv2d(3, 63, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 63, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
YOLOv8l summary (fused): 121 layers, 42,712,366 parameters, 0 gradients, 161.3 GFLOPs
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 5559.9±1213.1 MB/s, size: 53.4 KB)
val: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco128/
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.925 0.887 0.939 0.807
Speed: 0.2ms preprocess, 6.9ms inference, 0.0ms loss, 0.4ms postprocess per image
Results saved to runs/detect/step_1_post_val
After fine tuning mAP=0.807224186903875
After post fine-tuning validation
Model Conv2d(3, 63, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 63, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
0.009769531739708686
After Pruning
Model Conv2d(3, 63, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 62, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
YOLOv8l summary (fused): 121 layers, 42,094,706 parameters, 74,160 gradients, 158.8 GFLOPs
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 4357.7±655.2 MB/s, size: 44.7 KB)
val: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco128/
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.934 0.879 0.942 0.802
Speed: 0.1ms preprocess, 7.0ms inference, 0.0ms loss, 0.4ms postprocess per image
Results saved to runs/detect/step_2_pre_val
After post-pruning Validation
Model Conv2d(3, 63, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 62, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
After pruning iter 3: MACs=79.5541908 G, #Params=42.117503 M, mAP=0.8017145052594012, speed up=1.0398749024796818
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
engine/trainer: agnostic_nms=False, amp=False, augment=False, auto_augment=randaugment, batch=16, bgr=0.0, box=7.5, cache=False, cfg=None, classes=None, close_mosaic=10, cls=0.5, conf=None, copy_paste=0.0, copy_paste_mode=flip, cos_lr=False, cutmix=0.0, data=coco128.yaml, degrees=0.0, deterministic=True, device=None, dfl=1.5, dnn=False, dropout=0.0, dynamic=False, embed=None, epochs=10, erasing=0.4, exist_ok=False, fliplr=0.5, flipud=0.0, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, imgsz=640, int8=False, iou=0.7, keras=False, kobj=1.0, line_width=None, lr0=0.01, lrf=0.01, mask_ratio=4, max_det=300, mixup=0.0, mode=train, model=yolov8l.pt, momentum=0.937, mosaic=1.0, multi_scale=False, name=step_2_finetune, nbs=64, nms=False, opset=None, optimize=False, optimizer=auto, overlap_mask=True, patience=100, perspective=0.0, plots=True, pose=12.0, pretrained=True, profile=False, project=None, rect=False, resume=False, retina_masks=False, save=True, save_conf=False, save_crop=False, save_dir=runs/detect/step_2_finetune, save_frames=False, save_json=False, save_period=-1, save_txt=False, scale=0.5, seed=0, shear=0.0, show=False, show_boxes=True, show_conf=True, show_labels=True, simplify=True, single_cls=False, source=None, split=val, stream_buffer=False, task=detect, time=None, tracker=botsort.yaml, translate=0.1, val=True, verbose=False, vid_stride=1, visualize=False, warmup_bias_lr=0.1, warmup_epochs=3.0, warmup_momentum=0.8, weight_decay=0.0005, workers=8, workspace=None
Freezing layer 'model.22.dfl.conv.weight'
train: Fast image access ✅ (ping: 0.0±0.0 ms, read: 3830.3±1473.9 MB/s, size: 50.9 KB)
train: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco12
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 1639.3±480.6 MB/s, size: 52.5 KB)
val: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco128/
Plotting labels to runs/detect/step_2_finetune/labels.jpg...
optimizer: 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically...
optimizer: AdamW(lr=0.000119, momentum=0.9) with parameter groups 105 weight(decay=0.0), 112 weight(decay=0.0005), 111 bias(decay=0.0)
Image sizes 640 train, 640 val
Using 8 dataloader workers
Logging results to runs/detect/step_2_finetune
Starting training for 10 epochs...
Closing dataloader mosaic
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
1/10 16.9G 0.5199 0.3244 0.8907 121 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.927 0.881 0.944 0.812
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
2/10 17.2G 0.5038 0.3259 0.8853 113 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.922 0.887 0.941 0.81
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
3/10 17.2G 0.5075 0.3171 0.9042 118 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.914 0.895 0.948 0.813
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
4/10 17.2G 0.5008 0.3164 0.8908 68 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.92 0.887 0.944 0.812
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
5/10 17.2G 0.4901 0.3191 0.8742 95 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.934 0.88 0.945 0.814
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
6/10 17.3G 0.4969 0.3177 0.8799 122 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.936 0.887 0.947 0.818
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
7/10 17.1G 0.5126 0.3256 0.8695 75 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.912 0.904 0.95 0.82
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
8/10 17.3G 0.5631 0.3562 0.9061 142 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.918 0.904 0.953 0.821
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
9/10 17.3G 0.5603 0.3584 0.8904 104 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.924 0.898 0.952 0.823
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
10/10 17.6G 0.6014 0.3852 0.9412 164 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.929 0.897 0.952 0.826
10 epochs completed in 0.009 hours.
Optimizer stripped from runs/detect/step_2_finetune/weights/last.pt, 169.0MB
Optimizer stripped from runs/detect/step_2_finetune/weights/best.pt, 169.0MB
Validating runs/detect/step_2_finetune/weights/best.pt...
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
YOLOv8l summary (fused): 121 layers, 42,094,706 parameters, 0 gradients, 158.8 GFLOPs
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.929 0.897 0.952 0.826
Speed: 0.1ms preprocess, 3.1ms inference, 0.0ms loss, 0.3ms postprocess per image
After fine-tuning
Model Conv2d(3, 62, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 62, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
YOLOv8l summary (fused): 121 layers, 42,094,706 parameters, 0 gradients, 158.8 GFLOPs
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 1953.9±892.9 MB/s, size: 53.4 KB)
val: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco128/
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.926 0.893 0.95 0.82
Speed: 0.2ms preprocess, 7.0ms inference, 0.0ms loss, 0.4ms postprocess per image
Results saved to runs/detect/step_2_post_val
After fine tuning mAP=0.8196362847789926
After post fine-tuning validation
Model Conv2d(3, 62, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 62, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
0.017924759478681728
After Pruning
Model Conv2d(3, 62, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 61, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
YOLOv8l summary (fused): 121 layers, 40,919,781 parameters, 74,160 gradients, 154.4 GFLOPs
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 5161.9±848.4 MB/s, size: 44.7 KB)
val: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco128/
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.914 0.87 0.936 0.783
Speed: 0.1ms preprocess, 6.9ms inference, 0.0ms loss, 0.4ms postprocess per image
Results saved to runs/detect/step_3_pre_val
After post-pruning Validation
Model Conv2d(3, 62, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 61, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
After pruning iter 4: MACs=77.3600192 G, #Params=40.942254 M, mAP=0.782782444051276, speed up=1.0693690003634333
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
engine/trainer: agnostic_nms=False, amp=False, augment=False, auto_augment=randaugment, batch=16, bgr=0.0, box=7.5, cache=False, cfg=None, classes=None, close_mosaic=10, cls=0.5, conf=None, copy_paste=0.0, copy_paste_mode=flip, cos_lr=False, cutmix=0.0, data=coco128.yaml, degrees=0.0, deterministic=True, device=None, dfl=1.5, dnn=False, dropout=0.0, dynamic=False, embed=None, epochs=10, erasing=0.4, exist_ok=False, fliplr=0.5, flipud=0.0, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, imgsz=640, int8=False, iou=0.7, keras=False, kobj=1.0, line_width=None, lr0=0.01, lrf=0.01, mask_ratio=4, max_det=300, mixup=0.0, mode=train, model=yolov8l.pt, momentum=0.937, mosaic=1.0, multi_scale=False, name=step_3_finetune, nbs=64, nms=False, opset=None, optimize=False, optimizer=auto, overlap_mask=True, patience=100, perspective=0.0, plots=True, pose=12.0, pretrained=True, profile=False, project=None, rect=False, resume=False, retina_masks=False, save=True, save_conf=False, save_crop=False, save_dir=runs/detect/step_3_finetune, save_frames=False, save_json=False, save_period=-1, save_txt=False, scale=0.5, seed=0, shear=0.0, show=False, show_boxes=True, show_conf=True, show_labels=True, simplify=True, single_cls=False, source=None, split=val, stream_buffer=False, task=detect, time=None, tracker=botsort.yaml, translate=0.1, val=True, verbose=False, vid_stride=1, visualize=False, warmup_bias_lr=0.1, warmup_epochs=3.0, warmup_momentum=0.8, weight_decay=0.0005, workers=8, workspace=None
Freezing layer 'model.22.dfl.conv.weight'
train: Fast image access ✅ (ping: 0.0±0.0 ms, read: 3738.8±1510.8 MB/s, size: 50.9 KB)
train: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco12
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 1659.4±471.2 MB/s, size: 52.5 KB)
val: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco128/
Plotting labels to runs/detect/step_3_finetune/labels.jpg...
optimizer: 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically...
optimizer: AdamW(lr=0.000119, momentum=0.9) with parameter groups 105 weight(decay=0.0), 112 weight(decay=0.0005), 111 bias(decay=0.0)
Image sizes 640 train, 640 val
Using 8 dataloader workers
Logging results to runs/detect/step_3_finetune
Starting training for 10 epochs...
Closing dataloader mosaic
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
1/10 16.7G 0.533 0.3392 0.8902 121 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.929 0.865 0.938 0.799
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
2/10 17.4G 0.4804 0.31 0.871 113 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.926 0.891 0.943 0.815
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
3/10 17.4G 0.4873 0.3176 0.8843 118 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.925 0.891 0.942 0.817
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
4/10 17G 0.4908 0.3098 0.8743 68 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.934 0.886 0.943 0.821
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
5/10 16.9G 0.4684 0.3018 0.8614 95 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.916 0.894 0.944 0.82
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
6/10 17.1G 0.4781 0.3192 0.862 122 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.917 0.891 0.944 0.82
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
7/10 17G 0.5015 0.3257 0.8657 75 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.927 0.893 0.951 0.826
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
8/10 17G 0.5618 0.3555 0.8989 142 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.935 0.893 0.953 0.832
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
9/10 17.1G 0.5484 0.3455 0.88 104 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.928 0.895 0.953 0.832
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
10/10 17.4G 0.5878 0.3835 0.9243 164 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.925 0.892 0.952 0.834
10 epochs completed in 0.009 hours.
Optimizer stripped from runs/detect/step_3_finetune/weights/last.pt, 164.3MB
Optimizer stripped from runs/detect/step_3_finetune/weights/best.pt, 164.3MB
Validating runs/detect/step_3_finetune/weights/best.pt...
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
YOLOv8l summary (fused): 121 layers, 40,919,781 parameters, 0 gradients, 154.4 GFLOPs
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.925 0.892 0.952 0.834
Speed: 0.1ms preprocess, 3.0ms inference, 0.0ms loss, 0.3ms postprocess per image
After fine-tuning
Model Conv2d(3, 61, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 61, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
YOLOv8l summary (fused): 121 layers, 40,919,781 parameters, 0 gradients, 154.4 GFLOPs
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 4332.0±1294.7 MB/s, size: 53.4 KB)
val: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco128/
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.923 0.89 0.948 0.833
Speed: 0.2ms preprocess, 7.0ms inference, 0.0ms loss, 0.4ms postprocess per image
Results saved to runs/detect/step_3_post_val
After fine tuning mAP=0.8329326575889358
After post fine-tuning validation
Model Conv2d(3, 61, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 61, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
0.03136884242508382
After Pruning
Model Conv2d(3, 61, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 60, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
YOLOv8l summary (fused): 121 layers, 39,455,305 parameters, 74,160 gradients, 149.4 GFLOPs
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 4983.8±422.4 MB/s, size: 44.7 KB)
val: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco128/
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.915 0.877 0.937 0.794
Speed: 0.1ms preprocess, 6.9ms inference, 0.0ms loss, 0.4ms postprocess per image
Results saved to runs/detect/step_4_pre_val
After post-pruning Validation
Model Conv2d(3, 61, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 60, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
After pruning iter 5: MACs=74.8418608 G, #Params=39.477376 M, mAP=0.7937988689018066, speed up=1.1053494062777232
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
engine/trainer: agnostic_nms=False, amp=False, augment=False, auto_augment=randaugment, batch=16, bgr=0.0, box=7.5, cache=False, cfg=None, classes=None, close_mosaic=10, cls=0.5, conf=None, copy_paste=0.0, copy_paste_mode=flip, cos_lr=False, cutmix=0.0, data=coco128.yaml, degrees=0.0, deterministic=True, device=None, dfl=1.5, dnn=False, dropout=0.0, dynamic=False, embed=None, epochs=10, erasing=0.4, exist_ok=False, fliplr=0.5, flipud=0.0, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, imgsz=640, int8=False, iou=0.7, keras=False, kobj=1.0, line_width=None, lr0=0.01, lrf=0.01, mask_ratio=4, max_det=300, mixup=0.0, mode=train, model=yolov8l.pt, momentum=0.937, mosaic=1.0, multi_scale=False, name=step_4_finetune, nbs=64, nms=False, opset=None, optimize=False, optimizer=auto, overlap_mask=True, patience=100, perspective=0.0, plots=True, pose=12.0, pretrained=True, profile=False, project=None, rect=False, resume=False, retina_masks=False, save=True, save_conf=False, save_crop=False, save_dir=runs/detect/step_4_finetune, save_frames=False, save_json=False, save_period=-1, save_txt=False, scale=0.5, seed=0, shear=0.0, show=False, show_boxes=True, show_conf=True, show_labels=True, simplify=True, single_cls=False, source=None, split=val, stream_buffer=False, task=detect, time=None, tracker=botsort.yaml, translate=0.1, val=True, verbose=False, vid_stride=1, visualize=False, warmup_bias_lr=0.1, warmup_epochs=3.0, warmup_momentum=0.8, weight_decay=0.0005, workers=8, workspace=None
Freezing layer 'model.22.dfl.conv.weight'
train: Fast image access ✅ (ping: 0.0±0.0 ms, read: 3944.1±1349.5 MB/s, size: 50.9 KB)
train: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco12
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 851.4±239.0 MB/s, size: 52.5 KB)
val: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco128/
Plotting labels to runs/detect/step_4_finetune/labels.jpg...
optimizer: 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically...
optimizer: AdamW(lr=0.000119, momentum=0.9) with parameter groups 105 weight(decay=0.0), 112 weight(decay=0.0005), 111 bias(decay=0.0)
Image sizes 640 train, 640 val
Using 8 dataloader workers
Logging results to runs/detect/step_4_finetune
Starting training for 10 epochs...
Closing dataloader mosaic
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
1/10 16.4G 0.5412 0.3505 0.8826 121 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.921 0.884 0.942 0.803
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
2/10 16.6G 0.4801 0.311 0.862 113 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.906 0.894 0.947 0.811
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
3/10 16.6G 0.4775 0.3041 0.872 118 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.913 0.893 0.948 0.816
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
4/10 16.7G 0.4767 0.3017 0.8603 68 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.909 0.894 0.947 0.82
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
5/10 16.7G 0.4872 0.3068 0.8659 95 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.927 0.887 0.947 0.815
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
6/10 16.7G 0.4826 0.3129 0.86 122 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.934 0.878 0.943 0.816
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
7/10 16.7G 0.5067 0.3249 0.8598 75 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.938 0.881 0.945 0.817
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
8/10 16.7G 0.5403 0.3384 0.8883 142 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.941 0.885 0.946 0.82
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
9/10 16.8G 0.5609 0.3507 0.8826 104 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.94 0.888 0.948 0.824
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
10/10 16.6G 0.5955 0.3752 0.9273 164 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.937 0.889 0.948 0.825
10 epochs completed in 0.009 hours.
Optimizer stripped from runs/detect/step_4_finetune/weights/last.pt, 158.5MB
Optimizer stripped from runs/detect/step_4_finetune/weights/best.pt, 158.5MB
Validating runs/detect/step_4_finetune/weights/best.pt...
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
YOLOv8l summary (fused): 121 layers, 39,455,305 parameters, 0 gradients, 149.4 GFLOPs
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.937 0.889 0.948 0.825
Speed: 0.1ms preprocess, 3.0ms inference, 0.0ms loss, 0.3ms postprocess per image
After fine-tuning
Model Conv2d(3, 60, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 60, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
YOLOv8l summary (fused): 121 layers, 39,455,305 parameters, 0 gradients, 149.4 GFLOPs
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 4391.9±1982.1 MB/s, size: 53.4 KB)
val: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco128/
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.931 0.892 0.948 0.827
Speed: 0.2ms preprocess, 7.0ms inference, 0.0ms loss, 0.4ms postprocess per image
Results saved to runs/detect/step_4_post_val
After fine tuning mAP=0.8272230343997624
After post fine-tuning validation
Model Conv2d(3, 60, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 60, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
0.051012679818528694
After Pruning
Model Conv2d(3, 60, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 59, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
YOLOv8l summary (fused): 121 layers, 37,708,749 parameters, 74,160 gradients, 143.2 GFLOPs
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 4918.3±657.9 MB/s, size: 44.7 KB)
val: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco128/
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.901 0.86 0.925 0.767
Speed: 0.1ms preprocess, 6.0ms inference, 0.0ms loss, 0.4ms postprocess per image
Results saved to runs/detect/step_5_pre_val
After post-pruning Validation
Model Conv2d(3, 60, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 59, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
After pruning iter 6: MACs=71.732976 G, #Params=37.730325 M, mAP=0.7673209592104678, speed up=1.1532549046898597
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
engine/trainer: agnostic_nms=False, amp=False, augment=False, auto_augment=randaugment, batch=16, bgr=0.0, box=7.5, cache=False, cfg=None, classes=None, close_mosaic=10, cls=0.5, conf=None, copy_paste=0.0, copy_paste_mode=flip, cos_lr=False, cutmix=0.0, data=coco128.yaml, degrees=0.0, deterministic=True, device=None, dfl=1.5, dnn=False, dropout=0.0, dynamic=False, embed=None, epochs=10, erasing=0.4, exist_ok=False, fliplr=0.5, flipud=0.0, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, imgsz=640, int8=False, iou=0.7, keras=False, kobj=1.0, line_width=None, lr0=0.01, lrf=0.01, mask_ratio=4, max_det=300, mixup=0.0, mode=train, model=yolov8l.pt, momentum=0.937, mosaic=1.0, multi_scale=False, name=step_5_finetune, nbs=64, nms=False, opset=None, optimize=False, optimizer=auto, overlap_mask=True, patience=100, perspective=0.0, plots=True, pose=12.0, pretrained=True, profile=False, project=None, rect=False, resume=False, retina_masks=False, save=True, save_conf=False, save_crop=False, save_dir=runs/detect/step_5_finetune, save_frames=False, save_json=False, save_period=-1, save_txt=False, scale=0.5, seed=0, shear=0.0, show=False, show_boxes=True, show_conf=True, show_labels=True, simplify=True, single_cls=False, source=None, split=val, stream_buffer=False, task=detect, time=None, tracker=botsort.yaml, translate=0.1, val=True, verbose=False, vid_stride=1, visualize=False, warmup_bias_lr=0.1, warmup_epochs=3.0, warmup_momentum=0.8, weight_decay=0.0005, workers=8, workspace=None
Freezing layer 'model.22.dfl.conv.weight'
train: Fast image access ✅ (ping: 0.0±0.0 ms, read: 4051.3±1310.0 MB/s, size: 50.9 KB)
train: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco12
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 1346.5±283.8 MB/s, size: 52.5 KB)
val: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco128/
Plotting labels to runs/detect/step_5_finetune/labels.jpg...
optimizer: 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically...
optimizer: AdamW(lr=0.000119, momentum=0.9) with parameter groups 105 weight(decay=0.0), 112 weight(decay=0.0005), 111 bias(decay=0.0)
Image sizes 640 train, 640 val
Using 8 dataloader workers
Logging results to runs/detect/step_5_finetune
Starting training for 10 epochs...
Closing dataloader mosaic
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
1/10 16.1G 0.5751 0.3595 0.8923 121 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.916 0.875 0.932 0.782
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
2/10 16.3G 0.5115 0.3291 0.8669 113 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.922 0.887 0.939 0.791
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
3/10 16.3G 0.4856 0.3229 0.878 118 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.921 0.882 0.941 0.792
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
4/10 16.3G 0.4941 0.3111 0.8656 68 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.929 0.888 0.947 0.804
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
5/10 16.3G 0.4775 0.3146 0.8614 95 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.931 0.887 0.944 0.805
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
6/10 16.4G 0.5039 0.3229 0.8672 122 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.921 0.897 0.942 0.811
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
7/10 16.3G 0.5039 0.3256 0.8601 75 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.933 0.885 0.941 0.813
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
8/10 16.4G 0.552 0.351 0.8934 142 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.936 0.884 0.946 0.819
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
9/10 16.4G 0.5808 0.3612 0.891 104 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.929 0.895 0.948 0.819
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
10/10 16.3G 0.6055 0.3872 0.936 164 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.926 0.898 0.949 0.822
10 epochs completed in 0.009 hours.
Optimizer stripped from runs/detect/step_5_finetune/weights/last.pt, 151.5MB
Optimizer stripped from runs/detect/step_5_finetune/weights/best.pt, 151.5MB
Validating runs/detect/step_5_finetune/weights/best.pt...
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
YOLOv8l summary (fused): 121 layers, 37,708,749 parameters, 0 gradients, 143.2 GFLOPs
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.926 0.898 0.949 0.822
Speed: 0.1ms preprocess, 2.9ms inference, 0.0ms loss, 0.3ms postprocess per image
After fine-tuning
Model Conv2d(3, 59, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 59, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
YOLOv8l summary (fused): 121 layers, 37,708,749 parameters, 0 gradients, 143.2 GFLOPs
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 4543.0±2621.0 MB/s, size: 53.4 KB)
val: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco128/
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.917 0.896 0.945 0.821
Speed: 0.1ms preprocess, 6.1ms inference, 0.0ms loss, 0.4ms postprocess per image
Results saved to runs/detect/step_5_post_val
After fine tuning mAP=0.8206992215945592
After post fine-tuning validation
Model Conv2d(3, 59, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 59, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
0.07518590641324997
After Pruning
Model Conv2d(3, 59, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 57, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
YOLOv8l summary (fused): 121 layers, 35,995,675 parameters, 74,160 gradients, 136.7 GFLOPs
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 5429.6±1306.8 MB/s, size: 44.7 KB)
val: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco128/
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.896 0.825 0.912 0.749
Speed: 0.1ms preprocess, 6.4ms inference, 0.0ms loss, 0.4ms postprocess per image
Results saved to runs/detect/step_6_pre_val
After post-pruning Validation
Model Conv2d(3, 59, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 57, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
After pruning iter 7: MACs=68.4860368 G, #Params=36.016747 M, mAP=0.7488644175882014, speed up=1.207930992438447
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
engine/trainer: agnostic_nms=False, amp=False, augment=False, auto_augment=randaugment, batch=16, bgr=0.0, box=7.5, cache=False, cfg=None, classes=None, close_mosaic=10, cls=0.5, conf=None, copy_paste=0.0, copy_paste_mode=flip, cos_lr=False, cutmix=0.0, data=coco128.yaml, degrees=0.0, deterministic=True, device=None, dfl=1.5, dnn=False, dropout=0.0, dynamic=False, embed=None, epochs=10, erasing=0.4, exist_ok=False, fliplr=0.5, flipud=0.0, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, imgsz=640, int8=False, iou=0.7, keras=False, kobj=1.0, line_width=None, lr0=0.01, lrf=0.01, mask_ratio=4, max_det=300, mixup=0.0, mode=train, model=yolov8l.pt, momentum=0.937, mosaic=1.0, multi_scale=False, name=step_6_finetune, nbs=64, nms=False, opset=None, optimize=False, optimizer=auto, overlap_mask=True, patience=100, perspective=0.0, plots=True, pose=12.0, pretrained=True, profile=False, project=None, rect=False, resume=False, retina_masks=False, save=True, save_conf=False, save_crop=False, save_dir=runs/detect/step_6_finetune, save_frames=False, save_json=False, save_period=-1, save_txt=False, scale=0.5, seed=0, shear=0.0, show=False, show_boxes=True, show_conf=True, show_labels=True, simplify=True, single_cls=False, source=None, split=val, stream_buffer=False, task=detect, time=None, tracker=botsort.yaml, translate=0.1, val=True, verbose=False, vid_stride=1, visualize=False, warmup_bias_lr=0.1, warmup_epochs=3.0, warmup_momentum=0.8, weight_decay=0.0005, workers=8, workspace=None
Freezing layer 'model.22.dfl.conv.weight'
train: Fast image access ✅ (ping: 0.0±0.0 ms, read: 3739.6±1602.1 MB/s, size: 50.9 KB)
train: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco12
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 1726.2±473.0 MB/s, size: 52.5 KB)
val: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco128/
Plotting labels to runs/detect/step_6_finetune/labels.jpg...
optimizer: 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically...
optimizer: AdamW(lr=0.000119, momentum=0.9) with parameter groups 105 weight(decay=0.0), 112 weight(decay=0.0005), 111 bias(decay=0.0)
Image sizes 640 train, 640 val
Using 8 dataloader workers
Logging results to runs/detect/step_6_finetune
Starting training for 10 epochs...
Closing dataloader mosaic
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
1/10 15.6G 0.5731 0.3602 0.8969 121 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.917 0.852 0.929 0.781
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
2/10 15.8G 0.5205 0.3361 0.8819 113 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.902 0.884 0.937 0.798
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
3/10 15.9G 0.4968 0.3452 0.8811 118 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.907 0.892 0.942 0.805
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
4/10 15.9G 0.5077 0.3303 0.8692 68 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.9 0.894 0.94 0.809
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
5/10 15.9G 0.5099 0.3369 0.8692 95 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.916 0.889 0.937 0.802
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
6/10 15.9G 0.5154 0.3385 0.8712 122 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.92 0.893 0.939 0.801
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
7/10 15.9G 0.5223 0.3358 0.8692 75 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.898 0.904 0.939 0.807
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
8/10 16G 0.5637 0.354 0.8967 142 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.906 0.898 0.938 0.809
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
9/10 16G 0.5919 0.3694 0.901 104 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.914 0.897 0.939 0.813
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
10/10 15.8G 0.6332 0.4071 0.943 164 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.917 0.893 0.94 0.813
10 epochs completed in 0.008 hours.
Optimizer stripped from runs/detect/step_6_finetune/weights/last.pt, 144.6MB
Optimizer stripped from runs/detect/step_6_finetune/weights/best.pt, 144.6MB
Validating runs/detect/step_6_finetune/weights/best.pt...
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
YOLOv8l summary (fused): 121 layers, 35,995,675 parameters, 0 gradients, 136.7 GFLOPs
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.917 0.893 0.94 0.813
Speed: 0.1ms preprocess, 2.8ms inference, 0.0ms loss, 0.4ms postprocess per image
After fine-tuning
Model Conv2d(3, 57, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 57, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
YOLOv8l summary (fused): 121 layers, 35,995,675 parameters, 0 gradients, 136.7 GFLOPs
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 5162.0±828.0 MB/s, size: 53.4 KB)
val: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco128/
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.922 0.887 0.946 0.815
Speed: 0.1ms preprocess, 6.5ms inference, 0.0ms loss, 0.4ms postprocess per image
Results saved to runs/detect/step_6_post_val
After fine tuning mAP=0.8150353523112258
After post fine-tuning validation
Model Conv2d(3, 57, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 57, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
0.09935913300797124
After Pruning
Model Conv2d(3, 57, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 56, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
YOLOv8l summary (fused): 121 layers, 34,583,399 parameters, 74,160 gradients, 131.4 GFLOPs
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 5541.5±1410.7 MB/s, size: 44.7 KB)
val: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco128/
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.843 0.884 0.92 0.766
Speed: 0.2ms preprocess, 6.2ms inference, 0.0ms loss, 0.4ms postprocess per image
Results saved to runs/detect/step_7_pre_val
After post-pruning Validation
Model Conv2d(3, 57, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 56, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
After pruning iter 8: MACs=65.8289424 G, #Params=34.604045 M, mAP=0.7662475456515344, speed up=1.2566874597092115
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
engine/trainer: agnostic_nms=False, amp=False, augment=False, auto_augment=randaugment, batch=16, bgr=0.0, box=7.5, cache=False, cfg=None, classes=None, close_mosaic=10, cls=0.5, conf=None, copy_paste=0.0, copy_paste_mode=flip, cos_lr=False, cutmix=0.0, data=coco128.yaml, degrees=0.0, deterministic=True, device=None, dfl=1.5, dnn=False, dropout=0.0, dynamic=False, embed=None, epochs=10, erasing=0.4, exist_ok=False, fliplr=0.5, flipud=0.0, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, imgsz=640, int8=False, iou=0.7, keras=False, kobj=1.0, line_width=None, lr0=0.01, lrf=0.01, mask_ratio=4, max_det=300, mixup=0.0, mode=train, model=yolov8l.pt, momentum=0.937, mosaic=1.0, multi_scale=False, name=step_7_finetune, nbs=64, nms=False, opset=None, optimize=False, optimizer=auto, overlap_mask=True, patience=100, perspective=0.0, plots=True, pose=12.0, pretrained=True, profile=False, project=None, rect=False, resume=False, retina_masks=False, save=True, save_conf=False, save_crop=False, save_dir=runs/detect/step_7_finetune, save_frames=False, save_json=False, save_period=-1, save_txt=False, scale=0.5, seed=0, shear=0.0, show=False, show_boxes=True, show_conf=True, show_labels=True, simplify=True, single_cls=False, source=None, split=val, stream_buffer=False, task=detect, time=None, tracker=botsort.yaml, translate=0.1, val=True, verbose=False, vid_stride=1, visualize=False, warmup_bias_lr=0.1, warmup_epochs=3.0, warmup_momentum=0.8, weight_decay=0.0005, workers=8, workspace=None
Freezing layer 'model.22.dfl.conv.weight'
train: Fast image access ✅ (ping: 0.0±0.0 ms, read: 3985.9±1454.0 MB/s, size: 50.9 KB)
train: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco12
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 812.7±173.0 MB/s, size: 52.5 KB)
val: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco128/
Plotting labels to runs/detect/step_7_finetune/labels.jpg...
optimizer: 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically...
optimizer: AdamW(lr=0.000119, momentum=0.9) with parameter groups 105 weight(decay=0.0), 112 weight(decay=0.0005), 111 bias(decay=0.0)
Image sizes 640 train, 640 val
Using 8 dataloader workers
Logging results to runs/detect/step_7_finetune
Starting training for 10 epochs...
Closing dataloader mosaic
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
1/10 15.4G 0.5617 0.3587 0.8919 121 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.883 0.867 0.924 0.781
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
2/10 15.5G 0.5 0.3217 0.8684 113 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.922 0.863 0.927 0.791
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
3/10 15.5G 0.4909 0.3294 0.884 118 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.924 0.869 0.93 0.793
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
4/10 15.6G 0.4929 0.3229 0.8705 68 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.882 0.903 0.934 0.795
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
5/10 15.6G 0.4975 0.3312 0.8646 95 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.889 0.895 0.935 0.801
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
6/10 15.6G 0.5017 0.3367 0.8697 122 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.913 0.889 0.935 0.797
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
7/10 15.6G 0.5345 0.3396 0.8669 75 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.923 0.881 0.936 0.8
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
8/10 15.5G 0.5751 0.3614 0.8977 142 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.924 0.882 0.937 0.8
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
9/10 15.5G 0.5991 0.3909 0.8971 104 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.916 0.889 0.94 0.808
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
10/10 15.5G 0.6296 0.407 0.937 164 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.921 0.892 0.941 0.81
10 epochs completed in 0.008 hours.
Optimizer stripped from runs/detect/step_7_finetune/weights/last.pt, 139.0MB
Optimizer stripped from runs/detect/step_7_finetune/weights/best.pt, 139.0MB
Validating runs/detect/step_7_finetune/weights/best.pt...
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
YOLOv8l summary (fused): 121 layers, 34,583,399 parameters, 0 gradients, 131.4 GFLOPs
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.921 0.891 0.941 0.81
Speed: 0.1ms preprocess, 2.7ms inference, 0.0ms loss, 0.3ms postprocess per image
After fine-tuning
Model Conv2d(3, 56, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 56, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
YOLOv8l summary (fused): 121 layers, 34,583,399 parameters, 0 gradients, 131.4 GFLOPs
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 5065.6±1461.7 MB/s, size: 53.4 KB)
val: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco128/
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.913 0.894 0.943 0.813
Speed: 0.1ms preprocess, 6.1ms inference, 0.0ms loss, 0.4ms postprocess per image
Results saved to runs/detect/step_7_post_val
After fine tuning mAP=0.8132784771857411
After post fine-tuning validation
Model Conv2d(3, 56, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 56, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
0.11900297040141611
After Pruning
Model Conv2d(3, 56, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 55, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
YOLOv8l summary (fused): 121 layers, 33,747,610 parameters, 74,160 gradients, 128.5 GFLOPs
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 5359.4±1196.1 MB/s, size: 44.7 KB)
val: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco128/
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.931 0.859 0.932 0.786
Speed: 0.1ms preprocess, 5.9ms inference, 0.0ms loss, 0.4ms postprocess per image
Results saved to runs/detect/step_8_pre_val
After post-pruning Validation
Model Conv2d(3, 56, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 55, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
After pruning iter 9: MACs=64.3900056 G, #Params=33.768007 M, mAP=0.7864229772631458, speed up=1.2847709148203583
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
engine/trainer: agnostic_nms=False, amp=False, augment=False, auto_augment=randaugment, batch=16, bgr=0.0, box=7.5, cache=False, cfg=None, classes=None, close_mosaic=10, cls=0.5, conf=None, copy_paste=0.0, copy_paste_mode=flip, cos_lr=False, cutmix=0.0, data=coco128.yaml, degrees=0.0, deterministic=True, device=None, dfl=1.5, dnn=False, dropout=0.0, dynamic=False, embed=None, epochs=10, erasing=0.4, exist_ok=False, fliplr=0.5, flipud=0.0, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, imgsz=640, int8=False, iou=0.7, keras=False, kobj=1.0, line_width=None, lr0=0.01, lrf=0.01, mask_ratio=4, max_det=300, mixup=0.0, mode=train, model=yolov8l.pt, momentum=0.937, mosaic=1.0, multi_scale=False, name=step_8_finetune, nbs=64, nms=False, opset=None, optimize=False, optimizer=auto, overlap_mask=True, patience=100, perspective=0.0, plots=True, pose=12.0, pretrained=True, profile=False, project=None, rect=False, resume=False, retina_masks=False, save=True, save_conf=False, save_crop=False, save_dir=runs/detect/step_8_finetune, save_frames=False, save_json=False, save_period=-1, save_txt=False, scale=0.5, seed=0, shear=0.0, show=False, show_boxes=True, show_conf=True, show_labels=True, simplify=True, single_cls=False, source=None, split=val, stream_buffer=False, task=detect, time=None, tracker=botsort.yaml, translate=0.1, val=True, verbose=False, vid_stride=1, visualize=False, warmup_bias_lr=0.1, warmup_epochs=3.0, warmup_momentum=0.8, weight_decay=0.0005, workers=8, workspace=None
Freezing layer 'model.22.dfl.conv.weight'
train: Fast image access ✅ (ping: 0.0±0.0 ms, read: 4027.1±1719.2 MB/s, size: 50.9 KB)
train: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco12
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 1657.3±416.8 MB/s, size: 52.5 KB)
val: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco128/
Plotting labels to runs/detect/step_8_finetune/labels.jpg...
optimizer: 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically...
optimizer: AdamW(lr=0.000119, momentum=0.9) with parameter groups 105 weight(decay=0.0), 112 weight(decay=0.0005), 111 bias(decay=0.0)
Image sizes 640 train, 640 val
Using 8 dataloader workers
Logging results to runs/detect/step_8_finetune
Starting training for 10 epochs...
Closing dataloader mosaic
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
1/10 15.3G 0.5136 0.3353 0.8737 121 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.914 0.88 0.938 0.803
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
2/10 15.3G 0.4621 0.2981 0.8555 113 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.888 0.9 0.941 0.809
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
3/10 15.2G 0.4527 0.3111 0.8674 118 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.957 0.858 0.939 0.808
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
4/10 15.2G 0.4709 0.312 0.8606 68 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.901 0.898 0.941 0.811
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
5/10 15.4G 0.4727 0.3065 0.8574 95 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__>
Traceback (most recent call last):
File "/home/nathan/miniconda3/envs/dev/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 1654, in __del__
self._shutdown_workers()
File "/home/nathan/miniconda3/envs/dev/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 1637, in _shutdown_workers
if w.is_alive():
^^^^^^^^^^^^
File "/home/nathan/miniconda3/envs/dev/lib/python3.12/multiprocessing/process.py", line 160, in is_alive
assert self._parent_pid == os.getpid(), 'can only test a child process'
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: can only test a child process
Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__>
Traceback (most recent call last):
File "/home/nathan/miniconda3/envs/dev/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 1654, in __del__
self._shutdown_workers()
File "/home/nathan/miniconda3/envs/dev/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 1637, in _shutdown_workers
if w.is_alive():
Class Images Instances Box(P R mAP50 mAP50-95):^^^^^^^^^^^
File "/home/nathan/miniconda3/envs/dev/lib/python3.12/multiprocessing/process.py", line 160, in is_alive
assert self._parent_pid == os.getpid(), 'can only test a child process'
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: can only test a child process
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.893 0.9 0.942 0.808
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
6/10 15.4G 0.4873 0.3299 0.8622 122 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.927 0.884 0.941 0.807
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
7/10 15.3G 0.5022 0.3266 0.8596 75 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.931 0.88 0.943 0.804
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
8/10 15.3G 0.5419 0.3384 0.8849 89 640: 88%|█████Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__>
Traceback (most recent call last):
File "/home/nathan/miniconda3/envs/dev/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 1654, in __del__
self._shutdown_workers()
File "/home/nathan/miniconda3/envs/dev/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 1637, in _shutdown_workers
if w.is_alive():
^^^^^^^^^^^^
File "/home/nathan/miniconda3/envs/dev/lib/python3.12/multiprocessing/process.py", line 160, in is_alive
assert self._parent_pid == os.getpid(), 'can only test a child process'
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: can only test a child process
Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__>
Traceback (most recent call last):
File "/home/nathan/miniconda3/envs/dev/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 1654, in __del__
self._shutdown_workers()
File "/home/nathan/miniconda3/envs/dev/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 1637, in _shutdown_workers
if w.is_alive():
^^^^^^^^^^^^
File "/home/nathan/miniconda3/envs/dev/lib/python3.12/multiprocessing/process.py", line 160, in is_alive
assert self._parent_pid == os.getpid(), 'can only test a child process'
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: can only test a child process
8/10 15.3G 0.5583 0.3447 0.8908 142 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.926 0.888 0.945 0.808
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
9/10 15.3G 0.574 0.3601 0.8862 68 640: 75%|█████Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__>
Traceback (most recent call last):
File "/home/nathan/miniconda3/envs/dev/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 1654, in __del__
self._shutdown_workers()
File "/home/nathan/miniconda3/envs/dev/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 1637, in _shutdown_workers
if w.is_alive():
^^^^^^^^^^^^
File "/home/nathan/miniconda3/envs/dev/lib/python3.12/multiprocessing/process.py", line 160, in is_alive
assert self._parent_pid == os.getpid(), 'can only test a child process'
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: can only test a child process
Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__>
Traceback (most recent call last):
File "/home/nathan/miniconda3/envs/dev/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 1654, in __del__
self._shutdown_workers()
File "/home/nathan/miniconda3/envs/dev/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 1637, in _shutdown_workers
if w.is_alive():
^^^^^^^^^^^^
File "/home/nathan/miniconda3/envs/dev/lib/python3.12/multiprocessing/process.py", line 160, in is_alive
assert self._parent_pid == os.getpid(), 'can only test a child process'
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: can only test a child process
9/10 15.3G 0.58 0.3647 0.8901 104 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.932 0.881 0.942 0.811
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
10/10 15.2G 0.6333 0.4065 0.9388 164 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.926 0.883 0.941 0.811
10 epochs completed in 0.008 hours.
Optimizer stripped from runs/detect/step_8_finetune/weights/last.pt, 135.6MB
Optimizer stripped from runs/detect/step_8_finetune/weights/best.pt, 135.6MB
Validating runs/detect/step_8_finetune/weights/best.pt...
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
YOLOv8l summary (fused): 121 layers, 33,747,610 parameters, 0 gradients, 128.5 GFLOPs
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.931 0.88 0.942 0.811
Speed: 0.1ms preprocess, 2.6ms inference, 0.0ms loss, 0.3ms postprocess per image
After fine-tuning
Model Conv2d(3, 55, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 55, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
YOLOv8l summary (fused): 121 layers, 33,747,610 parameters, 0 gradients, 128.5 GFLOPs
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 4779.0±951.4 MB/s, size: 53.4 KB)
val: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco128/
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.92 0.892 0.943 0.806
Speed: 0.1ms preprocess, 5.9ms inference, 0.0ms loss, 0.4ms postprocess per image
Results saved to runs/detect/step_8_post_val
After fine tuning mAP=0.8060252598377426
After post fine-tuning validation
Model Conv2d(3, 55, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 55, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
0.1324470533478182
After Pruning
Model Conv2d(3, 55, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 55, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
YOLOv8l summary (fused): 121 layers, 33,209,910 parameters, 74,160 gradients, 126.7 GFLOPs
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 5056.1±1859.8 MB/s, size: 44.7 KB)
val: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco128/
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.917 0.88 0.942 0.785
Speed: 0.1ms preprocess, 5.3ms inference, 0.0ms loss, 0.4ms postprocess per image
Results saved to runs/detect/step_9_pre_val
After post-pruning Validation
Model Conv2d(3, 55, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 55, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
After pruning iter 10: MACs=63.4942128 G, #Params=33.230145 M, mAP=0.784538139397141, speed up=1.302896795658832
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
engine/trainer: agnostic_nms=False, amp=False, augment=False, auto_augment=randaugment, batch=16, bgr=0.0, box=7.5, cache=False, cfg=None, classes=None, close_mosaic=10, cls=0.5, conf=None, copy_paste=0.0, copy_paste_mode=flip, cos_lr=False, cutmix=0.0, data=coco128.yaml, degrees=0.0, deterministic=True, device=None, dfl=1.5, dnn=False, dropout=0.0, dynamic=False, embed=None, epochs=10, erasing=0.4, exist_ok=False, fliplr=0.5, flipud=0.0, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, imgsz=640, int8=False, iou=0.7, keras=False, kobj=1.0, line_width=None, lr0=0.01, lrf=0.01, mask_ratio=4, max_det=300, mixup=0.0, mode=train, model=yolov8l.pt, momentum=0.937, mosaic=1.0, multi_scale=False, name=step_9_finetune, nbs=64, nms=False, opset=None, optimize=False, optimizer=auto, overlap_mask=True, patience=100, perspective=0.0, plots=True, pose=12.0, pretrained=True, profile=False, project=None, rect=False, resume=False, retina_masks=False, save=True, save_conf=False, save_crop=False, save_dir=runs/detect/step_9_finetune, save_frames=False, save_json=False, save_period=-1, save_txt=False, scale=0.5, seed=0, shear=0.0, show=False, show_boxes=True, show_conf=True, show_labels=True, simplify=True, single_cls=False, source=None, split=val, stream_buffer=False, task=detect, time=None, tracker=botsort.yaml, translate=0.1, val=True, verbose=False, vid_stride=1, visualize=False, warmup_bias_lr=0.1, warmup_epochs=3.0, warmup_momentum=0.8, weight_decay=0.0005, workers=8, workspace=None
Freezing layer 'model.22.dfl.conv.weight'
train: Fast image access ✅ (ping: 0.0±0.0 ms, read: 4710.4±1405.8 MB/s, size: 50.9 KB)
train: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco12
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 1436.4±536.2 MB/s, size: 52.5 KB)
val: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco128/
Plotting labels to runs/detect/step_9_finetune/labels.jpg...
optimizer: 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically...
optimizer: AdamW(lr=0.000119, momentum=0.9) with parameter groups 105 weight(decay=0.0), 112 weight(decay=0.0005), 111 bias(decay=0.0)
Image sizes 640 train, 640 val
Using 8 dataloader workers
Logging results to runs/detect/step_9_finetune
Starting training for 10 epochs...
Closing dataloader mosaic
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
1/10 15.1G 0.5083 0.3199 0.8726 121 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.923 0.878 0.94 0.795
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
2/10 15.2G 0.4453 0.2858 0.8463 113 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.922 0.877 0.94 0.805
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
3/10 15.1G 0.4374 0.2928 0.8633 118 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.904 0.88 0.942 0.807
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
4/10 15.1G 0.4528 0.2921 0.854 68 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.889 0.904 0.942 0.806
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
5/10 15.2G 0.4514 0.2977 0.8483 95 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.885 0.919 0.951 0.814
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
6/10 15.2G 0.4622 0.3078 0.8546 122 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.877 0.917 0.95 0.814
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
7/10 15.1G 0.4877 0.3142 0.8533 75 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.903 0.909 0.952 0.811
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
8/10 15.1G 0.5348 0.3373 0.8856 142 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.913 0.909 0.951 0.813
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
9/10 15.2G 0.5581 0.3507 0.8842 104 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.918 0.912 0.951 0.816
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
10/10 15.1G 0.6125 0.3898 0.9361 164 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.918 0.911 0.952 0.817
10 epochs completed in 0.008 hours.
Optimizer stripped from runs/detect/step_9_finetune/weights/last.pt, 133.4MB
Optimizer stripped from runs/detect/step_9_finetune/weights/best.pt, 133.4MB
Validating runs/detect/step_9_finetune/weights/best.pt...
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
YOLOv8l summary (fused): 121 layers, 33,209,910 parameters, 0 gradients, 126.7 GFLOPs
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.918 0.911 0.952 0.817
Speed: 0.1ms preprocess, 2.5ms inference, 0.0ms loss, 0.3ms postprocess per image
After fine-tuning
Model Conv2d(3, 55, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 55, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
YOLOv8l summary (fused): 121 layers, 33,209,910 parameters, 0 gradients, 126.7 GFLOPs
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 2069.3±576.5 MB/s, size: 53.4 KB)
val: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco128/
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.922 0.904 0.952 0.817
Speed: 0.1ms preprocess, 5.4ms inference, 0.0ms loss, 0.4ms postprocess per image
Results saved to runs/detect/step_9_post_val
After fine tuning mAP=0.816555667499456
After post fine-tuning validation
Model Conv2d(3, 55, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 55, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
0.14060228108679124
After Pruning
Model Conv2d(3, 55, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 54, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
YOLOv8l summary (fused): 121 layers, 32,703,049 parameters, 74,160 gradients, 124.6 GFLOPs
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 5530.2±1394.7 MB/s, size: 44.7 KB)
val: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco128/
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.941 0.865 0.939 0.795
Speed: 0.1ms preprocess, 6.1ms inference, 0.0ms loss, 0.4ms postprocess per image
Results saved to runs/detect/step_10_pre_val
After post-pruning Validation
Model Conv2d(3, 55, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 54, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
After pruning iter 11: MACs=62.4345712 G, #Params=32.723122 M, mAP=0.7950424487460809, speed up=1.3250096030130178
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
engine/trainer: agnostic_nms=False, amp=False, augment=False, auto_augment=randaugment, batch=16, bgr=0.0, box=7.5, cache=False, cfg=None, classes=None, close_mosaic=10, cls=0.5, conf=None, copy_paste=0.0, copy_paste_mode=flip, cos_lr=False, cutmix=0.0, data=coco128.yaml, degrees=0.0, deterministic=True, device=None, dfl=1.5, dnn=False, dropout=0.0, dynamic=False, embed=None, epochs=10, erasing=0.4, exist_ok=False, fliplr=0.5, flipud=0.0, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, imgsz=640, int8=False, iou=0.7, keras=False, kobj=1.0, line_width=None, lr0=0.01, lrf=0.01, mask_ratio=4, max_det=300, mixup=0.0, mode=train, model=yolov8l.pt, momentum=0.937, mosaic=1.0, multi_scale=False, name=step_10_finetune, nbs=64, nms=False, opset=None, optimize=False, optimizer=auto, overlap_mask=True, patience=100, perspective=0.0, plots=True, pose=12.0, pretrained=True, profile=False, project=None, rect=False, resume=False, retina_masks=False, save=True, save_conf=False, save_crop=False, save_dir=runs/detect/step_10_finetune, save_frames=False, save_json=False, save_period=-1, save_txt=False, scale=0.5, seed=0, shear=0.0, show=False, show_boxes=True, show_conf=True, show_labels=True, simplify=True, single_cls=False, source=None, split=val, stream_buffer=False, task=detect, time=None, tracker=botsort.yaml, translate=0.1, val=True, verbose=False, vid_stride=1, visualize=False, warmup_bias_lr=0.1, warmup_epochs=3.0, warmup_momentum=0.8, weight_decay=0.0005, workers=8, workspace=None
Freezing layer 'model.22.dfl.conv.weight'
train: Fast image access ✅ (ping: 0.0±0.0 ms, read: 4167.3±1478.2 MB/s, size: 50.9 KB)
train: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco12
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 773.9±123.2 MB/s, size: 52.5 KB)
val: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco128/
Plotting labels to runs/detect/step_10_finetune/labels.jpg...
optimizer: 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically...
optimizer: AdamW(lr=0.000119, momentum=0.9) with parameter groups 105 weight(decay=0.0), 112 weight(decay=0.0005), 111 bias(decay=0.0)
Image sizes 640 train, 640 val
Using 8 dataloader workers
Logging results to runs/detect/step_10_finetune
Starting training for 10 epochs...
Closing dataloader mosaic
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
1/10 15.1G 0.4828 0.3114 0.8639 121 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.927 0.877 0.942 0.799
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
2/10 15.1G 0.4196 0.2732 0.8407 113 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.899 0.904 0.943 0.809
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
3/10 15G 0.4306 0.2861 0.8556 118 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.959 0.861 0.944 0.811
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
4/10 15G 0.4263 0.2811 0.8445 68 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.952 0.865 0.944 0.814
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
5/10 15G 0.4355 0.2887 0.8438 95 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.944 0.873 0.946 0.813
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
6/10 15G 0.4476 0.2933 0.8465 122 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.915 0.9 0.947 0.81
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
7/10 15G 0.4656 0.3029 0.8469 75 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.92 0.895 0.945 0.812
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
8/10 15.1G 0.5244 0.3296 0.8832 142 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.946 0.882 0.948 0.816
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
9/10 15G 0.5514 0.3476 0.8831 104 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.943 0.894 0.951 0.818
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
10/10 15G 0.6126 0.378 0.933 164 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.943 0.895 0.951 0.817
10 epochs completed in 0.008 hours.
Optimizer stripped from runs/detect/step_10_finetune/weights/last.pt, 131.4MB
Optimizer stripped from runs/detect/step_10_finetune/weights/best.pt, 131.4MB
Validating runs/detect/step_10_finetune/weights/best.pt...
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
YOLOv8l summary (fused): 121 layers, 32,703,049 parameters, 0 gradients, 124.6 GFLOPs
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.943 0.894 0.951 0.818
Speed: 0.1ms preprocess, 2.5ms inference, 0.0ms loss, 0.3ms postprocess per image
After fine-tuning
Model Conv2d(3, 54, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 54, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
YOLOv8l summary (fused): 121 layers, 32,703,049 parameters, 0 gradients, 124.6 GFLOPs
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 4396.4±2530.2 MB/s, size: 53.4 KB)
val: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco128/
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.937 0.896 0.95 0.814
Speed: 0.2ms preprocess, 6.3ms inference, 0.0ms loss, 0.4ms postprocess per image
Results saved to runs/detect/step_10_post_val
After fine tuning mAP=0.8142403277090464
After post fine-tuning validation
Model Conv2d(3, 54, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 54, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
0.14519222631100823
After Pruning
Model Conv2d(3, 54, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 54, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
YOLOv8l summary (fused): 121 layers, 32,669,140 parameters, 74,160 gradients, 124.6 GFLOPs
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 5125.9±924.0 MB/s, size: 44.7 KB)
val: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco128/
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.927 0.9 0.949 0.814
Speed: 0.2ms preprocess, 6.1ms inference, 0.0ms loss, 0.4ms postprocess per image
Results saved to runs/detect/step_11_pre_val
After post-pruning Validation
Model Conv2d(3, 54, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 54, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
After pruning iter 12: MACs=62.4070664 G, #Params=32.689204 M, mAP=0.8135646119344967, speed up=1.325593577332454
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
engine/trainer: agnostic_nms=False, amp=False, augment=False, auto_augment=randaugment, batch=16, bgr=0.0, box=7.5, cache=False, cfg=None, classes=None, close_mosaic=10, cls=0.5, conf=None, copy_paste=0.0, copy_paste_mode=flip, cos_lr=False, cutmix=0.0, data=coco128.yaml, degrees=0.0, deterministic=True, device=None, dfl=1.5, dnn=False, dropout=0.0, dynamic=False, embed=None, epochs=10, erasing=0.4, exist_ok=False, fliplr=0.5, flipud=0.0, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, imgsz=640, int8=False, iou=0.7, keras=False, kobj=1.0, line_width=None, lr0=0.01, lrf=0.01, mask_ratio=4, max_det=300, mixup=0.0, mode=train, model=yolov8l.pt, momentum=0.937, mosaic=1.0, multi_scale=False, name=step_11_finetune, nbs=64, nms=False, opset=None, optimize=False, optimizer=auto, overlap_mask=True, patience=100, perspective=0.0, plots=True, pose=12.0, pretrained=True, profile=False, project=None, rect=False, resume=False, retina_masks=False, save=True, save_conf=False, save_crop=False, save_dir=runs/detect/step_11_finetune, save_frames=False, save_json=False, save_period=-1, save_txt=False, scale=0.5, seed=0, shear=0.0, show=False, show_boxes=True, show_conf=True, show_labels=True, simplify=True, single_cls=False, source=None, split=val, stream_buffer=False, task=detect, time=None, tracker=botsort.yaml, translate=0.1, val=True, verbose=False, vid_stride=1, visualize=False, warmup_bias_lr=0.1, warmup_epochs=3.0, warmup_momentum=0.8, weight_decay=0.0005, workers=8, workspace=None
Freezing layer 'model.22.dfl.conv.weight'
train: Fast image access ✅ (ping: 0.0±0.0 ms, read: 3971.3±1585.4 MB/s, size: 50.9 KB)
train: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco12
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 723.8±135.6 MB/s, size: 52.5 KB)
val: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco128/
Plotting labels to runs/detect/step_11_finetune/labels.jpg...
optimizer: 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically...
optimizer: AdamW(lr=0.000119, momentum=0.9) with parameter groups 105 weight(decay=0.0), 112 weight(decay=0.0005), 111 bias(decay=0.0)
Image sizes 640 train, 640 val
Using 8 dataloader workers
Logging results to runs/detect/step_11_finetune
Starting training for 10 epochs...
Closing dataloader mosaic
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
1/10 14.9G 0.396 0.2713 0.8408 121 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.945 0.893 0.95 0.819
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
2/10 15G 0.3555 0.2442 0.8256 113 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.94 0.896 0.949 0.819
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
3/10 14.9G 0.3572 0.2509 0.8368 118 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.945 0.895 0.949 0.818
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
4/10 14.9G 0.3804 0.2563 0.8346 68 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.933 0.903 0.949 0.82
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
5/10 14.9G 0.3835 0.267 0.8318 95 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.933 0.899 0.948 0.815
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
6/10 15G 0.4049 0.2751 0.8348 122 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.946 0.887 0.95 0.816
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
7/10 15G 0.4375 0.2882 0.8372 75 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.924 0.895 0.949 0.816
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
8/10 15G 0.5142 0.3154 0.8795 142 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.927 0.898 0.951 0.818
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
9/10 15G 0.5171 0.3301 0.8704 104 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.934 0.897 0.95 0.82
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
10/10 15.1G 0.6157 0.3821 0.9382 164 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.939 0.9 0.952 0.82
10 epochs completed in 0.008 hours.
Optimizer stripped from runs/detect/step_11_finetune/weights/last.pt, 131.3MB
Optimizer stripped from runs/detect/step_11_finetune/weights/best.pt, 131.3MB
Validating runs/detect/step_11_finetune/weights/best.pt...
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
YOLOv8l summary (fused): 121 layers, 32,669,140 parameters, 0 gradients, 124.6 GFLOPs
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.933 0.903 0.949 0.82
Speed: 0.1ms preprocess, 2.5ms inference, 0.0ms loss, 0.3ms postprocess per image
After fine-tuning
Model Conv2d(3, 54, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 54, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
YOLOv8l summary (fused): 121 layers, 32,669,140 parameters, 0 gradients, 124.6 GFLOPs
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 5264.4±896.4 MB/s, size: 53.4 KB)
val: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco128/
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.936 0.901 0.95 0.819
Speed: 0.1ms preprocess, 6.1ms inference, 0.0ms loss, 0.4ms postprocess per image
Results saved to runs/detect/step_11_post_val
After fine tuning mAP=0.818715380662013
After post fine-tuning validation
Model Conv2d(3, 54, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 54, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
0.14766719382862217
After Pruning
Model Conv2d(3, 54, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 54, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
YOLOv8l summary (fused): 121 layers, 32,416,863 parameters, 74,160 gradients, 123.4 GFLOPs
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 5427.5±1166.9 MB/s, size: 44.7 KB)
val: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco128/
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.941 0.871 0.944 0.809
Speed: 0.1ms preprocess, 6.1ms inference, 0.0ms loss, 0.4ms postprocess per image
Results saved to runs/detect/step_12_pre_val
After post-pruning Validation
Model Conv2d(3, 54, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 54, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
After pruning iter 13: MACs=61.8488912 G, #Params=32.436843 M, mAP=0.8090287265949643, speed up=1.3375568226839933
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
engine/trainer: agnostic_nms=False, amp=False, augment=False, auto_augment=randaugment, batch=16, bgr=0.0, box=7.5, cache=False, cfg=None, classes=None, close_mosaic=10, cls=0.5, conf=None, copy_paste=0.0, copy_paste_mode=flip, cos_lr=False, cutmix=0.0, data=coco128.yaml, degrees=0.0, deterministic=True, device=None, dfl=1.5, dnn=False, dropout=0.0, dynamic=False, embed=None, epochs=10, erasing=0.4, exist_ok=False, fliplr=0.5, flipud=0.0, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, imgsz=640, int8=False, iou=0.7, keras=False, kobj=1.0, line_width=None, lr0=0.01, lrf=0.01, mask_ratio=4, max_det=300, mixup=0.0, mode=train, model=yolov8l.pt, momentum=0.937, mosaic=1.0, multi_scale=False, name=step_12_finetune, nbs=64, nms=False, opset=None, optimize=False, optimizer=auto, overlap_mask=True, patience=100, perspective=0.0, plots=True, pose=12.0, pretrained=True, profile=False, project=None, rect=False, resume=False, retina_masks=False, save=True, save_conf=False, save_crop=False, save_dir=runs/detect/step_12_finetune, save_frames=False, save_json=False, save_period=-1, save_txt=False, scale=0.5, seed=0, shear=0.0, show=False, show_boxes=True, show_conf=True, show_labels=True, simplify=True, single_cls=False, source=None, split=val, stream_buffer=False, task=detect, time=None, tracker=botsort.yaml, translate=0.1, val=True, verbose=False, vid_stride=1, visualize=False, warmup_bias_lr=0.1, warmup_epochs=3.0, warmup_momentum=0.8, weight_decay=0.0005, workers=8, workspace=None
Freezing layer 'model.22.dfl.conv.weight'
train: Fast image access ✅ (ping: 0.0±0.0 ms, read: 4055.7±1294.9 MB/s, size: 50.9 KB)
train: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco12
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 878.0±228.5 MB/s, size: 52.5 KB)
val: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco128/
Plotting labels to runs/detect/step_12_finetune/labels.jpg...
optimizer: 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically...
optimizer: AdamW(lr=0.000119, momentum=0.9) with parameter groups 105 weight(decay=0.0), 112 weight(decay=0.0005), 111 bias(decay=0.0)
Image sizes 640 train, 640 val
Using 8 dataloader workers
Logging results to runs/detect/step_12_finetune
Starting training for 10 epochs...
Closing dataloader mosaic
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
1/10 15G 0.4056 0.2754 0.8413 121 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.936 0.875 0.945 0.818
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
2/10 15G 0.367 0.2491 0.8286 113 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.935 0.888 0.948 0.821
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
3/10 14.9G 0.3679 0.2586 0.8393 118 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.943 0.881 0.947 0.826
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
4/10 15.1G 0.3758 0.2559 0.8318 68 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.949 0.883 0.947 0.822
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
5/10 15.1G 0.3951 0.2704 0.8339 95 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.941 0.885 0.947 0.821
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
6/10 15.1G 0.4162 0.2774 0.8396 122 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.941 0.892 0.95 0.818
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
7/10 14.9G 0.4546 0.2918 0.8406 75 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.936 0.901 0.951 0.815
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
8/10 15G 0.4872 0.3136 0.8662 142 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.94 0.903 0.952 0.814
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
9/10 15G 0.5297 0.3309 0.8719 104 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.941 0.897 0.947 0.818
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
10/10 14.9G 0.608 0.3775 0.9298 164 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.939 0.895 0.948 0.819
10 epochs completed in 0.008 hours.
Optimizer stripped from runs/detect/step_12_finetune/weights/last.pt, 130.3MB
Optimizer stripped from runs/detect/step_12_finetune/weights/best.pt, 130.3MB
Validating runs/detect/step_12_finetune/weights/best.pt...
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
YOLOv8l summary (fused): 121 layers, 32,416,863 parameters, 0 gradients, 123.4 GFLOPs
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.943 0.881 0.947 0.827
Speed: 0.1ms preprocess, 2.5ms inference, 0.0ms loss, 0.3ms postprocess per image
After fine-tuning
Model Conv2d(3, 54, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 54, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
YOLOv8l summary (fused): 121 layers, 32,416,863 parameters, 0 gradients, 123.4 GFLOPs
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 4576.3±660.3 MB/s, size: 53.4 KB)
val: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco128/
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.944 0.882 0.946 0.823
Speed: 0.1ms preprocess, 6.1ms inference, 0.0ms loss, 0.4ms postprocess per image
Results saved to runs/detect/step_12_post_val
After fine tuning mAP=0.8229267970026074
After post fine-tuning validation
Model Conv2d(3, 54, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 54, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
0.14897095513156428
After Pruning
Model Conv2d(3, 54, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 54, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
YOLOv8l summary (fused): 121 layers, 32,416,863 parameters, 74,160 gradients, 123.4 GFLOPs
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 5147.7±1080.5 MB/s, size: 44.7 KB)
val: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco128/
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.942 0.9 0.949 0.815
Speed: 0.1ms preprocess, 6.1ms inference, 0.0ms loss, 0.4ms postprocess per image
Results saved to runs/detect/step_13_pre_val
After post-pruning Validation
Model Conv2d(3, 54, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 54, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
After pruning iter 14: MACs=61.8488912 G, #Params=32.436843 M, mAP=0.8153406959934129, speed up=1.3375568226839933
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
engine/trainer: agnostic_nms=False, amp=False, augment=False, auto_augment=randaugment, batch=16, bgr=0.0, box=7.5, cache=False, cfg=None, classes=None, close_mosaic=10, cls=0.5, conf=None, copy_paste=0.0, copy_paste_mode=flip, cos_lr=False, cutmix=0.0, data=coco128.yaml, degrees=0.0, deterministic=True, device=None, dfl=1.5, dnn=False, dropout=0.0, dynamic=False, embed=None, epochs=10, erasing=0.4, exist_ok=False, fliplr=0.5, flipud=0.0, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, imgsz=640, int8=False, iou=0.7, keras=False, kobj=1.0, line_width=None, lr0=0.01, lrf=0.01, mask_ratio=4, max_det=300, mixup=0.0, mode=train, model=yolov8l.pt, momentum=0.937, mosaic=1.0, multi_scale=False, name=step_13_finetune, nbs=64, nms=False, opset=None, optimize=False, optimizer=auto, overlap_mask=True, patience=100, perspective=0.0, plots=True, pose=12.0, pretrained=True, profile=False, project=None, rect=False, resume=False, retina_masks=False, save=True, save_conf=False, save_crop=False, save_dir=runs/detect/step_13_finetune, save_frames=False, save_json=False, save_period=-1, save_txt=False, scale=0.5, seed=0, shear=0.0, show=False, show_boxes=True, show_conf=True, show_labels=True, simplify=True, single_cls=False, source=None, split=val, stream_buffer=False, task=detect, time=None, tracker=botsort.yaml, translate=0.1, val=True, verbose=False, vid_stride=1, visualize=False, warmup_bias_lr=0.1, warmup_epochs=3.0, warmup_momentum=0.8, weight_decay=0.0005, workers=8, workspace=None
Freezing layer 'model.22.dfl.conv.weight'
train: Fast image access ✅ (ping: 0.0±0.0 ms, read: 4073.6±1476.9 MB/s, size: 50.9 KB)
train: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco12
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 909.8±193.4 MB/s, size: 52.5 KB)
val: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco128/
Plotting labels to runs/detect/step_13_finetune/labels.jpg...
optimizer: 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically...
optimizer: AdamW(lr=0.000119, momentum=0.9) with parameter groups 105 weight(decay=0.0), 112 weight(decay=0.0005), 111 bias(decay=0.0)
Image sizes 640 train, 640 val
Using 8 dataloader workers
Logging results to runs/detect/step_13_finetune
Starting training for 10 epochs...
Closing dataloader mosaic
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
1/10 14.9G 0.3599 0.2503 0.8299 121 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.94 0.898 0.948 0.821
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
2/10 14.9G 0.3407 0.2283 0.8201 113 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.937 0.902 0.951 0.826
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
3/10 14.8G 0.3487 0.2399 0.8281 118 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.934 0.906 0.951 0.828
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
4/10 15G 0.3497 0.2389 0.8248 68 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.938 0.902 0.949 0.827
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
5/10 15G 0.3522 0.2418 0.8247 95 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.95 0.895 0.953 0.819
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
6/10 15G 0.3821 0.2589 0.8274 122 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.948 0.894 0.951 0.819
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
7/10 14.9G 0.4137 0.2715 0.8289 75 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.952 0.89 0.949 0.824
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
8/10 15G 0.4671 0.2988 0.8616 142 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.954 0.895 0.95 0.822
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
9/10 15G 0.4944 0.3161 0.8602 104 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.949 0.901 0.95 0.822
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
10/10 14.8G 0.5943 0.3612 0.9235 164 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.95 0.901 0.951 0.826
10 epochs completed in 0.008 hours.
Optimizer stripped from runs/detect/step_13_finetune/weights/last.pt, 130.3MB
Optimizer stripped from runs/detect/step_13_finetune/weights/best.pt, 130.3MB
Validating runs/detect/step_13_finetune/weights/best.pt...
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
YOLOv8l summary (fused): 121 layers, 32,416,863 parameters, 0 gradients, 123.4 GFLOPs
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.934 0.906 0.951 0.828
Speed: 0.1ms preprocess, 2.5ms inference, 0.0ms loss, 0.3ms postprocess per image
After fine-tuning
Model Conv2d(3, 54, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 54, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
YOLOv8l summary (fused): 121 layers, 32,416,863 parameters, 0 gradients, 123.4 GFLOPs
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 4735.7±1747.0 MB/s, size: 53.4 KB)
val: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco128/
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.931 0.909 0.949 0.823
Speed: 0.1ms preprocess, 6.2ms inference, 0.0ms loss, 0.4ms postprocess per image
Results saved to runs/detect/step_13_post_val
After fine tuning mAP=0.82278884829967
After post fine-tuning validation
Model Conv2d(3, 54, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 54, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
0.14964931342467439
After Pruning
Model Conv2d(3, 54, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 54, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
YOLOv8l summary (fused): 121 layers, 32,416,863 parameters, 74,160 gradients, 123.4 GFLOPs
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 4596.6±1966.1 MB/s, size: 44.7 KB)
val: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco128/
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.945 0.904 0.949 0.819
Speed: 0.2ms preprocess, 6.1ms inference, 0.0ms loss, 0.4ms postprocess per image
Results saved to runs/detect/step_14_pre_val
After post-pruning Validation
Model Conv2d(3, 54, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 54, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
After pruning iter 15: MACs=61.8488912 G, #Params=32.436843 M, mAP=0.8194257834184517, speed up=1.3375568226839933
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
engine/trainer: agnostic_nms=False, amp=False, augment=False, auto_augment=randaugment, batch=16, bgr=0.0, box=7.5, cache=False, cfg=None, classes=None, close_mosaic=10, cls=0.5, conf=None, copy_paste=0.0, copy_paste_mode=flip, cos_lr=False, cutmix=0.0, data=coco128.yaml, degrees=0.0, deterministic=True, device=None, dfl=1.5, dnn=False, dropout=0.0, dynamic=False, embed=None, epochs=10, erasing=0.4, exist_ok=False, fliplr=0.5, flipud=0.0, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, imgsz=640, int8=False, iou=0.7, keras=False, kobj=1.0, line_width=None, lr0=0.01, lrf=0.01, mask_ratio=4, max_det=300, mixup=0.0, mode=train, model=yolov8l.pt, momentum=0.937, mosaic=1.0, multi_scale=False, name=step_14_finetune, nbs=64, nms=False, opset=None, optimize=False, optimizer=auto, overlap_mask=True, patience=100, perspective=0.0, plots=True, pose=12.0, pretrained=True, profile=False, project=None, rect=False, resume=False, retina_masks=False, save=True, save_conf=False, save_crop=False, save_dir=runs/detect/step_14_finetune, save_frames=False, save_json=False, save_period=-1, save_txt=False, scale=0.5, seed=0, shear=0.0, show=False, show_boxes=True, show_conf=True, show_labels=True, simplify=True, single_cls=False, source=None, split=val, stream_buffer=False, task=detect, time=None, tracker=botsort.yaml, translate=0.1, val=True, verbose=False, vid_stride=1, visualize=False, warmup_bias_lr=0.1, warmup_epochs=3.0, warmup_momentum=0.8, weight_decay=0.0005, workers=8, workspace=None
Freezing layer 'model.22.dfl.conv.weight'
train: Fast image access ✅ (ping: 0.0±0.0 ms, read: 4126.6±1397.7 MB/s, size: 50.9 KB)
train: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco12
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 887.1±278.1 MB/s, size: 52.5 KB)
val: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco128/
Plotting labels to runs/detect/step_14_finetune/labels.jpg...
optimizer: 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically...
optimizer: AdamW(lr=0.000119, momentum=0.9) with parameter groups 105 weight(decay=0.0), 112 weight(decay=0.0005), 111 bias(decay=0.0)
Image sizes 640 train, 640 val
Using 8 dataloader workers
Logging results to runs/detect/step_14_finetune
Starting training for 10 epochs...
Closing dataloader mosaic
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
1/10 14.9G 0.332 0.2378 0.8243 121 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.947 0.901 0.951 0.832
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
2/10 14.9G 0.3191 0.2212 0.8139 113 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.937 0.913 0.952 0.832
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
3/10 14.8G 0.3273 0.2297 0.8235 118 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.937 0.909 0.95 0.829
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
4/10 14.9G 0.3607 0.2413 0.8252 68 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.94 0.91 0.95 0.823
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
5/10 15G 0.3739 0.2451 0.8254 95 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.945 0.907 0.949 0.822
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
6/10 15G 0.3695 0.2499 0.8252 122 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.933 0.907 0.944 0.823
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
7/10 14.9G 0.4027 0.2608 0.8271 75 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.94 0.895 0.948 0.825
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
8/10 15G 0.4442 0.2861 0.8574 142 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.947 0.896 0.95 0.83
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
9/10 14.9G 0.4736 0.2997 0.8551 104 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.946 0.903 0.95 0.83
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
10/10 14.8G 0.5683 0.3475 0.9165 164 640: 100%|█████
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.946 0.905 0.952 0.829
10 epochs completed in 0.007 hours.
Optimizer stripped from runs/detect/step_14_finetune/weights/last.pt, 130.3MB
Optimizer stripped from runs/detect/step_14_finetune/weights/best.pt, 130.3MB
Validating runs/detect/step_14_finetune/weights/best.pt...
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
YOLOv8l summary (fused): 121 layers, 32,416,863 parameters, 0 gradients, 123.4 GFLOPs
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.947 0.901 0.951 0.832
Speed: 0.1ms preprocess, 2.5ms inference, 0.0ms loss, 0.3ms postprocess per image
After fine-tuning
Model Conv2d(3, 54, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 54, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CUDA:0 (NVIDIA GeForce RTX 5090, 32109MiB)
YOLOv8l summary (fused): 121 layers, 32,416,863 parameters, 0 gradients, 123.4 GFLOPs
val: Fast image access ✅ (ping: 0.0±0.0 ms, read: 3499.3±1609.6 MB/s, size: 53.4 KB)
val: Scanning /home/nathan/Developer/FasterAI-Labs/Projects/ALX Systems/datasets/coco128/
Class Images Instances Box(P R mAP50 mAP50-95):
all 128 929 0.951 0.897 0.95 0.829
Speed: 0.2ms preprocess, 6.1ms inference, 0.0ms loss, 0.4ms postprocess per image
Results saved to runs/detect/step_14_post_val
After fine tuning mAP=0.8288516438034145
After post fine-tuning validation
Model Conv2d(3, 54, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Pruner Conv2d(3, 54, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
Ultralytics 8.3.162 🚀 Python-3.12.11 torch-2.9.1+cu128 CPU (Intel Core(TM) i9-14900KS)
YOLOv8l summary (fused): 121 layers, 32,416,863 parameters, 0 gradients, 123.4 GFLOPs
PyTorch: starting from 'runs/detect/step_14_finetune/weights/best.pt' with input shape (1, 3, 640, 640) BCHW and output shape(s) (1, 84, 8400) (124.2 MB)
ONNX: starting export with onnx 1.17.0 opset 10...
W0202 14:45:52.878000 36862 site-packages/torch/onnx/_internal/exporter/_compat.py:114] Setting ONNX exporter to use operator set version 18 because the requested opset_version 10 is a lower version than we have implementations for. Automatic version conversion will be performed, which may not be successful at converting to the requested version. If version conversion is unsuccessful, the opset version of the exported model will be kept at 18. Please consider setting opset_version >=18 to leverage latest ONNX features
The model version conversion is not supported by the onnxscript version converter and fallback is enabled. The model will be converted using the onnx C API (target version: 10).
Failed to convert the model to the target version 10 using the ONNX C API. The model was not modified
Traceback (most recent call last):
File "/home/nathan/miniconda3/envs/dev/lib/python3.12/site-packages/onnxscript/version_converter/__init__.py", line 127, in call
converted_proto = _c_api_utils.call_onnx_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/nathan/miniconda3/envs/dev/lib/python3.12/site-packages/onnxscript/version_converter/_c_api_utils.py", line 65, in call_onnx_api
result = func(proto)
^^^^^^^^^^^
File "/home/nathan/miniconda3/envs/dev/lib/python3.12/site-packages/onnxscript/version_converter/__init__.py", line 122, in _partial_convert_version
return onnx.version_converter.convert_version(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/nathan/miniconda3/envs/dev/lib/python3.12/site-packages/onnx/version_converter.py", line 38, in convert_version
converted_model_str = C.convert_version(model_str, target_version)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: /github/workspace/onnx/version_converter/BaseConverter.h:70: adapter_lookup: Assertion `false` failed: No Adapter To Version $17 for Resize
Applied 1 of general pattern rewrite rules.
ONNX: slimming with onnxslim 0.1.59...
ONNX: export success ✅ 2.8s, saved as 'runs/detect/step_14_finetune/weights/best.onnx' (123.8 MB)
Export complete (3.3s)
Results saved to /home/nathan/Developer/FasterAI-Labs/gh/fasterai/nbs/tutorials/prune/runs/detect/step_14_finetune/weights
Predict: yolo predict task=detect model=runs/detect/step_14_finetune/weights/best.onnx imgsz=640
Validate: yolo val task=detect model=runs/detect/step_14_finetune/weights/best.onnx imgsz=640 data=/home/nathan/miniconda3/envs/dev/lib/python3.12/site-packages/ultralytics/cfg/datasets/coco128.yaml
Visualize: https://netron.app