About me

Portfolio

Blog

Deep Learning Computer Science

GitHub

Visit GitHub

💺

YOLO 계열 정리

모델 정의

YOLO : Bounding Box Coordinate(테두리 상자 조정)과 Classification(분류)를 동일 신경망 구조를 통해 동시에 실행하는 Unified Detection(통합 인식)을 구현하는 것

아키텍처

1) 이미지를 S x S 개의 grid로 분할

2) 이미지 전체를 신경망에 넣고 특징 추출을 통해 Prediction Tensor 생성

(Prediction Tensor란, grid 별 bounding box 정보, confidence score, 분류 class 확률을 포함)

3) grid 별 예측 정보를 바탕으로 bounding box 조정 및 분류 작업 수행

Unified Detection

•

region proposal, feature extraction, classification, bbox regression을 one-stage detection 으로 통합

→ 이미지를 통째로 넣음으로서 global한 특성을 반영하여 detection할 수 있음

•

S×SS \times SS×S gird on input

◦

각 grid cell은 B개의 bounding box와 해당 bbox에 대한 confidence score 예측

◦

class-specific confidence score는 bbox에 특정 class 객체가 나타날 확률과 예측된 bbox가 해당 class 객체에 얼마나 잘 들어맞는지 나타냄(test time에서)

Pr(Classi∣Object)×Pr(Object)×IOUpredtruth=Pr(Classi)×IOUpredtruthPr(Class_i|Object) \times Pr(Object) \times IOU_{pred}^{truth} = Pr(Class_i) \times IOU_{pred}^{truth}Pr(Classi​∣Object)×Pr(Object)×IOUpredtruth​=Pr(Classi​)×IOUpredtruth​

◦

최종 output tensor는 S×S×(B∗5+C)S\times S \times (B*5+C)S×S×(B∗5+C)

Network Design

•

YOLO는 하나의 CNN구조로 디자인 되어있는데 이때 GoogLeNet에서 따왔음

◦

24 conv layer + 2 fc layer (Fast YOLO : 9 conv layer + 2 fc layer)

▪

20 conv layer : pretrained with 1000-class ImageNet (input image : 224 x 224)

▪

4 conv layer + 2 fc layer : fine-tuned with PASCAL VOC (input image : 448 x 448)

◦

중간에 1 x 1 reduction layer 로 연산량 감소

Training

stage1)

•

특정 object에 responsible 한 cellicell_icelli​ 는 GT box의 중심이 위치하는 cell로 할당

•

YOLO는 여러 bbox를 예측하지만, 학습 단계에서는 IOU가 가장 높은 bbox만 사용

→

cell_i

에서 responsible 한 j번째 bbox를 표시하여 loss function에 반영

stage2)

•

Loss Function

◦

Multi Loss = Coordinate Loss + Confidence-Score Loss + No-object Penalties + Classification Loss

⇒ grid cell 에 object가 존재하는 경우의 오차와 predictor box로 선정된 경우의 오차만 학습

stage3)

•

PASCAL VOC dataset 기준으로 이미지 1개당 98개 bbox 생성하고 각 class에 대한 예측값 계산(grid cell : 7 x 7)

•

object 당 bbox 개수가 많으므로 NMS 적용

◦

NMS는 Class 별로 진행

YOLO의 한계점

작은 물체에 대해서 탐지 성능이 낮음

→ object가 크면 bbox 간의 IOU 값의 차이가 커져서 적절한 predictor를 선택할 수 있지만, object가 작으면 근소한 차이로 predictor가 결정됨

일반화된 지식과 다르게 object 비율이 달라지면 detection 성능이 낮아짐

YOLO 계열 정리

YOLO V1

YOLO V2