@mShuaiZhao
2018-01-17T07:16:37.000000Z
字数 1585
阅读 332
PaperReading
2017.12
ObjectDetection
2015 CVPR
Instead, we frame object detection as a regression problem to spatially separated bounding boxes and associated class probabilities.
A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation.
advantages
prior work
take a classifier for that object and evaluate it at various locations and scales in a test image
region proposal methods
receive the whole image, get a large context
YOLO learns generalizable representations of objects.
Unified Detection
Each bounding box consists of 5 predictions : and confidence.
the confidence prediction represents the IOU between the predicted box and any ground truth box.
Each grid cell also predicts conditional class probabilities.
grids
for each grid cell predicts bounding boxes, confidence for those boxes
class probabilities
encoded as an
总结的来说,就是将图片划分为个栅格,其实栅格的划分对于bounding box并没有太大意义。根据划分好的栅格确定最后的输出形式为。
前面 个output units负责预测的是bounding box的坐标。
后面个output units负责预测的是物体出现的概率和物体的类别。
Training
disadvantages
small objects that appear in groups, such as flocks of birds
这是由于划分grid所产生的问题。有优点也有缺点,trade-off。
it struggle to generalize to objects in new or unusual aspect ratios or configurations. 毕竟是从数据中直接预测bounding box,好难...
comparison
对其他方法也不是很熟,就不比较了。
Notes
感觉这篇paper也是CNN的典型应用,设计好你想要的输出,当然这也确定了你的groundtruth label的形式,最后通过CNN强大的学习能力,来从输出到输入学习到你想要的映射,最后完成你想要实现的功能。
这篇paper输入的是整个图像,层数也很多,所以最后学习到是high-level representations,泛化能力也比较强。但是同样是因为输入的是整个图像,图像其他地方的点对object而言或许就是一种噪声,也许这就是这篇paper的accuracy比较低的原因吧。