Hi,欢迎来到中国嵌入式培训高端品牌 - 华清远见嵌入式学院<北京总部官网>,专注嵌入式工程师培养15年!
当前位置: > 嵌入式学院 > 嵌入式学习 > 讲师博文 > AprilTags论文解读

一、Apriltag是改进的ARToolkit、 ARTag 。

1.1 ARToolkit的劣势:

A major disadvantage of this approach is the computational cost associated with decoding tags, since each template required a separate, slow correlation operation. A second disadvantage is that it is difficult to generate templates that are approximately orthogonal to each other.


The tag detection scheme used by ARToolkit is based on a simple binarization of the input image based on a userspecified threshold.


This scheme is very fast, but not robust to changes in illumination.


In general, ARToolkit’s detections can not handle even modest occlusions of the tag’s border.


1.2 ARTag 对ARToolkit的改进:

the detection mechanism was based on the image gradient, making it robust to changes in lighting.


While the details of the detector algorithm are not public, ARTag’s detection mechanism is able to detect tags whose border is partially occluded.

ARTag 的详细的获取算法不公开,并且他可以获取tag边缘被部分闭塞。

ARTag also provided the first coding system based on forward error correction, which

made tags easier to generate, faster to correlate, and provided greater orthogonality between tags.

ARTag 提供第一个向前纠错的解码系统,这个让tag容易产生,快速纠错,也提供更好的算法。


2.1 整体描述:

we describe the detector whose job is to estimate the position of possible tags in an image. Loosely speaking, the detector attempts to find four-sided regions (“quads”) that have a darker interior than their exterior. The tags themselves have black and white borders in order to facilitate this.


2.2 获取线段(Detecting line segments )

Our approach begins by detecting lines in the image. Our approach, similar in basic approach to the ARTag detector, computes the gradient direction and magnitude at every pixel and agglomeratively clusters the pixels into components with similar gradient directions and magnitudes.

大概意思是说,类似于ARTag 的获取方法,即计算tag的每一个像素点的梯度方向和幅值,并且把相同的梯度方向和幅值得像素集群到一个部件中。

2.3 之前的方法(Early processing steps)

First:The tag detection algorithm begins by computing the gradient at every pixel, computing their magnitudes (通过计算像素的梯度得到幅值图像)。

Second:gradient direction(得到梯度方向)

Third:similar gradient directions and magnitude are clustered into components(相似的梯度方向和幅值集群到一个组件)


The clustering algorithm is similar to the graph-based method of Felzenszwalb : a graph is created in which each node represents a pixel.



Edges are added between adjacent pixels with an edge weight equal to the pixels’ difference in gradient direction. These edges are then sorted and processed in terms of increasing edge weight: for each edge, we test whether the connected components that the pixels belong to should be joined together.



This gradient-based clustering method is sensitive to noise in the image: even modest amounts of noise will cause local gradient directions to vary, inhibiting the growth of the components. The solution to this problem is to low-pass filter the image.


Unlike other problem domains where this filtering can blur useful information in the image, the edges of a tag are intrinsically large-scale features (particularly in comparison to the data field), and so this filtering does not cause information loss. We recommend a value of σ = 0.8.


Fourth:Using weighted least squares, a line segment is then fit to the pixels in each component.(使用加权最小二乘法,一条线段就适合每个组件的像素。)

The direction of the line segment is determined by the gradient direction, so that segments are dark on the left, light on the right. The direction of the lines are visualized by short perpendicular “notches” at their midpoint; note that these “notches” always point towards the lighter region.


2.4 获取线段的总结

The segmentation algorithm is the slowest phase in our detection scheme. As an option, this segmentation can be performed at half the image resolution with a 4x improvement in speed. The sub-sampling operation can be efficiently combined with the recommended low-pass filter. The consequence of this optimization is a modestly reduced detection range, since very small quads may no longer be detected.


2.5 四边形获取

Our approach is based on a recursive depth-first search with a depth of four: each level of the search tree adds an edge to the quad. At depth one, we consider all line segments. At depths two through four, we consider all of the line segments that begin “close enough” to where the previous line segment ended and which obey a counter-clockwise winding order.


Robustness to occlusions and segmentation errors is handled by adjusting the “close enough” threshold: by making the threshold large, significant gaps around the edges can be handled. Our threshold for “close enough” is twice the length of the line plus five additional pixels. This is a large threshold which leads to a low false negative rate, but also results in a high false positive rate.


We populate a two-dimensional lookup table to accelerate queries for line segments that begin near a point in space.



3.1 Homography and extrinsics estimation(单应性和外在评估)

3.1.1 通过DLT得到单应矩阵

We compute the 3×3 homography matrix that projects 2D points in homogeneous coordinates from the tag’s coordinate system (in which [0 0 1]T is at the center of the tag and the tag extends one unit in the xˆ and yˆdirections) to the 2D image coordinate system. The homography is computed using the Direct Linear Transform (DLT) algorithm. Note that since the homography projects points in homogeneous coordinates, it is defined only up to scale.

计算的3x3 单应矩阵, 项目2D 点的均匀坐标从标签的坐标系 (在其中 [0 0 1] T 是在标签的中心和标签扩展一个单位在 xˆ和 yˆ方向) 到2D 图像坐标系统。应是使用直接线性变换 (DLT) 算法计算的。请注意,由于单应项目是以齐次坐标表示的, 所以它的定义只有按比例。

3.1.2 计算方法

Computation of the tag’s position and orientation requires additional information: the camera’s focal length and the physical size of the tag.


The 3 × 3 homography matrix (computed by the DLT) can be written as the product of the 3 × 4 camera projection matrix P (which we assume is known) and the 4 × 3 truncated extrinsics matrix E.

3 x 3 单应矩阵 (由 DLT 计算) 可以写成 3 x 4 相机投影矩阵 P (我们假设已知) 和 4 x 3 截断extrinsics矩阵E的乘积。

截断extrinsics矩阵 E:

extrinsics matrix are typically 4 × 4, but every position on the tag

is at z = 0 in the tag’s coordinate system. Thus, we can rewrite every tag coordinate as a 2D homogeneous point with z implicitly zero, and remove the third column of the extrinsics matrix, forming the truncated extrinsics matrix.

extrinsics 矩阵通常是 4 x 4, 但每个位置上的标签在标记的坐标系统中为 z = 0。因此, 我们可以将每个标记坐标重写为一个具有 z 隐式零的2D 齐点, 并移除 extrinsics 矩阵的第三列。

We represent the rotation components of P as Rijand thetranslation components as Tk. We also represent the unknownscale factor as s.

我们代表 P 的旋转分量为 Rij和转换组件作为 Tk。我们也代表未知比例因子为s。

Note that we cannot directly solve for E because P is rankdeficient. We can expand the right hand side of Eqn. 2, andwrite the expression for each hij as a set of simultaneousequations。

请注意, 我们不能直接解决 E, 因为 P 是秩不足.我们可以扩大右手边的 Eqn 2,将每个hij的表达式写为一组同等方程。

These are all easily solved for the elements of Rij and Tkexcept for the unknown scale factor s. However, since thecolumns of a rotation matrix must all be of unit magnitude,we can constrain the magnitude of s. We have two columnsof the rotation matrix, so we compute s as the geometric the

geometric average of their magnitudes. The sign of s canbe recovered by requiring that the tag appear in front of thecamera, i.e., that Tz < 0. The third column of the rotationmatrix can be recovered by computing the cross product ofthe two known columns, since the columns of a rotation

matrix must be orthonormal.

这些都很容易解决的 Rij 和 Tk 的元素,除了未知的比例因子 s。然而, 由于旋转矩阵的列必须都是单位幅值,我们可以限制 s 的大小。我们有两列的旋转矩阵, 所以我们计算 s 为他们幅值的几何平均值。标记s可以重新获得通过请求在相机前的tag。即Tz < 0。旋转的第三列矩阵可以通过计算交叉乘积来恢复两个已知列, 因为旋转的列矩阵必须是正交的。

The DLT procedure and the normalization procedureabove do not guarantee that the rotation matrix is strictlyorthonormal. To correct this, we compute the polar decomposition of R, which yields a proper rotation matrix whileminimizing the Frobenius matrix norm of the error.

DLT 程序与规范化程序以上不保证旋转矩阵是严格正交.为了纠正这一点, 我们计算 R 的极分解, 它产生一个适当的旋转矩阵, 而最小化误差的 Frobenius 矩阵范数。


3.2.1 整体概述

The final task is to read the bits from the payload field.We do this bycomputing the tag-relative coordinates of eachbit field, transforming them into image coordinates using thehomography, and then thresholding the resulting pixels. Inorder to be robust to lighting (which can vary not only fromtag to tag, but also within a tag), we use a spatially-varyingthreshold.

最后的任务是从有效负载字段中读取位。我们通过计算每个位字段的tag相对坐标系, 利用单应性将它们转换为图像坐标, 然后对结果像素进行阈值化。为了受光照影响小 (这不仅可以tag到tag, 而且也可以在一个tag), 我们使用空间变化阈。

we build spatially-varying model of the intensity of “black” pixels, and a second model for the intensity of“white” models. We use the border of the tag, which contains known examples of both white and black pixels.

我们建立了 "黑色" 像素的强度的空间变化模型, 以及第二个模型的强度"白色" 模型。我们使用标签的边框, 它包含白色和黑色像素的已知示例。

A fourth quad is detected around one of the payload bits of the larger

tag. These two extraneous detections are eventually discarded because their payload is invalid. The white dots correspond to samples around the tags border which are used to fit a linear model of intensity of “white” pixels; a model is similarly fit for the black pixels. These two models are used to threshold the data payload bits, shown as yellow dots.

在较大的一个有效载荷位的tag检测到一个四个方形。这两个外部检测最终被丢弃, 因为它们的有效负载无效。白点对应于tag周围的样本用于拟合 "白" 像素强度线性模型的边界;模型同样适合黑色像素。这两种模型用于阈值数据有效负载位, 显示为黄色点。

This model has four parameters which are easily computedusing least squares regression. We build two such models,one for black, the other for white. The threshold used whendecoding data bits is then just the average of the predictedintensity values of the black and white models.

该模型有四参数, 易于计算使用最小二乘法回归。我们建立了两个这样的模型一个是黑色的, 另一个是白色的。使用的阈值解码数据位, 然后只是平均的预测黑白模型的强度值。

3.2.2 CODING SYSTEM (编码系统,决定获取的四边形是否有效。)

Thegoals of a coding system are to:

• Maximize the number of distinguishable codes

• Maximize the number of bit errors that can be detectedor corrected

• Minimize the false positive/inter-tag confusion rate

• Minimize the total number of bits per tag (and thus thesize of the tag)

These goals are often in conflict, and so a given coderepresents atrade-off.




·最小的the false positive/inter-tag 混淆率

·最小化每个tag的总位数 (tag的大小)

这些目标经常处于冲突中, 因此给定的代码表示权衡。

we describe a newcoding system based on lexicodes that provides significantadvantages over previous methods. Our procedure can generate lexicodes with a variety of properties, allowing the userto use a code that best fits their needs.

我们描述了一个新基于 lexicodes 的编码系统, 提供了显著优于以前的方法。我们的程序可以生成具有多种属性的 lexicodes, 允许用户使用最符合其需要的代码。

we use a lexicode system that can generatecodes for any arbitrary tag size (e.g., 3x3, 4x4, 5x5, 6x6)and minimum Hamming distance. Ourapproach explicitlyguarantees the minimum Hamming distance for all four

rotations of each tag and eliminates tags which are oflow geometriccomplexity. Computing the tags can be anexpensive operation, but is done offline. Small tags (5x5)can be easily computed in seconds or minutes, but largertags (6x6) can take several days of CPU time.

我们使用一个 lexicode 系统, 可以生成任意标记大小的码 (例如, 3x3, 4x4, 5x5, 6x6)和最小汉明距离。我们的方法明确保证最小汉明距离为每个tag的4方向旋转和消除标签低几何复杂度。计算tag是昂贵的操作, 但离线完成。小标签 (5x5)可以很容易地以秒或分钟计算, 但更大标记 (6x6) 可能需要几天的 CPU 时间。



在线咨询: 曹老师QQ(3337544669), 徐老师QQ(1462495461), 刘老师 QQ(3108687497)


Copyright 2004-2018 华清远见教育集团 版权所有 ,京ICP备16055225号,京公海网安备11010802025203号