# caffe 源码学习笔记(5) 卷积

## caffe中卷积运算的实现

`````` 1
2for w in 1..W
3  for h in 1..H
4    for x in 1..K
5      for y in 1..K
6        for m in 1..M
7          for d in 1..D
8            output(w, h, m) += input(w+x, h+y, d) * filter(m, x, y, d)
9          end
10        end
11      end
12    end
13  end
14end
15
``````

Caffe convolves by reduction to matrix multiplication. This achieves high-throughput and generality of input and filter dimensions but comes at the cost of memory for matrices. This makes use of efficiency in BLAS.

The input is "im2col" transformed to a channel K' x H x W data matrix for multiplication with the N x K' x H x W filter matrix to yield a N' x H x W output matrix that is then "col2im" restored. K' is the input channel * kernel height * kernel width dimension of the unrolled inputs so that the im2col matrix has a column for each input region to be filtered. col2im restores the output spatial structure by rolling up the output channel N' columns of the output matrix.

caffe代码中src/utill/im2col.cpp 中的 im2col和col2im,都是为了这个优化(指把卷积运算转换成矩阵乘法) 这部分感觉过于detail了,不打算展开.

`````` 1
2message ConvolutionParameter {
3  optional uint32 num_output = 1; // The number of outputs for the layer
4  optional bool bias_term = 2 [default = true]; // whether to have bias terms
5
6  // Pad, kernel size, and stride are all given as a single value for equal
7  // dimensions in all spatial dimensions, or once per spatial dimension.
8  repeated uint32 pad = 3; // The padding size; defaults to 0
9  repeated uint32 kernel_size = 4; // The kernel size
10  repeated uint32 stride = 6; // The stride; defaults to 1
11  // Factor used to dilate the kernel, (implicitly) zero-filling the resulting
12  // holes. (Kernel dilation is sometimes referred to by its use in the
13  // algorithme à trous from Holschneider et al. 1987.)
14  repeated uint32 dilation = 18; // The dilation; defaults to 1
15
16  // For 2D convolution only, the *_h and *_w versions may also be used to
17  // specify both spatial dimensions.
18  optional uint32 pad_h = 9 [default = 0]; // The padding height (2D only)
19  optional uint32 pad_w = 10 [default = 0]; // The padding width (2D only)
20  optional uint32 kernel_h = 11; // The kernel height (2D only)
21  optional uint32 kernel_w = 12; // The kernel width (2D only)
22  optional uint32 stride_h = 13; // The stride height (2D only)
23  optional uint32 stride_w = 14; // The stride width (2D only)
24
25  optional uint32 group = 5 [default = 1]; // The group size for group conv
26
27  optional FillerParameter weight_filler = 7; // The filler for the weight
28  optional FillerParameter bias_filler = 8; // The filler for the bias
29  enum Engine {
30    DEFAULT = 0;
31    CAFFE = 1;
32    CUDNN = 2;
33  }
34  optional Engine engine = 15 [default = DEFAULT];
35
36  // The axis to interpret as "channels" when performing convolution.
37  // Preceding dimensions are treated as independent inputs;
38  // succeeding dimensions are treated as "spatial".
39  // With (N, C, H, W) inputs, and axis == 1 (the default), we perform
40  // N independent 2D convolutions, sliding C-channel (or (C/g)-channels, for
41  // groups g>1) filters across the spatial axes (H, W) of the input.
42  // With (N, C, D, H, W) inputs, and axis == 1, we perform
43  // N independent 3D convolutions, sliding (C/g)-channels
44  // filters across the spatial axes (D, H, W) of the input.
45  optional int32 axis = 16 [default = 1];
46
47  // Whether to force use of the general ND convolution, even if a specific
48  // implementation for blobs of the appropriate number of spatial dimensions
49  // is available. (Currently, there is only a 2D-specific convolution
50  // implementation; for input blobs with num_axes != 2, this option is
51  // ignored and the ND implementation will be used.)
52  optional bool force_nd_im2col = 17 [default = false];
53}
54
``````

## 各种卷积

### 2d卷积

2d可以理解成filter只在height和width两个方向移动

• same: 并不表示input和output的size相同(只在stride 为1时成立) 而是说如果filter超出了feature map的边界,自动做padding.

### Transposed Convolution (Deconvolution)

Transposed Convolution 其实是比较合适的叫法,但是人们也经常用"Deconvolution"来表示.

Transposed Convolution 就是一种做upsample的方式.

### Dilated Convolution (空洞卷积)

caffe的conv layer是直接支持这种结构的. 似乎是在 Semantic Segmentation 领域比较常见,在工作中没太接触过这种结构.

### Grouped Convolution

``````1
2  group_ = this->layer_param_.convolution_param().group();
3  CHECK_EQ(channels_ % group_, 0);
4  CHECK_EQ(num_output_ % group_, 0)
5      << "Number of output should be multiples of group.";
6
``````

`````` 1
2
3template <typename Dtype>
4void BaseConvolutionLayer<Dtype>::forward_cpu_gemm(const Dtype* input,
5    const Dtype* weights, Dtype* output, bool skip_im2col) {
6  const Dtype* col_buff = input;
7  if (!is_1x1_) {
8    if (!skip_im2col) {
9      conv_im2col_cpu(input, col_buffer_.mutable_cpu_data());
10    }
11    col_buff = col_buffer_.cpu_data();
12  }
13  for (int g = 0; g < group_; ++g) {
14    caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, conv_out_channels_ /
15        group_, conv_out_spatial_dim_, kernel_dim_,
16        (Dtype)1., weights + weight_offset_ * g, col_buff + col_offset_ * g,
17        (Dtype)0., output + output_offset_ * g);
18  }
19}
20
``````