deep-learning

caffe 源码阅读笔记

Jun 30, 2020 · 1 min read · caffe

caffe做部署是YYDS! blob layer net 激活函数卷积 reshape slice loss function reduce eltwise argmax

Read More
[施工中] cupy与torch的导入顺序不同对计算结果的影响

Dec 17, 2023 · 10 min read · pytorch cupy python

背景公司内部的基于torch的toolbox发现某个版本之后,结果发生了偏移. 通过一系列排查,发现当导入cupy和torch的顺序不同时，计算结果会有所差异。也就是说,如下两段代码会导致模型训练等环节的计算得到不同的结果. 1import cupy as cp 2import torch 1import torch 2import cupy as cp 3 最小复现代码经过一番努力,把问题从内部框架中剥离了出来. 如下是得到的最小复现代码. 通过调整import cupy与import torch的相对顺序,会得到不同的结果. 1# import cupy as cp 2import torch 3import …

Read More
Build Onnxruntime With Bazel

Jan 16, 2023 · 1 min read · onnxruntime bazel

背景需要使用bazel build onnxruntime 但是onnxruntime本身没有提供bazel相关的配置作为单独的repo 将onnxruntime的包下载下来解压主要的坑点在于动态库必须写全版本号，不然无法成功导入完整的BUILD.bazel文件为 1 2 3 4cc_import( 5 name = "ort_lib", 6 hdrs = glob(["onnxruntime-linux-x64-1.13.1/include/**/*.h"]), 7 # FIXME: 这里的动态库必须写全版本号，不然会出现error: undefined reference to …

Read More
[施工中]caffe 源码学习笔记(11) softmax

Aug 6, 2022 · 2 min read · caffe

背景 2022年惊讶的发现，当时竟然没有写关于softmax的笔记，因此来补充一下。 proto 还是先看proto 12// Message that stores parameters used by SoftmaxLayer, SoftmaxWithLossLayer 3message SoftmaxParameter {4 enum Engine {5 DEFAULT = 0;6 CAFFE = 1;7 CUDNN = 2;8 }9 optional Engine engine = 1 [default = DEFAULT];1011 // The axis along which to perform the …

Read More
(CSE 599W)Reverse Mode Autodiff

Apr 5, 2021 · 2 min read · DL-SYS CSE599W

背景怎么算微分。。通常有三种方法。 Symbolic Differentiation Numerical Differentiation Automatic Differentiation (auto diff) auto diff中两种主流的方式分别是forward-mode和reverse-mode 由于forward-mode的方法中，计算的时间复杂度是O(n),n是输入的参数个数。而reverse-mode中，计算的时间复杂度是O(m),m是输出节点的个数。在dnn中，n往往很大，远大于m，因此这里主要介绍reverse-mode auto diff方法。 backprop和reverse mode auto …

Read More
【推荐系统】Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions

Jan 23, 2021 · 1 min read · 推荐系统

迫于生计，从今天开始学习推荐系统相关的内容，今天先来读一篇推荐系统领域的综述 Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions 由于目前的工作其实是偏向推荐系统的serving,训练的开发，因此这些paper可能都是粗读，也不会把paper中的内容逐句翻译，而是找出我认为最为重要的一些概念加以记录。 INTRODUCTION 推荐的问题简单可以归纳成对user未看见的item进行打分的过程，这个分一般称之为rating.有了rating,推荐前top k 个最好 …

Read More
【施工中】torch2trt　学习笔记

Sep 18, 2020 · 1 min read · Jetson Nano 模型转换

前言偶然发现了 torch2trt 的模型转换方案，思路是直接将pytorch op映射到TensorRT的python api. 在pytorch进行每个op　forward的时候，tensorrt也相应往network上添加op. 这里会先涉及torch2trt的使用，后面会补充这个转换工具的代码学习使用torch2trt torch2trt pytorch可以直接安装，但是torchvision根据 pytorch-for-jetson-version-1-6-0-now-available 中的说法，需要编译安装 1git clone https://github.com/pytorch/vision 然后切换到tag …

Read More
Jetson Nano踩坑记录

Sep 8, 2020 · 5 min read · Jetson Nano

写在前面主要是需要在jetson nano做模型转换，来记录下踩的坑目前有两条路径，一条是我们现有的转换路径，也就是pytorch->onnx(->caffe)->trt的路径在这条路径上踩了比较多的坑，最终暂时放弃，最直接的原因是cudnn8.0升级接口发生改动，编译caffe遇到较多问题这里其实仍然采用了两条平行的路径，一条是直接在nano上构建环境，另外一种是基于docker(包括构建交叉编译环境用于加快编译速度) 另一条路径是基于torch2trt,是一条直接pytorch->trt的路径这里主要记录在第一条路径上踩过的坑环境准备先过一遍开发者手册主要是介绍了下nano的硬件 …

Read More
caffe 源码学习笔记(11) argmax layer

May 6, 2020 · 3 min read · caffe

背景似乎没什么背景,继续看caffe代码 argmax的作用是返回一个blob某个维度或者batch_size之后的维度的top_k的index(或者pair(index,value)) proto 还是先看proto 12message ArgMaxParameter {3 // If true produce pairs (argmax, maxval) 4 optional bool out_max_val = 1 [default = false];5 optional uint32 top_k = 2 [default = 1];6 // The axis along which to maximise -- may …

Read More
caffe 源码学习笔记(10) eltwise layer

May 3, 2020 · 3 min read · caffe

背景这个layer和reduce layer有一些相似,就干脆一起看了. 作用是输入至少两个blob,然后对每个blob中的元素所一些运算,最后得到一个blob. caffe 支持的运算有"PROD","SUM","MAX"三种顺便提一句,TensorRT支持的要多一些: 1 2enum class ElementWiseOperation : int 3{ 4 kSUM = 0, //!< Sum of the two elements. 5 kPROD = 1, //!< Product of the two elements. 6 kMAX = 2, …

Read More
- 1
- 2
- 3
- 4
- 5