'update'

c5dc095b · 108 · c5dc095b · c5dc095b
Commit c5dc095b authored Oct 24, 2020 by 108
Show whitespace changes
Inline Side-by-side

Showing with 102 additions and 0 deletions

20201024/test.py
+73 -0

README.md
+29 -0

No files found.
--- a/20201024/test.py
+++ b/20201024/test.py
+import  numba as nb
+import numpy as np
+import time
+from numba import jit,cuda,float32
+import time
+import cupy as cp
+# gpu = cp.ones((1024,512,4,4),dtype=cp.int32)
+# cpu  =np.ones((1024,512,4,4),dtype=np.int32)
+#
+#
+# ctime1 = time.time()
+#
+# for c in range(1024):
+#     cpu = np.add(cpu,cpu)
+#
+# ctime2 = time.time()
+#
+# ctotal = ctime2 - ctime1
+# print('纯cpu计算时间：', ctotal)
+#
+# # 纯cupy的gpu测试：
+# gtime1 = time.time() #GPU下测试系统时钟开始计时
+# for g in range(1024):
+#     gpu = cp.add(gpu,gpu)   # 自带的加法函数
+# gtime2 = time.time()
+# gtotal = gtime2 - gtime1
+# print('纯gpu计算时间：', gtotal)
+#
+#
+# # gpu和cpu混合编程：
+# ggtime1 = time.time()
+# for g in range(1024):
+#     gpu = gpu + gpu         # 手工加法：+ 默认回到cpu计算！！！
+# ggtime2 = time.time()
+# ggtotal = ggtime2 - ggtime1
+# print('混合的计算时间：', ggtotal)
+TPB = 16
+@cuda.jit
+def matmul_gpu(A,B,C):
+    pass
+@nb.jit(nopython =True)
+def matmul_cpu(A,B,C):
+    for y in range(B.shape[1]):
+        for x in range(B.shape[0]):
+            tmp = 0.
+            for k in range(A.shape[1]):
+                tmp += A[x,k] * B[k,y]
+            C[x,y] = tmp
+@cuda.jit
+def matmul_shared_memary(A,B,C):
+    pass
+A = np.full((TPB * 50,TPB * 50),3,dtype=np.float)
+B = np.full((TPB * 50,TPB * 50),4,dtype=np.float)
+C_cpu = np.full((A.shape[0],B.shape[1]),0,dtype=np.float)
+print("start  processing in CPU")
+start_cpu = time.time()
+matmul_cpu(A,B,C_cpu)
+end_time =time.time() - start_cpu
+print("cpu time".format(str(round(end_time))))
+#start in GPU
+A_global_mem = cuda.to_device(A)
+B_global_mem = cuda.to_device(B)
+C_global_mem = cuda.device_array((A.shape[0],B.shape[1]))
+C_shared_mem = cuda.device_array((A.shape[0],B.shape[1]))
--- a/README.md
+++ b/README.md
+- ```直播-Lecture```: 核心的知识点
+- ```直播-Workshop```: 代码实战、复习课、主题分享、论文讲解，论文解读等
+|    日期  |    主题  | 讲师 | 知识点详情 | 课件  |  相关阅读  |   其 他 | 作业 |
+|---------|---------|---------|---------|---------|---------|---------|---------|
+| PART 1<br>前期基础复习|||<br>[课程文档](https://shimo.im/docs/9w8Q8wkPrxGxWPcq)<br>|[D2课程资料包](http://47.94.6.102/Architect/Data/blob/master/D2%E8%AF%BE%E7%A8%8B%E8%B5%84%E6%96%99%E5%8C%85%20(1).zip)<br><br>[tacotron2](http://47.94.6.102/Architect/Paper/blob/master/tacotron2%20(2).pdf)|||||
+|10月18日<br>周日<br>10：00|lecture1<br>基础理论及课程介绍|任老师|l课程安排与三大项目介绍（图像+语音+推荐）<br>l高性能计算技术深度学习应用概览。<br>l矩阵计算基础理论及应用。<br>lCupy矩阵加速技术。<br>lNumba编译加速技术。|[lecture课件](http://47.94.6.102/Architect/course-info/blob/master/D2%E8%AF%BE%E7%A8%8B%E8%B5%84%E6%96%99%E5%8C%85.zip)||||
+|10月18日<br>周日<br>19：00|review1<br>|任老师||[论文](http://47.94.6.102/Architect/course-info/blob/master/tacotron2.pdf)|||
+|10月25日<br>周日<br>10：00|lecture2<br>并行及分布式框架概述|任老师|l  经典并行化方案介绍。<br>l  OPENMP详细介绍。<br>l  MPI技术详细介绍。<br>l  Nvidia 集合通信NCCL 技术介绍。|[课件周六更新]()||||
+|10月25日<br>周日<br>21：00|Paper<br>如何阅读Paper|张老师|关于NLP文献的选择和阅读<br>Paper课程的内容与目标|[]()|||
+| PART 2<br>目标检测项目|||项目目标：<br>通过第一个CV项目，掌握并能够上手自己动手实现卷积等常规的神经网络算子，使用深度学习完成目标检测项目的部署。|
+|<br>|Lecture3<br>经典卷积网络回顾|任老师|l经典卷积网络模型回顾，从Lenet到Resnet。 <br>l卷积参数设计及应用详解(分组卷积，空洞卷积等)。 <br>l卷积层的正向和反向传播算法。 <br>l卷积层的计算优化技术。|[]()|||
+|<br>|Lecture4<br>目标检测算法|任老师|l  RCNN系列<br>l  YoLo系列<br>l  SSD。<br>l  Tensorrt框架入门。|[]()|||
+|<br>|Lecture5<br>NvidiaTensort核心算法和Plugin开发|任老师|l  TensoRT 量化技术。<br>l  TensoRT混合推理的原理。<br>l  TensoRT的Plugin开发流程。|[]()|||
+| PART 3<br>个性化语音合成项目实战||||
+|<br>|Lecture6<br>个性化语音合成项目全貌概览|任老师|l  合成项目整体介绍。<br>l  声纹提取网络结构。<br>l  Tacotron/Tacoton2 结构介绍。<br>l  Wavenet、WaveRNN、WaveGlow 结构介绍|[]()|||
+|<br>|Lecture7<br>计算图表示及优化|任老师|l  ONNX计算图表示方法介绍。<br>l  ONNX 图优化的常用技术。<br>l  Pytorch模型转ONNX技术介绍。|[]()|||
+|<br>|Lecture8<br>Attention-Based <br>Seq2Seq模型Tacotron2|任老师|l  Tacotron2模型整体结构。<br>l  attention技术回顾。<br>l  teacher forcing技术。<br>l  GRU算法的TensorRT实现。|[]()|||
+|<br>|Lecture9<br>声码器:Wave序列生成算法实战|任老师||声码器技术回顾。<br>lwavenet模型解读。<br>lWaveRNN模型解读。<br>lWaveGLow模型解读。|[]()|||
+| PART 4<br>分布式推荐系统||||
+|<br>|Lecture10<br>推荐系统概览|任老师|l  常用推荐算法概述。<br>l  LR的简单新闻推荐系统。<br>l  常用Layer的前向传播和反向传播算法。以fc和pooling为例。|[]()|||
+|<br>|Lecture11<br>分布式参数服务器|任老师|l  参数服务器概述。<br>l  分布式环境下的SGD算法。<br>l  Range 查询技术。|[]()|||
+|<br>|Lecture12<br>分布式推荐系统实战|任老师|l  FM算法详细详细介绍。<br>l  DeepFM算法详细介绍。<br>l  稀疏矩阵的全局参数更新算法。|[]()|||
+| PART 5<br>深度学习高级主题||||
+|<br>|Lecture13<br>深度学习架构演进|任老师|l  主流深度学习框架的核心设计思路对比。<br>第一代系统<br>第二代系统 <br>第三代系统|[]()|||
+|<br>|Lecture14<br>训练加速高级技术1|任老师|l  Local SGD原理。  <br>l  并行执行器设计。|[]()|||
+|<br>|Lecture15<br>训练加速高级技术2|任老师|l  深度学习框架分布式通信技术。<br>l  深度学习框架计算图fuse技术。|[]()|||
+|<br>|Lecture16<br>结业答辩|任老师||[]()|||