OneFlow 發(fā)布 v0.2.0
本文轉(zhuǎn)載自:OneFlow 發(fā)布 v0.2.0
GitHub: ????https://github.com/Oneflow-Inc/oneflow
Changelog
v0.2.0 (09/10/2020)
OneFlow 已經(jīng)開源72天啦,現(xiàn)在也迎來了0.2.0版本,在這個版本中, OneFlow ?進行了不少性能優(yōu)化,在分布式性能最快的道路上繼續(xù)狂飆。針對國內(nèi)用戶優(yōu)化編譯體驗,從此下載編譯 OneFlow 快到飛起。
除此之外,本次更新的亮點還有:
1. grpc 版本升級到最新版,消除老版本 grpc 帶來的編譯障礙;
2. Tensorflow XLA 升級到了最新版;
3. 我們做了大量工作來增強穩(wěn)定性、圖編譯和運行速度,即將發(fā)布深度學(xué)習(xí)框架測試報告,可關(guān)注 Oneflow-Inc/ DLPerf 項目先睹為快;
4. 您可能已經(jīng)關(guān)注到了我們正在快馬加鞭地優(yōu)化 OneFlow 的動態(tài)圖機制, C++ 和 Python 交互方式正在從 SWIG 向 Pybind11 遷移,屆時 OneFlow API 至少和 PyTorch 一樣好用(如果不是更好用)。
5. API 文檔進一步完善,更多示例更易懂。
?
Op 修復(fù)、性能優(yōu)化
支持二元 add op 與前驅(qū)節(jié)點融合
FuseAddToOutput #3524
Dropout support add_to_output #3569
Dev matmul add to output #3581
kernel 性能優(yōu)化
Fused BatchNormAddRelu#3519
bn_add_relu use bit mask#3645
layer_norm param grad #3604
Fused layer norm #3591
BiasAdd Row Col Half2 #3636
MaskAndScaleHalf2 #3643
Optimize CudaAsyncMemoryCopier #3543
Avoid using local memory in CropMirrorNormalizeGpuKernel #3539
LayerNormGpuKernel use fused InstanceScaleCenter #3573
使用 user op 實現(xiàn) model update ops,以及 model update ops 支持 fusion
Add model update user ops #3546
Migrate L1L2RegularizeGradientOp to UserOp Framework #3527
model update fuse scalar_mul_by_tensor #3635
Dev indexed slices model update user ops #3561
Dev adam xla and rm sys op #3584
NCCL 支持設(shè)置最大融合 op 數(shù)量
Add nccl_fusion_max_ops #3567
新 op
[feature] Fused ImageDecoderRandomCropResize #3644
Add AmpWhiteIdentityOp #3658
Add ImageDecoderRandomCropResizeOp::InferParallelSignature #3646
Dev add op tril #3511
add masked fill op #3515
cuDNN 算法推導(dǎo)支持全局緩存
Add CudnnConvAlgoCache #3649
Bugfix 與 其他
fix broadcast div grad? #3525
fix optimizer copy-paste bug #3508
fix bug about pad value #3640
Optimize some default values #3648
Fix cuda runtime #3621
Fix reshape inplace #3545
Refactor rmsprop
mean_squareand add unit tests for optimizers? #3523Remove cuDNN fields from OperatorConf #3536
Add UserOpConfWrapperBuilder::ScopeSymbolId #3528
Fix NcclCollectiveBoxing builder_name #3563
rm conv2d cpu testcase #3574
fix broadcast_to_compatible_with grad bug #3609
Add inline for half #3600
Fix converter half #3599
Fix gpu_atomic_max double overload use fmaxf #3578
fix upsample #3579
Eager Execution
給 eager 相關(guān)的代碼加上更多注釋;微調(diào) stateless_call 指令,區(qū)分 mutable_input 和 output 兩類不同的參數(shù);實現(xiàn) broadcast 指令;
fix fmt cuda_copy_d2h_stream_type #3606
add comments for cuda_copy_d2h_stream_type.cpp #3603
Fix TopoForEachNode in GenCollectiveBoxingPlan #3566
Split call_op_kernel instruction args into const_input/mutable_input/output? #3562
split BlobObject and EagerBlobObject #3485
remove unused code under vm/ #3585
Dev broadcast instruction #3555
Broadcast instruction #3552
pybind11 集成
現(xiàn)在 OneFlow 內(nèi) SWIG 和 pybind11 共存,之后會逐步切換到 pybind11
pybind11 integration #3517
upgrad to pybind11 master and pass exe path #3522
Update rel script for pybind11 #3526
Dev oneflow pybind api #3625
優(yōu)化、修復(fù)編譯工具
修復(fù)了一些導(dǎo)致編譯失敗緩慢的不合理配置、加速了依賴下載、 修復(fù)了 ubuntu dockerfile
[bug] fix ubuntu docker build #3504
change link order to fix the cpu+openblas build #3634
[bug] fix bug: oneflow cpu-only lib flags #3615
add convert_url_to_oss_https_url and DCN flag #3595
Add cn url in readme #3583
make absl use tar not git #3570
Optimize nvcc gencode flag #3577
Transport 網(wǎng)絡(luò)傳輸子系統(tǒng)
支持 P2P 動態(tài)網(wǎng)絡(luò)傳輸
[feature] Transport #3549
集成 CFG 工具
CFG 是基于 proto 語法的、生成跨 python、C++ 數(shù)據(jù)交互代碼的工具
Dev integrate cfg #3597
Less usage of PbMessage in Operator #3651
XLA 支持優(yōu)化
升級到了 TF 最新版本
upgrade XRT XLA to TF 2.3.0 #3531
Fix XLA crash #3548
GRPC 升級
升級到了 GRPC 最新版本
Upgrade grpc #3551
[bug] [bugfix] GRPC: control server CompletionQueue shutdown. #3589
CI、測試優(yōu)化
將 XLA 也加入 CI,優(yōu)化了 op 的測試用例,自動上傳 master 最新 commit
Parallel unit tests (Step 1, refactor existing unit tests) #3632
Add build type for pr oss upload #3627
XLA ci support #3564
Auto upload tar to aliyun oss #3592
Don't pack source code if it is not master #3593
move fmt to github hosted? #3559
refactor ci #3557
CtrlTest find available port for ctrl port instead of handwriting #3610
ONNX 支持
優(yōu)化 IR,更新測試腳本
onnx update #3495
增加、修訂文檔
Add api docs zzk #3505
Add api docs zzk #3533
Add api docs zzk #3514
fix masked_fill op doc #3560
Python 前端修復(fù)
Fix the bug of using op_module_builder in namespace scope #3513
Comment release global for now to avoid random crash in python #3629
update lib name in link flags #3623
rm spaces in rm_spaces optimizer.py #3619
優(yōu)化、修復(fù)系統(tǒng)通用組件
[enhancement] flat ErrorProto error_type #3474
[enhancement] Added user_op_conf getter for BatchAxisContext/KernelInitContext/SbpContext #3506
[bug] Fix UserOpConfWrapper::has_input/has_output #3507
support reflecting cfg message #3655
Refactor scope #3652
Refactor placement scope#3650
Bugfix split config proto and session job set #3637
[Bug fix] Release global variables #3624
Add OpRegistry::SetAreaId #3608
Dev converter #3580
Tensor::dptr support half #3582
Use InferOutBlobDescsIf instead of InferBlobDescsIf in InferOpNodeLogicalBlobDesc? #3535
Add ctrl_in_op_name only when unreachable #3537
當(dāng)然,還有更多新特性等著大家在 change log 中發(fā)現(xiàn),下個版本見!
掃描文末二維碼,加入討論群,及時獲取更多資訊!
????????
如無法入群,可求助 OneFlow 小助手
QQ:3119703778
VX:OneFlowXZS
