由于CUDA OOM,对XLA(HLO)内存分配过程的一点总结

发布时间:2023年12月20日

单卡(A800, 80GB)测试llama7B时出现CUDA OOM,从日志看,是分配preallocated temp allocation时,出现了OOM。从结果上看,XLA module需要的临时内存,需要一次性分配出来,这导致图还未真正执行即OOM:
在这里插入图片描述

分析

一个XLA module对应一个BufferAssignment对象,而一个BufferAssignment对应多个BufferAllocation。
BufferAssignment主要负责分配4类内存,它们相加得到总的内存:

  • parameter allocation computation的参数
  • maybe_live_out allocation 生命周期超过computation的,例如返回结果 参考 tensorflow/compiler/xla/service/buffer_assignment.h:333
  • preallocated temp allocation 中间算子用到的临时内存
  • constant allocation 常量
Buffer Assignment Stats for Conv2D.9
BufferAssignment stats:
             parameter allocation:    1.25MiB
              constant allocation:         0B
        maybe_live_out allocation:    9.00MiB
     preallocated temp allocation:   19.25MiB
  preallocated temp fragmentation:       112B (0.00%)
                 total allocation:   29.50MiB
              total fragmentation:    8.00MiB (27.12%)

对应到HLO图

可与下节HLO对应查看
在这里插入图片描述

HLO (after opt)

HloModule Conv2D.9, entry_computation_layout={(f32[512,2,2,128]{3,2,1,0},f32[1,1,128,512]{3,2,1,0})->f32[512,3,3,512]{3,2,1,0}}

ENTRY main.4 {
  Arg_0.1 = f32[512,2,2,128]{3,2,1,0} parameter(0)
  copy = f32[512,2,2,128]{2,1,0,3} copy(Arg_0.1)
  Arg_1.2 = f32[1,1,128,512]{3,2,1,0} parameter(1)
  bitcast = f32[1,1,128,512]{1,0,3,2} bitcast(Arg_1.2)
  cudnn-conv-bw-filter = (f32[512,3,3,512]{2,1,0,3}, u8[10747920]{0}) custom-call(copy, bitcast), window={size=3x3 stride=2x2 pad=1_1x1_1}, dim_labels=f01b_i01o->01bf, custom_call_target="__cudnn$convBackwardFilter", metadata={op_name="Conv2D"}, backend_config="{\"algorithm\":{\"algo_id\":\"22\",\"math_type\":\"DEFAULT_MATH\",\"tuning_knobs\":{\"2\":\"0\",\"5\":\"2\",\"14\":\"3\"},\"is_cudnn_frontend\":true,\"workspace_size\":\"10747920\"},\"conv_result_scale\":1,\"activation_mode\":\"0\",\"side_input_scale\":0}"
  get-tuple-element = f32[512,3,3,512]{2,1,0,3} get-tuple-element(cudnn-conv-bw-filter), index=0, metadata={op_name="Conv2D"}
  ROOT copy.2 = f32[512,3,3,512]{3,2,1,0} copy(get-tuple-element), metadata={op_name="Conv2D"}
}

示例脚本

import tensorflow as tf
import os

x = tf.ones(shape=(512,2,2,128), dtype=tf.float32)
kernel = tf.ones(shape=(1,1,128,512), dtype=tf.float32)

with tf.device("/$XLA_DEVICE_STR:0"):
  lhs = tf.nn.conv2d(
    input=x,
    filters=kernel,
    strides=[1,1],
    padding=[[0,0],[1,0],[1,0],[0,0]],
    dilations=[2,2],
  )

rhs = tf.nn.conv2d(
  input=x,
  filters=kernel,
  strides=[1,1],
  padding=[[0,0],[1,0],[1,0],[0,0]],
  dilations=[2,2],
)

tf.debugging.assert_near(lhs, rhs)

tensorflow日志

based on v2.11.1

TF_CPP_VMODULE=gpu_compiler=2,gpu_executable=2,buffer_assignment=3 python conv2d.py

root@4f07101da743:/data/jack/workspace# TF_CPP_VMODULE=gpu_compiler=2,gpu_executable=2,buffer_assignment=3 python conv2d.py
2023-12-20 09:11:16.272031: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-12-20 09:11:17.076645: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-12-20 09:11:18.633291: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-12-20 09:11:18.691364: I tensorflow/compiler/xla/service/service.cc:173] XLA service 0x1fa40c0 initialized for platform Poplar (this does not guarantee that XLA will be used). Devices:
2023-12-20 09:11:18.691438: I tensorflow/compiler/xla/service/service.cc:181]   StreamExecutor device (0): Poplar,
2023-12-20 09:11:25.048037: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1613] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 77507 MB memory:  -> device: 0, name: NVIDIA Graphics Device, pci bus id: 0000:10:00.0, compute capability: 8.0
2023-12-20 09:11:25.049987: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1613] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 77507 MB memory:  -> device: 1, name: NVIDIA Graphics Device, pci bus id: 0000:16:00.0, compute capability: 8.0
2023-12-20 09:11:25.051409: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1613] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 77507 MB memory:  -> device: 2, name: NVIDIA Graphics Device, pci bus id: 0000:49:00.0, compute capability: 8.0
2023-12-20 09:11:25.052813: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1613] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 77507 MB memory:  -> device: 3, name: NVIDIA Graphics Device, pci bus id: 0000:4d:00.0, compute capability: 8.0
2023-12-20 09:11:25.054282: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1613] Created device /job:localhost/replica:0/task:0/device:GPU:4 with 77507 MB memory:  -> device: 4, name: NVIDIA Graphics Device, pci bus id: 0000:8a:00.0, compute capability: 8.0
2023-12-20 09:11:25.055458: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1613] Created device /job:localhost/replica:0/task:0/device:GPU:5 with 77507 MB memory:  -> device: 5, name: NVIDIA Graphics Device, pci bus id: 0000:8f:00.0, compute capability: 8.0
2023-12-20 09:11:25.056505: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1613] Created device /job:localhost/replica:0/task:0/device:GPU:6 with 77507 MB memory:  -> device: 6, name: NVIDIA Graphics Device, pci bus id: 0000:c6:00.0, compute capability: 8.0
2023-12-20 09:11:25.057600: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1613] Created device /job:localhost/replica:0/task:0/device:GPU:7 with 76606 MB memory:  -> device: 7, name: NVIDIA Graphics Device, pci bus id: 0000:ca:00.0, compute capability: 8.0
2023-12-20 09:11:27.962269: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:428] Loaded cuDNN version 8100
2023-12-20 09:11:49.371683: I tensorflow/compiler/xla/service/gpu/gpu_compiler.cc:656] HLO Fusion Stats:
Number of fusion ops: 0
Number of kLoop fusions: 0

Number of kInput fusions: 0

2023-12-20 09:11:49.382673: I tensorflow/compiler/xla/service/gpu/gpu_compiler.cc:848] GpuCompiler::RunHloPasses time: 22.5 s (cumulative: 22.5 s, max: 22.5 s, #called: 1)
2023-12-20 09:11:49.382728: I tensorflow/compiler/xla/service/gpu/gpu_compiler.cc:1388] Starting to compile HLO module Conv2D.9
2023-12-20 09:11:49.383544: I tensorflow/compiler/xla/service/gpu/gpu_compiler.cc:1406] HLO memory read+written: 21.25MiB
2023-12-20 09:11:49.389876: I tensorflow/compiler/xla/service/buffer_assignment.cc:1693] Assigning buffers to module Conv2D.9
2023-12-20 09:11:49.390982: I tensorflow/compiler/xla/service/buffer_assignment.cc:1694] HloModule Conv2D.9, entry_computation_layout={(f32[512,2,2,128]{3,2,1,0},f32[1,1,128,512]{3,2,1,0})->f32[512,3,3,512]{3,2,1,0}}
2023-12-20 09:11:49.391008: I tensorflow/compiler/xla/service/buffer_assignment.cc:1694]
2023-12-20 09:11:49.391016: I tensorflow/compiler/xla/service/buffer_assignment.cc:1694] ENTRY %main.4 (Arg_0.1: f32[512,2,2,128], Arg_1.2: f32[1,1,128,512]) -> f32[512,3,3,512] {
2023-12-20 09:11:49.391048: I tensorflow/compiler/xla/service/buffer_assignment.cc:1694]   %Arg_0.1 = f32[512,2,2,128]{3,2,1,0} parameter(0)
2023-12-20 09:11:49.391075: I tensorflow/compiler/xla/service/buffer_assignment.cc:1694]   %copy = f32[512,2,2,128]{2,1,0,3} copy(f32[512,2,2,128]{3,2,1,0} %Arg_0.1)
2023-12-20 09:11:49.391108: I tensorflow/compiler/xla/service/buffer_assignment.cc:1694]   %Arg_1.2 = f32[1,1,128,512]{3,2,1,0} parameter(1)
2023-12-20 09:11:49.391116: I tensorflow/compiler/xla/service/buffer_assignment.cc:1694]   %bitcast = f32[1,1,128,512]{1,0,3,2} bitcast(f32[1,1,128,512]{3,2,1,0} %Arg_1.2)
2023-12-20 09:11:49.391133: I tensorflow/compiler/xla/service/buffer_assignment.cc:1694]   %cudnn-conv-bw-filter = (f32[512,3,3,512]{2,1,0,3}, u8[10747920]{0}) custom-call(f32[512,2,2,128]{2,1,0,3} %copy, f32[1,1,128,512]{1,0,3,2} %bitcast), window={size=3x3 stride=2x2 pad=1_1x1_1}, dim_labels=f01b_i01o->01bf, custom_call_target="__cudnn$convBackwardFilter", metadata={op_name="Conv2D"}, backend_config="{\"algorithm\":{\"algo_id\":\"22\",\"math_type\":\"DEFAULT_MATH\",\"tuning_knobs\":{\"5\":\"2\",\"14\":\"3\",\"2\":\"0\"},\"is_cudnn_frontend\":true,\"workspace_size\":\"10747920\"},\"conv_result_scale\":1,\"activation_mode\":\"0\",\"side_input_scale\":0}"
2023-12-20 09:11:49.391170: I tensorflow/compiler/xla/service/buffer_assignment.cc:1694]   %get-tuple-element = f32[512,3,3,512]{2,1,0,3} get-tuple-element((f32[512,3,3,512]{2,1,0,3}, u8[10747920]{0}) %cudnn-conv-bw-filter), index=0, metadata={op_name="Conv2D"}
2023-12-20 09:11:49.391209: I tensorflow/compiler/xla/service/buffer_assignment.cc:1694]   ROOT %copy.2 = f32[512,3,3,512]{3,2,1,0} copy(f32[512,3,3,512]{2,1,0,3} %get-tuple-element), metadata={op_name="Conv2D"}
2023-12-20 09:11:49.391217: I tensorflow/compiler/xla/service/buffer_assignment.cc:1694] }
2023-12-20 09:11:49.391223: I tensorflow/compiler/xla/service/buffer_assignment.cc:1694]
2023-12-20 09:11:49.391438: I tensorflow/compiler/xla/service/buffer_assignment.cc:1695] HloAliasAnalysis, module Conv2D.9
2023-12-20 09:11:49.391454: I tensorflow/compiler/xla/service/buffer_assignment.cc:1695]   Buffers at each position:
2023-12-20 09:11:49.391464: I tensorflow/compiler/xla/service/buffer_assignment.cc:1695]     Arg_0.1:
2023-12-20 09:11:49.391473: I tensorflow/compiler/xla/service/buffer_assignment.cc:1695]       HloBuffer 0, values: <0 Arg_0.1>
2023-12-20 09:11:49.391484: I tensorflow/compiler/xla/service/buffer_assignment.cc:1695]     Arg_1.2:
2023-12-20 09:11:49.391493: I tensorflow/compiler/xla/service/buffer_assignment.cc:1695]       HloBuffer 2, values: <2 Arg_1.2>
2023-12-20 09:11:49.391500: I tensorflow/compiler/xla/service/buffer_assignment.cc:1695]     copy:
2023-12-20 09:11:49.391506: I tensorflow/compiler/xla/service/buffer_assignment.cc:1695]       HloBuffer 1, values: <1 copy>
2023-12-20 09:11:49.391514: I tensorflow/compiler/xla/service/buffer_assignment.cc:1695]     copy.2:
2023-12-20 09:11:49.391528: I tensorflow/compiler/xla/service/buffer_assignment.cc:1695]       HloBuffer 6, values: <6 copy.2>
2023-12-20 09:11:49.391534: I tensorflow/compiler/xla/service/buffer_assignment.cc:1695]     bitcast:
2023-12-20 09:11:49.391542: I tensorflow/compiler/xla/service/buffer_assignment.cc:1695]       HloBuffer 2, values: <2 Arg_1.2>
2023-12-20 09:11:49.391555: I tensorflow/compiler/xla/service/buffer_assignment.cc:1695]     cudnn-conv-bw-filter:
2023-12-20 09:11:49.391563: I tensorflow/compiler/xla/service/buffer_assignment.cc:1695]       tuple index {}:
2023-12-20 09:11:49.391571: I tensorflow/compiler/xla/service/buffer_assignment.cc:1695]         HloBuffer 3, values: <3 cudnn-conv-bw-filter{}>
2023-12-20 09:11:49.391578: I tensorflow/compiler/xla/service/buffer_assignment.cc:1695]       tuple index {0}:
2023-12-20 09:11:49.391584: I tensorflow/compiler/xla/service/buffer_assignment.cc:1695]         HloBuffer 4, values: <4 cudnn-conv-bw-filter{0}>
2023-12-20 09:11:49.391591: I tensorflow/compiler/xla/service/buffer_assignment.cc:1695]       tuple index {1}:
2023-12-20 09:11:49.391602: I tensorflow/compiler/xla/service/buffer_assignment.cc:1695]         HloBuffer 5, values: <5 cudnn-conv-bw-filter{1}>
2023-12-20 09:11:49.391611: I tensorflow/compiler/xla/service/buffer_assignment.cc:1695]     get-tuple-element:
2023-12-20 09:11:49.391617: I tensorflow/compiler/xla/service/buffer_assignment.cc:1695]       HloBuffer 4, values: <4 cudnn-conv-bw-filter{0}>
2023-12-20 09:11:49.391626: I tensorflow/compiler/xla/service/buffer_assignment.cc:1695]   Buffers:
2023-12-20 09:11:49.391634: I tensorflow/compiler/xla/service/buffer_assignment.cc:1695]     HloBuffer 0, values: <0 Arg_0.1>
2023-12-20 09:11:49.391644: I tensorflow/compiler/xla/service/buffer_assignment.cc:1695]       positions:
2023-12-20 09:11:49.391662: I tensorflow/compiler/xla/service/buffer_assignment.cc:1695]         Arg_0.1
2023-12-20 09:11:49.391674: I tensorflow/compiler/xla/service/buffer_assignment.cc:1695]     HloBuffer 1, values: <1 copy>
2023-12-20 09:11:49.391681: I tensorflow/compiler/xla/service/buffer_assignment.cc:1695]       positions:
2023-12-20 09:11:49.391689: I tensorflow/compiler/xla/service/buffer_assignment.cc:1695]         copy
2023-12-20 09:11:49.391701: I tensorflow/compiler/xla/service/buffer_assignment.cc:1695]     HloBuffer 2, values: <2 Arg_1.2>
2023-12-20 09:11:49.391714: I tensorflow/compiler/xla/service/buffer_assignment.cc:1695]       positions:
2023-12-20 09:11:49.391727: I tensorflow/compiler/xla/service/buffer_assignment.cc:1695]         Arg_1.2
2023-12-20 09:11:49.391735: I tensorflow/compiler/xla/service/buffer_assignment.cc:1695]         bitcast
2023-12-20 09:11:49.391751: I tensorflow/compiler/xla/service/buffer_assignment.cc:1695]     HloBuffer 3, values: <3 cudnn-conv-bw-filter{}>
2023-12-20 09:11:49.391760: I tensorflow/compiler/xla/service/buffer_assignment.cc:1695]       positions:
2023-12-20 09:11:49.391769: I tensorflow/compiler/xla/service/buffer_assignment.cc:1695]         cudnn-conv-bw-filter {}
2023-12-20 09:11:49.391786: I tensorflow/compiler/xla/service/buffer_assignment.cc:1695]     HloBuffer 4, values: <4 cudnn-conv-bw-filter{0}>
2023-12-20 09:11:49.391798: I tensorflow/compiler/xla/service/buffer_assignment.cc:1695]       positions:
2023-12-20 09:11:49.391808: I tensorflow/compiler/xla/service/buffer_assignment.cc:1695]         cudnn-conv-bw-filter {0}
2023-12-20 09:11:49.391823: I tensorflow/compiler/xla/service/buffer_assignment.cc:1695]         get-tuple-element
2023-12-20 09:11:49.391835: I tensorflow/compiler/xla/service/buffer_assignment.cc:1695]     HloBuffer 5, values: <5 cudnn-conv-bw-filter{1}>
2023-12-20 09:11:49.391843: I tensorflow/compiler/xla/service/buffer_assignment.cc:1695]       positions:
2023-12-20 09:11:49.391853: I tensorflow/compiler/xla/service/buffer_assignment.cc:1695]         cudnn-conv-bw-filter {1}
2023-12-20 09:11:49.391872: I tensorflow/compiler/xla/service/buffer_assignment.cc:1695]     HloBuffer 6, values: <6 copy.2>
2023-12-20 09:11:49.391879: I tensorflow/compiler/xla/service/buffer_assignment.cc:1695]       positions:
2023-12-20 09:11:49.391891: I tensorflow/compiler/xla/service/buffer_assignment.cc:1695]         copy.2
2023-12-20 09:11:49.392151: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696] HloDataflowAnalysis, module Conv2D.9
2023-12-20 09:11:49.392165: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]   Instruction value sets:
2023-12-20 09:11:49.392172: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696] Instruction:
2023-12-20 09:11:49.392183: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]   Arg_0.1:
2023-12-20 09:11:49.392195: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]       <0 Arg_0.1> (def)
2023-12-20 09:11:49.392202: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696] Instruction:
2023-12-20 09:11:49.392208: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]   Arg_1.2:
2023-12-20 09:11:49.392215: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]       <2 Arg_1.2> (def)
2023-12-20 09:11:49.392224: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696] Instruction:
2023-12-20 09:11:49.392237: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]   copy:
2023-12-20 09:11:49.392246: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]       <1 copy> (def)
2023-12-20 09:11:49.392253: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696] Instruction:
2023-12-20 09:11:49.392262: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]   copy.2:
2023-12-20 09:11:49.392278: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]       <6 copy.2> (def)
2023-12-20 09:11:49.392288: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696] Instruction:
2023-12-20 09:11:49.392300: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]   bitcast:
2023-12-20 09:11:49.392311: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]       <2 Arg_1.2>
2023-12-20 09:11:49.392322: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696] Instruction:
2023-12-20 09:11:49.392329: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]   cudnn-conv-bw-filter:
2023-12-20 09:11:49.392335: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]       tuple index {}:
2023-12-20 09:11:49.392345: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]         <3 cudnn-conv-bw-filter{}> (def)
2023-12-20 09:11:49.392353: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]       tuple index {0}:
2023-12-20 09:11:49.392360: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]         <4 cudnn-conv-bw-filter{0}> (def)
2023-12-20 09:11:49.392374: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]       tuple index {1}:
2023-12-20 09:11:49.392383: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]         <5 cudnn-conv-bw-filter{1}> (def)
2023-12-20 09:11:49.392392: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696] Instruction:
2023-12-20 09:11:49.392403: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]   get-tuple-element:
2023-12-20 09:11:49.392415: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]       <4 cudnn-conv-bw-filter{0}>
2023-12-20 09:11:49.392423: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]   HloValues:
2023-12-20 09:11:49.392432: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]     <0 Arg_0.1>
2023-12-20 09:11:49.392442: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]      positions:
2023-12-20 09:11:49.392449: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]       Arg_0.1
2023-12-20 09:11:49.392455: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]      uses:
2023-12-20 09:11:49.392468: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]       copy, operand 0
2023-12-20 09:11:49.392477: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]      from instruction:%Arg_0.1 = f32[512,2,2,128]{3,2,1,0} parameter(0)
2023-12-20 09:11:49.392487: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]     <1 copy>
2023-12-20 09:11:49.392497: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]      positions:
2023-12-20 09:11:49.392510: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]       copy
2023-12-20 09:11:49.392518: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]      uses:
2023-12-20 09:11:49.392524: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]       cudnn-conv-bw-filter, operand 0
2023-12-20 09:11:49.392535: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]      from instruction:%copy = f32[512,2,2,128]{2,1,0,3} copy(f32[512,2,2,128]{3,2,1,0} %Arg_0.1)
2023-12-20 09:11:49.392542: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]     <2 Arg_1.2>
2023-12-20 09:11:49.392549: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]      positions:
2023-12-20 09:11:49.392556: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]       Arg_1.2
2023-12-20 09:11:49.392566: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]       bitcast
2023-12-20 09:11:49.392578: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]      uses:
2023-12-20 09:11:49.392593: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]       bitcast, operand 0
2023-12-20 09:11:49.392604: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]       cudnn-conv-bw-filter, operand 1
2023-12-20 09:11:49.392612: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]      from instruction:%Arg_1.2 = f32[1,1,128,512]{3,2,1,0} parameter(1)
2023-12-20 09:11:49.392624: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]     <3 cudnn-conv-bw-filter{}>
2023-12-20 09:11:49.392636: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]      positions:
2023-12-20 09:11:49.392649: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]       cudnn-conv-bw-filter {}
2023-12-20 09:11:49.392657: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]      uses:
2023-12-20 09:11:49.392668: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]       get-tuple-element, operand 0 {}
2023-12-20 09:11:49.392678: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]      from instruction:%cudnn-conv-bw-filter = (f32[512,3,3,512]{2,1,0,3}, u8[10747920]{0}) custom-call(f32[512,2,2,128]{2,1,0,3} %copy, f32[1,1,128,512]{1,0,3,2} %bitcast), window={size=3x3 stride=2x2 pad=1_1x1_1}, dim_labels=f01b_i01o->01bf, custom_call_target="__cudnn$convBackwardFilter", metadata={op_name="Conv2D"}, backend_config="{\"algorithm\":{\"algo_id\":\"22\",\"math_type\":\"DEFAULT_MATH\",\"tuning_knobs\":{\"5\":\"2\",\"14\":\"3\",\"2\":\"0\"},\"is_cudnn_frontend\":true,\"workspace_size\":\"10747920\"},\"conv_result_scale\":1,\"activation_mode\":\"0\",\"side_input_scale\":0}"
2023-12-20 09:11:49.392694: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]     <4 cudnn-conv-bw-filter{0}>
2023-12-20 09:11:49.392702: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]      positions:
2023-12-20 09:11:49.392716: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]       cudnn-conv-bw-filter {0}
2023-12-20 09:11:49.392727: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]       get-tuple-element
2023-12-20 09:11:49.392739: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]      uses:
2023-12-20 09:11:49.392749: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]       copy.2, operand 0
2023-12-20 09:11:49.392756: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]      from instruction:%cudnn-conv-bw-filter = (f32[512,3,3,512]{2,1,0,3}, u8[10747920]{0}) custom-call(f32[512,2,2,128]{2,1,0,3} %copy, f32[1,1,128,512]{1,0,3,2} %bitcast), window={size=3x3 stride=2x2 pad=1_1x1_1}, dim_labels=f01b_i01o->01bf, custom_call_target="__cudnn$convBackwardFilter", metadata={op_name="Conv2D"}, backend_config="{\"algorithm\":{\"algo_id\":\"22\",\"math_type\":\"DEFAULT_MATH\",\"tuning_knobs\":{\"5\":\"2\",\"14\":\"3\",\"2\":\"0\"},\"is_cudnn_frontend\":true,\"workspace_size\":\"10747920\"},\"conv_result_scale\":1,\"activation_mode\":\"0\",\"side_input_scale\":0}"
2023-12-20 09:11:49.392764: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]     <5 cudnn-conv-bw-filter{1}>
2023-12-20 09:11:49.392771: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]      positions:
2023-12-20 09:11:49.392783: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]       cudnn-conv-bw-filter {1}
2023-12-20 09:11:49.392791: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]      uses:
2023-12-20 09:11:49.392808: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]      from instruction:%cudnn-conv-bw-filter = (f32[512,3,3,512]{2,1,0,3}, u8[10747920]{0}) custom-call(f32[512,2,2,128]{2,1,0,3} %copy, f32[1,1,128,512]{1,0,3,2} %bitcast), window={size=3x3 stride=2x2 pad=1_1x1_1}, dim_labels=f01b_i01o->01bf, custom_call_target="__cudnn$convBackwardFilter", metadata={op_name="Conv2D"}, backend_config="{\"algorithm\":{\"algo_id\":\"22\",\"math_type\":\"DEFAULT_MATH\",\"tuning_knobs\":{\"5\":\"2\",\"14\":\"3\",\"2\":\"0\"},\"is_cudnn_frontend\":true,\"workspace_size\":\"10747920\"},\"conv_result_scale\":1,\"activation_mode\":\"0\",\"side_input_scale\":0}"
2023-12-20 09:11:49.392819: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]     <6 copy.2>
2023-12-20 09:11:49.392831: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]      positions:
2023-12-20 09:11:49.392838: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]       copy.2
2023-12-20 09:11:49.392848: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]      uses:
2023-12-20 09:11:49.392870: I tensorflow/compiler/xla/service/buffer_assignment.cc:1696]      from instruction:%copy.2 = f32[512,3,3,512]{3,2,1,0} copy(f32[512,3,3,512]{2,1,0,3} %get-tuple-element), metadata={op_name="Conv2D"}
2023-12-20 09:11:49.392887: I tensorflow/compiler/xla/service/buffer_assignment.cc:1697] Number of buffers to assign: 7
2023-12-20 09:11:49.392927: I tensorflow/compiler/xla/service/buffer_assignment.cc:1709] After coloring:
2023-12-20 09:11:49.393153: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710] HloDataflowAnalysis, module Conv2D.9
2023-12-20 09:11:49.393161: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]   Instruction value sets:
2023-12-20 09:11:49.393165: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710] Instruction:
2023-12-20 09:11:49.393169: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]   Arg_0.1:
2023-12-20 09:11:49.393173: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]       <0 Arg_0.1 @0> (def)
2023-12-20 09:11:49.393177: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710] Instruction:
2023-12-20 09:11:49.393180: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]   Arg_1.2:
2023-12-20 09:11:49.393185: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]       <2 Arg_1.2 @0> (def)
2023-12-20 09:11:49.393189: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710] Instruction:
2023-12-20 09:11:49.393193: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]   copy:
2023-12-20 09:11:49.393196: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]       <1 copy @0> (def)
2023-12-20 09:11:49.393200: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710] Instruction:
2023-12-20 09:11:49.393204: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]   copy.2:
2023-12-20 09:11:49.393208: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]       <6 copy.2 @0> (def)
2023-12-20 09:11:49.393212: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710] Instruction:
2023-12-20 09:11:49.393216: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]   bitcast:
2023-12-20 09:11:49.393220: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]       <2 Arg_1.2 @0>
2023-12-20 09:11:49.393224: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710] Instruction:
2023-12-20 09:11:49.393228: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]   cudnn-conv-bw-filter:
2023-12-20 09:11:49.393232: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]       tuple index {}:
2023-12-20 09:11:49.393236: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]         <3 cudnn-conv-bw-filter{} @0> (def)
2023-12-20 09:11:49.393240: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]       tuple index {0}:
2023-12-20 09:11:49.393244: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]         <4 cudnn-conv-bw-filter{0} @0> (def)
2023-12-20 09:11:49.393249: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]       tuple index {1}:
2023-12-20 09:11:49.393252: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]         <5 cudnn-conv-bw-filter{1} @0> (def)
2023-12-20 09:11:49.393256: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710] Instruction:
2023-12-20 09:11:49.393260: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]   get-tuple-element:
2023-12-20 09:11:49.393264: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]       <4 cudnn-conv-bw-filter{0} @0>
2023-12-20 09:11:49.393269: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]   HloValues:
2023-12-20 09:11:49.393273: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]     <0 Arg_0.1 @0>
2023-12-20 09:11:49.393277: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]      positions:
2023-12-20 09:11:49.393281: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]       Arg_0.1
2023-12-20 09:11:49.393285: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]      uses:
2023-12-20 09:11:49.393289: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]       copy, operand 0
2023-12-20 09:11:49.393293: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]      from instruction:%Arg_0.1 = f32[512,2,2,128]{3,2,1,0} parameter(0)
2023-12-20 09:11:49.393297: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]     <1 copy @0>
2023-12-20 09:11:49.393301: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]      positions:
2023-12-20 09:11:49.393305: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]       copy
2023-12-20 09:11:49.393309: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]      uses:
2023-12-20 09:11:49.393313: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]       cudnn-conv-bw-filter, operand 0
2023-12-20 09:11:49.393317: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]      from instruction:%copy = f32[512,2,2,128]{2,1,0,3} copy(f32[512,2,2,128]{3,2,1,0} %Arg_0.1)
2023-12-20 09:11:49.393322: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]     <2 Arg_1.2 @0>
2023-12-20 09:11:49.393326: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]      positions:
2023-12-20 09:11:49.393330: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]       Arg_1.2
2023-12-20 09:11:49.393333: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]       bitcast
2023-12-20 09:11:49.393337: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]      uses:
2023-12-20 09:11:49.393341: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]       bitcast, operand 0
2023-12-20 09:11:49.393345: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]       cudnn-conv-bw-filter, operand 1
2023-12-20 09:11:49.393349: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]      from instruction:%Arg_1.2 = f32[1,1,128,512]{3,2,1,0} parameter(1)
2023-12-20 09:11:49.393354: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]     <3 cudnn-conv-bw-filter{} @0>
2023-12-20 09:11:49.393358: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]      positions:
2023-12-20 09:11:49.393362: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]       cudnn-conv-bw-filter {}
2023-12-20 09:11:49.393366: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]      uses:
2023-12-20 09:11:49.393370: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]       get-tuple-element, operand 0 {}
2023-12-20 09:11:49.393393: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]      from instruction:%cudnn-conv-bw-filter = (f32[512,3,3,512]{2,1,0,3}, u8[10747920]{0}) custom-call(f32[512,2,2,128]{2,1,0,3} %copy, f32[1,1,128,512]{1,0,3,2} %bitcast), window={size=3x3 stride=2x2 pad=1_1x1_1}, dim_labels=f01b_i01o->01bf, custom_call_target="__cudnn$convBackwardFilter", metadata={op_name="Conv2D"}, backend_config="{\"algorithm\":{\"algo_id\":\"22\",\"math_type\":\"DEFAULT_MATH\",\"tuning_knobs\":{\"5\":\"2\",\"14\":\"3\",\"2\":\"0\"},\"is_cudnn_frontend\":true,\"workspace_size\":\"10747920\"},\"conv_result_scale\":1,\"activation_mode\":\"0\",\"side_input_scale\":0}"
2023-12-20 09:11:49.393399: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]     <4 cudnn-conv-bw-filter{0} @0>
2023-12-20 09:11:49.393403: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]      positions:
2023-12-20 09:11:49.393407: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]       cudnn-conv-bw-filter {0}
2023-12-20 09:11:49.393413: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]       get-tuple-element
2023-12-20 09:11:49.393417: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]      uses:
2023-12-20 09:11:49.393422: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]       copy.2, operand 0
2023-12-20 09:11:49.393427: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]      from instruction:%cudnn-conv-bw-filter = (f32[512,3,3,512]{2,1,0,3}, u8[10747920]{0}) custom-call(f32[512,2,2,128]{2,1,0,3} %copy, f32[1,1,128,512]{1,0,3,2} %bitcast), window={size=3x3 stride=2x2 pad=1_1x1_1}, dim_labels=f01b_i01o->01bf, custom_call_target="__cudnn$convBackwardFilter", metadata={op_name="Conv2D"}, backend_config="{\"algorithm\":{\"algo_id\":\"22\",\"math_type\":\"DEFAULT_MATH\",\"tuning_knobs\":{\"5\":\"2\",\"14\":\"3\",\"2\":\"0\"},\"is_cudnn_frontend\":true,\"workspace_size\":\"10747920\"},\"conv_result_scale\":1,\"activation_mode\":\"0\",\"side_input_scale\":0}"
2023-12-20 09:11:49.393434: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]     <5 cudnn-conv-bw-filter{1} @0>
2023-12-20 09:11:49.393439: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]      positions:
2023-12-20 09:11:49.393444: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]       cudnn-conv-bw-filter {1}
2023-12-20 09:11:49.393448: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]      uses:
2023-12-20 09:11:49.393452: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]      from instruction:%cudnn-conv-bw-filter = (f32[512,3,3,512]{2,1,0,3}, u8[10747920]{0}) custom-call(f32[512,2,2,128]{2,1,0,3} %copy, f32[1,1,128,512]{1,0,3,2} %bitcast), window={size=3x3 stride=2x2 pad=1_1x1_1}, dim_labels=f01b_i01o->01bf, custom_call_target="__cudnn$convBackwardFilter", metadata={op_name="Conv2D"}, backend_config="{\"algorithm\":{\"algo_id\":\"22\",\"math_type\":\"DEFAULT_MATH\",\"tuning_knobs\":{\"5\":\"2\",\"14\":\"3\",\"2\":\"0\"},\"is_cudnn_frontend\":true,\"workspace_size\":\"10747920\"},\"conv_result_scale\":1,\"activation_mode\":\"0\",\"side_input_scale\":0}"
2023-12-20 09:11:49.393458: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]     <6 copy.2 @0>
2023-12-20 09:11:49.393468: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]      positions:
2023-12-20 09:11:49.393481: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]       copy.2
2023-12-20 09:11:49.393493: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]      uses:
2023-12-20 09:11:49.393501: I tensorflow/compiler/xla/service/buffer_assignment.cc:1710]      from instruction:%copy.2 = f32[512,3,3,512]{3,2,1,0} copy(f32[512,3,3,512]{2,1,0,3} %get-tuple-element), metadata={op_name="Conv2D"}
2023-12-20 09:11:49.394199: I tensorflow/compiler/xla/service/buffer_assignment.cc:1350] =================================================
2023-12-20 09:11:49.394235: I tensorflow/compiler/xla/service/buffer_assignment.cc:1351] Assigning buffer for HloBuffer 5, values: <5 cudnn-conv-bw-filter{1} @0>
2023-12-20 09:11:49.394289: I tensorflow/compiler/xla/service/buffer_assignment.cc:1232] Delaying assignment of temp buffer: <5 cudnn-conv-bw-filter{1} @0>
2023-12-20 09:11:49.394301: I tensorflow/compiler/xla/service/buffer_assignment.cc:1350] =================================================
2023-12-20 09:11:49.394312: I tensorflow/compiler/xla/service/buffer_assignment.cc:1351] Assigning buffer for HloBuffer 6, values: <6 copy.2 @0>
2023-12-20 09:11:49.394437: I tensorflow/compiler/xla/service/buffer_assignment.cc:526] HloBuffer lives out: HloBuffer 6, values: <6 copy.2 @0>
2023-12-20 09:11:49.394486: I tensorflow/compiler/xla/service/buffer_assignment.cc:527] Set maybe live out: allocation 0: size 9437184, output shape is |f32[512,3,3,512]|, preallocated-temp:
 value: <6 copy.2 @0> (size=9437184,offset=0): f32[512,3,3,512]{3,2,1,0}

2023-12-20 09:11:49.394514: I tensorflow/compiler/xla/service/buffer_assignment.cc:1242] New allocation #0 for: HloBuffer 6, values: <6 copy.2 @0>
2023-12-20 09:11:49.394528: I tensorflow/compiler/xla/service/buffer_assignment.cc:1350] =================================================
2023-12-20 09:11:49.394541: I tensorflow/compiler/xla/service/buffer_assignment.cc:1351] Assigning buffer for HloBuffer 4, values: <4 cudnn-conv-bw-filter{0} @0>
2023-12-20 09:11:49.394615: I tensorflow/compiler/xla/service/buffer_assignment.cc:1232] Delaying assignment of temp buffer: <4 cudnn-conv-bw-filter{0} @0>
2023-12-20 09:11:49.394628: I tensorflow/compiler/xla/service/buffer_assignment.cc:1350] =================================================
2023-12-20 09:11:49.394639: I tensorflow/compiler/xla/service/buffer_assignment.cc:1351] Assigning buffer for HloBuffer 0, values: <0 Arg_0.1 @0>
2023-12-20 09:11:49.394737: I tensorflow/compiler/xla/service/buffer_assignment.cc:1154] New allocation #1 marked as entry computation parameter: HloBuffer 0, values: <0 Arg_0.1 @0>
2023-12-20 09:11:49.394751: I tensorflow/compiler/xla/service/buffer_assignment.cc:1350] =================================================
2023-12-20 09:11:49.394762: I tensorflow/compiler/xla/service/buffer_assignment.cc:1351] Assigning buffer for HloBuffer 1, values: <1 copy @0>
2023-12-20 09:11:49.394888: I tensorflow/compiler/xla/service/buffer_assignment.cc:1203] Reusing allocation #0 for: HloBuffer 1, values: <1 copy @0>
2023-12-20 09:11:49.394903: I tensorflow/compiler/xla/service/buffer_assignment.cc:1350] =================================================
2023-12-20 09:11:49.394913: I tensorflow/compiler/xla/service/buffer_assignment.cc:1351] Assigning buffer for HloBuffer 2, values: <2 Arg_1.2 @0>
2023-12-20 09:11:49.395007: I tensorflow/compiler/xla/service/buffer_assignment.cc:1154] New allocation #2 marked as entry computation parameter: HloBuffer 2, values: <2 Arg_1.2 @0>
2023-12-20 09:11:49.395022: I tensorflow/compiler/xla/service/buffer_assignment.cc:1350] =================================================
2023-12-20 09:11:49.395034: I tensorflow/compiler/xla/service/buffer_assignment.cc:1351] Assigning buffer for HloBuffer 3, values: <3 cudnn-conv-bw-filter{} @0>
2023-12-20 09:11:49.395091: I tensorflow/compiler/xla/service/buffer_assignment.cc:1174] New allocation #3 for tuple-shaped buffer: HloBuffer 3, values: <3 cudnn-conv-bw-filter{} @0>
2023-12-20 09:11:49.395127: I tensorflow/compiler/xla/service/buffer_assignment.cc:1732] Running whole module heap simulation: 1
2023-12-20 09:11:49.395138: I tensorflow/compiler/xla/service/buffer_assignment.cc:1736] Multiheap per heap size limit: -1
2023-12-20 09:11:49.395148: I tensorflow/compiler/xla/service/buffer_assignment.cc:1451] Running whole-module heap simulation
2023-12-20 09:11:49.395230: I tensorflow/compiler/xla/service/buffer_assignment.cc:1467] Simulating heap for color 0
2023-12-20 09:11:49.396053: I tensorflow/compiler/xla/service/buffer_assignment.cc:1648] Result size from heap simulator: 20185216
2023-12-20 09:11:49.396218: I tensorflow/compiler/xla/service/buffer_assignment.cc:1529] Compute peak memory logical buffers
2023-12-20 09:11:49.396361: I tensorflow/compiler/xla/service/buffer_assignment.cc:1664] allocation 4: size 20185216, preallocated-temp:
2023-12-20 09:11:49.396378: I tensorflow/compiler/xla/service/buffer_assignment.cc:1664]  value: <4 cudnn-conv-bw-filter{0} @0> (size=9437184,offset=10748032): f32[512,3,3,512]{2,1,0,3}
2023-12-20 09:11:49.396386: I tensorflow/compiler/xla/service/buffer_assignment.cc:1664]  value: <5 cudnn-conv-bw-filter{1} @0> (size=10747920,offset=0): u8[10747920]{0}
2023-12-20 09:11:49.396451: I tensorflow/compiler/xla/service/buffer_assignment.cc:1763] maybe_live_out LogicalBuffer: HloBuffer 6, values: <6 copy.2 @0>
2023-12-20 09:11:49.396500: I tensorflow/compiler/xla/service/buffer_assignment.cc:1768] maybe_live_out BufferAllocation: allocation 0: size 9437184, output shape is |f32[512,3,3,512]|, maybe-live-out:
 value: <1 copy @0> (size=1048576,offset=0): f32[512,2,2,128]{2,1,0,3}
 value: <6 copy.2 @0> (size=9437184,offset=0): f32[512,3,3,512]{3,2,1,0}

2023-12-20 09:11:49.396524: I tensorflow/compiler/xla/service/buffer_assignment.cc:549] CombineTempAllocations()
2023-12-20 09:11:49.396565: I tensorflow/compiler/xla/service/buffer_assignment.cc:574] Combined temp allocation for color 0 is: allocation 3: size 16, preallocated-temp:
 value: <3 cudnn-conv-bw-filter{} @0> (size=16,offset=0): (f32[512,3,3,512]{2,1,0,3}, u8[10747920]{0})

2023-12-20 09:11:49.396651: I tensorflow/compiler/xla/service/buffer_assignment.cc:592] Combined allocation absorbing temp allocation: allocation 4: size 20185216, preallocated-temp:
 value: <4 cudnn-conv-bw-filter{0} @0> (size=9437184,offset=10748032): f32[512,3,3,512]{2,1,0,3}
 value: <5 cudnn-conv-bw-filter{1} @0> (size=10747920,offset=0): u8[10747920]{0}

2023-12-20 09:11:49.397043: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779] BufferAssignment:
2023-12-20 09:11:49.397064: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779] allocation 0: size 9437184, output shape is |f32[512,3,3,512]|, maybe-live-out:
2023-12-20 09:11:49.397072: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779]  value: <1 copy @0> (size=1048576,offset=0): f32[512,2,2,128]{2,1,0,3}
2023-12-20 09:11:49.397083: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779]  value: <6 copy.2 @0> (size=9437184,offset=0): f32[512,3,3,512]{3,2,1,0}
2023-12-20 09:11:49.397099: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779] allocation 1: size 1048576, parameter 0, shape |f32[512,2,2,128]| at ShapeIndex {}:
2023-12-20 09:11:49.397110: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779]  value: <0 Arg_0.1 @0> (size=1048576,offset=0): f32[512,2,2,128]{3,2,1,0}
2023-12-20 09:11:49.397117: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779] allocation 2: size 262144, parameter 1, shape |f32[1,1,128,512]| at ShapeIndex {}:
2023-12-20 09:11:49.397128: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779]  value: <2 Arg_1.2 @0> (size=262144,offset=0): f32[1,1,128,512]{3,2,1,0}
2023-12-20 09:11:49.397140: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779] allocation 3: size 20185344, preallocated-temp:
2023-12-20 09:11:49.397150: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779]  value: <3 cudnn-conv-bw-filter{} @0> (size=16,offset=0): (f32[512,3,3,512]{2,1,0,3}, u8[10747920]{0})
2023-12-20 09:11:49.397163: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779]  value: <4 cudnn-conv-bw-filter{0} @0> (size=9437184,offset=10748160): f32[512,3,3,512]{2,1,0,3}
2023-12-20 09:11:49.397175: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779]  value: <5 cudnn-conv-bw-filter{1} @0> (size=10747920,offset=128): u8[10747920]{0}
2023-12-20 09:11:49.397187: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779]
2023-12-20 09:11:49.397194: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779] Total bytes used: 30933248 (29.50MiB)
2023-12-20 09:11:49.397204: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779]
2023-12-20 09:11:49.397221: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779] Used values:
2023-12-20 09:11:49.397231: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779] <0 Arg_0.1 @0>
2023-12-20 09:11:49.397238: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779]  positions:
2023-12-20 09:11:49.397249: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779]   Arg_0.1
2023-12-20 09:11:49.397258: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779]  uses:
2023-12-20 09:11:49.397269: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779]   copy, operand 0
2023-12-20 09:11:49.397276: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779]  from instruction:%Arg_0.1 = f32[512,2,2,128]{3,2,1,0} parameter(0)
2023-12-20 09:11:49.397285: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779] <1 copy @0>
2023-12-20 09:11:49.397296: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779]  positions:
2023-12-20 09:11:49.397304: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779]   copy
2023-12-20 09:11:49.397312: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779]  uses:
2023-12-20 09:11:49.397323: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779]   cudnn-conv-bw-filter, operand 0
2023-12-20 09:11:49.397333: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779]  from instruction:%copy = f32[512,2,2,128]{2,1,0,3} copy(f32[512,2,2,128]{3,2,1,0} %Arg_0.1)
2023-12-20 09:11:49.397344: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779] <2 Arg_1.2 @0>
2023-12-20 09:11:49.397356: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779]  positions:
2023-12-20 09:11:49.397368: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779]   Arg_1.2
2023-12-20 09:11:49.397379: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779]   bitcast
2023-12-20 09:11:49.397391: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779]  uses:
2023-12-20 09:11:49.397399: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779]   bitcast, operand 0
2023-12-20 09:11:49.397411: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779]   cudnn-conv-bw-filter, operand 1
2023-12-20 09:11:49.397422: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779]  from instruction:%Arg_1.2 = f32[1,1,128,512]{3,2,1,0} parameter(1)
2023-12-20 09:11:49.397433: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779] <3 cudnn-conv-bw-filter{} @0>
2023-12-20 09:11:49.397444: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779]  positions:
2023-12-20 09:11:49.397456: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779]   cudnn-conv-bw-filter {}
2023-12-20 09:11:49.397469: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779]  uses:
2023-12-20 09:11:49.397476: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779]   get-tuple-element, operand 0 {}
2023-12-20 09:11:49.397486: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779]  from instruction:%cudnn-conv-bw-filter = (f32[512,3,3,512]{2,1,0,3}, u8[10747920]{0}) custom-call(f32[512,2,2,128]{2,1,0,3} %copy, f32[1,1,128,512]{1,0,3,2} %bitcast), window={size=3x3 stride=2x2 pad=1_1x1_1}, dim_labels=f01b_i01o->01bf, custom_call_target="__cudnn$convBackwardFilter", metadata={op_name="Conv2D"}, backend_config="{\"algorithm\":{\"algo_id\":\"22\",\"math_type\":\"DEFAULT_MATH\",\"tuning_knobs\":{\"5\":\"2\",\"14\":\"3\",\"2\":\"0\"},\"is_cudnn_frontend\":true,\"workspace_size\":\"10747920\"},\"conv_result_scale\":1,\"activation_mode\":\"0\",\"side_input_scale\":0}"
2023-12-20 09:11:49.397500: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779] <4 cudnn-conv-bw-filter{0} @0>
2023-12-20 09:11:49.397513: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779]  positions:
2023-12-20 09:11:49.397524: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779]   cudnn-conv-bw-filter {0}
2023-12-20 09:11:49.397536: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779]   get-tuple-element
2023-12-20 09:11:49.397548: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779]  uses:
2023-12-20 09:11:49.397557: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779]   copy.2, operand 0
2023-12-20 09:11:49.397564: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779]  from instruction:%cudnn-conv-bw-filter = (f32[512,3,3,512]{2,1,0,3}, u8[10747920]{0}) custom-call(f32[512,2,2,128]{2,1,0,3} %copy, f32[1,1,128,512]{1,0,3,2} %bitcast), window={size=3x3 stride=2x2 pad=1_1x1_1}, dim_labels=f01b_i01o->01bf, custom_call_target="__cudnn$convBackwardFilter", metadata={op_name="Conv2D"}, backend_config="{\"algorithm\":{\"algo_id\":\"22\",\"math_type\":\"DEFAULT_MATH\",\"tuning_knobs\":{\"5\":\"2\",\"14\":\"3\",\"2\":\"0\"},\"is_cudnn_frontend\":true,\"workspace_size\":\"10747920\"},\"conv_result_scale\":1,\"activation_mode\":\"0\",\"side_input_scale\":0}"
2023-12-20 09:11:49.397576: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779] <5 cudnn-conv-bw-filter{1} @0>
2023-12-20 09:11:49.397587: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779]  positions:
2023-12-20 09:11:49.397597: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779]   cudnn-conv-bw-filter {1}
2023-12-20 09:11:49.397609: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779]  uses:
2023-12-20 09:11:49.397618: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779]  from instruction:%cudnn-conv-bw-filter = (f32[512,3,3,512]{2,1,0,3}, u8[10747920]{0}) custom-call(f32[512,2,2,128]{2,1,0,3} %copy, f32[1,1,128,512]{1,0,3,2} %bitcast), window={size=3x3 stride=2x2 pad=1_1x1_1}, dim_labels=f01b_i01o->01bf, custom_call_target="__cudnn$convBackwardFilter", metadata={op_name="Conv2D"}, backend_config="{\"algorithm\":{\"algo_id\":\"22\",\"math_type\":\"DEFAULT_MATH\",\"tuning_knobs\":{\"5\":\"2\",\"14\":\"3\",\"2\":\"0\"},\"is_cudnn_frontend\":true,\"workspace_size\":\"10747920\"},\"conv_result_scale\":1,\"activation_mode\":\"0\",\"side_input_scale\":0}"
2023-12-20 09:11:49.397630: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779] <6 copy.2 @0>
2023-12-20 09:11:49.397640: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779]  positions:
2023-12-20 09:11:49.397652: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779]   copy.2
2023-12-20 09:11:49.397662: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779]  uses:
2023-12-20 09:11:49.397674: I tensorflow/compiler/xla/service/buffer_assignment.cc:1779]  from instruction:%copy.2 = f32[512,3,3,512]{3,2,1,0} copy(f32[512,3,3,512]{2,1,0,3} %get-tuple-element), metadata={op_name="Conv2D"}
2023-12-20 09:11:49.399450: I tensorflow/compiler/xla/service/buffer_assignment.cc:1781] BufferAssignment stats:
2023-12-20 09:11:49.399473: I tensorflow/compiler/xla/service/buffer_assignment.cc:1781]              parameter allocation:    1.25MiB
2023-12-20 09:11:49.399481: I tensorflow/compiler/xla/service/buffer_assignment.cc:1781]               constant allocation:         0B
2023-12-20 09:11:49.399491: I tensorflow/compiler/xla/service/buffer_assignment.cc:1781]         maybe_live_out allocation:    9.00MiB
2023-12-20 09:11:49.399504: I tensorflow/compiler/xla/service/buffer_assignment.cc:1781]      preallocated temp allocation:   19.25MiB
2023-12-20 09:11:49.399514: I tensorflow/compiler/xla/service/buffer_assignment.cc:1781]   preallocated temp fragmentation:       112B (0.00%)
2023-12-20 09:11:49.399527: I tensorflow/compiler/xla/service/buffer_assignment.cc:1781]                  total allocation:   29.50MiB
2023-12-20 09:11:49.399539: I tensorflow/compiler/xla/service/buffer_assignment.cc:1781]               total fragmentation:    8.00MiB (27.12%)
2023-12-20 09:11:49.399553: I tensorflow/compiler/xla/service/buffer_assignment.cc:1782] Buffer assignment done.
2023-12-20 09:11:49.399597: I tensorflow/compiler/xla/service/gpu/gpu_compiler.cc:1049] Buffer Assignment Stats for Conv2D.9
BufferAssignment stats:
             parameter allocation:    1.25MiB
              constant allocation:         0B
        maybe_live_out allocation:    9.00MiB
     preallocated temp allocation:   19.25MiB
  preallocated temp fragmentation:       112B (0.00%)
                 total allocation:   29.50MiB
              total fragmentation:    8.00MiB (27.12%)

2023-12-20 09:11:49.407472: I tensorflow/compiler/xla/service/buffer_assignment.cc:413] Trying to find unique slice for copy.2 [{}]
2023-12-20 09:11:49.407521: I tensorflow/compiler/xla/service/buffer_assignment.cc:418] Examining value <6 copy.2 @0>
2023-12-20 09:11:49.407542: I tensorflow/compiler/xla/service/buffer_assignment.cc:420] Has allocation
2023-12-20 09:11:49.408162: I tensorflow/compiler/xla/service/buffer_assignment.cc:413] Trying to find unique slice for bitcast [{}]
2023-12-20 09:11:49.408205: I tensorflow/compiler/xla/service/buffer_assignment.cc:418] Examining value <2 Arg_1.2 @0>
2023-12-20 09:11:49.408228: I tensorflow/compiler/xla/service/buffer_assignment.cc:420] Has allocation
2023-12-20 09:11:49.408249: I tensorflow/compiler/xla/service/buffer_assignment.cc:413] Trying to find unique slice for Arg_1.2 [{}]
2023-12-20 09:11:49.408266: I tensorflow/compiler/xla/service/buffer_assignment.cc:418] Examining value <2 Arg_1.2 @0>
2023-12-20 09:11:49.408296: I tensorflow/compiler/xla/service/buffer_assignment.cc:420] Has allocation
2023-12-20 09:11:49.408474: I tensorflow/compiler/xla/service/buffer_assignment.cc:413] Trying to find unique slice for Arg_0.1 [{}]
2023-12-20 09:11:49.408503: I tensorflow/compiler/xla/service/buffer_assignment.cc:418] Examining value <0 Arg_0.1 @0>
2023-12-20 09:11:49.408521: I tensorflow/compiler/xla/service/buffer_assignment.cc:420] Has allocation
2023-12-20 09:11:49.408924: I tensorflow/compiler/xla/service/buffer_assignment.cc:413] Trying to find unique slice for copy [{}]
2023-12-20 09:11:49.408966: I tensorflow/compiler/xla/service/buffer_assignment.cc:418] Examining value <1 copy @0>
2023-12-20 09:11:49.408983: I tensorflow/compiler/xla/service/buffer_assignment.cc:420] Has allocation
2023-12-20 09:11:49.410060: I tensorflow/compiler/xla/service/buffer_assignment.cc:413] Trying to find unique slice for bitcast [{}]
2023-12-20 09:11:49.410110: I tensorflow/compiler/xla/service/buffer_assignment.cc:418] Examining value <2 Arg_1.2 @0>
2023-12-20 09:11:49.410132: I tensorflow/compiler/xla/service/buffer_assignment.cc:420] Has allocation
2023-12-20 09:11:49.410497: I tensorflow/compiler/xla/service/buffer_assignment.cc:413] Trying to find unique slice for cudnn-conv-bw-filter [{0}]
2023-12-20 09:11:49.410542: I tensorflow/compiler/xla/service/buffer_assignment.cc:418] Examining value <4 cudnn-conv-bw-filter{0} @0>
2023-12-20 09:11:49.410559: I tensorflow/compiler/xla/service/buffer_assignment.cc:420] Has allocation
2023-12-20 09:11:49.410890: I tensorflow/compiler/xla/service/buffer_assignment.cc:413] Trying to find unique slice for cudnn-conv-bw-filter [{1}]
2023-12-20 09:11:49.410933: I tensorflow/compiler/xla/service/buffer_assignment.cc:418] Examining value <5 cudnn-conv-bw-filter{1} @0>
2023-12-20 09:11:49.410950: I tensorflow/compiler/xla/service/buffer_assignment.cc:420] Has allocation
2023-12-20 09:11:49.411450: I tensorflow/compiler/xla/service/buffer_assignment.cc:413] Trying to find unique slice for get-tuple-element [{}]
2023-12-20 09:11:49.411493: I tensorflow/compiler/xla/service/buffer_assignment.cc:418] Examining value <4 cudnn-conv-bw-filter{0} @0>
2023-12-20 09:11:49.411511: I tensorflow/compiler/xla/service/buffer_assignment.cc:420] Has allocation
2023-12-20 09:11:49.411821: I tensorflow/compiler/xla/service/buffer_assignment.cc:413] Trying to find unique slice for copy.2 [{}]
2023-12-20 09:11:49.411863: I tensorflow/compiler/xla/service/buffer_assignment.cc:418] Examining value <6 copy.2 @0>
2023-12-20 09:11:49.411880: I tensorflow/compiler/xla/service/buffer_assignment.cc:420] Has allocation
2023-12-20 09:11:49.425292: I tensorflow/compiler/xla/service/gpu/gpu_compiler.cc:1103] GpuCompiler::RunBackend - IR emission time: 9.68 ms (cumulative: 9.68 ms, max: 9.68 ms, #called: 1)
2023-12-20 09:11:49.427762: I tensorflow/compiler/xla/service/gpu/gpu_compiler.cc:1168] GpuCompiler::RunBackend - Running LLVM verifier time: 1.63 ms (cumulative: 1.63 ms, max: 1.63 ms, #called: 1)
2023-12-20 09:11:50.128260: I tensorflow/compiler/xla/service/gpu/gpu_compiler.cc:1389] GpuCompiler::RunBackend time: 745 ms (cumulative: 745 ms, max: 745 ms, #called: 1)
2023-12-20 09:11:50.129347: I tensorflow/compiler/jit/xla_compilation_cache.cc:477] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
2023-12-20 09:11:50.135349: I tensorflow/compiler/xla/service/gpu/gpu_executable.cc:755] Buffer 0 -> 0x7fafda655200 (1048576 B)Buffer 1 -> 0x7fafda755200 (262144 B)Buffer 2 -> 0x7fafdb200000 (9437184 B)Buffer 3 -> 0x7fafdbc00000 (20185344 B)
2023-12-20 09:11:50.135767: I tensorflow/compiler/xla/service/gpu/gpu_executable.cc:365] Executing the thunk for Thunk:#hlo_op=copy,hlo_module=Conv2D.9,program_id=1#
2023-12-20 09:11:50.135863: I tensorflow/compiler/xla/service/gpu/gpu_executable.cc:365] Executing the thunk for Thunk:#hlo_op=cudnn-conv-bw-filter,hlo_module=Conv2D.9,program_id=1#
2023-12-20 09:11:50.136771: I tensorflow/compiler/xla/service/gpu/gpu_executable.cc:365] Executing the thunk for Thunk:#hlo_op=copy.2,hlo_module=Conv2D.9,program_id=1#
2023-12-20 09:11:50.137562: I tensorflow/compiler/xla/service/gpu/gpu_executable.cc:724] GpuExecutable::ExecuteAsyncOnStreamImpl(Conv2D.9) time: 4.88 ms (cumulative: 4.88 ms, max: 4.88 ms, #called: 1)
文章来源:https://blog.csdn.net/liuzonrze/article/details/135115212
本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。