?GCN模型和GAT模型仅仅是减缓了过平滑问题,网络层数并没有达到深层。SGC采用图卷积矩阵的k次幂在单层的神经网络中试图去捕获高阶的邻域信息。PPNP和APPNP用个性化页面排名矩阵取图卷积矩阵克服了过平滑问题。然而这些方法在每一层线性聚合邻域表征,失去了深度非线性架构强大的表达能力,这意味着他们仍然是浅层的网络。那图神经网络真的可以达到深层吗?答案是肯定的!
?本文介绍的DAGNN模型就是解决过平滑问题并使网络加深的一个模型。DAGNN将变换和传播分离,传播k次后,局部邻域和大的邻域信息可以被学习到,通过一个自适应调节机制,平衡局部邻域信息和全局邻域信息。
?大多数图卷积操作通过邻域信息传播聚合邻域表征,之后实施一个变换操作。普通图卷积的第
l
l
l层操作可以描述为:
h
i
(
l
)
=
PROPAGATION
(
l
)
(
{
x
i
(
l
?
1
)
,
{
x
j
l
?
1
∣
j
∈
N
i
}
}
)
h_i^{(l)} = \text{PROPAGATION}^{(l)}(\{x_i^{(l-1)},\{x_j^{l-1}|j \in \mathcal{N}_i\}\})
hi(l)?=PROPAGATION(l)({xi(l?1)?,{xjl?1?∣j∈Ni?}})
x
i
(
l
)
=
TRANSFORMATION
(
l
)
(
a
i
(
l
)
)
????????????
x_i^{(l)}=\text{TRANSFORMATION}^{(l)}(a_i^{(l)})\;\;\;\;\;\;
xi(l)?=TRANSFORMATION(l)(ai(l)?)
?基于上述操作,那存在的问题是什么呢?首先,直觉上,表征变换和表征传播交织在一起,即变换操作中的参数和传播中的感受域交织。一跳邻域需要一个变换函数,那么当考虑到更大的邻域时会需要更多的参数,进而导致很难训练一个具有大量参数的深层图神经网络。其次,再结点分类任务中,一个具有两层的多层感知机不考虑图结构,仅仅使用初始特征
X
X
X作为输入,也可以达到不错的性能。基于图结构的信息传播则是帮助减轻分类任务,使同一个类的结点表征更相似。因此,从特征和图结构的角度来看,表征变换和表征传播发挥着不同的作用。DAGNN模型则是采用传播和变换分离的策略,得到如下的模型:
Z
=
MLP
(
X
)
????????????
Z = \text{MLP}(X)\;\;\;\;\;\;
Z=MLP(X)
X
o
u
t
=
softmax
(
A
^
k
Z
)
????????????
X_{out} = \text{softmax}(\hat{A}^kZ)\;\;\;\;\;\;
Xout?=softmax(A^kZ)
?然而,不考虑中间表征,仅仅使用传播k次之后的表征很难获得充分关键的邻域信息,同时也会带来更多的全局信息而稀释了局部的信息,为此,在传播之后,DAGCN采用自适应调节机制来平衡局部邻域信息和全局邻域信息,来决定不同传播层获得的邻域信息有多少应该被保留来生成每个结点的最终表示。DAGNN模型公式描述如下,模型框架如下图所示。
Z
=
MLP
(
X
)
Z = \text{MLP}(X)
Z=MLP(X)
H
l
=
A
^
Z
,
l
=
1
,
2
,
.
.
.
,
k
H_l= \hat{A}Z,l=1,2,...,k
Hl?=A^Z,l=1,2,...,k
H
=
stack
(
Z
,
H
1
,
.
.
.
,
H
k
)
H = \text{stack}(Z,H_1,...,H_k)
H=stack(Z,H1?,...,Hk?)
S
=
σ
(
H
s
)
S = \sigma(Hs)
S=σ(Hs)
S
~
=
reshape
(
S
)
\tilde{S} =\text{reshape}(S)
S~=reshape(S)
X
o
u
t
=
softmax
(
sqeeze
(
S
~
H
)
)
X_{out} = \text{softmax}(\text{sqeeze}(\tilde{S}H))
Xout?=softmax(sqeeze(S~H))
?本节介绍了DAGNN的源码复现,主要将模型的代码放在了下面,如果需要详细的源码,请参看百度云链接。
链接:https://pan.baidu.com/s/1DofAHZbSp5Zf4uKMlriaBA
提取码:6666
import torch
from torch.nn import Module
import torch.nn as nn
from torch.nn.parameter import Parameter
from torch.nn import functional as F
class DAGNN(nn.Module):
def __init__(self,input_dim,hid_dim,output_dim,model,k,dropout):
super(DAGNN,self).__init__()
self.input_dim = input_dim
self.hid_dim = hid_dim
self.output_dim = output_dim
self.model = model
self.k = k
self.dropout = dropout
#法一
#self.s = Parameter(torch.empty(size=(output_dim,1)))
#nn.init.xavier_uniform_(self.s.data,gain=1.414)
#法二
self.project = nn.Linear(output_dim,1)
self.init_param()
def init_param(self):
#kaiming 初始化
self.project.reset_parameters()
def forward(self,feature,adj):
Z = self.model(feature)
prop_matrix = [Z]
for _ in range(self.k):
Z = torch.mm(adj,Z)
prop_matrix.append(Z)
H = torch.stack(prop_matrix,dim=1)
#S = F.sigmoid(torch.matmul(H,self.s))
S = F.sigmoid(self.project(H))
#转置
S = S.transpose(2,1)
out = torch.squeeze(torch.matmul(S,H))
return F.log_softmax(out,dim=1)
class MLP(Module):
def __init__(self,input_dim,hid_dim,output_dim,dropout):
super(MLP,self).__init__()
self.input_dim = input_dim
self.hid_dim = hid_dim
self.output_dim = output_dim
self.dropout = dropout
self.layer1 = nn.Linear(input_dim,hid_dim)
self.layer2 = nn.Linear(hid_dim,output_dim)
self.init_param()
def init_param(self):
self.layer1.reset_parameters()
self.layer2.reset_parameters()
def forward(self,X):
X = F.dropout(X,self.dropout,training=self.training)
X = self.layer1(X)
X = F.relu(X)
X = F.dropout(X,self.dropout,training=self.training)
X = self.layer2(X)
return X
def __repr__(self) -> str:
return self.__class__.__name__
import os
os.environ["DGLBACKEND"] = "pytorch"
import dgl
import dgl.function as fn
import torch
import torch.nn as nn
import torch.nn.functional as F
from dgl.nn.pytorch.conv import GATConv
class DAGNNLayer(nn.Module):
def __init__(self,infeat,k) -> None:
super(DAGNNLayer,self).__init__()
self.k = k
self.s = nn.Linear(infeat,1,bias=False)
self.s.reset_parameters()
def forward(self,feat,g):
with g.local_scope():
results = [feat]
g = dgl.add_self_loop(g)
#计算正则
degs = g.in_degrees().to(feat).clamp(min=1)
norm = torch.pow(degs,-0.5)
norm = norm.to(feat.device).unsqueeze(1)
for _ in range(self.k):
feat = feat * norm
g.ndata['h'] = feat
g.update_all(fn.copy_u('h','m'),fn.sum('m','h'))
feat = g.ndata['h'].to(feat)
feat = feat * norm
results.append(feat)
H = torch.stack(results,dim=1)
S = F.sigmoid(self.s(H))
S = S.permute(0,2,1)
out = torch.matmul(S,H).squeeze()
return out
class DAGNN(nn.Module):
def __init__(self,infeat,hidfeat,outfeat ,dropout,k) -> None:
super(DAGNN,self).__init__()
self.dropout = dropout
self.layer1 = nn.Linear(infeat,hidfeat,bias=False)
self.layer2 = nn.Linear(hidfeat,outfeat,bias=False)
self.DAGNNLayer = DAGNNLayer(outfeat,k)
self.init_param()
def init_param(self):
self.layer1.reset_parameters()
self.layer2.reset_parameters()
def forward(self,x,g):
x = F.dropout(x,self.dropout,training=self.training)
x = F.relu(self.layer1(x))
x = F.dropout(x,self.dropout,training=self.training)
x = self.layer2(x)
x = self.DAGNNLayer(x,g)
return F.log_softmax(x,dim=1)