今日对数据库检查发现如下错误
Errors in file /u01/app/oracle/diag/rdbms/orcldb/orcldb1/incident/incdir_4510778/orcldb1_m004_322434_i4510778.trc:
ORA-00313: open failed for members of log group 6 of thread 1
ORA-00312: online log 6 thread 1: '+REDO/orcldb/ONLINELOG/group_6.266.1070296359'
ORA-17503: ksfdopn:2 Failed to open file +REDO/orcldb/ONLINELOG/group_6.266.1070296359
ORA-15064: communication failure with ASM instance
ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:sendmsg failed with status: 105
ORA-27301: OS failure message: No buffer space available
ORA-27302: failure occurred at: sskgxpsnd2
2023-12-19T03:03:32.788217+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcldb/orcldb1/incident/incdir_4510778/orcldb1_m004_322434_i4510778.trc:
ORA-00313: open failed for members of log group 7 of thread 1
ORA-00312: online log 7 thread 1: '+DGDATA4/orcldb/ONLINELOG/group_7.262.1066409025'
ORA-17503: ksfdopn:2 Failed to open file +DGDATA4/orcldb/ONLINELOG/group_7.262.1066409025
ORA-15064: communication failure with ASM instance
ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:sendmsg failed with status: 105
ORA-27301: OS failure message: No buffer space available
ORA-27302: failure occurred at: sskgxpsnd2
2023-12-19T03:03:42.795004+08:00
Errors in file /u01/app/oracle/diag/rdbms/orcldb/orcldb1/incident/incdir_4510778/orcldb1_m004_322434_i4510778.trc:
ORA-00313: open failed for members of log group 8 of thread 2
ORA-00312: online log 8 thread 2: '+REDO/orcldb/ONLINELOG/group_8.267.1070296539'
ORA-17503: ksfdopn:2 Failed to open file +REDO/orcldb/ONLINELOG/group_8.267.1070296539
ORA-15064: communication failure with ASM instance
ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:sendmsg failed with status: 105
ORA-27301: OS failure message: No buffer space available
ORA-27302: failure occurred at: sskgxpsnd2
2023-12-19T03:03:52.801743+08:00
?
[root@orcl01 ~]# numactl --hardware
available: 4 nodes (0-3)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
node 0 size: 195001 MB
node 0 free: 1759 MB
node 1 cpus: 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119
node 1 size: 196608 MB
node 1 free: 629 MB
node 2 cpus: 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139
node 2 size: 131072 MB
node 2 free: 613 MB
node 3 cpus: 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159
node 3 size: 131072 MB
node 3 free: 465 MB
node distances:
node ? 0 ? 1 ? 2 ? 3?
? 0: ?10 ?21 ?21 ?21?
? 1: ?21 ?10 ?21 ?21?
? 2: ?21 ?21 ?10 ?21?
? 3: ?21 ?21 ?21 ?10?
[root@orcl01 ~]# ip ad sh
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
? ? link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
? ? inet 127.0.0.1/8 scope host lo
? ? ? ?valid_lft forever preferred_lft forever
? ? inet6 ::1/128 scope host?
? ? ? ?valid_lft forever preferred_lft forever
2: eno4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
? ? link/ether c4:b8:b4:2e:b6:f0 brd ff:ff:ff:ff:ff:ff
3: ens3f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
? ? link/ether 34:0a:98:9c:ed:37 brd ff:ff:ff:ff:ff:ff
? ? inet 10.2.0.101/24 brd 10.2.0.255 scope global ens3f0
? ? ? ?valid_lft forever preferred_lft forever
? ? inet 169.254.22.17/19 brd 169.254.31.255 scope global ens3f0:1
? ? ? ?valid_lft forever preferred_lft forever
? ? inet6 fe80::360a:98ff:fe9c:ed37/64 scope link?
? ? ? ?valid_lft forever preferred_lft forever
4: ens3f1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
? ? link/ether 34:0a:98:9c:ed:38 brd ff:ff:ff:ff:ff:ff
5: eno3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
? ? link/ether c4:b8:b4:2e:b6:f1 brd ff:ff:ff:ff:ff:ff
? ? inet 10.228.224.4/26 brd 10.228.224.63 scope global eno3
? ? ? ?valid_lft forever preferred_lft forever
? ? inet 10.228.224.10/26 brd 10.228.224.63 scope global secondary eno3:1
? ? ? ?valid_lft forever preferred_lft forever
? ? inet 10.228.224.12/26 brd 10.228.224.63 scope global secondary eno3:2
? ? ? ?valid_lft forever preferred_lft forever
? ? inet6 2409:8002:5a06:120:10:0:2:d00c/116 scope global deprecated?
? ? ? ?valid_lft forever preferred_lft 0sec
? ? inet6 2409:8002:5a06:120:10:0:2:d00a/116 scope global deprecated?
? ? ? ?valid_lft forever preferred_lft 0sec
? ? inet6 2409:8002:5a06:120:10:0:2:d004/116 scope global?
? ? ? ?valid_lft forever preferred_lft forever
? ? inet6 fe80::c6b8:b4ff:fe2e:b6f1/64 scope link?
? ? ? ?valid_lft forever preferred_lft forever
6: eno1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
? ? link/ether c4:b8:b4:2e:b6:f2 brd ff:ff:ff:ff:ff:ff
7: eno2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
? ? link/ether c4:b8:b4:2e:b6:f3 brd ff:ff:ff:ff:ff:ff
8: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
? ? link/ether 52:54:00:76:94:f3 brd ff:ff:ff:ff:ff:ff
? ? inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
? ? ? ?valid_lft forever preferred_lft forever
9: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc fq_codel master virbr0 state DOWN group default qlen 1000
? ? link/ether 52:54:00:76:94:f3 brd ff:ff:ff:ff:ff:ff
[root@orcl01 ~]#?
很早以前遇到过IPC这样的问题无数,
这个就是咱们现在大内存的主机下,缓冲区碎片整理单元有点小,并且回环地址的mtu值有问题。我们按照官方MOS文档指示进行更改即可;
On servers with High Physical Memory, the parameter vm.min_free_kbytes should be set in the order of 0.4% of total Physical Memory. This helps in keeping a larger range of defragmented memory pages available for network buffers reducing the probability of a low-buffer-space conditions.
*** For example, on a server which is having 256GB RAM, the parameter vm.min_free_kbytes should be set to 1073742 ***
On NUMA Enabled Systems, the value of vm.min_free_kbytes should be multiplied by the number of NUMA nodes since the value is to be split across all the nodes.
On NUMA Enabled Systems, the value of vm.min_free_kbytes = n * 0.4% of total Physical Memory. Here ‘n’ is the number of NUMA nodes.
Additionally, the MTU value should be modified as below
#ifconfig lo mtu 16436
To make the change persistent over reboot add the following line in the file /etc/sysconfig/network-scripts/ifcfg-lo :
MTU=16436
Save the file and restart the network service to load the changes
#service network restart
Note : While making the changes in CRS nodes, if network is restarted while CRS is up, it can hung CRS. So cluster services should be stopped prior to the network restart.
mtype: 61 process 14152 failed because of a resource problem in the OS. The OS has most likely run out of buffers (rval: 4)
Errors in file /oracle/admin/diag/rdbms/<DB>/<SID>/trace/<SID>_ora_14152.trc ?(incident=249601):
ORA-00603: ORACLE server session terminated by fatal error
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:sendmsg failed with status: 105
ORA-27301: OS failure message: No buffer space available
ORA-27302: failure occurred at: sskgxpsnd2
Incident details in: /oracle/admin/diag/rdbms/<DB>/<SID>/incident/incdir_249601/<SID>_ora_14152_i249601.trc
This happens due to less space available for network buffer reservation.根据查询数据库metalink发现需要对主机参数进行调整优化。 ???????