问题发生场景如题,主要症状为宿主机及 VM 网络中断,PVE 系统日志出现 Detected Hardware Unit Hang 错误:
Jun 01 16:08:56 pve kernel: e1000e 0000:00:1f.6 eno2: Detected Hardware Unit Hang:
TDH <f6>
TDT <19>
next_to_use <19>
next_to_clean <f5>
buffer_info[next_to_clean]:
time_stamp <100d9db25>
next_to_watch <f6>
jiffies <100dfacc0>
next_to_watch.status <0>
MAC Status <40080083>
PHY Status <796d>
PHY 1000BASE-T Status <3800>
PHY Extended Status <3000>
PCI Status <10>
Jun 01 16:08:58 pve kernel: e1000e 0000:00:1f.6 eno2: Detected Hardware Unit Hang:
TDH <f6>
TDT <19>
next_to_use <19>
next_to_clean <f5>
buffer_info[next_to_clean]:
time_stamp <100d9db25>
next_to_watch <f6>
jiffies <100dfb480>
next_to_watch.status <0>
MAC Status <40080083>
PHY Status <796d>
PHY 1000BASE-T Status <3800>
PHY Extended Status <3000>
PCI Status <10>
Jun 01 16:09:00 pve kernel: e1000e 0000:00:1f.6 eno2: Detected Hardware Unit Hang:
TDH <f6>
TDT <19>
next_to_use <19>
next_to_clean <f5>
buffer_info[next_to_clean]:
time_stamp <100d9db25>
next_to_watch <f6>
jiffies <100dfbc40>
next_to_watch.status <0>
MAC Status <40080083>
PHY Status <796d>
PHY 1000BASE-T Status <3800>
PHY Extended Status <3000>
PCI Status <10>
...
此问题与 TCP checksum offload 特性有关,解决方案就是关掉 checksum offload,具体方法是使用 ethtool 工具:
ethtool -K eth0 tx off rx off
# 或者直接禁用 TSO
ethtool -K eth0 tso off
通过上面这个命令就可以临时禁用对应网卡的 checksum offload
如果要重启后永久生效的话将此命令写入/etc/network/if-up.d/ethtool_distable_tso文件中并为此文件加上x权限即可:
#!/bin/sh
ethtool -K eth0 tx off rx off
参考:
解决FreeNAS under KVM使用Virtio网卡导致宿主机网卡Hang的问题
Cluster losing network connection when a single VM is heavily using the network
e1000e Reset adapter unexpectedly / Detected Hardware Unit Hang
PVE 部署 NAS 系统中 SMB 大流量导致网络中断的问题
https://ailitonia.com/archives/pve-%e9%83%a8%e7%bd%b2-nas-%e7%b3%bb%e7%bb%9f%e4%b8%ad-smb-%e5%a4%a7%e6%b5%81%e9%87%8f%e5%af%bc%e8%87%b4%e7%bd%91%e7%bb%9c%e4%b8%ad%e6%96%ad%e7%9a%84%e9%97%ae%e9%a2%98/
本文被阅读了:1,658次