Dealing with igb nic driver dropping network

The problem looks like the device ‘disappears’ from the bus, and becomes inaccessible to the driver. If it happens early – the driver will not load, if it happens later – it may fail with sporadic access errors

cat /sys/module/pcie_aspm/parameters/policy
grubby --update-kernel ALL --args "pcie_aspm.policy=performance pcie_port_pm=off pcie_aspm=off"
dracut -f
ethtool -K eno1 tso off gso off gro off

 

cat /root/resetnic.sh

#!/bin/bash

gg_intel() {
journalctl -f | while IFS= read -r line; do
if echo “$line” | grep -q “Failed to read reg 0xc030!”; then

echo 1 > “/sys/bus/pci/devices/0000:07:00.0/remove”
echo 1 > “/sys/bus/pci/rescan”

echo “[$(date)] NIC Was reset!” >>/root/resetnic.log
fi
done
}
gg_intel

cat /etc/systemd/system/resetnic.service

[Unit]
Description=Reset NIC Service
After=network.target

[Service]
Type=simple
ExecStart=/bin/bash /root/resetnic.sh

[Install]
WantedBy=multi-user.target

systemctl daemon-reload

sudo systemctl enable resetnic.service sudo systemctl start resetnic.service