16. nfs not responding,8BD问题

16.1. 复现过程

8DB问题复现的条件是:

nfs服务端 nfs客户端 现象
cpu20-RHEL7.6 kernel-alt-4.14.0-115.el7a cpu16 ubuntu 18.04 出现
cpu20-RHEL7.6 kernel-alt-4.14.0-115.7.1.el7 a cpu16 ubuntu 18.04 不出现
cpu16-RHEL7.6 kernel-alt-4.14.0-115.el7a cpu20 RHEL7.6 kernel-alt-4.14.0-115.el7a 出现
cpu16-RHEL7.6 kernel-alt-4.14.0-115.el7a cpu20 RHEL7.6 kernel-alt-4.14.0-115.7.1.el 7a 不出现

系统信息:

[root@readhat76 ~]#cat /etc/os-release
NAME="Red Hat Enterprise Linux Server"
VERSION="7.6 (Maipo)"
ID="rhel"
ID_LIKE="fedora"
VARIANT="Server"
VARIANT_ID="server"
VERSION_ID="7.6"
PRETTY_NAME="Red Hat Enterprise Linux Server 7.6 (Maipo)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:7.6:GA:server"
HOME_URL="https://www.redhat.com/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 7"
REDHAT_BUGZILLA_PRODUCT_VERSION=7.6
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="7.6"
[root@readhat76 ~]#

16.2. nfs 服务端设置

[root@readhat76 ~]#cat /etc/exports
/root/nfs-test-dir *(rw,sync,no_root_squash)

16.3. nfs 客户端设置

[root@readhat76 ~]#mount -o vers=3 root@192.168.1.215:/root/nfs-test-dir /root/nfs-client-dir

[root@readhat76 ~]#df
Filesystem                       1K-blocks     Used  Available Use% Mounted on
devtmpfs                         267835008        0  267835008   0% /dev
tmpfs                            267845760        0  267845760   0% /dev/shm
tmpfs                            267845760    41728  267804032   1% /run
tmpfs                            267845760        0  267845760   0% /sys/fs/cgroup
/dev/mapper/rhel_readhat76-root   52403200 12393324   40009876  24% /
/dev/sdb2                          1038336   127428     910908  13% /boot
/dev/sdb1                           204580     7944     196636   4% /boot/efi
/dev/mapper/rhel_readhat76-home 3847258716    33008 3847225708   1% /home
tmpfs                             53569216        0   53569216   0% /run/user/0
/dev/loop0                         3109414  3109414          0 100% /mnt/cd_redhat7.6
localhost:/root/nfs-test-dir      52403200 12392448   40010752  24% /root/nfs-client-dir

16.4. 在nfs客户段编译内核源码

源码需要位于挂载的目录下

wget https://cdn.kernel.org/pub/linux/kernel/v5.x/linux-5.0.3.tar.xz
xz -d linux-5.0.3.tar.xz

make defconfig

make -j48

16.5. 复现成功

在nfs客户端编译停止

me@ubuntu:~/nfs-client-dir/linux-5.0.3$ sudo make -j48
  WRAP    arch/arm64/include/generated/uapi/asm/kvm_para.h
  WRAP    arch/arm64/include/generated/uapi/asm/errno.h
  WRAP    arch/arm64/include/generated/uapi/asm/ioctl.h
  WRAP    arch/arm64/include/generated/uapi/asm/ioctls.h
  WRAP    arch/arm64/include/generated/uapi/asm/ipcbuf.h
  WRAP    arch/arm64/include/generated/uapi/asm/mman.h
  WRAP    arch/arm64/include/generated/uapi/asm/msgbuf.h
  WRAP    arch/arm64/include/generated/uapi/asm/poll.h
  WRAP    arch/arm64/include/generated/uapi/asm/resource.h
  WRAP    arch/arm64/include/generated/uapi/asm/sembuf.h
  WRAP    arch/arm64/include/generated/uapi/asm/shmbuf.h
  WRAP    arch/arm64/include/generated/uapi/asm/siginfo.h
  UPD     include/config/kernel.release
  WRAP    arch/arm64/include/generated/uapi/asm/socket.h
  WRAP    arch/arm64/include/generated/uapi/asm/sockios.h
  WRAP    arch/arm64/include/generated/uapi/asm/swab.h
  WRAP    arch/arm64/include/generated/uapi/asm/termbits.h
  WRAP    arch/arm64/include/generated/uapi/asm/termios.h
  WRAP    arch/arm64/include/generated/uapi/asm/types.h
  UPD     include/generated/uapi/linux/version.h

在nfs客户端出现

me@ubuntu:~$ dmesg -T
[Thu Mar 21 15:17:02 2019] nfsacl: server 192.168.1.215 not responding, still trying
[Thu Mar 21 15:17:02 2019] nfsacl: server 192.168.1.215 not responding, still trying

在nfs服务端出现

[root@redhat76 linux-5.0.3]# dmesg -T
[Thu Mar 21 15:19:36 2019] rpc-srv/tcp: nfsd: got error -11 when sending 116 bytes - shutting down socket
[Thu Mar 21 15:21:15 2019] rpc-srv/tcp: nfsd: got error -11 when sending 116 bytes - shutting down socket

其中make的call stack是:

[Sat Apr 13 17:50:11 2019] [<ffff000008085e24>] __switch_to+0x8c/0xa8
[Sat Apr 13 17:50:11 2019] [<ffff000008828f18>] __schedule+0x328/0x860
[Sat Apr 13 17:50:11 2019] [<ffff000008829484>] schedule+0x34/0x8c
[Sat Apr 13 17:50:11 2019] [<ffff000000ef009c>] rpc_wait_bit_killable+0x2c/0xb8 [sunrpc]
[Sat Apr 13 17:50:11 2019] [<ffff000008829a7c>] __wait_on_bit+0xac/0xe0
[Sat Apr 13 17:50:11 2019] [<ffff000008829b58>] out_of_line_wait_on_bit+0xa8/0xcc
[Sat Apr 13 17:50:11 2019] [<ffff000000ef132c>] __rpc_execute+0x114/0x468 [sunrpc]
[Sat Apr 13 17:50:11 2019] [<ffff000000ef1a58>] rpc_execute+0x7c/0x10c [sunrpc]
[Sat Apr 13 17:50:11 2019] [<ffff000000ee1150>] rpc_run_task+0x118/0x168 [sunrpc]
[Sat Apr 13 17:50:11 2019] [<ffff000000ee3b44>] rpc_call_sync+0x6c/0xc0 [sunrpc]
[Sat Apr 13 17:50:11 2019] [<ffff000000de09dc>] nfs3_rpc_wrapper.constprop.11+0x78/0xd4 [nfsv3]
[Sat Apr 13 17:50:11 2019] [<ffff000000de1fd4>] nfs3_proc_getattr+0x70/0xec [nfsv3]
[Sat Apr 13 17:50:11 2019] [<ffff000002c7c114>] __nfs_revalidate_inode+0xf8/0x384 [nfs]
[Sat Apr 13 17:50:11 2019] [<ffff000002c755dc>] nfs_do_access+0x194/0x430 [nfs]
[Sat Apr 13 17:50:11 2019] [<ffff000002c75a48>] nfs_permission+0x15c/0x21c [nfs]
[Sat Apr 13 17:50:11 2019] [<ffff0000082adf08>] __inode_permission+0x98/0xf4
[Sat Apr 13 17:50:11 2019] [<ffff0000082adf94>] inode_permission+0x30/0x6c
[Sat Apr 13 17:50:11 2019] [<ffff0000082b10e4>] link_path_walk+0x7c/0x4ac
[Sat Apr 13 17:50:11 2019] [<ffff0000082b164c>] path_lookupat+0xac/0x230
[Sat Apr 13 17:50:11 2019] [<ffff0000082b29a4>] filename_lookup+0x90/0x158
[Sat Apr 13 17:50:11 2019] [<ffff0000082b2b9c>] user_path_at_empty+0x58/0x64
[Sat Apr 13 17:50:11 2019] [<ffff0000082a7b08>] vfs_statx+0x98/0x108
[Sat Apr 13 17:50:11 2019] [<ffff0000082a810c>] SyS_newfstatat+0x50/0x88

获取call_stack的办法是:

echo "w" > /proc/sysrq-trigger
dmesg

完整的log可以查看[8DB]

16.6. 编译内核进行验证

根据 [redhat 编译内核] 编译新内核并安装。

16.7. 重新验证

成功编译内核

  LD [M]  sound/soc/meson/snd-soc-meson-axg-tdm-formatter.ko
  LD [M]  sound/soc/meson/snd-soc-meson-axg-tdm-interface.ko
  LD [M]  sound/soc/meson/snd-soc-meson-axg-tdmin.ko
  LD [M]  sound/soc/meson/snd-soc-meson-axg-tdmout.ko
  LD [M]  sound/soc/meson/snd-soc-meson-axg-toddr.ko
  LD [M]  sound/soc/rockchip/snd-soc-rk3399-gru-sound.ko
  LD [M]  sound/soc/rockchip/snd-soc-rockchip-i2s.ko
  LD [M]  sound/soc/rockchip/snd-soc-rockchip-pcm.ko
  LD [M]  sound/soc/rockchip/snd-soc-rockchip-rt5645.ko
  LD [M]  sound/soc/rockchip/snd-soc-rockchip-spdif.ko
  LD [M]  sound/soc/sh/rcar/snd-soc-rcar.ko
me@ubuntu:~/nfs-client-dir/linux-5.0.3$
me@ubuntu:~/nfs-client-dir/linux-5.0.3$
me@ubuntu:~/nfs-client-dir/linux-5.0.3$
me@ubuntu:~/nfs-client-dir/linux-5.0.3$ ls
arch   built-in.a  COPYING  crypto         drivers   fs       init  Kbuild   kernel  LICENSES     Makefile  modules.builtin  Module.symvers  README   scripts   sound       tools  virt     vmlinux.o
block  certs       CREDITS  Documentation  firmware  include  ipc   Kconfig  lib     MAINTAINERS  mm        modules.order    net             samples  security  System.map  usr    vmlinux

没有出现nfs server not respond

me@ubuntu:~/nfs-client-dir/linux-5.0.3$ dmesg -T
me@ubuntu:~/nfs-client-dir/linux-5.0.3$

复现问题过程的问题 ## 问题1 plex not found

me@ubuntu:~/nfs-client-dir/linux-5.0.3$ sudo make defconfig
  LEX     scripts/kconfig/zconf.lex.c
/bin/sh: 1: flex: not found
scripts/Makefile.lib:193: recipe for target 'scripts/kconfig/zconf.lex.c' failed
make[1]: *** [scripts/kconfig/zconf.lex.c] Error 127
Makefile:538: recipe for target 'defconfig' failed
make: *** [defconfig] Error 2

解决办法是:

apt install plex

16.8. 问题2 bison: not found

apt install bison

16.9. 问题3 openssl not found

scripts/extract-cert.c:21:25: fatal error: openssl/bio.h: No such file or directory
 #include <openssl/bio.h>
                         ^
compilation terminated.

解决办法