高性能计算集群管理常用命令

查看负载情况

1
pssh -i -h node-list -o outputs/ uptime

查看每个节点在线用户人数

1
pssh -i -h node-list -o outputs/ 'w| wc -l'

查看电源情况

1
ipmitool -H 172.10.10.45 -U admin -P admin chassis power status

批量启动服务或加载模块

1
pssh -i -h gpu-node-list -o outputs/ 'modprobe ipmi_watchdog;modeprobe ipmi_si;modprobe ipmi_poweroff;modeprobe ipmi_msghandler;modeprobe ipmi_devintf;service ipmi restart'

批量查询集群节点的电源状态

1
for i in {1..20}; do num=$[$i+306];echo '----------------------------node$num------------------------'; ipmitool -H 172.10.12.1$i -U admin -P admin chassis power status; echo ''; done
ZHANGCHI wechat
关注微信号进一步交流