To do this exercise, you will need about 50-100 pieces of paper or sticky notes.
Divide the pieces of paper / sticky notes among the “users” and have them fill out all the pages with simple math problems and their name. Tell everyone that these are the jobs that need to be done and correspond to their computing research problems.
Point out that we now have jobs and we have “compute nodes” (the people still sitting
down) that can solve these problems. How are the jobs going to get to the nodes?
The answer is the scheduling program that will take the jobs from the users and deliver
them to open compute nodes.
Have all the “compute nodes” raise their hands. Have the users “submit” their jobs by handing them to the schedulers. Schedulers should then deliver them to “open” (hands-raised) compute nodes and collect finished problems and return them to the appropriate user.
Wait until most of the problems are done and then re-seat everyone.
To check the status of slurm job control
# scontrol ping
If the primary or the backup slurm controll is down, we need to start the slurmctld
# systemctl status slurmctld.service
scontrol show partition
$ scontrol show partition
PartitionName=debug
AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=YES QoS=N/A
DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED
Nodes=compute1,compute2,compute3,compute4,compute5,compute6,compute7,compute8,compute9,compute10,compute11,compute12,compute13,compute14,compute15,compute16
PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO
OverTimeLimit=NONE PreemptMode=OFF
State=UP TotalCPUs=640 TotalNodes=16 SelectTypeParameters=NONE
JobDefaults=(null)
DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
Display the accounts created:
$ sacctmgr show account
$ sacctmgr show account -s # Show also associations in the accounts
abaineh abaineh abaineh
abebaw abebaw abebaw
abebayehu abebayehu abebayehu
abebe abebe abebe
List users by:
$ sacctmgr show user
$ sacctmgr show user -s
User Def Acct Admin Cluster Account Partition Share MaxJobs MaxNodes MaxCPUs MaxSubmit MaxWall MaxCPUMins QOS Def QOS
---------- ---------- --------- ---------- ---------- ---------- --------- ------- -------- -------- --------- ----------- ----------- --------------
abaineh abaineh None chess abaineh 1 normal normal
Display all Association records by:
$ sacctmgr show associations
Cluster Account User Partition Share GrpJobs GrpTRES GrpSubmit GrpWall GrpTRESMins MaxJobs MaxTRES MaxTRESPerNode MaxSubmit MaxWall MaxTRESMins QOS Def QOS GrpTRESRunMin
---------- ---------- ---------- ---------- --------- ------- ------------- --------- ----------- ------------- ------- ------------- --------------
chess root 1 normal
chess root root 1 normal
chess abaineh 1 normal
chess abaineh abaineh 1 normal normal
chess abebaw 1 normal
chess abebaw abebaw 1 normal
chess abebayehu 1 normal
chess abebayehu abebayehu 1 normal normal
chess abebe 1 normal
chess abebe abebe 1 normal normal
chess abebe2 1 normal
chess abebe2 abebe2 1 normal normal
chess abebee 1 normal
chess abraham 1 normal
chess abraham abraham 1 normal normal
chess abrha 1 normal
chess abrha abrha 1 normal
chess abrham 1 normal
chess abrham abrham 1 normal normal
chess adane 1 normal
Display current accounts on the slurm
$ sacctmgr show account
Create a hierarchical organization list using
# sacctmgr add account username Descr=username Org=username
Create user named username with a default account(required) yyy:
# sacctmgr add user name=username Account=username
Synchronize the slurm user
# scontrol reconfig
Show the Slurm entity (e.g., accounts) problems:
# sacctmgr show problem
We can modify the database items using SQL-like where and set, for example:
# sacctmgr modify account where name=username set cluster=chess
We can delete user account from slurm job controller by:
sacctmgr remove user Username where account=Username/account