Basis slurm cluster in docker
This commit is contained in:
3
.gitignore
vendored
Normal file
3
.gitignore
vendored
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
.gitsecret/keys/random_seed
|
||||||
|
!*.secret
|
||||||
|
slurm-base/files/munge.key
|
BIN
.gitsecret/keys/pubring.kbx
Normal file
BIN
.gitsecret/keys/pubring.kbx
Normal file
Binary file not shown.
BIN
.gitsecret/keys/pubring.kbx~
Normal file
BIN
.gitsecret/keys/pubring.kbx~
Normal file
Binary file not shown.
BIN
.gitsecret/keys/trustdb.gpg
Normal file
BIN
.gitsecret/keys/trustdb.gpg
Normal file
Binary file not shown.
1
.gitsecret/paths/mapping.cfg
Normal file
1
.gitsecret/paths/mapping.cfg
Normal file
@@ -0,0 +1 @@
|
|||||||
|
slurm-base/files/munge.key:c1969b6105adce0e62d71877a77bb7a69762d1be8dae8d7ffe92156663f3ee22
|
75
README.md
Normal file
75
README.md
Normal file
@@ -0,0 +1,75 @@
|
|||||||
|
# Een mini slurm cluster als docker compose omgeving
|
||||||
|
|
||||||
|
## Introductie
|
||||||
|
|
||||||
|
Op dit moment zijn we aan het onderzoeken of we via docker containers
|
||||||
|
een slurm omgeving kunnen maken waarin de Deltares waqua en d-hydro
|
||||||
|
modelen kunnen draaien.
|
||||||
|
|
||||||
|
Dit cluster is een POK van slurm in docker containers.
|
||||||
|
De doelen hiervan is:
|
||||||
|
|
||||||
|
- Uitzoeken of slurm in docker kan draaien
|
||||||
|
- Uitzoeken of we de deltares modelen hierin kunnen draaien
|
||||||
|
- Kennis overdracht van slurm
|
||||||
|
|
||||||
|
## Build instructies
|
||||||
|
|
||||||
|
De submit en reken nodes zijn afhankelijk van een basis image slurm-base.
|
||||||
|
Hierin staan al een aantal files die op zowel reken nodes als submit node
|
||||||
|
aanwezig moet zijn.
|
||||||
|
|
||||||
|
Deze moet eerst gebouwt worden met een docker commando:
|
||||||
|
```
|
||||||
|
docker build -t slurm-base:latest slurm-base
|
||||||
|
```
|
||||||
|
|
||||||
|
Hierna kan het cluster gebouwt worden via:
|
||||||
|
```
|
||||||
|
docker-compose build
|
||||||
|
```
|
||||||
|
|
||||||
|
En gestart worden via:
|
||||||
|
```
|
||||||
|
docker-compose up -d
|
||||||
|
```
|
||||||
|
|
||||||
|
## Testen
|
||||||
|
|
||||||
|
### De status van het cluster
|
||||||
|
|
||||||
|
Met de volgende commando's kun je status informatie krijgen:
|
||||||
|
- sinfo
|
||||||
|
- squeue
|
||||||
|
- scontrol ping
|
||||||
|
- scontrol show nodes
|
||||||
|
- scontrol show partition
|
||||||
|
- scontrol show job
|
||||||
|
|
||||||
|
### Een simpel shell script
|
||||||
|
|
||||||
|
Plaats het volgende shell script ergens in '''/home''':
|
||||||
|
```
|
||||||
|
#!/bin/sh
|
||||||
|
|
||||||
|
hostname
|
||||||
|
sleep $(( ${RANDOM}%40+40 ))
|
||||||
|
```
|
||||||
|
|
||||||
|
Vervolgens kun je dit script met sbatch 8 keer submitten.
|
||||||
|
Wat je ziet is dat elke node 2 scripten start.
|
||||||
|
|
||||||
|
De overige 4 blijven in de queue staan.
|
||||||
|
|
||||||
|
## Todo
|
||||||
|
### Nu
|
||||||
|
|
||||||
|
- Integratie mpi.
|
||||||
|
- Integratie waqua/d-hydro.
|
||||||
|
- Workshop schrijven.
|
||||||
|
|
||||||
|
### Voor productie
|
||||||
|
|
||||||
|
- Persistent maken job administratie.
|
||||||
|
- Redundante master nodes.
|
||||||
|
- Submit nodes die geen master node zijn.
|
19
cal/Dockerfile
Normal file
19
cal/Dockerfile
Normal file
@@ -0,0 +1,19 @@
|
|||||||
|
# Start with docker base
|
||||||
|
FROM slurm-base
|
||||||
|
|
||||||
|
LABEL maintainer="Marcel Nijenhof <marceln@pion.xs4all.nl>"
|
||||||
|
|
||||||
|
RUN "/usr/bin/yum" "-y" "install" \
|
||||||
|
slurm-slurmd
|
||||||
|
|
||||||
|
#
|
||||||
|
# Startup
|
||||||
|
#
|
||||||
|
ADD files/startup /sbin/startup
|
||||||
|
RUN chown root:root /sbin/startup
|
||||||
|
RUN chmod 700 /sbin/startup
|
||||||
|
|
||||||
|
|
||||||
|
HEALTHCHECK CMD ps -e | grep -q slurmd
|
||||||
|
|
||||||
|
CMD ["/sbin/startup"]
|
4
cal/files/startup
Normal file
4
cal/files/startup
Normal file
@@ -0,0 +1,4 @@
|
|||||||
|
#!/bin/sh
|
||||||
|
|
||||||
|
su -s /bin/sh munge -c /usr/sbin/munged
|
||||||
|
exec /opt/slurm/sbin/slurmd -D /opt/slurm/etc/slurm.conf
|
21
docker-compose.yml
Normal file
21
docker-compose.yml
Normal file
@@ -0,0 +1,21 @@
|
|||||||
|
---
|
||||||
|
version: '3.7'
|
||||||
|
services:
|
||||||
|
submit:
|
||||||
|
build: submit
|
||||||
|
hostname: submit
|
||||||
|
volumes:
|
||||||
|
- "/dev/log:/dev/log"
|
||||||
|
- "/var/lib/docker/bindmounts/test/home:/home"
|
||||||
|
cal01:
|
||||||
|
build: cal
|
||||||
|
hostname: cal01
|
||||||
|
volumes:
|
||||||
|
- "/dev/log:/dev/log"
|
||||||
|
- "/var/lib/docker/bindmounts/test/home:/home"
|
||||||
|
cal02:
|
||||||
|
build: cal
|
||||||
|
hostname: cal02
|
||||||
|
volumes:
|
||||||
|
- "/dev/log:/dev/log"
|
||||||
|
- "/var/lib/docker/bindmounts/test/home:/home"
|
40
slurm-base/Dockerfile
Normal file
40
slurm-base/Dockerfile
Normal file
@@ -0,0 +1,40 @@
|
|||||||
|
# Start with docker base
|
||||||
|
FROM centos:7
|
||||||
|
|
||||||
|
LABEL maintainer="Marcel Nijenhof <marceln@pion.xs4all.nl>"
|
||||||
|
|
||||||
|
#
|
||||||
|
# Install and update
|
||||||
|
#
|
||||||
|
ADD files/slurm.repo /etc/yum.repos.d/slurm.repo
|
||||||
|
|
||||||
|
RUN "/usr/bin/yum" "-y" "update"
|
||||||
|
|
||||||
|
RUN "/usr/bin/yum" "-y" "install" \
|
||||||
|
https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
|
||||||
|
|
||||||
|
RUN "/usr/bin/yum" "-y" "install" \
|
||||||
|
slurm
|
||||||
|
|
||||||
|
RUN "/usr/sbin/groupadd" "-g" "1000" "marceln"
|
||||||
|
RUN "/usr/sbin/useradd" \
|
||||||
|
"-c" "Marcel Nijenhof" \
|
||||||
|
"-u" "1000" \
|
||||||
|
"-g" "marceln" \
|
||||||
|
"-G" "wheel" \
|
||||||
|
"-p" '$6$noVPG3snbYoJqcpO$7ii6A0GJPLzKS1cwjypUkSSID8uHG2rA3plQQifLONh9gtHpq1QY08Wako7wzFE7jMbkbFSgB3a3xlhQkvTQ00' \
|
||||||
|
"marceln"
|
||||||
|
|
||||||
|
#
|
||||||
|
# Munge config
|
||||||
|
#
|
||||||
|
ADD files/munge.key /etc/munge/munge.key
|
||||||
|
RUN chown munge:munge /etc/munge/munge.key
|
||||||
|
RUN chmod 600 /etc/munge/munge.key
|
||||||
|
|
||||||
|
#
|
||||||
|
# Slurm config
|
||||||
|
#
|
||||||
|
RUN mkdir /opt/slurm/etc /var/log/slurm/
|
||||||
|
ADD files/slurm.conf /opt/slurm/etc/slurm.conf
|
||||||
|
ADD files/slurm.sh /etc/profile.d/slurm.sh
|
BIN
slurm-base/files/munge.key.secret
Normal file
BIN
slurm-base/files/munge.key.secret
Normal file
Binary file not shown.
36
slurm-base/files/slurm.conf
Normal file
36
slurm-base/files/slurm.conf
Normal file
@@ -0,0 +1,36 @@
|
|||||||
|
#
|
||||||
|
# https://slurm.schedmd.com/slurm.conf.html
|
||||||
|
#
|
||||||
|
ClusterName=slurmcluster
|
||||||
|
SlurmctldHost=submit
|
||||||
|
#
|
||||||
|
AuthType=auth/munge
|
||||||
|
InactiveLimit=120
|
||||||
|
JobCompType=jobcomp/filetxt
|
||||||
|
JobCompLoc=/var/log/slurm/jobcomp
|
||||||
|
ProctrackType=proctrack/linuxproc
|
||||||
|
KillWait=30
|
||||||
|
MaxJobCount=10000
|
||||||
|
MinJobAge=3600
|
||||||
|
ReturnToService=0
|
||||||
|
SchedulerType=sched/backfill
|
||||||
|
SelectType=select/cons_res
|
||||||
|
SelectTypeParameters=CR_CPU
|
||||||
|
SlurmctldLogFile=/var/log/slurm/slurmctld.log
|
||||||
|
SlurmdLogFile=/var/log/slurm/slurmd.log
|
||||||
|
SlurmctldPort=7002
|
||||||
|
SlurmdPort=7003
|
||||||
|
SlurmdSpoolDir=/var/spool/slurmd.spool
|
||||||
|
StateSaveLocation=/var/spool/slurm.state
|
||||||
|
SwitchType=switch/none
|
||||||
|
TmpFS=/tmp
|
||||||
|
WaitTime=30
|
||||||
|
#
|
||||||
|
# Node Configurations
|
||||||
|
#
|
||||||
|
NodeName=cal01 CPUs=2 RealMemory=2000 TmpDisk=64000
|
||||||
|
NodeName=cal02 CPUs=2 RealMemory=2000 TmpDisk=64000
|
||||||
|
#
|
||||||
|
# Partition Configurations
|
||||||
|
#
|
||||||
|
PartitionName=queue Nodes=ALL Default=YES
|
5
slurm-base/files/slurm.repo
Normal file
5
slurm-base/files/slurm.repo
Normal file
@@ -0,0 +1,5 @@
|
|||||||
|
[slurm]
|
||||||
|
name=Slurm CentOS7
|
||||||
|
baseurl=https://marceln.org/CentOS7
|
||||||
|
gpgcheck=0
|
||||||
|
enabled=1
|
1
slurm-base/files/slurm.sh
Normal file
1
slurm-base/files/slurm.sh
Normal file
@@ -0,0 +1 @@
|
|||||||
|
PATH=${PATH}:/opt/slurm/bin
|
4
slurm-base/files/startup
Normal file
4
slurm-base/files/startup
Normal file
@@ -0,0 +1,4 @@
|
|||||||
|
#!/bin/sh
|
||||||
|
|
||||||
|
su -s /bin/sh munge -c /usr/sbin/munged
|
||||||
|
exec /opt/slurm/sbin/slurmctld -D /opt/slurm/etc/slurm.conf
|
BIN
slurm-base/files/wait
Executable file
BIN
slurm-base/files/wait
Executable file
Binary file not shown.
18
submit/Dockerfile
Normal file
18
submit/Dockerfile
Normal file
@@ -0,0 +1,18 @@
|
|||||||
|
# Start with docker base
|
||||||
|
FROM slurm-base
|
||||||
|
|
||||||
|
LABEL maintainer="Marcel Nijenhof <marceln@pion.xs4all.nl>"
|
||||||
|
|
||||||
|
RUN "/usr/bin/yum" "-y" "install" \
|
||||||
|
slurm-slurmctld \
|
||||||
|
slurm-torque
|
||||||
|
|
||||||
|
#
|
||||||
|
# Startup
|
||||||
|
#
|
||||||
|
ADD files/startup /sbin/startup
|
||||||
|
RUN chown root:root /sbin/startup
|
||||||
|
RUN chmod 700 /sbin/startup
|
||||||
|
|
||||||
|
HEALTHCHECK CMD /opt/slurm/bin/scontrol ping | grep -q UP
|
||||||
|
CMD ["/sbin/startup"]
|
4
submit/files/startup
Normal file
4
submit/files/startup
Normal file
@@ -0,0 +1,4 @@
|
|||||||
|
#!/bin/sh
|
||||||
|
|
||||||
|
su -s /bin/sh munge -c /usr/sbin/munged
|
||||||
|
exec /opt/slurm/sbin/slurmctld -D /opt/slurm/etc/slurm.conf
|
Reference in New Issue
Block a user