Basis slurm cluster in docker

This commit is contained in:
Marcel Nijenhof
2020-05-31 06:58:17 -04:00
commit 5808ac15b0
18 changed files with 231 additions and 0 deletions

3
.gitignore vendored Normal file
View File

@@ -0,0 +1,3 @@
.gitsecret/keys/random_seed
!*.secret
slurm-base/files/munge.key

BIN
.gitsecret/keys/pubring.kbx Normal file

Binary file not shown.

Binary file not shown.

BIN
.gitsecret/keys/trustdb.gpg Normal file

Binary file not shown.

View File

@@ -0,0 +1 @@
slurm-base/files/munge.key:c1969b6105adce0e62d71877a77bb7a69762d1be8dae8d7ffe92156663f3ee22

75
README.md Normal file
View File

@@ -0,0 +1,75 @@
# Een mini slurm cluster als docker compose omgeving
## Introductie
Op dit moment zijn we aan het onderzoeken of we via docker containers
een slurm omgeving kunnen maken waarin de Deltares waqua en d-hydro
modelen kunnen draaien.
Dit cluster is een POK van slurm in docker containers.
De doelen hiervan is:
- Uitzoeken of slurm in docker kan draaien
- Uitzoeken of we de deltares modelen hierin kunnen draaien
- Kennis overdracht van slurm
## Build instructies
De submit en reken nodes zijn afhankelijk van een basis image slurm-base.
Hierin staan al een aantal files die op zowel reken nodes als submit node
aanwezig moet zijn.
Deze moet eerst gebouwt worden met een docker commando:
```
docker build -t slurm-base:latest slurm-base
```
Hierna kan het cluster gebouwt worden via:
```
docker-compose build
```
En gestart worden via:
```
docker-compose up -d
```
## Testen
### De status van het cluster
Met de volgende commando's kun je status informatie krijgen:
- sinfo
- squeue
- scontrol ping
- scontrol show nodes
- scontrol show partition
- scontrol show job
### Een simpel shell script
Plaats het volgende shell script ergens in '''/home''':
```
#!/bin/sh
hostname
sleep $(( ${RANDOM}%40+40 ))
```
Vervolgens kun je dit script met sbatch 8 keer submitten.
Wat je ziet is dat elke node 2 scripten start.
De overige 4 blijven in de queue staan.
## Todo
### Nu
- Integratie mpi.
- Integratie waqua/d-hydro.
- Workshop schrijven.
### Voor productie
- Persistent maken job administratie.
- Redundante master nodes.
- Submit nodes die geen master node zijn.

19
cal/Dockerfile Normal file
View File

@@ -0,0 +1,19 @@
# Start with docker base
FROM slurm-base
LABEL maintainer="Marcel Nijenhof <marceln@pion.xs4all.nl>"
RUN "/usr/bin/yum" "-y" "install" \
slurm-slurmd
#
# Startup
#
ADD files/startup /sbin/startup
RUN chown root:root /sbin/startup
RUN chmod 700 /sbin/startup
HEALTHCHECK CMD ps -e | grep -q slurmd
CMD ["/sbin/startup"]

4
cal/files/startup Normal file
View File

@@ -0,0 +1,4 @@
#!/bin/sh
su -s /bin/sh munge -c /usr/sbin/munged
exec /opt/slurm/sbin/slurmd -D /opt/slurm/etc/slurm.conf

21
docker-compose.yml Normal file
View File

@@ -0,0 +1,21 @@
---
version: '3.7'
services:
submit:
build: submit
hostname: submit
volumes:
- "/dev/log:/dev/log"
- "/var/lib/docker/bindmounts/test/home:/home"
cal01:
build: cal
hostname: cal01
volumes:
- "/dev/log:/dev/log"
- "/var/lib/docker/bindmounts/test/home:/home"
cal02:
build: cal
hostname: cal02
volumes:
- "/dev/log:/dev/log"
- "/var/lib/docker/bindmounts/test/home:/home"

40
slurm-base/Dockerfile Normal file
View File

@@ -0,0 +1,40 @@
# Start with docker base
FROM centos:7
LABEL maintainer="Marcel Nijenhof <marceln@pion.xs4all.nl>"
#
# Install and update
#
ADD files/slurm.repo /etc/yum.repos.d/slurm.repo
RUN "/usr/bin/yum" "-y" "update"
RUN "/usr/bin/yum" "-y" "install" \
https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
RUN "/usr/bin/yum" "-y" "install" \
slurm
RUN "/usr/sbin/groupadd" "-g" "1000" "marceln"
RUN "/usr/sbin/useradd" \
"-c" "Marcel Nijenhof" \
"-u" "1000" \
"-g" "marceln" \
"-G" "wheel" \
"-p" '$6$noVPG3snbYoJqcpO$7ii6A0GJPLzKS1cwjypUkSSID8uHG2rA3plQQifLONh9gtHpq1QY08Wako7wzFE7jMbkbFSgB3a3xlhQkvTQ00' \
"marceln"
#
# Munge config
#
ADD files/munge.key /etc/munge/munge.key
RUN chown munge:munge /etc/munge/munge.key
RUN chmod 600 /etc/munge/munge.key
#
# Slurm config
#
RUN mkdir /opt/slurm/etc /var/log/slurm/
ADD files/slurm.conf /opt/slurm/etc/slurm.conf
ADD files/slurm.sh /etc/profile.d/slurm.sh

Binary file not shown.

View File

@@ -0,0 +1,36 @@
#
# https://slurm.schedmd.com/slurm.conf.html
#
ClusterName=slurmcluster
SlurmctldHost=submit
#
AuthType=auth/munge
InactiveLimit=120
JobCompType=jobcomp/filetxt
JobCompLoc=/var/log/slurm/jobcomp
ProctrackType=proctrack/linuxproc
KillWait=30
MaxJobCount=10000
MinJobAge=3600
ReturnToService=0
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_CPU
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmdLogFile=/var/log/slurm/slurmd.log
SlurmctldPort=7002
SlurmdPort=7003
SlurmdSpoolDir=/var/spool/slurmd.spool
StateSaveLocation=/var/spool/slurm.state
SwitchType=switch/none
TmpFS=/tmp
WaitTime=30
#
# Node Configurations
#
NodeName=cal01 CPUs=2 RealMemory=2000 TmpDisk=64000
NodeName=cal02 CPUs=2 RealMemory=2000 TmpDisk=64000
#
# Partition Configurations
#
PartitionName=queue Nodes=ALL Default=YES

View File

@@ -0,0 +1,5 @@
[slurm]
name=Slurm CentOS7
baseurl=https://marceln.org/CentOS7
gpgcheck=0
enabled=1

View File

@@ -0,0 +1 @@
PATH=${PATH}:/opt/slurm/bin

4
slurm-base/files/startup Normal file
View File

@@ -0,0 +1,4 @@
#!/bin/sh
su -s /bin/sh munge -c /usr/sbin/munged
exec /opt/slurm/sbin/slurmctld -D /opt/slurm/etc/slurm.conf

BIN
slurm-base/files/wait Executable file

Binary file not shown.

18
submit/Dockerfile Normal file
View File

@@ -0,0 +1,18 @@
# Start with docker base
FROM slurm-base
LABEL maintainer="Marcel Nijenhof <marceln@pion.xs4all.nl>"
RUN "/usr/bin/yum" "-y" "install" \
slurm-slurmctld \
slurm-torque
#
# Startup
#
ADD files/startup /sbin/startup
RUN chown root:root /sbin/startup
RUN chmod 700 /sbin/startup
HEALTHCHECK CMD /opt/slurm/bin/scontrol ping | grep -q UP
CMD ["/sbin/startup"]

4
submit/files/startup Normal file
View File

@@ -0,0 +1,4 @@
#!/bin/sh
su -s /bin/sh munge -c /usr/sbin/munged
exec /opt/slurm/sbin/slurmctld -D /opt/slurm/etc/slurm.conf