Basis slurm cluster in docker
This commit is contained in:
3
.gitignore
vendored
Normal file
3
.gitignore
vendored
Normal file
@@ -0,0 +1,3 @@
|
||||
.gitsecret/keys/random_seed
|
||||
!*.secret
|
||||
slurm-base/files/munge.key
|
BIN
.gitsecret/keys/pubring.kbx
Normal file
BIN
.gitsecret/keys/pubring.kbx
Normal file
Binary file not shown.
BIN
.gitsecret/keys/pubring.kbx~
Normal file
BIN
.gitsecret/keys/pubring.kbx~
Normal file
Binary file not shown.
BIN
.gitsecret/keys/trustdb.gpg
Normal file
BIN
.gitsecret/keys/trustdb.gpg
Normal file
Binary file not shown.
1
.gitsecret/paths/mapping.cfg
Normal file
1
.gitsecret/paths/mapping.cfg
Normal file
@@ -0,0 +1 @@
|
||||
slurm-base/files/munge.key:c1969b6105adce0e62d71877a77bb7a69762d1be8dae8d7ffe92156663f3ee22
|
75
README.md
Normal file
75
README.md
Normal file
@@ -0,0 +1,75 @@
|
||||
# Een mini slurm cluster als docker compose omgeving
|
||||
|
||||
## Introductie
|
||||
|
||||
Op dit moment zijn we aan het onderzoeken of we via docker containers
|
||||
een slurm omgeving kunnen maken waarin de Deltares waqua en d-hydro
|
||||
modelen kunnen draaien.
|
||||
|
||||
Dit cluster is een POK van slurm in docker containers.
|
||||
De doelen hiervan is:
|
||||
|
||||
- Uitzoeken of slurm in docker kan draaien
|
||||
- Uitzoeken of we de deltares modelen hierin kunnen draaien
|
||||
- Kennis overdracht van slurm
|
||||
|
||||
## Build instructies
|
||||
|
||||
De submit en reken nodes zijn afhankelijk van een basis image slurm-base.
|
||||
Hierin staan al een aantal files die op zowel reken nodes als submit node
|
||||
aanwezig moet zijn.
|
||||
|
||||
Deze moet eerst gebouwt worden met een docker commando:
|
||||
```
|
||||
docker build -t slurm-base:latest slurm-base
|
||||
```
|
||||
|
||||
Hierna kan het cluster gebouwt worden via:
|
||||
```
|
||||
docker-compose build
|
||||
```
|
||||
|
||||
En gestart worden via:
|
||||
```
|
||||
docker-compose up -d
|
||||
```
|
||||
|
||||
## Testen
|
||||
|
||||
### De status van het cluster
|
||||
|
||||
Met de volgende commando's kun je status informatie krijgen:
|
||||
- sinfo
|
||||
- squeue
|
||||
- scontrol ping
|
||||
- scontrol show nodes
|
||||
- scontrol show partition
|
||||
- scontrol show job
|
||||
|
||||
### Een simpel shell script
|
||||
|
||||
Plaats het volgende shell script ergens in '''/home''':
|
||||
```
|
||||
#!/bin/sh
|
||||
|
||||
hostname
|
||||
sleep $(( ${RANDOM}%40+40 ))
|
||||
```
|
||||
|
||||
Vervolgens kun je dit script met sbatch 8 keer submitten.
|
||||
Wat je ziet is dat elke node 2 scripten start.
|
||||
|
||||
De overige 4 blijven in de queue staan.
|
||||
|
||||
## Todo
|
||||
### Nu
|
||||
|
||||
- Integratie mpi.
|
||||
- Integratie waqua/d-hydro.
|
||||
- Workshop schrijven.
|
||||
|
||||
### Voor productie
|
||||
|
||||
- Persistent maken job administratie.
|
||||
- Redundante master nodes.
|
||||
- Submit nodes die geen master node zijn.
|
19
cal/Dockerfile
Normal file
19
cal/Dockerfile
Normal file
@@ -0,0 +1,19 @@
|
||||
# Start with docker base
|
||||
FROM slurm-base
|
||||
|
||||
LABEL maintainer="Marcel Nijenhof <marceln@pion.xs4all.nl>"
|
||||
|
||||
RUN "/usr/bin/yum" "-y" "install" \
|
||||
slurm-slurmd
|
||||
|
||||
#
|
||||
# Startup
|
||||
#
|
||||
ADD files/startup /sbin/startup
|
||||
RUN chown root:root /sbin/startup
|
||||
RUN chmod 700 /sbin/startup
|
||||
|
||||
|
||||
HEALTHCHECK CMD ps -e | grep -q slurmd
|
||||
|
||||
CMD ["/sbin/startup"]
|
4
cal/files/startup
Normal file
4
cal/files/startup
Normal file
@@ -0,0 +1,4 @@
|
||||
#!/bin/sh
|
||||
|
||||
su -s /bin/sh munge -c /usr/sbin/munged
|
||||
exec /opt/slurm/sbin/slurmd -D /opt/slurm/etc/slurm.conf
|
21
docker-compose.yml
Normal file
21
docker-compose.yml
Normal file
@@ -0,0 +1,21 @@
|
||||
---
|
||||
version: '3.7'
|
||||
services:
|
||||
submit:
|
||||
build: submit
|
||||
hostname: submit
|
||||
volumes:
|
||||
- "/dev/log:/dev/log"
|
||||
- "/var/lib/docker/bindmounts/test/home:/home"
|
||||
cal01:
|
||||
build: cal
|
||||
hostname: cal01
|
||||
volumes:
|
||||
- "/dev/log:/dev/log"
|
||||
- "/var/lib/docker/bindmounts/test/home:/home"
|
||||
cal02:
|
||||
build: cal
|
||||
hostname: cal02
|
||||
volumes:
|
||||
- "/dev/log:/dev/log"
|
||||
- "/var/lib/docker/bindmounts/test/home:/home"
|
40
slurm-base/Dockerfile
Normal file
40
slurm-base/Dockerfile
Normal file
@@ -0,0 +1,40 @@
|
||||
# Start with docker base
|
||||
FROM centos:7
|
||||
|
||||
LABEL maintainer="Marcel Nijenhof <marceln@pion.xs4all.nl>"
|
||||
|
||||
#
|
||||
# Install and update
|
||||
#
|
||||
ADD files/slurm.repo /etc/yum.repos.d/slurm.repo
|
||||
|
||||
RUN "/usr/bin/yum" "-y" "update"
|
||||
|
||||
RUN "/usr/bin/yum" "-y" "install" \
|
||||
https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
|
||||
|
||||
RUN "/usr/bin/yum" "-y" "install" \
|
||||
slurm
|
||||
|
||||
RUN "/usr/sbin/groupadd" "-g" "1000" "marceln"
|
||||
RUN "/usr/sbin/useradd" \
|
||||
"-c" "Marcel Nijenhof" \
|
||||
"-u" "1000" \
|
||||
"-g" "marceln" \
|
||||
"-G" "wheel" \
|
||||
"-p" '$6$noVPG3snbYoJqcpO$7ii6A0GJPLzKS1cwjypUkSSID8uHG2rA3plQQifLONh9gtHpq1QY08Wako7wzFE7jMbkbFSgB3a3xlhQkvTQ00' \
|
||||
"marceln"
|
||||
|
||||
#
|
||||
# Munge config
|
||||
#
|
||||
ADD files/munge.key /etc/munge/munge.key
|
||||
RUN chown munge:munge /etc/munge/munge.key
|
||||
RUN chmod 600 /etc/munge/munge.key
|
||||
|
||||
#
|
||||
# Slurm config
|
||||
#
|
||||
RUN mkdir /opt/slurm/etc /var/log/slurm/
|
||||
ADD files/slurm.conf /opt/slurm/etc/slurm.conf
|
||||
ADD files/slurm.sh /etc/profile.d/slurm.sh
|
BIN
slurm-base/files/munge.key.secret
Normal file
BIN
slurm-base/files/munge.key.secret
Normal file
Binary file not shown.
36
slurm-base/files/slurm.conf
Normal file
36
slurm-base/files/slurm.conf
Normal file
@@ -0,0 +1,36 @@
|
||||
#
|
||||
# https://slurm.schedmd.com/slurm.conf.html
|
||||
#
|
||||
ClusterName=slurmcluster
|
||||
SlurmctldHost=submit
|
||||
#
|
||||
AuthType=auth/munge
|
||||
InactiveLimit=120
|
||||
JobCompType=jobcomp/filetxt
|
||||
JobCompLoc=/var/log/slurm/jobcomp
|
||||
ProctrackType=proctrack/linuxproc
|
||||
KillWait=30
|
||||
MaxJobCount=10000
|
||||
MinJobAge=3600
|
||||
ReturnToService=0
|
||||
SchedulerType=sched/backfill
|
||||
SelectType=select/cons_res
|
||||
SelectTypeParameters=CR_CPU
|
||||
SlurmctldLogFile=/var/log/slurm/slurmctld.log
|
||||
SlurmdLogFile=/var/log/slurm/slurmd.log
|
||||
SlurmctldPort=7002
|
||||
SlurmdPort=7003
|
||||
SlurmdSpoolDir=/var/spool/slurmd.spool
|
||||
StateSaveLocation=/var/spool/slurm.state
|
||||
SwitchType=switch/none
|
||||
TmpFS=/tmp
|
||||
WaitTime=30
|
||||
#
|
||||
# Node Configurations
|
||||
#
|
||||
NodeName=cal01 CPUs=2 RealMemory=2000 TmpDisk=64000
|
||||
NodeName=cal02 CPUs=2 RealMemory=2000 TmpDisk=64000
|
||||
#
|
||||
# Partition Configurations
|
||||
#
|
||||
PartitionName=queue Nodes=ALL Default=YES
|
5
slurm-base/files/slurm.repo
Normal file
5
slurm-base/files/slurm.repo
Normal file
@@ -0,0 +1,5 @@
|
||||
[slurm]
|
||||
name=Slurm CentOS7
|
||||
baseurl=https://marceln.org/CentOS7
|
||||
gpgcheck=0
|
||||
enabled=1
|
1
slurm-base/files/slurm.sh
Normal file
1
slurm-base/files/slurm.sh
Normal file
@@ -0,0 +1 @@
|
||||
PATH=${PATH}:/opt/slurm/bin
|
4
slurm-base/files/startup
Normal file
4
slurm-base/files/startup
Normal file
@@ -0,0 +1,4 @@
|
||||
#!/bin/sh
|
||||
|
||||
su -s /bin/sh munge -c /usr/sbin/munged
|
||||
exec /opt/slurm/sbin/slurmctld -D /opt/slurm/etc/slurm.conf
|
BIN
slurm-base/files/wait
Executable file
BIN
slurm-base/files/wait
Executable file
Binary file not shown.
18
submit/Dockerfile
Normal file
18
submit/Dockerfile
Normal file
@@ -0,0 +1,18 @@
|
||||
# Start with docker base
|
||||
FROM slurm-base
|
||||
|
||||
LABEL maintainer="Marcel Nijenhof <marceln@pion.xs4all.nl>"
|
||||
|
||||
RUN "/usr/bin/yum" "-y" "install" \
|
||||
slurm-slurmctld \
|
||||
slurm-torque
|
||||
|
||||
#
|
||||
# Startup
|
||||
#
|
||||
ADD files/startup /sbin/startup
|
||||
RUN chown root:root /sbin/startup
|
||||
RUN chmod 700 /sbin/startup
|
||||
|
||||
HEALTHCHECK CMD /opt/slurm/bin/scontrol ping | grep -q UP
|
||||
CMD ["/sbin/startup"]
|
4
submit/files/startup
Normal file
4
submit/files/startup
Normal file
@@ -0,0 +1,4 @@
|
||||
#!/bin/sh
|
||||
|
||||
su -s /bin/sh munge -c /usr/sbin/munged
|
||||
exec /opt/slurm/sbin/slurmctld -D /opt/slurm/etc/slurm.conf
|
Reference in New Issue
Block a user