Licenses Guide
Licenses Overview
Slurm can help with software license management by assigning available licenses to jobs at scheduling time. If the licenses are not available, jobs are kept pending until licenses become available. Licenses in Slurm are essentially shared resources, meaning configured resources that are not tied to a specific host but are associated with the entire cluster.
Licenses in Slurm can be configured in two ways:
- Local Licenses: Local licenses are local to the cluster using the slurm.conf in which they are configured.
- Remote Licenses: Remote licenses are served by the database and are configured using the sacctmgr command. Remote licenses are dynamic in nature as upon running the sacctmgr command, the slurmdbd updates all clusters the licenses are assigned to.
Local Licenses
Local licenses are defined in the slurm.conf using the Licenses option.
slurm.conf:
Licenses=fluent:30,ansys:100
Configured licenses can be viewed using the scontrol command.
$ scontrol show lic LicenseName=ansys Total=100 Used=0 Free=100 Remote=no LicenseName=fluent Total=30 Used=0 Free=30 Remote=no
Requesting licenses is done by using the -L, or --licenses, submission option.
$ sbatch -L ansys:2 script.sh Submitted batch job 5212 $ scontrol show lic LicenseName=ansys Total=100 Used=2 Free=98 Remote=no LicenseName=fluent Total=30 Used=0 Free=30 Remote=no
Licenses may also be requested using the --tres-per-task option for job submission. If this approach is used, the license must also be defined in the AccountingStorageTRES option of the slurm.conf.
slurm.conf:
Licenses=fluent:30 AccountingStorageTRES=license/fluent
Requesting licenses with the --tres-per-task submission option.
$ sbatch --tres-per-task=license/fluent:4 script.sh Submitted batch job 6482 $ scontrol show lic LicenseName=fluent Total=30 Used=4 Free=26 Reserved=0 Remote=no
Remote Licenses
Use Case
A site has two license servers, one serves 100 Nastran licenses provided by FlexNet and the other serves 50 Matlab licenses from Reprise License Management. The site has two clusters named "fluid" and "pdf" dedicated to run simulation jobs using both products. The managers want to split the number of Nastran licenses equally between clusters, but assign 70% of the Matlab licenses to cluster "pdf" and the remaining 30% to cluster "fluid".
Configuring Slurm for the use case
Here we assume that both clusters have been configured correctly in the slurmdbd using the sacctmgr command.
$ sacctmgr show clusters format=cluster,controlhost Cluster ControlHost ---------- --------------- fluid 143.11.1.3 pdf 144.12.3.2
The licenses are added using the sacctmgr command, specifying the total count of licenses and the percentage that should be allocated to each cluster. This can be done either in one step or through a multi-step process.
One step:
$ sacctmgr add resource name=nastran cluster=fluid,pdf \ count=100 allowed=50 server=flex_host servertype=flexlm type=license Adding Resource(s) nastran@flex_host Cluster - fluid 50 Cluster - pdf 50 Settings Name = nastran Server = flex_host Description = nastran ServerType = flexlm Count = 100 Type = License
Multi-step:
$ sacctmgr add resource name=matlab count=50 server=rlm_host \ servertype=rlm type=license Adding Resource(s) matlab@rlm_host Settings Name = matlab Server = rlm_host Description = matlab ServerType = rlm Count = 50 Type = License $ sacctmgr add resource name=matlab server=rlm_host \ cluster=pdf allowed=70 Adding Resource(s) matlab@rlm_host Cluster - pdf 70 Settings Name = matlab Server = rlm_host Count = 50 LastConsumed = 0 Flags = (null) Type = License $ sacctmgr add resource name=matlab server=rlm_host \ cluster=fluid allowed=30 Adding Resource(s) matlab@rlm_host Cluster - fluid 30 Settings Name = matlab Server = rlm_host Count = 50 LastConsumed = 0 Flags = (null) Type = License
The sacctmgr command will now display the grand total of licenses.
$ sacctmgr show resource Name Server Type Count LastConsumed Allocated ServerType Flags ---------- ---------- -------- ------ ------------ --------- ---------- -------------------- nastran flex_host License 100 0 100 flexlm matlab rlm_host License 50 0 100 rlm $ sacctmgr show resource withclusters Name Server Type Count LastConsumed Allocated ServerType Cluster Allowed Flags ---------- ---------- -------- ------ ------------ --------- ---------- ---------- -------- -------------------- nastran flex_host License 100 0 100 flexlm fluid 50 nastran flex_host License 100 0 100 flexlm pdf 50 matlab rlm_host License 50 0 100 rlm fluid 30 matlab rlm_host License 50 0 100 rlm pdf 70
The configured licenses are now visible on both clusters using the scontrol command.
# On cluster "pdf": $ scontrol show lic LicenseName=matlab@rlm_host Total=35 Used=0 Free=35 Reserved=0 Remote=yes LastConsumed=0 LastDeficit=0 LastUpdate=2023-02-28T17:01:44 LicenseName=nastran@flex_host Total=50 Used=0 Free=50 Reserved=0 Remote=yes LastConsumed=0 LastDeficit=0 LastUpdate=2023-02-28T17:01:44 # On cluster "fluid": $ scontrol show lic LicenseName=matlab@rlm_host Total=15 Used=0 Free=15 Reserved=0 Remote=yes LastConsumed=0 LastDeficit=0 LastUpdate=2023-02-28T17:01:44 LicenseName=nastran@flex_host Total=50 Used=0 Free=50 Reserved=0 Remote=yes LastConsumed=0 LastDeficit=0 LastUpdate=2023-02-28T17:01:44
When submitting jobs to remote licenses, the name and server must be used.
$ sbatch -L nastran@flex_host script.sh Submitted batch job 5172
License percentages and counts can be modified as shown below:
$ sacctmgr modify resource name=matlab server=rlm_host set \ count=200 Modified server resource ... matlab@rlm_host Cluster - fluid - matlab@rlm_host Cluster - pdf - matlab@rlm_host $ sacctmgr modify resource name=matlab server=rlm_host \ cluster=pdf set allowed=60 Modified server resource ... Cluster - pdf - matlab@rlm_host $ sacctmgr show resource withclusters Name Server Type Count LastConsumed Allocated ServerType Cluster Allowed Flags ---------- ---------- -------- ------ ------------ --------- ---------- ---------- -------- -------------------- nastran flex_host License 100 0 100 flexlm fluid 50 nastran flex_host License 100 0 100 flexlm pdf 50 matlab rlm_host License 200 0 90 rlm fluid 30 matlab rlm_host License 200 0 90 rlm pdf 60
Licenses can be deleted either on the cluster or all together as shown:
$ sacctmgr delete resource where name=matlab server=rlm_host cluster=fluid Deleting resource(s)... Deleting resource(s)... Cluster - fluid - matlab@rlm_host $ sacctmgr delete resource where name=nastran server=flex_host Deleting resource(s)... nastran@flex_host Cluster - fluid - nastran@flex_host Cluster - pdf - nastran@flex_host $ sacctmgr show resource withclusters Name Server Type Count LastConsumed Allocated ServerType Cluster Allowed Flags ---------- ---------- -------- ------ ------------ --------- ---------- ---------- -------- -------------------- matlab rlm_host License 200 0 60 rlm pdf 60
Starting with Slurm 23.02, a new Absolute flag is available that indicates the license allowed values for each cluster are to be treated as absolute license counts rather than percentages.
Some brief examples of license management using this flag.
$ sacctmgr -i add resource name=deluxe cluster=fluid,pdf count=150 allowed=70 \ server=flex_host servertype=flexlm flags=absolute Adding Resource(s) deluxe@flex_host Cluster - fluid 70 Cluster - pdf 70 Settings Name = deluxe Server = flex_host Description = deluxe ServerType = flexlm Count = 150 Flags = Absolute Type = Unknown $ sacctmgr show resource withclusters Name Server Type Count LastConsumed Allocated ServerType Cluster Allowed Flags ---------- ---------- -------- ------ ------------ --------- ---------- ---------- -------- -------------------- deluxe flex_host License 150 0 140 flexlm fluid 70 Absolute deluxe flex_host License 150 0 140 flexlm pdf 70 Absolute $ sacctmgr -i update resource deluxe set allowed=25 where cluster=fluid Modified server resource ... Cluster - fluid - deluxe@flex_host $ sacctmgr show resource withclusters Name Server Type Count LastConsumed Allocated ServerType Cluster Allowed Flags ---------- ---------- -------- ------ ------------ --------- ---------- ---------- -------- -------------------- deluxe flex_host License 150 0 95 flexlm fluid 25 Absolute deluxe flex_host License 150 0 95 flexlm pdf 70 Absolute
This can also be established as the default for all newly created licenses by adding AllResourcesAbsolute=yes to slurmdbd.conf (and restarting SlurmDBD to make the change take effect).
Dynamic licenses
Starting with Slurm 23.02, the LastConsumed field for remote licenses is designed to be periodically updated with the active use count from a license server. An example script for FlexLM's lmstat command is provided below — similar scripts can be easily constructed for other license management stacks.
#!/bin/bash set -euxo pipefail LMSTAT=/opt/foobar/bin/lmstat LICENSE=foobar consumed=$(${LMSTAT} | grep "Users of ${LICENSE}"|sed "s/.*Total of \([0-9]\+\) licenses in use)/\1/") sacctmgr -i update resource ${LICENSE} set lastconsumed=${consumed}
When the LastConsumed value is changed through sacctmgr an update is automatically pushed to the Slurm controllers. They will use this value to calculate a LastDeficit value — this value indicates how many licenses that have "gone missing" from the cluster's perspective and will need to be set aside temporarily.
E.g., on this cluster 100 "foobar" licenses are available, and we are allocating access to 80 of them on the "blackhole" cluster:
$ sacctmgr add resource foobar count=100 flags=absolute cluster=blackhole allowed=80 Adding Resource(s) foobar@slurmdb Cluster - blackhole 80 Settings Name = foobar Server = slurmdb Description = foobar Count = 100 Flags = Absolute Type = Unknown Would you like to commit changes? (You have 30 seconds to decide) (N/y): y $ scontrol show license LicenseName=foobar@slurmdb Total=80 Used=0 Free=80 Reserved=0 Remote=yes LastConsumed=0 LastDeficit=0 LastUpdate=2023-02-28T16:36:55
Now, our cron job comes in and updates the LastConsumed value to 30, while the cluster has yet to allocate any licenses to jobs:
$ sacctmgr -i update resource foobar set lastconsumed=30 Modified server resource ... foobar@slurmdb Cluster - blackhole - foobar@slurmdb $ scontrol show license LicenseName=foobar@slurmdb Total=80 Used=0 Free=70 Reserved=0 Remote=yes LastConsumed=30 LastDeficit=10 LastUpdate=2023-02-28T16:39:27
Note that the cluster has now calculated a deficit of 10 licenses, and has noticed that it should only schedule up to 70 licenses at the moment. The cluster knows that up to 20 licenses are reserved for other clusters or external use at the moment. However, since LastConsumed was set to 30 this implies an additional 10 licenses have "gone rogue" and their usage cannot be accounted for. Thus the cluster must not assign those to any pending jobs, as it's likely that the job would fail to acquire the desired licenses.
If a further update (likely driven through cron) now reduces the LastConsumed count to 10, the deficit is now considered to have disappeared, and the cluster will make all 80 assigned licenses available again:
$ sacctmgr -i update resource foobar set lastconsumed=20 Modified server resource ... foobar@slurmdb Cluster - blackhole - foobar@slurmdb $ scontrol show license LicenseName=foobar@slurmdb Total=80 Used=0 Free=80 Reserved=0 Remote=yes LastConsumed=20 LastDeficit=0 LastUpdate=2023-02-28T16:44:26
Last modified 25 April 2024