Licenses Guide

Licenses Overview

Slurm can help with software license management by assigning available licenses to jobs at scheduling time. If the licenses are not available, jobs are kept pending until licenses become available. Licenses in Slurm are essentially shared resources, meaning configured resources that are not tied to a specific host but are associated with the entire cluster.

Licenses in Slurm can be configured in two ways:

  • Local Licenses: Local licenses are local to the cluster using the slurm.conf in which they are configured.
  • Remote Licenses: Remote licenses are served by the database and are configured using the sacctmgr command. Remote licenses are dynamic in nature as upon running the sacctmgr command, the slurmdbd updates all clusters the licenses are assigned to.

Local Licenses

Local licenses are defined in the slurm.conf using the Licenses option.

slurm.conf:

Licenses=fluent:30,ansys:100

Configured licenses can be viewed using the scontrol command.

$ scontrol show lic
LicenseName=ansys
    Total=100 Used=0 Free=100 Remote=no
LicenseName=fluent
    Total=30 Used=0 Free=30 Remote=no

Requesting licenses is done by using the -L, or --licenses, submission option.

$ sbatch -L ansys:2 script.sh
Submitted batch job 5212

$ scontrol show lic
LicenseName=ansys
    Total=100 Used=2 Free=98 Remote=no
LicenseName=fluent
    Total=30 Used=0 Free=30 Remote=no

Licenses may also be requested using the --tres-per-task option for job submission. If this approach is used, the license must also be defined in the AccountingStorageTRES option of the slurm.conf.

slurm.conf:

Licenses=fluent:30
AccountingStorageTRES=license/fluent

Requesting licenses with the --tres-per-task submission option.

$ sbatch --tres-per-task=license/fluent:4 script.sh
Submitted batch job 6482

$ scontrol show lic
LicenseName=fluent
    Total=30 Used=4 Free=26 Reserved=0 Remote=no

Remote Licenses

Use Case

A site has two license servers, one serves 100 Nastran licenses provided by FlexNet and the other serves 50 Matlab licenses from Reprise License Management. The site has two clusters named "fluid" and "pdf" dedicated to run simulation jobs using both products. The managers want to split the number of Nastran licenses equally between clusters, but assign 70% of the Matlab licenses to cluster "pdf" and the remaining 30% to cluster "fluid".

Configuring Slurm for the use case

Here we assume that both clusters have been configured correctly in the slurmdbd using the sacctmgr command.

$ sacctmgr show clusters format=cluster,controlhost
   Cluster     ControlHost
---------- ---------------
     fluid     143.11.1.3
       pdf     144.12.3.2

The licenses are added using the sacctmgr command, specifying the total count of licenses and the percentage that should be allocated to each cluster. This can be done either in one step or through a multi-step process.

One step:

$ sacctmgr add resource name=nastran cluster=fluid,pdf \
  count=100 allowed=50 server=flex_host servertype=flexlm type=license
 Adding Resource(s)
  nastran@flex_host
   Cluster - fluid	50
   Cluster - pdf	50
 Settings
  Name           = nastran
  Server         = flex_host
  Description    = nastran
  ServerType     = flexlm
  Count          = 100
  Type           = License

Multi-step:

$ sacctmgr add resource name=matlab count=50 server=rlm_host \
  servertype=rlm type=license
 Adding Resource(s)
  matlab@rlm_host
 Settings
  Name           = matlab
  Server         = rlm_host
  Description    = matlab
  ServerType     = rlm
  Count          = 50
  Type           = License

$ sacctmgr add resource name=matlab server=rlm_host \
  cluster=pdf allowed=70
 Adding Resource(s)
  matlab@rlm_host
   Cluster - pdf	70
 Settings
  Name           = matlab
  Server         = rlm_host
  Count          = 50
  LastConsumed   = 0
  Flags          = (null)
  Type           = License

$ sacctmgr add resource name=matlab server=rlm_host \
  cluster=fluid allowed=30
 Adding Resource(s)
  matlab@rlm_host
   Cluster - fluid	30
 Settings
  Name           = matlab
  Server         = rlm_host
  Count          = 50
  LastConsumed   = 0
  Flags          = (null)
  Type           = License

The sacctmgr command will now display the grand total of licenses.

$ sacctmgr show resource
      Name     Server     Type  Count LastConsumed Allocated ServerType                Flags
---------- ---------- -------- ------ ------------ --------- ---------- --------------------
   nastran  flex_host  License    100            0       100     flexlm
    matlab   rlm_host  License     50            0       100        rlm
$ sacctmgr show resource withclusters
      Name     Server     Type  Count LastConsumed Allocated ServerType    Cluster  Allowed                Flags
---------- ---------- -------- ------ ------------ --------- ---------- ---------- -------- --------------------
   nastran  flex_host  License    100            0       100     flexlm      fluid       50
   nastran  flex_host  License    100            0       100     flexlm        pdf       50 
    matlab   rlm_host  License     50            0       100        rlm      fluid       30
    matlab   rlm_host  License     50            0       100        rlm        pdf       70

The configured licenses are now visible on both clusters using the scontrol command.

# On cluster "pdf":
$ scontrol show lic
LicenseName=matlab@rlm_host
    Total=35 Used=0 Free=35 Reserved=0 Remote=yes
    LastConsumed=0 LastDeficit=0 LastUpdate=2023-02-28T17:01:44
LicenseName=nastran@flex_host
    Total=50 Used=0 Free=50 Reserved=0 Remote=yes
    LastConsumed=0 LastDeficit=0 LastUpdate=2023-02-28T17:01:44

# On cluster "fluid":
$ scontrol show lic
LicenseName=matlab@rlm_host
    Total=15 Used=0 Free=15 Reserved=0 Remote=yes
    LastConsumed=0 LastDeficit=0 LastUpdate=2023-02-28T17:01:44
LicenseName=nastran@flex_host
    Total=50 Used=0 Free=50 Reserved=0 Remote=yes
    LastConsumed=0 LastDeficit=0 LastUpdate=2023-02-28T17:01:44

When submitting jobs to remote licenses, the name and server must be used.

$ sbatch -L nastran@flex_host script.sh
Submitted batch job 5172

License percentages and counts can be modified as shown below:

$ sacctmgr modify resource name=matlab server=rlm_host set \
  count=200
 Modified server resource ...
  matlab@rlm_host
  Cluster - fluid	- matlab@rlm_host
  Cluster - pdf	- matlab@rlm_host

$ sacctmgr modify resource name=matlab server=rlm_host \
  cluster=pdf set allowed=60
 Modified server resource ...
  Cluster - pdf	- matlab@rlm_host

$ sacctmgr show resource withclusters
      Name     Server     Type  Count LastConsumed Allocated ServerType    Cluster  Allowed                Flags
---------- ---------- -------- ------ ------------ --------- ---------- ---------- -------- --------------------
   nastran  flex_host  License    100            0       100     flexlm      fluid       50
   nastran  flex_host  License    100            0       100     flexlm        pdf       50
    matlab   rlm_host  License    200            0        90        rlm      fluid       30
    matlab   rlm_host  License    200            0        90        rlm        pdf       60

Licenses can be deleted either on the cluster or all together as shown:

$ sacctmgr delete resource where name=matlab server=rlm_host cluster=fluid
 Deleting resource(s)...
 Deleting resource(s)...
  Cluster - fluid	- matlab@rlm_host

$ sacctmgr delete resource where name=nastran server=flex_host
 Deleting resource(s)...
  nastran@flex_host
  Cluster - fluid	- nastran@flex_host
  Cluster - pdf	- nastran@flex_host

$ sacctmgr show resource withclusters
      Name     Server     Type  Count LastConsumed Allocated ServerType    Cluster  Allowed                Flags
---------- ---------- -------- ------ ------------ --------- ---------- ---------- -------- --------------------
    matlab   rlm_host  License    200            0        60        rlm        pdf       60

Starting with Slurm 23.02, a new Absolute flag is available that indicates the license allowed values for each cluster are to be treated as absolute license counts rather than percentages.

Some brief examples of license management using this flag.

$ sacctmgr -i add resource name=deluxe cluster=fluid,pdf count=150 allowed=70 \
  server=flex_host servertype=flexlm flags=absolute
 Adding Resource(s)
  deluxe@flex_host
   Cluster - fluid	70
   Cluster - pdf	70
 Settings
  Name           = deluxe
  Server         = flex_host
  Description    = deluxe
  ServerType     = flexlm
  Count          = 150
  Flags          = Absolute
  Type           = Unknown

$ sacctmgr show resource withclusters
      Name     Server     Type  Count LastConsumed Allocated ServerType    Cluster  Allowed                Flags 
---------- ---------- -------- ------ ------------ --------- ---------- ---------- -------- -------------------- 
    deluxe  flex_host  License    150            0       140     flexlm      fluid       70             Absolute 
    deluxe  flex_host  License    150            0       140     flexlm        pdf       70             Absolute

$ sacctmgr -i update resource deluxe set allowed=25 where cluster=fluid
 Modified server resource ...
  Cluster - fluid	- deluxe@flex_host

$ sacctmgr show resource withclusters
      Name     Server     Type  Count LastConsumed Allocated ServerType    Cluster  Allowed                Flags 
---------- ---------- -------- ------ ------------ --------- ---------- ---------- -------- -------------------- 
    deluxe  flex_host  License    150            0        95     flexlm      fluid       25             Absolute 
    deluxe  flex_host  License    150            0        95     flexlm        pdf       70             Absolute 

This can also be established as the default for all newly created licenses by adding AllResourcesAbsolute=yes to slurmdbd.conf (and restarting SlurmDBD to make the change take effect).

Dynamic licenses

Starting with Slurm 23.02, the LastConsumed field for remote licenses is designed to be periodically updated with the active use count from a license server. An example script for FlexLM's lmstat command is provided below — similar scripts can be easily constructed for other license management stacks.

#!/bin/bash

set -euxo pipefail

LMSTAT=/opt/foobar/bin/lmstat
LICENSE=foobar

consumed=$(${LMSTAT} | grep "Users of ${LICENSE}"|sed "s/.*Total of \([0-9]\+\) licenses in use)/\1/")

sacctmgr -i update resource ${LICENSE} set lastconsumed=${consumed}

When the LastConsumed value is changed through sacctmgr an update is automatically pushed to the Slurm controllers. They will use this value to calculate a LastDeficit value — this value indicates how many licenses that have "gone missing" from the cluster's perspective and will need to be set aside temporarily.

E.g., on this cluster 100 "foobar" licenses are available, and we are allocating access to 80 of them on the "blackhole" cluster:

$ sacctmgr add resource foobar count=100 flags=absolute cluster=blackhole allowed=80
 Adding Resource(s)
  foobar@slurmdb
   Cluster - blackhole	80
 Settings
  Name           = foobar
  Server         = slurmdb
  Description    = foobar
  Count          = 100
  Flags          = Absolute
  Type           = Unknown
Would you like to commit changes? (You have 30 seconds to decide)
(N/y): y
$ scontrol show license
LicenseName=foobar@slurmdb
    Total=80 Used=0 Free=80 Reserved=0 Remote=yes
    LastConsumed=0 LastDeficit=0 LastUpdate=2023-02-28T16:36:55

Now, our cron job comes in and updates the LastConsumed value to 30, while the cluster has yet to allocate any licenses to jobs:

$ sacctmgr -i update resource foobar set lastconsumed=30
 Modified server resource ...
  foobar@slurmdb
  Cluster - blackhole	- foobar@slurmdb
$ scontrol show license
LicenseName=foobar@slurmdb
    Total=80 Used=0 Free=70 Reserved=0 Remote=yes
    LastConsumed=30 LastDeficit=10 LastUpdate=2023-02-28T16:39:27

Note that the cluster has now calculated a deficit of 10 licenses, and has noticed that it should only schedule up to 70 licenses at the moment. The cluster knows that up to 20 licenses are reserved for other clusters or external use at the moment. However, since LastConsumed was set to 30 this implies an additional 10 licenses have "gone rogue" and their usage cannot be accounted for. Thus the cluster must not assign those to any pending jobs, as it's likely that the job would fail to acquire the desired licenses.

If a further update (likely driven through cron) now reduces the LastConsumed count to 10, the deficit is now considered to have disappeared, and the cluster will make all 80 assigned licenses available again:

$ sacctmgr -i update resource foobar set lastconsumed=20
 Modified server resource ...
  foobar@slurmdb
  Cluster - blackhole	- foobar@slurmdb
$ scontrol show license
LicenseName=foobar@slurmdb
    Total=80 Used=0 Free=80 Reserved=0 Remote=yes
    LastConsumed=20 LastDeficit=0 LastUpdate=2023-02-28T16:44:26

Last modified 25 April 2024