b. Create your HPC Cluster

In this step, you create a cluster configuration that includes parameters for Amazon FSx for Lustre.

If you are not familiar with AWS ParallelCluster, we recommend that you first complete the AWS ParallelCluster lab before proceeding.

Create an Amazon S3 Bucket and Upload Files

First, create an Amazon S3 bucket and upload a file. Then, you can retrieve the file using Amazon FSx for Lustre.

  1. Open a terminal in your AWS Cloud9 instance.
  2. Run the following commands to create a new Amazon S3 bucket. These commands also retrieve and store two example files in this bucket: MatrixMarket and a velocity model from the Society of Exploration Geophysicists.

    # generate a uniqe postfix
    BUCKET_POSTFIX=$(uuidgen --random | cut -d'-' -f1)
    echo "Your bucket name will be mybucket-${BUCKET_POSTFIX}"
    aws s3 mb s3://mybucket-${BUCKET_POSTFIX}
    # retrieve local copies
    wget ftp://math.nist.gov/pub/MatrixMarket2/misc/cylshell/s3dkq4m2.mtx.gz
    wget http://s3.amazonaws.com/open.source.geoscience/open_data/seg_eage_salt/SEG_C3NA_Velocity.sgy
    # upload to your bucket
    aws s3 cp s3dkq4m2.mtx.gz s3://mybucket-${BUCKET_POSTFIX}/s3dkq4m2.mtx.gz
    aws s3 cp SEG_C3NA_Velocity.sgy s3://mybucket-${BUCKET_POSTFIX}/SEG_C3NA_Velocity.sgy
    # delete local copies
    rm s3dkq4m2.mtx.gz
    rm SEG_C3NA_Velocity.sgy

Before continuing to the next step, check the content of your bucket using the AWS CLI with the command aws s3 ls s3://mybucket-${BUCKET_POSTFIX} or the AWS console. Now, build our AWS ParallelCluster configuration.

Create a Cluster Configuration File for Amazon FSx for Lustre

This section assumes that you are familiar with AWS ParallelCluster and the process of bootstrapping a cluster.

Generate a new key-pair and new default AWS ParallelCluster configuration.

The cluster configuration that you generate for Amazon FSx for Lustre includes the following settings:

  • Scratch Lustre partition of 1.2 TB; using the Amazon S3 bucket created previously as the import and export path.
    • There are two primary deployment options for Lustre, scratch or persistent. Scratch is best for temporary storage and shorter-term processing of data. There are two deployment options for Scratch, SCRATCH_1 and SCRATCH_2. SCRATCH_1 is the default deployment type. SCRATCH_2 is the latest generation scratch filesystem, and offers higher burst throughput over baseline throughput and also in-transit encryption of data.
  • Set head node and compute nodes as c5 instances. c5 is the latest generation of compute-optimized instances. For this lab we will use c5.xlarge instance type for both head node and compute node which has 4 vcpus and 8 GB of memory. In production, you will want to use a smaller instance type (e.g. c5.xlarge) for your head node and a larger instance type (e.g. c5.18xlarge or c5.24xlarge) for your compute nodes.

  • A placement group to maximize the bandwidth between instances and reduce the latency.

  • Set the cluster to 0 compute nodes when starting, the minimum size to 0, and maximum size to 8 instances. The cluster uses Auto Scaling Groups that will grow and shrink between the min and max limits based on the cluster utilization and job queue backlog.

  • A GP2 Amazon EBS volume will be attached to the head node then shared through NFS to be mounted by the compute nodes on /shared. It is generally a good location to store applications or scripts. Keep in mind that the /home directory is shared on NFS as well.

  • The job scheduler is SLURM

For more details about the configuration options, see the AWS ParallelCluster User Guide and the fsx parameters section of the AWS ParallelCluster User Guide.

If you are using a different terminal than above, make sure that the Amazon S3 bucket name is correct.

Paste the following commands into your terminal:

# generate a new keypair, remove those lines if you want to use the previous one
aws ec2 create-key-pair --key-name lab-3-key --query KeyMaterial --output text > ~/.ssh/lab-3-key
chmod 600 ~/.ssh/lab-3-key

# create the cluster configuration
IFACE=$(curl --silent
SUBNET_ID=$(curl --silent${IFACE}/subnet-id)
VPC_ID=$(curl --silent${IFACE}/vpc-id)

cat > my-fsx-cluster.ini << EOF
aws_region_name = ${REGION}

cluster_template = default
update_check = false
sanity_check = true

[vpc public]
vpc_id = ${VPC_ID}
master_subnet_id = ${SUBNET_ID}

[cluster default]
key_name = lab-3-key
vpc_settings = public
base_os = alinux2
master_instance_type = c5.xlarge
ebs_settings = myebs
fsx_settings = myfsx
queue_settings = compute
scheduler = slurm
s3_read_resource = arn:aws:s3:::*

[queue compute]
compute_resource_settings = default
disable_hyperthreading = true
placement_group = DYNAMIC

[compute_resource default]
instance_type = c5.xlarge
min_count = 0
max_count = 8

[ebs myebs]
shared_dir = /shared
volume_type = gp2
volume_size = 20

[fsx myfsx]
shared_dir = /lustre
storage_capacity = 1200
import_path =  s3://mybucket-${BUCKET_POSTFIX}
deployment_type = SCRATCH_2

ssh = ssh {CFN_USER}@{MASTER_IP} {ARGS}

If you want to check the content of your configuration file, use the following command:

cat my-fsx-cluster.ini

Now, you are ready to create a cluster.

Generate a Cluster for Amazon FSx for Lustre

Create the cluster using the following command.

pcluster create my-fsx-cluster -c my-fsx-cluster.ini

This cluster generates additional resources for Amazon FSx for Lustre which will take a few minutes longer to create than the previous AWS ParallelCluster workshop.

Creating your Lustre file-system will take some time to provision. In the meantime you can look at the resources being created on your behalf on the AWS CloudFormation page of your AWS Console.

Connect to Your Cluster

Once created, connect to your cluster.

pcluster ssh my-fsx-cluster -i ~/.ssh/lab-3-key

Next, take a deeper look at the Lustre file system.