简介
Amazon ParallelCluster
Amazon ParallelCluster 是亚马逊云科技支持的开源集群管理工具,可帮助您部署和管理高性能计算 (HPC) 集群。ParallelCluster 是建立在开源 CfnCluster 项目的基础上,Amazon ParallelCluster 可以快速构建 HPC 计算环境。自动设置所需的计算资源和共享文件系统。可以在 Amazon ParallelCluster 环境中使用批处理调度器 Amazon Batch 或 Slurm,旧版本 ParallelCluster 还支持 PBS 和 SGE。
Amazon ParallelCluster 便于快速启动概念验证部署和生产部署。也可以在 Amazon ParallelCluster 基础之上构建更高级别的工作流程,例如 CFD 高性能计算。
Amazon ParallelCluster 可以使用多个 Amazon HPC 服务,例如图形展示的 NICE DCV 和高性能计算文件系统 FSX Lustre 。DCV 可以使用在 CFD 前后处理上,典型的场景是工程师可以通过 DCV 使用 CFD Post 打开最终的计算模型,进行查看验证。也可以通过 ICEM 进行前处理操作。FSX Lustre 提供符合高性能计算需求的带宽和延迟。
NICE DCV
NICE DCV 是一种高性能远程显示协议,为客户提供一种安全的方式,可以在各种网络条件下,将远程桌面和应用程序从任何云或数据中心流式传输到任何设备。借助 NICE DCV 和 Amazon EC2,客户可以在 EC2 实例上远程运行图形密集型应用程序,并将结果流式传输到客户端计算机上,从而无需昂贵的专用工作站。跨多种 HPC 工作负载的客户使用 NICE DCV 满足其远程可视化要求。在 Amazon EC2 上使用 NICE DCV 不会产生任何额外费用。您只需为用于运行和存储工作负载的 EC2 资源付费。
FSx for Lustre
FSx for Lustre 使启动和运行流行的高性能 Lustre 文件系统变得轻松且经济高效。您可以使用 Lustre 来处理如机器学习、高性能计算 (HPC)、视频处理和财务建模。
开源 Lustre 文件系统专为需要快速存储的应用程序而设计。Lustre 旨在解决快速、廉价地处理世界上不断增长的数据集的问题。这是一个广泛使用的文件系统,专为世界上速度最快的计算机而设计。它提供亚毫秒级的延迟、高达数百 GB 的吞吐量以及高达数百万 IOPS。
作为一项完全托管的服务,Amazon FSx 可迅速地将 Lustre 用于存储速度至关重要的工作负载。FSx for Lustre 消除了设置和管理 Lustre 文件系统的传统复杂性,使您能够在几分钟内启动高性能文件系统。它还提供了多种部署选项,因此您可以根据需求优化成本。
FSx for Lustre 符合 POSIX 标准,因此您可以使用当前基于 Linux 的应用程序,而无需进行任何更改。可以像任何文件系统在 Linux 操作系统中一样工作。它还提供先写后读一致性,并支持文件锁定。
ANSYS Fluent
ANSYS Fluent 是国际上比较流行的商用 CFD 软件包,在美国的市场占有率为 60%,凡是和流体、热传递和化学反应等有关的工业均可使用。它具有丰富的物理模型、先进的数值方法和强大的前后处理功能,在航空航天、汽车设计、石油天然气和涡轮机设计等方面都有着广泛的应用。
Slurm
ParallelCluster 3 集成了 Slurm 和 Batch 作业调度系统,Slurm 是适用于 CFD 作业调度。Slurm(Simple Linux Utility for Resource Management,http://slurm.schedmd.com/ )是开源的、具有容错性和高度可扩展的 Linux 集群超级计算系统资源管理和作业调度系统。超级计算系统可利用 Slurm 对资源和作业进行管理,以避免相互干扰,提高运行效率。所有需运行的作业,无论是用于程序调试还是业务计算,都可以通过交互式并行 srun 、批处理式 sbatch 或分配式 salloc 等命令提交,提交后可以利用相关命令查询作业状态等。
方案部署
安装 ParallelCluster
前提条件
Amazon ParallelCluster 需要 Python 3.6 或更高版本。如果还没有安装,需要先从https://www.python.org/downloads/ 下载兼容的版本,进行安装。
$ python3
Python 3.7.10 (default, Jun 3 2021, 00:02:01)
[GCC 7.3.1 20180712 (Red Hat 7.3.1-13)] on linux
Type "help", "copyright", "credits" or "license" for more information.、
>>>
*左滑查看更多
安装虚拟环境 virtualenv
$ python3 -m pip install --upgrade pip
Defaulting to user installation because normal site-packages is not writeable
Collecting pip
Downloading pip-22.2.1-py3-none-any.whl (2.0 MB)
|████████████████████████████████| 2.0 MB 44.7 MB/s
Installing collected packages: pip
Successfully installed pip-22.2.1
$ python3 -m pip install --user --upgrade virtualenv
Collecting virtualenv
Downloading virtualenv-20.16.2-py2.py3-none-any.whl (8.8 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.8/8.8 MB 89.1 MB/s eta 0:00:00
Collecting distlib<1,>=0.3.1
Downloading distlib-0.3.5-py2.py3-none-any.whl (466 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 467.0/467.0 kB 71.2 MB/s eta 0:00:00
Collecting importlib-metadata>=0.12
Downloading importlib_metadata-4.12.0-py3-none-any.whl (21 kB)
Collecting platformdirs<3,>=2
Downloading platformdirs-2.5.2-py3-none-any.whl (14 kB)
Collecting filelock<4,>=3.2
Downloading filelock-3.7.1-py3-none-any.whl (10 kB)
Collecting typing-extensions>=3.6.4
Downloading typing_extensions-4.3.0-py3-none-any.whl (25 kB)
Collecting zipp>=0.5
Downloading zipp-3.8.1-py3-none-any.whl (5.6 kB)
Installing collected packages: distlib, zipp, typing-extensions, platformdirs, filelock, importlib-metadata, virtualenv
Successfully installed distlib-0.3.5 filelock-3.7.1 importlib-metadata-4.12.0 platformdirs-2.5.2 typing-extensions-4.3.0 virtualenv-20.16.2 zipp-3.8.
*左滑查看更多
创建 virtualenv,并命名
$ python3 -m virtualenv ~/apc-ve
created virtual environment CPython3.7.10.final.0-64 in 850ms
creator CPython3Posix(dest=/home/ec2-user/apc-ve, clear=False, no_vcs_ignore=False, global=False)
seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/home/ec2-user/.local/share/virtualenv)
added seed packages: pip==22.2.1, setuptools==63.2.0, wheel==0.37.1
activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator
*左滑查看更多
这个时候会在当前目录下生成文件夹 apc-ve
激活新的 virtualenv
$ source ~/apc-ve/bin/activate
*左滑查看更多
在虚拟环境下安装 Amazon ParallelCluster
$ python3 -m pip install --upgrade "aws-parallelcluster"
Collecting aws-parallelcluster
Downloading aws_parallelcluster-3.2.0-py3-none-any.whl (424 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 425.0/425.0 kB 37.8 MB/s eta 0:00:00
Collecting aws-cdk.aws-batch!=1.153.0,~=1.137
Downloading aws_cdk.aws_batch-1.167.0-py3-none-any.whl (333 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 333.6/333.6 kB 52.3 MB/s eta 0:00:00
Collecting jmespath~=0.10
Downloading jmespath-0.10.0-py2.py3-none-any.whl (24 kB)
Collecting aws-cdk.aws-cloudwatch!=1.153.0,~=1.137
Downloading aws_cdk.aws_cloudwatch-1.167.0-py3-none-any.whl (379 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 379.1/379.1 kB 44.9 MB/s eta 0:00:00
Collecting aws-cdk.core!=1.153.0,~=1.137
Downloading aws_cdk.core-1.167.0-py3-none-any.whl (1.4 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.4/1.4 MB 95.1 MB/s eta 0:00:00
……
Collecting certifi>=2017.4.17
Downloading certifi-2022.6.15-py3-none-any.whl (160 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 160.2/160.2 kB 41.5 MB/s eta 0:00:00
Collecting exceptiongroup
Downloading exceptiongroup-1.0.0rc8-py3-none-any.whl (11 kB)
Collecting six>=1.5
Downloading six-1.16.0-py2.py3-none-any.whl (11 kB)
Installing collected packages: publication, zipp, urllib3, typing-extensions, typeguard, tabulate, six, PyYAML, pyrsistent, pyparsing, pkgutil-resolve-name, MarkupSafe, jmespath, itsdangerous, inflection, idna, exceptiongroup, charset-normalizer, certifi, attrs, werkzeug, requests, python-dateutil, packaging, jinja2, importlib-resources, importlib-metadata, cattrs, marshmallow, jsonschema, jsii, click, botocore, s3transfer, flask, constructs, clickclick, aws-cdk.region-info, aws-cdk.cloud-assembly-schema, connexion, boto3, aws-cdk.cx-api, aws-cdk.core, aws-cdk.aws-signer, aws-cdk.aws-sam, aws-cdk.aws-imagebuilder, aws-cdk.aws-iam, aws-cdk.aws-codestarnotifications, aws-cdk.aws-acmpca, aws-cdk.assets, aws-cdk.aws-kms, aws-cdk.aws-events, aws-cdk.aws-codeguruprofiler, aws-cdk.aws-cloudwatch, aws-cdk.aws-autoscaling-common, aws-cdk.aws-ssm, aws-cdk.aws-sqs, aws-cdk.aws-s3, aws-cdk.aws-ecr, aws-cdk.aws-applicationautoscaling, aws-cdk.aws-sns, aws-cdk.aws-s3-assets, aws-cdk.aws-ecr-assets, aws-cdk.aws-logs, aws-cdk.aws-codecommit, aws-cdk.aws-stepfunctions, aws-cdk.aws-kinesis, aws-cdk.aws-ec2, aws-cdk.aws-fsx, aws-cdk.aws-elasticloadbalancing, aws-cdk.aws-efs, aws-cdk.aws-lambda, aws-cdk.aws-sns-subscriptions, aws-cdk.aws-secretsmanager, aws-cdk.aws-cloudformation, aws-cdk.custom-resources, aws-cdk.aws-codebuild, aws-cdk.aws-route53, aws-cdk.aws-globalaccelerator, aws-cdk.aws-dynamodb, aws-cdk.aws-certificatemanager, aws-cdk.aws-elasticloadbalancingv2, aws-cdk.aws-cognito, aws-cdk.aws-cloudfront, aws-cdk.aws-servicediscovery, aws-cdk.aws-autoscaling, aws-cdk.aws-apigateway, aws-cdk.aws-route53-targets, aws-cdk.aws-autoscaling-hooktargets, aws-cdk.aws-ecs, aws-cdk.aws-batch, aws-parallelcluster
Successfully installed MarkupSafe-2.1.1 PyYAML-5.4.1 attrs-21.4.0 aws-cdk.assets-1.167.0 aws-cdk.aws-acmpca-1.167.0 aws-cdk.aws-apigateway-1.167.0 aws-cdk.aws-applicationautoscaling-1.167.0 aws-cdk.aws-autoscaling-1.167.0 aws-cdk.aws-autoscaling-common-1.167.0 aws-cdk.aws-autoscaling-hooktargets-1.167.0 aws-cdk.aws-batch-1.167.0 aws-cdk.aws-certificatemanager-1.167.0 aws-cdk.aws-cloudformation-1.167.0 aws-cdk.aws-cloudfront-1.167.0 aws-cdk.aws-cloudwatch-1.167.0 aws-cdk.aws-codebuild-1.167.0 aws-cdk.aws-codecommit-1.167.0 aws-cdk.aws-codeguruprofiler-1.167.0 aws-cdk.aws-codestarnotifications-1.167.0 aws-cdk.aws-cognito-1.167.0 aws-cdk.aws-dynamodb-1.167.0 aws-cdk.aws-ec2-1.167.0 aws-cdk.aws-ecr-1.167.0 aws-cdk.aws-ecr-assets-1.167.0 aws-cdk.aws-ecs-1.167.0 aws-cdk.aws-efs-1.167.0 aws-cdk.aws-elasticloadbalancing-1.167.0 aws-cdk.aws-elasticloadbalancingv2-1.167.0 aws-cdk.aws-events-1.167.0 aws-cdk.aws-fsx-1.167.0 aws-cdk.aws-globalaccelerator-1.167.0 aws-cdk.aws-iam-1.167.0 aws-cdk.aws-imagebuilder-1.167.0 aws-cdk.aws-kinesis-1.167.0 aws-cdk.aws-kms-1.167.0 aws-cdk.aws-lambda-1.167.0 aws-cdk.aws-logs-1.167.0 aws-cdk.aws-route53-1.167.0 aws-cdk.aws-route53-targets-1.167.0 aws-cdk.aws-s3-1.167.0 aws-cdk.aws-s3-assets-1.167.0 aws-cdk.aws-sam-1.167.0 aws-cdk.aws-secretsmanager-1.167.0 aws-cdk.aws-servicediscovery-1.167.0 aws-cdk.aws-signer-1.167.0 aws-cdk.aws-sns-1.167.0 aws-cdk.aws-sns-subscriptions-1.167.0 aws-cdk.aws-sqs-1.167.0 aws-cdk.aws-ssm-1.167.0 aws-cdk.aws-stepfunctions-1.167.0 aws-cdk.cloud-assembly-schema-1.167.0 aws-cdk.core-1.167.0 aws-cdk.custom-resources-1.167.0 aws-cdk.cx-api-1.167.0 aws-cdk.region-info-1.167.0 aws-parallelcluster-3.2.0 boto3-1.24.44 botocore-1.27.44 cattrs-22.1.0 certifi-2022.6.15 charset-normalizer-2.1.0 click-8.1.3 clickclick-20.10.2 connexion-2.13.1 constructs-3.4.58 exceptiongroup-1.0.0rc8 flask-2.2.0 idna-3.3 importlib-metadata-4.12.0 importlib-resources-5.9.0 inflection-0.5.1 itsdangerous-2.1.2 jinja2-3.1.2 jmespath-0.10.0 jsii-1.63.2 jsonschema-4.9.0 marshmallow-3.17.0 packaging-21.3 pkgutil-resolve-name-1.3.10 publication-0.0.3 pyparsing-3.0.9 pyrsistent-0.18.1 python-dateutil-2.8.2 requests-2.28.1 s3transfer-0.6.0 six-1.16.0 tabulate-0.8.10 typeguard-2.13.3 typing-extensions-4.3.0 urllib3-1.26.11 werkzeug-2.2.1 zipp-3.8.
*左滑查看更多
安装 Node Version Manager 和 Node.js
AWS Cloud Development Kit (AWS CDK)模板生成会使用到Node Version Manager和Node.js。
$ curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.38.0/install.sh | bash
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 14926 100 14926 0 0 469k 0 --:--:-- --:--:-- --:--:-- 485k
=> Downloading nvm as script to '/home/ec2-user/.nvm'
=> Appending nvm source string to /home/ec2-user/.bashrc
=> Appending bash_completion source string to /home/ec2-user/.bashrc
=> Close and reopen your terminal to start using nvm or run the following to use it now:
export NVM_DIR="$HOME/.nvm"
[ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh" # This loads nvm
[ -s "$NVM_DIR/bash_completion" ] && \. "$NVM_DIR/bash_completion" # This loads nvm bash_completion
$ chmod ug+x ~/.nvm/nvm.sh
$ source ~/.nvm/nvm.sh
$ nvm install --lts
Installing latest LTS version.
Downloading and installing node v16.16.0...
Downloading https://nodejs.org/dist/v16.16.0/node-v16.16.0-linux-x64.tar.xz...
################################################################################################################################################################################## 100.0%
Computing checksum with sha256sum
Checksums matched!
Now using node v16.16.0 (npm v8.11.0)
Creating default alias: default -> lts/* (-> v16.16.0)
$ node - version
*左滑查看更多
验证 Amazon ParallelCluster 安装正确
激活新的 virtualenv
$ source ~/apc-ve/bin/activate
$ pcluster version
{
"version": "3.2.0"
}
*左滑查看更多
配置 Amazon ParallelCluster
$ aws configure
AWS Access Key ID [None]: AKIA5OZOUQ4F2T4IMAOS
AWS Secret Access Key [None]: XXX
Default region name [None]: cn-northwest-1
Default output format [None]:
$ pcluster configure --config cluster-config.yaml
INFO: Configuration file cluster-config.yaml will be written.
Press CTRL-C to interrupt the procedure.
Allowed values for AWS Region ID:
1. cn-north-1
2. cn-northwest-1
AWS Region ID [cn-northwest-1]:
Allowed values for EC2 Key Pair Name:
1. LL-K2
EC2 Key Pair Name [LL-K2]:
Allowed values for Scheduler:
1. slurm
2. awsbatch
Scheduler [slurm]:
Allowed values for Operating System:
1. alinux2
2. centos7
3. ubuntu1804
4. ubuntu2004
Operating System [alinux2]: alinux2
Head node instance type [t2.micro]: c5.large
Number of queues [1]:
Name of queue 1 [queue1]:
Number of compute resources for queue1 [1]:
Compute instance type for compute resource 1 in queue1 [t2.micro]: c5.xlarge
Maximum instance count [10]:
Automate VPC creation? (y/n) [n]:
Allowed values for VPC ID:
# id name number_of_subnets
--- --------------------- ----------------- -------------------
1 vpc-003630feddf7d2417 EKS 2
2 vpc-013d1e62cfa405b8e ECS 2
3 vpc-0252e11202ae27e51 2
4 vpc-9b64d8f2 HPC 3
VPC ID [vpc-003630feddf7d2417]: vpc-9b64d8f2
Automate Subnet creation? (y/n) [y]:
Allowed values for Availability Zone:
1. cn-northwest-1a
2. cn-northwest-1b
3. cn-northwest-1c
Availability Zone [cn-northwest-1a]:
Allowed values for Network Configuration:
1. Head node in a public subnet and compute fleet in a private subnet
2. Head node and compute fleet in the same public subnet
Network Configuration [Head node in a public subnet and compute fleet in a private subnet]:
Creating CloudFormation stack...
Do not leave the terminal until the process has finished.
Stack Name: parallelclusternetworking-pubpriv-20220729030718 (id: arn:aws-cn:cloudformation:cn-northwest-1:925126395659:stack/parallelclusternetworking-pubpriv-20220729030718/b846e230-0eeb-11ed-979c-0a9d1a8a4fe6)
Status: parallelclusternetworking-pubpriv-20220729030718 - CREATE_COMPLETE
The stack has been created.
Configuration file written to cluster-config.yaml
You can edit your configuration file or simply run 'pcluster create-cluster --cluster-configuration cluster-config.yaml --cluster-name cluster-name --region cn-northwest-1' to create your cluster.
*左滑查看更多
创建 CFD 集群
配置文件
按照 HPC/CFD 运行需要修改 cluster-config.yaml,增加前后处理所需的 DCV 远程可视化,还有流体计算所需的高性能计算文件系统 Fsx Lustre。
1.NICE DCV
Dcv:
Enabled: true
2.Fsx Lustre
SharedStorage:
- MountDir: /fsx
Name: ParallelFileSystem
StorageType: FsxLustre
FsxLustreSettings:
StorageCapacity: 1200
DeploymentType: PERSISTENT_1
ImportedFileChunkSize: 1024
ExportPath: s3://plljdi-fs1/export
ImportPath: s3://plljdi-fs1
PerUnitStorageThroughput: 200
*左滑查看更多
当前 ANSYS Fluent 支持 Centos 7操作系统, Amazon Linux 2 不在 ANSYS 官方认证的系统里面。
创建集群
$ pcluster create-cluster --cluster-name cfd-cluster --cluster-configuration cfd-cluster-config.yaml
{
"cluster": {
"clusterName": "cfd-cluster",
"cloudformationStackStatus": "CREATE_IN_PROGRESS",
"cloudformationStackArn": "arn:aws-cn:cloudformation:cn-northwest-1:925126395659:stack/test-cluster/348e1c40-0eed-11ed-b3f5-0a96b85a5424",
"region": "cn-northwest-1",
"version": "3.1.4",
"clusterStatus": "CREATE_IN_PROGRESS"
}
}
*左滑查看更多
查询集群信息
$ pcluster describe-cluster --cluster-name cfd-cluster
{
"creationTime": "2022-07-29T10:31:33.608Z",
"headNode": {
"launchTime": "2022-07-29T10:40:14.000Z",
"instanceId": "i-0e3c4967953c806a7",
"publicIpAddress": "52.83.49.88",
"instanceType": "c5.large",
"state": "running",
"privateIpAddress": "172.31.48.96"
},
"version": "3.1.4",
"clusterConfiguration": {
"url": "https://parallelcluster-02fb13f6f8ec970c-v1-do-not-delete.s3.cn-northwest-1.amazonaws.com.cn/parallelcluster/3.1.4/clusters/cfd-cluster-7p51jnbemquummo3/configs/cluster-config.yaml?versionId=sf6OxDbpIGYPjmrRfSSArCU5YRUHzCqo&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIA5OZOUQ4F2T4IMAOS%2F20220805%2Fcn-northwest-1%2Fs3%2Faws4_request&X-Amz-Date=20220805T021305Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=f7bd1e1e31bdcc9d3bf7b260d68f418e39a7239fdf4baf0983cb1e399cdea35e"
},
"tags": [
{
"value": "3.1.4",
"key": "parallelcluster:version"
}
],
"cloudFormationStackStatus": "CREATE_COMPLETE",
"clusterName": "cfd-cluster",
"computeFleetStatus": "RUNNING",
"cloudformationStackArn": "arn:aws-cn:cloudformation:cn-northwest-1:925126395659:stack/cfd-cluster/9dc591c0-0f29-11ed-a5cd-02357b891a1c",
"lastUpdatedTime": "2022-07-29T10:31:33.608Z",
"region": "cn-northwest-1",
"clusterStatus": "CREATE_COMPLETE"
}
$ pcluster list-clusters --query 'clusters[?clusterName==`cfd-cluster`]'
[
{
"clusterName": "cfd-cluster",
"cloudformationStackStatus": "CREATE_IN_PROGRESS",
"cloudformationStackArn": "arn:aws-cn:cloudformation:cn-northwest-1:925126395659:stack/cfd-cluster/f7316cd0-1464-11ed-8f62-0aa55a928096",
"region": "cn-northwest-1",
"version": "3.1.4",
"clusterStatus": "CREATE_IN_PROGRESS"
}
]
*左滑查看更多
登陆集群
$ pcluster ssh --cluster-name cfd-cluster -i ~/LL-K2.pem
*左滑查看更多
检查 Slurm 集群状态
sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
queue1* up infinite 10 idle~ queue1-dy-c5xlarge-[1-10]
sinfo -l
Fri Aug 05 02:56:34 2022
PARTITION AVAIL TIMELIMIT JOB_SIZE ROOT OVERSUBS GROUPS NODES STATE NODELIST
queue1* up infinite 1-infinite no NO all 10 idle~ queue1-dy-c5xlarge-[1-10]
squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
srun -n4 -l hostname
0: queue1-dy-c5xlarge-1
2: queue1-dy-c5xlarge-1
3: queue1-dy-c5xlarge-1
1: queue1-dy-c5xlarge-1
*左滑查看更多
DCV 登陆
DCV dcv-connect参数
pcluster dcv-connect [-h]
--cluster-name CLUSTER_NAME
[--debug]
[--key-path KEY_PATH]
[--region REGION]
[--show-url]
$ pcluster dcv-connect --cluster-name cfd-cluster --key-path ~/LL-K2.pem --show-url
*左滑查看更多
Please use the following one-time URL in your browser within 30 seconds:
https://52.83.49.88:8443?authToken=Xh92zh9pJ3bWK1Sn_2gzdUEnf4GwjYWYyMmh2bWSq4n8Pm4jUWWbqCOuBG6CdWBLFpPwZLmi7WC8PM7t44DWwL9Lr85Cu_QWTaEg-A9tywg3TjA2waXRzQhhI8-URnDWfTpC8l6Od5IkaUyiAjqybRfK2a41yYHNYSYUc3uWL_UNKYgjjoqCjvwFyBpKa0WGo88mODGpLkyWNhU6dqiWTK-BMqbSXl3SttPQOgge6YIwvSyKB28rmP0JoyC4SkvN#8DWPj4h0HXiKPbh1yZ69
打开浏览器,通过链接登陆集群管理节点。CFD 前后处理阶段可以通过 DCV 登陆在管理节点进行,可以根据 CFD 前后处理资源需求,配置带有 GPU 的机器。
安装 Fluent 软件
从 ANSYS 官方拿到安装介质和授权文件,通过 DCV 登陆到管理节点,将软件安装到共享存储 Fsx Lustre 目录下,这样所有的计算节点都能运行 Fluent 相关组件。按照安装提示往下走。
安装好之后,配置 License 访问端口。修改 ansyslmd.ini 文件,将以下两条记录添加进去。
SERVER=1055@licenseServer
ANSYSLI_SERVERS=2325@licenseServer
运行 Fluent 和 CFD-Post 软件
运行 Fluent
通过 NICE DCV 登陆,然后运行 /fsx/apps/ansys_inc/v195/fluent/bin/fluent
用户可以通过 Fluent 来进行 CFD 的仿真模拟,因为当前 Fluent GUI 还不支持 Slurm 调度,可以通过脚本集成的方式,把 Fluent 作业提交给 Slurm sbatch。
运行 CFD-Post
在 Amazon Linux 2 下,需要正确设置 LD_LIBRARY_PATH 环境变量,因为可能会存在一些lib库,运行环境需要指定的。
export LD_LIBRARY_PATH=/fsx/apps/ansys_inc/v195/commonfiles/CFX/support/fluentio/lib/linx64/:$LD_LIBRARY_PATH
*左滑查看更多
运行 /fsx/apps/ansys_inc/v195/CFD-Post/bin/cfdpost,
通过 CFD-Post 查看模型仿真计算结果。
例如 perf_IndyCar.res 结果文件。
资源回收
当我们不在需要计算环境的情况下,需要删除 CFD 集群。
pcluster delete-cluster --region cn-northwest-1 --cluster-name cfd-cluster
*左滑查看更多
通过 Amazon Console,删除 Cloud Formation networking stack
删除 VPC ,如果是新建的 VPC。
本篇作者
林磊
资深高性能计算行业和 SaaS 行业专家。毕业于中国科学技术大学和中科院软件研究所。加入亚马逊云科技之前,曾就职于 IBM 和 ANSYS China,主持过多个超算和 EDA 和 CAE 高性能系统建设。作为产品经理参与 CAE Workspace 平台研发工作(调度系统)。研究生期间,参与过分布式密码计算项目,该项目由国家自然科学基金支持。
听说,点完下面4个按钮
就不会碰到bug了!