Logo Hewlett Packard Enterprise

Cluster Administration using HPE SGI Management Suite

Trainings-ID:
H6LM8S

Inhalt des Trainings

The HPE SGI Management Suite cluster administration course provides knowledge and practice in basic cluster administration areas such as cluster software installation ,cluster configuration, administration commands, software repository and image management, provisioning, application installation, monitoring with Ganglia and Nagios, and troubleshooting the cluster.


After you successfully complete this course, expect to be able to:
  • Use the ipmitool command to setup for cluster admin node imaging
  • Setup Serial Over Lan for console access and power control
  • Troubleshoot startup problems
  • Configure a cluster using the SGI Management Center 3 (SMC3)
  • Image compute nodes
  • Run InfiniBand commands
  • Setup user accounts
  • Run MPI applications across the cluster
  • Monitor a running cluster with Ganglia and Nagios
  • Add and remove compute nodes
  • Install and setup a batch scheduler
  • Submit batch jobs with a batch schedule

Zielgruppen

  • Attend this course if you administer HPE SGI Management Suite on HPE SGI 8600 clusters or SGI Management Center 3 on SGI ICE clusters.
  • Experienced Linux System Administrators
  • Experienced Linux users who must maintain their own system

Vorkenntnisse

  • Editing text with the vi editor
  • Recognizing regular expression syntax
  • Accessing documentation with man and info file viewers
  • Monitoring, managing and maintaining log files
  • Entering common commands at the bash command line; creating and interpreting basic bash shell scripts
  • Installing and configuring standard software components, services, and security feature
  • Configuring basic communication protocols that support networked communications
  • Creating and modifying crontabs
  • Monitoring resources usage, familiarity with basic monitoring tools
  • Installing and configuring a Linux distribution on a server
  • Creating, modifying, and deleting user accounts and group accounts
  • Partitioning disks, managing filesystems and logical volumes
  • Using RPM package management
  • Installing and using virtualized systems
  • Understanding basic hardware and hardware troubleshooting

Detail-Inhalte

Overview
  • Identify flat and hierarchical cluster topologies
  • Explain the function of admin, rack leader, compute (service), and ice-compute node roles
  • Describe the network VLAN layout
  • Recognize the interface naming conventions

 Installation
  • Install the admin node
  • Install HPE SGI Management Suite software
  • Copy distribution and HPE Performance Software - Message Passing Interface RPMs to the repository on the admin node
  • Specify the cluster domain name
  • Add patches or updates
  • Setup network time protocol (NTP)
  • Build database and the rack lead, compute (service), and ice-compute images

Discovery
  • Use the discover command to add lead and compute node to the cluster database
  • Use the discover command to image the lead and compute nodes
  • Use the discover command to monitor the automated addition of ice-compute nodes to the cluster
  • Review the structure of the discover configfile
  • Reset the cluster database

 Data Networks
  • List data network interconnects
  • Identify key InfiniBand (IB) features
  • Identify IB fabric components and functions
  • Configure basic OpenSM software
  • Run basic IB diagnostics

Monitoring
  • Use the Ganglia web interface to monitor the cluster
  • Monitor the cluster with common utilities 

Customize the Cluster
  • Maintain repository and rebuild images with custom RPM lists
  • Configure cluster services
  • Use cimage to manage ice-compute node images
  • Use cinstallman to manage node images

Cluster User Environment
  • Use the pdsh commands
  • Use the module command
  • Compile and run test programs using the MPI environment

 Post-install Scripts
  • Review the post-installation scripts feature for compute and lead nodes
  • Review the per-host customization scripts feature for icecompute
  • Use post-install scripts to ap

 Maintenance
  • Identify if a node has failed
  • Get failure information
  • Disable the node
  • Re-enable the node
  • Review cadmin options
  • Monitor BMC/CMC/ECC environmental events
  • Update the cluster

Troubleshooting
  • Use system_info_gather and dbdump for system inventory
  • Review cluster log files
  • Obtain a traceback with nodetrace
  • Review lead node XFS project quotas 

Downloads

Terminanfrage

Ab  3.360,-*

*Preis pro Teilnehmer*in ohne Zusatzoptionen, exkl. MwSt.

Sie haben Fragen?

Ihr ETC Support

Kontaktieren Sie uns!

+43 1 533 1777-99

Hidden
Hidden
Hidden