Cluster Administration using HPE SGI Management Suite - H6LM8S

Beschreibung

The HPE SGI Management Suite cluster administration course provides knowledge and practice in basic cluster administration areas such as cluster software installation ,cluster configuration, administration commands, software repository and image management, provisioning, application installation, monitoring with Ganglia and Nagios, and troubleshooting the cluster.


After you successfully complete this course, expect to be able to:
  • Use the ipmitool command to setup for cluster admin node imaging
  • Setup Serial Over Lan for console access and power control
  • Troubleshoot startup problems
  • Configure a cluster using the SGI Management Center 3 (SMC3)
  • Image compute nodes
  • Run InfiniBand commands
  • Setup user accounts
  • Run MPI applications across the cluster
  • Monitor a running cluster with Ganglia and Nagios
  • Add and remove compute nodes
  • Install and setup a batch scheduler
  • Submit batch jobs with a batch schedule

expand_more chevron_right Zielgruppe

Ideal candidate for this course:
  • Attend this course if you administer HPE SGI Management Suite on HPE SGI 8600 clusters or SGI Management Center 3 on SGI ICE clusters.
  • Experienced Linux System Administrators
  • Experienced Linux users who must maintain their own system

    expand_more chevron_right Vorkenntnisse

    The following knowledge is recommended for this seminar:
    • Editing text with the vi editor
    • Recognizing regular expression syntax
    • Accessing documentation with man and info file viewers
    • Monitoring, managing and maintaining log files
    • Entering common commands at the bash command line; creating and interpreting basic bash shell scripts
    • Installing and configuring standard software components, services, and security feature
    • Configuring basic communication protocols that support networked communications
    • Creating and modifying crontabs
    • Monitoring resources usage, familiarity with basic monitoring tools
    • Installing and configuring a Linux distribution on a server
    • Creating, modifying, and deleting user accounts and group accounts
    • Partitioning disks, managing filesystems and logical volumes
    • Using RPM package management
    • Installing and using virtualized systems
    • Understanding basic hardware and hardware troubleshooting

    expand_more chevron_right Detail-Inhalte

    Overview
    • Identify flat and hierarchical cluster topologies
    • Explain the function of admin, rack leader, compute (service), and ice-compute node roles
    • Describe the network VLAN layout
    • Recognize the interface naming conventions

     Installation
    • Install the admin node
    • Install HPE SGI Management Suite software
    • Copy distribution and HPE Performance Software - Message Passing Interface RPMs to the repository on the admin node
    • Specify the cluster domain name
    • Add patches or updates
    • Setup network time protocol (NTP)
    • Build database and the rack lead, compute (service), and ice-compute images

    Discovery
    • Use the discover command to add lead and compute node to the cluster database
    • Use the discover command to image the lead and compute nodes
    • Use the discover command to monitor the automated addition of ice-compute nodes to the cluster
    • Review the structure of the discover configfile
    • Reset the cluster database

     Data Networks
    • List data network interconnects
    • Identify key InfiniBand (IB) features
    • Identify IB fabric components and functions
    • Configure basic OpenSM software
    • Run basic IB diagnostics

    Monitoring
    • Use the Ganglia web interface to monitor the cluster
    • Monitor the cluster with common utilities 

    Customize the Cluster
    • Maintain repository and rebuild images with custom RPM lists
    • Configure cluster services
    • Use cimage to manage ice-compute node images
    • Use cinstallman to manage node images

    Cluster User Environment
    • Use the pdsh commands
    • Use the module command
    • Compile and run test programs using the MPI environment

     Post-install Scripts
    • Review the post-installation scripts feature for compute and lead nodes
    • Review the per-host customization scripts feature for icecompute
    • Use post-install scripts to ap

     Maintenance
    • Identify if a node has failed
    • Get failure information
    • Disable the node
    • Re-enable the node
    • Review cadmin options
    • Monitor BMC/CMC/ECC environmental events
    • Update the cluster

    Troubleshooting
    • Use system_info_gather and dbdump for system inventory
    • Review cluster log files
    • Obtain a traceback with nodetrace
    • Review lead node XFS project quotas 

    Terminanfrage

      Durch Angabe Ihrer E-Mail-Adresse und Anklicken des Buttons „Newsletter abonnieren“ erklären Sie sich damit einverstanden, dass ETC Ihnen regelmäßig Informationen zu IT Seminaren und weiteren Trainings- und Weiterbildungsthemen zusendet. Die Einwilligung kann jederzeit bei ETC widerrufen werden.