hpc
Sponsored Links
Sponsored Links
Secleted [ 0 ] software to compare
Results 1 - 15 of about 8
HPC Toolkit 4.2.1
HPC Toolkit is a tool for profile-based performance analysis of applications. more>>
HPCToolkit is an open-source suite of multi-platform tools for profile-based performance analysis of applications. The figure provides an overview of the toolkit components and their relationships.
Main features:
- hpcrun: a tool for profiling executions of unmodified application binaries using statistical sampling of hardware performance counters.
- hpcprof & xprof: tools for interpeting sample-based execution profiles and relating them back to program source lines.
- bloop: a tool for analyzing application binaries to recover program structure; namely, to identify where loops are present and what program source lines they contain.
- hpcview: a tool for correlating program structure information, multiple sample-based performance profiles, and program source code to produce a performance database.
- hpcviewer: a java-based GUI for exploring databases consisting of performance information correlated with program source.
A program called hpcview is at the toolkits center. It takes performance profiles, program structure information, and, under the direction of a configuration file, correlates it with application source code to produce a browsable performance database.
hpcview also enables the user to define expressions to compute derived metrics as functions other metrics already defined (e.g. measured metrics read from data files or previously-computed derived metrics).
Performance databases are explored using our Java-based hpcviewer user interface that enables one to explore an applications performance data in a top-down fashion and enables one to easily navigate back and forth between performance data and source code.
The user interface presents performance data in a hierarchical display. At any time, you are looking at some program context (program, file, procedure, loop, or line). Also displayed is the data for both the parent and the children of the current context. Up and down arrows on the lines of the display are used to walk the hierarchy.
In order to speed up top-down analysis, the interface also provides `flatten and `un-flatten buttons. Their icons hint at their function. `Flatten modifies the hierarchy by eliding non-leaf children of the current node and replacing them with the grandchildren.
Unflatten reverses this. Since the tables are sorted, the flatten operation makes short work of diving into the program from the top to identify the most important files, procedures, loops and statements.
Performance data manipulated by hpcview can come from any source, as long as the profile data can be translated or saved directly to a standard, profile-like input format. To date, the principal sources of input data for hpcview have been hardware performance counter profiles.
Such profiles are generated by setting up a hardware counter to monitor events of interest (e.g., primary cache misses), to generate a trap when the counter overflows, and then to histogram the program counter values at which these traps occur. For Linux, we developed the hpcrun tool to collect profiles by sampling hardware performance counters.
This tool uses UTKs PAPI library for access to hardware performance counters. A second tool, hpcprof is used to map profiles collected using hpcrun back to program source lines. hpcprof is based on code from Curt Janssens cprof/vprof profiler. On operating systems other than Linux, we use vendor-supplied tools to collect profile data. On MIPS+Irix platforms, we use SGIs ssrun tool to collect profiles. On Alpha+Tru64, we use either with Compaqs uprofile or DCPI utilities for this purpose.
hpcview and hpcviewer can be used to view profile-like data of any type, not just data sampled from hardware performance counters. To analyze one program that contained many register spills, we built a perl script to examine assembly code generated by the SGI compilers for MIPS+Irix and create profiles that map register spills back to source code lines.
To facilitate automation, the programs in HPCToolkit are intended to be run using scripts and configuration files. Once these are set up, rerunning the program to collect new data, and all of the steps that go into generating a browsable dataset can be completely automated. The scripts automate the collection of data and conversion of profile data into a common, XML-based format.
Other performance tools (e.g. SGIs ssrun) report performance data at the line, procedure, and program level. However, since much of the time in scientific programs is spent in loops; having data at the loop level as well is critical to facilitate performance tuning.
For this reason, HPCToolkit includes a binary analyzer bloop that extracts loop nesting structure from application binaries and uses symbol table line map information to map this structure back to the source programs level. Because bloop works on binaries, this process is independent of the language used (though in practice it can be somewhat compiler dependent).
The loop nesting structure information produced by bloop enables hpcview to associate performance data with each loop in a program without incurring any additional overhead for data collection during program execution.
Supported platforms: Pentium+Linux, Opteron+Linux, Athlon+Linux, Itanium+Linux, Alpha+Tru64 and MIPS+Irix.
HPCToolkit is open-source software released with a BSD-like license.
<<lessMain features:
- hpcrun: a tool for profiling executions of unmodified application binaries using statistical sampling of hardware performance counters.
- hpcprof & xprof: tools for interpeting sample-based execution profiles and relating them back to program source lines.
- bloop: a tool for analyzing application binaries to recover program structure; namely, to identify where loops are present and what program source lines they contain.
- hpcview: a tool for correlating program structure information, multiple sample-based performance profiles, and program source code to produce a performance database.
- hpcviewer: a java-based GUI for exploring databases consisting of performance information correlated with program source.
A program called hpcview is at the toolkits center. It takes performance profiles, program structure information, and, under the direction of a configuration file, correlates it with application source code to produce a browsable performance database.
hpcview also enables the user to define expressions to compute derived metrics as functions other metrics already defined (e.g. measured metrics read from data files or previously-computed derived metrics).
Performance databases are explored using our Java-based hpcviewer user interface that enables one to explore an applications performance data in a top-down fashion and enables one to easily navigate back and forth between performance data and source code.
The user interface presents performance data in a hierarchical display. At any time, you are looking at some program context (program, file, procedure, loop, or line). Also displayed is the data for both the parent and the children of the current context. Up and down arrows on the lines of the display are used to walk the hierarchy.
In order to speed up top-down analysis, the interface also provides `flatten and `un-flatten buttons. Their icons hint at their function. `Flatten modifies the hierarchy by eliding non-leaf children of the current node and replacing them with the grandchildren.
Unflatten reverses this. Since the tables are sorted, the flatten operation makes short work of diving into the program from the top to identify the most important files, procedures, loops and statements.
Performance data manipulated by hpcview can come from any source, as long as the profile data can be translated or saved directly to a standard, profile-like input format. To date, the principal sources of input data for hpcview have been hardware performance counter profiles.
Such profiles are generated by setting up a hardware counter to monitor events of interest (e.g., primary cache misses), to generate a trap when the counter overflows, and then to histogram the program counter values at which these traps occur. For Linux, we developed the hpcrun tool to collect profiles by sampling hardware performance counters.
This tool uses UTKs PAPI library for access to hardware performance counters. A second tool, hpcprof is used to map profiles collected using hpcrun back to program source lines. hpcprof is based on code from Curt Janssens cprof/vprof profiler. On operating systems other than Linux, we use vendor-supplied tools to collect profile data. On MIPS+Irix platforms, we use SGIs ssrun tool to collect profiles. On Alpha+Tru64, we use either with Compaqs uprofile or DCPI utilities for this purpose.
hpcview and hpcviewer can be used to view profile-like data of any type, not just data sampled from hardware performance counters. To analyze one program that contained many register spills, we built a perl script to examine assembly code generated by the SGI compilers for MIPS+Irix and create profiles that map register spills back to source code lines.
To facilitate automation, the programs in HPCToolkit are intended to be run using scripts and configuration files. Once these are set up, rerunning the program to collect new data, and all of the steps that go into generating a browsable dataset can be completely automated. The scripts automate the collection of data and conversion of profile data into a common, XML-based format.
Other performance tools (e.g. SGIs ssrun) report performance data at the line, procedure, and program level. However, since much of the time in scientific programs is spent in loops; having data at the loop level as well is critical to facilitate performance tuning.
For this reason, HPCToolkit includes a binary analyzer bloop that extracts loop nesting structure from application binaries and uses symbol table line map information to map this structure back to the source programs level. Because bloop works on binaries, this process is independent of the language used (though in practice it can be somewhat compiler dependent).
The loop nesting structure information produced by bloop enables hpcview to associate performance data with each loop in a program without incurring any additional overhead for data collection during program execution.
Supported platforms: Pentium+Linux, Opteron+Linux, Athlon+Linux, Itanium+Linux, Alpha+Tru64 and MIPS+Irix.
HPCToolkit is open-source software released with a BSD-like license.
Download (0.30MB)
Added: 2006-11-28 License: GPL (GNU General Public License) Price:
1066 downloads
HPC Challenge 1.2.0
HPC Challenge is a high performance benchmark suite. more>>
HPC Challenge is a high performance benchmark suite. The HPC Challenge consists of basically 7 benchmarks:
1. HPL - the Linpack TPP benchmark which measures the floating point rate of execution for solving a linear system of equations.
2. DGEMM - measures the floating point rate of execution of double precision real matrix-matrix multiplication.
3. STREAM - a simple synthetic benchmark program that measures sustainable memory bandwidth (in GB/s) and the corresponding computation rate for simple vector kernel.
4. PTRANS (parallel matrix transpose) - exercises the communications where pairs of processors communicate with each other simultaneously. It is a useful test of the total communications capacity of the network.
5. RandomAccess - measures the rate of integer random updates of memory (GUPS).
6. FFTE - measures the floating point rate of execution of double precision complex one-dimensional Discrete Fourier Transform (DFT).
7. Communication bandwidth and latency - a set of tests to measure latency and bandwidth of a number of simultaneous communication patterns; based on b_eff (effective bandwidth benchmark).
Compiling:
The first step is to create a configuration file that reflects characteristics of your machine. The configuration file should be created in the hpl directory. This directory contains instructions (the files README and INSTALL) on how to create the configuration file. The directory hpl/setup contains many examples of configuration files. A good approach is to copy one of them to the hpl directory and if it doesnt work then change it. This file is reused by all the components of the HPC Challange suite.
When configuration is done, a file should exist in the hpl directory whose name starts with Make. and ends with the name for the system used for tests. For example, if the name of the system is Unix, the file should be named Make.Unix.
To build the benchmark executable (for the system named Unix) type: make arch=Unix. This command should be run in the top directory (not in the hpl directory). It will look in the hpl directory for the configuration file and use it to build the benchmark executable.
Configuration:
The HPC Challange is driven by a short input file named hpccinf.txt that is almost the same as the input file for HPL (customarily called HPL.dat). Refer to the file hpl/www/tuning.html for details about the input file for HPL. A sample input file is included with the HPC Challange distribution.
The differences between HPL input file and HPC Challange input file can be summarized as follows:
- Lines 3 and 4 are ignored. The output always goes to the file named hpccoutf.txt.
- There are additional lines (starting with line 33) that may (but do not have to) be used to customize the HPC Challenge benchmark. They are described below.
The additional lines in the HPC Challenge input file (compared to the HPL input file) are:
Lines 33 and 34 describe additional matrix sizes to be used for running the PTRANS benchmark (one of the components of the HPC Challange benchmark).
- Lines 35 and 36 describe additional blocking factors to be used for running PTRANS benchmark.
Just for completeness, here is the list of lines of the HPC Challanges input file with brief descriptions of their meaning:
- Line 1: ignored
- Line 2: ignored
- Line 3: ignored
- Line 4: ignored
- Line 5: number of matrix sizes for HPL (and PTRANS)
- Line 6: matrix sizes for HPL (and PTRANS)
- Line 7: number of blocking factors for HPL (and PTRANS)
- Line 8: blocking factors for HPL (and PTRANS)
- Line 9: type of process ordering for HPL
- Line 10: number of process grids for HPL (and PTRANS)
- Line 11: numbers of process rows of each process grid for HPL (and
PTRANS)
- Line 12: numbers of process columns of each process grid for HPL
(and PTRANS)
- Line 13: threshold value not to be exceeded by scaled residual for
HPL (and PTRANS)
- Line 14: number of panel factorization methods for HPL
- Line 15: panel factorization methods for HPL
- Line 16: number of recursive stopping criteria for HPL
- Line 17: recursive stopping criteria for HPL
- Line 18: number of recursion panel counts for HPL
- Line 19: recursion panel counts for HPL
- Line 20: number of recursive panel factorization methods for HPL
- Line 21: recursive panel factorization methods for HPL
- Line 22: number of broadcast methods for HPL
- Line 23: broadcast methods for HPL
- Line 24: number of look-ahead depths for HPL
- Line 25: look-ahead depths for HPL
- Line 26: swap methods for HPL
- Line 27: swapping threshold for HPL
- Line 28: form of L1 for HPL
- Line 29: form of U for HPL
- Line 30: value that specifies whether equilibration should be used
by HPL
- Line 31: memory alignment for HPL
- Line 32: ignored
- Line 33: number of additional problem sizes for PTRANS
- Line 34: additional problem sizes for PTRANS
- Line 35: number of additional blocking factors for PTRANS
- Line 36: additional blocking factors for PTRANS
Enhancements:
- This version contains many bugfixes, major features, and minor enhancements, many of which were contributed by users.
- The major focus of this release was to improve accuracy of the reported performance results and ensure scalability of the code on the largest supercomputer installations with hundreds of thousands of computational cores.
<<less1. HPL - the Linpack TPP benchmark which measures the floating point rate of execution for solving a linear system of equations.
2. DGEMM - measures the floating point rate of execution of double precision real matrix-matrix multiplication.
3. STREAM - a simple synthetic benchmark program that measures sustainable memory bandwidth (in GB/s) and the corresponding computation rate for simple vector kernel.
4. PTRANS (parallel matrix transpose) - exercises the communications where pairs of processors communicate with each other simultaneously. It is a useful test of the total communications capacity of the network.
5. RandomAccess - measures the rate of integer random updates of memory (GUPS).
6. FFTE - measures the floating point rate of execution of double precision complex one-dimensional Discrete Fourier Transform (DFT).
7. Communication bandwidth and latency - a set of tests to measure latency and bandwidth of a number of simultaneous communication patterns; based on b_eff (effective bandwidth benchmark).
Compiling:
The first step is to create a configuration file that reflects characteristics of your machine. The configuration file should be created in the hpl directory. This directory contains instructions (the files README and INSTALL) on how to create the configuration file. The directory hpl/setup contains many examples of configuration files. A good approach is to copy one of them to the hpl directory and if it doesnt work then change it. This file is reused by all the components of the HPC Challange suite.
When configuration is done, a file should exist in the hpl directory whose name starts with Make. and ends with the name for the system used for tests. For example, if the name of the system is Unix, the file should be named Make.Unix.
To build the benchmark executable (for the system named Unix) type: make arch=Unix. This command should be run in the top directory (not in the hpl directory). It will look in the hpl directory for the configuration file and use it to build the benchmark executable.
Configuration:
The HPC Challange is driven by a short input file named hpccinf.txt that is almost the same as the input file for HPL (customarily called HPL.dat). Refer to the file hpl/www/tuning.html for details about the input file for HPL. A sample input file is included with the HPC Challange distribution.
The differences between HPL input file and HPC Challange input file can be summarized as follows:
- Lines 3 and 4 are ignored. The output always goes to the file named hpccoutf.txt.
- There are additional lines (starting with line 33) that may (but do not have to) be used to customize the HPC Challenge benchmark. They are described below.
The additional lines in the HPC Challenge input file (compared to the HPL input file) are:
Lines 33 and 34 describe additional matrix sizes to be used for running the PTRANS benchmark (one of the components of the HPC Challange benchmark).
- Lines 35 and 36 describe additional blocking factors to be used for running PTRANS benchmark.
Just for completeness, here is the list of lines of the HPC Challanges input file with brief descriptions of their meaning:
- Line 1: ignored
- Line 2: ignored
- Line 3: ignored
- Line 4: ignored
- Line 5: number of matrix sizes for HPL (and PTRANS)
- Line 6: matrix sizes for HPL (and PTRANS)
- Line 7: number of blocking factors for HPL (and PTRANS)
- Line 8: blocking factors for HPL (and PTRANS)
- Line 9: type of process ordering for HPL
- Line 10: number of process grids for HPL (and PTRANS)
- Line 11: numbers of process rows of each process grid for HPL (and
PTRANS)
- Line 12: numbers of process columns of each process grid for HPL
(and PTRANS)
- Line 13: threshold value not to be exceeded by scaled residual for
HPL (and PTRANS)
- Line 14: number of panel factorization methods for HPL
- Line 15: panel factorization methods for HPL
- Line 16: number of recursive stopping criteria for HPL
- Line 17: recursive stopping criteria for HPL
- Line 18: number of recursion panel counts for HPL
- Line 19: recursion panel counts for HPL
- Line 20: number of recursive panel factorization methods for HPL
- Line 21: recursive panel factorization methods for HPL
- Line 22: number of broadcast methods for HPL
- Line 23: broadcast methods for HPL
- Line 24: number of look-ahead depths for HPL
- Line 25: look-ahead depths for HPL
- Line 26: swap methods for HPL
- Line 27: swapping threshold for HPL
- Line 28: form of L1 for HPL
- Line 29: form of U for HPL
- Line 30: value that specifies whether equilibration should be used
by HPL
- Line 31: memory alignment for HPL
- Line 32: ignored
- Line 33: number of additional problem sizes for PTRANS
- Line 34: additional problem sizes for PTRANS
- Line 35: number of additional blocking factors for PTRANS
- Line 36: additional blocking factors for PTRANS
Enhancements:
- This version contains many bugfixes, major features, and minor enhancements, many of which were contributed by users.
- The major focus of this release was to improve accuracy of the reported performance results and ensure scalability of the code on the largest supercomputer installations with hundreds of thousands of computational cores.
Download (0.60MB)
Added: 2007-06-27 License: BSD License Price:
856 downloads
ComputeMode 1.5
ComputeMode is a system that builds a Virtual HPC Cluster using idle intranet resources. more>>
ComputeMode project is an Icatis product that extends an organizations Computational Grid through aggregation of unused computing resources.
For instance, ComputeMode lets a virtual cluster to be built using employees PCs at their idle times. Practically most PCs in large companies are not used at night, on weekends, and during employee vacations, training periods or business trips.
So ComputeMode allows you to take the benefits of the powerful computation infrastructure you already have, but you are not aware of!
Another interesting use of ComputeMode is within computer classrooms, where PCs can be automatically harnessed into a virtual cluster when they are not being used (night, week-ends, holidays).
ComputeMode relies on a ComputeMode Server that keeps track of the availability of some dedicated PCs on the local network. Each PC Owner has a weekly availability Schedule, identifying periods when he or she is not using his/her PC. For instance, a PC owner can declare working periods from 8:00 to 18:00, Monday to Friday.
In the remainder of this document, we will refer to PCs handled by ComputeMode as Processing Nodes.
The ComputeMode Administrator can easily manage the computing PCs through a Web-based interface, accessed through the ComputeMode Server.
A Grid User can submit some computational Jobs to the system through the use of a classical Batch Manager (also called "Job Management System", "Distributed Resource Managers", or "Queuing System").
The Grid User can log onto the Batch Manager from any machine of the local network (usually through the "ssh" secure shell). The Batch Manager will then reserve appropriate resources for the computations, Schedule the execution of the Jobs depending on the overall load on the Computing Grid and allocate Jobs to available computing resources.
The OAR Batch Manager is installed by default with ComputeMode, even though other products such as Platform LSF, OpenPBS or Sun Grid Engine are supported by ComputeMode.
ComputeMode can monitor the load on the Batch Manager, and detect overloads. In such cases it can allocate available Processing Nodes to computational tasks.
Each Processing Node has two different operating modes:
- The User Mode, in which the machine is working in its standard way under Microsoft Windows for instance. The Owner of the machine will not even notice that his or her PC is managed as a ComputeMode Processing Node. In particular, the computational resources of the PC will not be used while in User Mode.
- The Computation Mode (this is where the name of our product comes from) is activated when the machine is in a time period where the PC is declared as "idle". In case the ComputeMode Server detects some computational peak, the PC can be remotely switched to Computation Mode. The switch from user mode is done through an automatic reboot of the machine and proceeds to a remote boot. The remote boot is handled by the ComputeMode Server with the "Preboot eXecution Environment" (PXE) protocol, which is natively available from the BIOS of PCs since 1999. While in Computation Mode, the machine is running under the Linux Operating System, and does not have any access to any local hard disk.
When the ComputeMode Server detects that some given Processing Node is no longer available for computation, it restores the machine back to its User Mode, so that the Owner will not even notice that his or her PC has been used by ComputeMode.
The submitted Jobs can take advantage of the NFS distributed file system, which is made available from the ComputeMode Server. Each Grid User has his or her own private directory and can use it for data required by the computational Jobs, as well as to retrieve the produced output files.
A specific case may happen if a PC Owner comes back and needs his or her PC while a computational Job is being processed. This can happen, for instance, if the Owner has an urgent need and comes back at night to work on his/her PC. In such a situation, the owner always has priority over the machine. Just by using the keyboard, the Owner can abort any ongoing computational activity on his/her PC, and ComputeMode will restore the machine to its User Mode in about one minute.
Enhancements:
- UI usability improvements, code cleanup, and fixes were made.
- The client and server system were upgraded.
- New, simpler node import/export was added.
- It is available as a virtual appliance.
<<lessFor instance, ComputeMode lets a virtual cluster to be built using employees PCs at their idle times. Practically most PCs in large companies are not used at night, on weekends, and during employee vacations, training periods or business trips.
So ComputeMode allows you to take the benefits of the powerful computation infrastructure you already have, but you are not aware of!
Another interesting use of ComputeMode is within computer classrooms, where PCs can be automatically harnessed into a virtual cluster when they are not being used (night, week-ends, holidays).
ComputeMode relies on a ComputeMode Server that keeps track of the availability of some dedicated PCs on the local network. Each PC Owner has a weekly availability Schedule, identifying periods when he or she is not using his/her PC. For instance, a PC owner can declare working periods from 8:00 to 18:00, Monday to Friday.
In the remainder of this document, we will refer to PCs handled by ComputeMode as Processing Nodes.
The ComputeMode Administrator can easily manage the computing PCs through a Web-based interface, accessed through the ComputeMode Server.
A Grid User can submit some computational Jobs to the system through the use of a classical Batch Manager (also called "Job Management System", "Distributed Resource Managers", or "Queuing System").
The Grid User can log onto the Batch Manager from any machine of the local network (usually through the "ssh" secure shell). The Batch Manager will then reserve appropriate resources for the computations, Schedule the execution of the Jobs depending on the overall load on the Computing Grid and allocate Jobs to available computing resources.
The OAR Batch Manager is installed by default with ComputeMode, even though other products such as Platform LSF, OpenPBS or Sun Grid Engine are supported by ComputeMode.
ComputeMode can monitor the load on the Batch Manager, and detect overloads. In such cases it can allocate available Processing Nodes to computational tasks.
Each Processing Node has two different operating modes:
- The User Mode, in which the machine is working in its standard way under Microsoft Windows for instance. The Owner of the machine will not even notice that his or her PC is managed as a ComputeMode Processing Node. In particular, the computational resources of the PC will not be used while in User Mode.
- The Computation Mode (this is where the name of our product comes from) is activated when the machine is in a time period where the PC is declared as "idle". In case the ComputeMode Server detects some computational peak, the PC can be remotely switched to Computation Mode. The switch from user mode is done through an automatic reboot of the machine and proceeds to a remote boot. The remote boot is handled by the ComputeMode Server with the "Preboot eXecution Environment" (PXE) protocol, which is natively available from the BIOS of PCs since 1999. While in Computation Mode, the machine is running under the Linux Operating System, and does not have any access to any local hard disk.
When the ComputeMode Server detects that some given Processing Node is no longer available for computation, it restores the machine back to its User Mode, so that the Owner will not even notice that his or her PC has been used by ComputeMode.
The submitted Jobs can take advantage of the NFS distributed file system, which is made available from the ComputeMode Server. Each Grid User has his or her own private directory and can use it for data required by the computational Jobs, as well as to retrieve the produced output files.
A specific case may happen if a PC Owner comes back and needs his or her PC while a computational Job is being processed. This can happen, for instance, if the Owner has an urgent need and comes back at night to work on his/her PC. In such a situation, the owner always has priority over the machine. Just by using the keyboard, the Owner can abort any ongoing computational activity on his/her PC, and ComputeMode will restore the machine to its User Mode in about one minute.
Enhancements:
- UI usability improvements, code cleanup, and fixes were made.
- The client and server system were upgraded.
- New, simpler node import/export was added.
- It is available as a virtual appliance.
Download (453MB)
Added: 2006-11-10 License: Free To Use But Restricted Price:
1081 downloads
Los Alamos Message Passing Interface 1.5.16 RC1
Los Alamos Message Passing Interface is an implementation of the Message Passing Interface (MPI). more>>
Los Alamos Message Passing Interface is an implementation of the Message Passing Interface (MPI) motivated by a growing need for fault tolerance at the software level in large high-performance computing (HPC) systems.
This need is caused by the vast number of components present in modern HPC systems, particularly clusters. The individual components -- processors, memory modules, network interface cards (NICs), etc. -- are typically manufactured to tolerances adequate for small or desktop systems.
When aggregated into a large HPC system, however, system-wide error rates may be too great to successfully complete a long application run. For example, a network device may have an error rate which is perfectly acceptable for a desktop system, but not in a cluster of thousands of nodes, which must run error free for many hours or even days to complete a scientific calculation.
LA-MPI has two primary goals: network fault tolerance and high performance.
Network fault tolerance is acheived by implementing a highly efficient checksum/retransmission protocol. The integrity of delivered data is (optionally) verified at the user-level using a checksum or CRC. Data that is corrupt (or never delivered) is retransmitted.
As for high performance, LA-MPIs lightweight checksum/retransmission protocol allows us to achieve low latency messaging. Furthermore, the flexible approach taken to the use of redundant data paths in a network-device-rich system leads to high network bandwidth since different messages and/or message-fragments can be sent in parallel along different paths. Also, since LA-MPI is developed for use on the the large systems at Los Alamos National Laboratory we have verified that LA-MPI is scalable to over 3,500 processes.
An alternative solution to the network fault tolerance problem is to use the TCP/IP protocol. We believe, however, that this protocol -- developed to handle unreliable, inhomogeneous and oversubscribed networks -- performs poorly and is overly complex for HPC system messaging, and that LA-MPIs lightweight checksum/retransmission protocol is a more appropriate choice.
Main features:
- Standard compliant (MPI version 1.2 integrated with ROMIO for MPI-IO)
- Highly portable
- Open source (LGPL)
- Thread safe
- Optimized for SMP systems, including NUMA architectures
- Network fault tolerant (data integrity checked at user level)
- Message-fragment striping across multiple network devices
Enhancements:
- Namespace conflicts have been fixed.
- Error detection and handling of fragments has been improved.
- Bugs in memory barriers and spinlocks for x86 and x86_64 architectures have been fixed.
- Profiling and backtracing support have been added.
- Asynchronous I/O has been disabled by default as a workaround for problems with some filesystems.
- Minor timeout bugs have been fixed.
<<lessThis need is caused by the vast number of components present in modern HPC systems, particularly clusters. The individual components -- processors, memory modules, network interface cards (NICs), etc. -- are typically manufactured to tolerances adequate for small or desktop systems.
When aggregated into a large HPC system, however, system-wide error rates may be too great to successfully complete a long application run. For example, a network device may have an error rate which is perfectly acceptable for a desktop system, but not in a cluster of thousands of nodes, which must run error free for many hours or even days to complete a scientific calculation.
LA-MPI has two primary goals: network fault tolerance and high performance.
Network fault tolerance is acheived by implementing a highly efficient checksum/retransmission protocol. The integrity of delivered data is (optionally) verified at the user-level using a checksum or CRC. Data that is corrupt (or never delivered) is retransmitted.
As for high performance, LA-MPIs lightweight checksum/retransmission protocol allows us to achieve low latency messaging. Furthermore, the flexible approach taken to the use of redundant data paths in a network-device-rich system leads to high network bandwidth since different messages and/or message-fragments can be sent in parallel along different paths. Also, since LA-MPI is developed for use on the the large systems at Los Alamos National Laboratory we have verified that LA-MPI is scalable to over 3,500 processes.
An alternative solution to the network fault tolerance problem is to use the TCP/IP protocol. We believe, however, that this protocol -- developed to handle unreliable, inhomogeneous and oversubscribed networks -- performs poorly and is overly complex for HPC system messaging, and that LA-MPIs lightweight checksum/retransmission protocol is a more appropriate choice.
Main features:
- Standard compliant (MPI version 1.2 integrated with ROMIO for MPI-IO)
- Highly portable
- Open source (LGPL)
- Thread safe
- Optimized for SMP systems, including NUMA architectures
- Network fault tolerant (data integrity checked at user level)
- Message-fragment striping across multiple network devices
Enhancements:
- Namespace conflicts have been fixed.
- Error detection and handling of fragments has been improved.
- Bugs in memory barriers and spinlocks for x86 and x86_64 architectures have been fixed.
- Profiling and backtracing support have been added.
- Asynchronous I/O has been disabled by default as a workaround for problems with some filesystems.
- Minor timeout bugs have been fixed.
Download (1.3MB)
Added: 2006-08-26 License: LGPL (GNU Lesser General Public License) Price:
1155 downloads
MUNGE Uid N Gid Emporium 0.5.8
MUNGE Uid N Gid Emporium is an authentication service for creating and validating credentials. more>>
MUNGE Uid N Gid Emporium is an authentication service for creating and validating credentials. It is designed to be highly scalable for use in an HPC cluster environment.
It allows a process to authenticate the UID and GID of another local or remote process within a group of hosts having common users and groups. These hosts form a security realm that is defined by a shared cryptographic key.
Clients within this security realm can create and validate credentials without the use of root privileges, reserved ports, or platform-specific methods.
Rationale
The need for MUNGE arose out of the HPC cluster environment. Consider the scenario in which a local daemon running on a login node receives a client request and forwards it on to remote daemons running on compute nodes within the cluster. Since the user has already logged on to the login node, the local daemon just needs a reliable means of ascertaining the UID and GID of the client process. Furthermore, the remote daemons need a mechanism to ensure the forwarded authentication data has not been subsequently altered.
A common solution to this problem is to use Unix domain sockets to determine the identity of the local client, and then forward this information on to remote hosts via trusted rsh connections. But this presents several new problems. First, there is no portable API for determining the identity of a client over a Unix domain socket. Second, rsh connections must originate from a reserved port; the limited number of reserved ports available on a given host directly limits scalability. Third, root privileges are required in order to bind to a reserved port. Finally, the remote daemons have no means of determining whether the client identity is authentic.
Overview
A process creates a credential by requesting one from the local MUNGE service. The encoded credential contains the UID and GID of the originating process. This process sends the credential to another process within the security realm as a means of proving its identity. The receiving process validates the credential with the use of its local MUNGE service. The decoded credential provides the receiving process with a reliable means of ascertaining the UID and GID of the originating process. This information can be used for accounting or access control decisions.
The contents of the credential (including any optional payload data) are encrypted with a key shared by all munged daemons within the security realm. The integrity of the credential is ensured by a message authentication code (MAC). The credential is valid for a limited time defined by its time-to-live (TTL). The daemon ensures unexpired credentials are not replayed on a particular host. Decoding of a credential can be restricted to a particular user and/or group ID. The payload data can be used for purposes such as embedding the destinations address to ensure the credential is only valid on a specific host. The internal format of the credential is encoded in a platform-independent manner. And the credential itself is base64 encoded to allow it to be transmitted over virtually any transport.
Enhancements:
- A bug was fixed that caused stack corruption on AMD-64 when using Libgcrypt.
<<lessIt allows a process to authenticate the UID and GID of another local or remote process within a group of hosts having common users and groups. These hosts form a security realm that is defined by a shared cryptographic key.
Clients within this security realm can create and validate credentials without the use of root privileges, reserved ports, or platform-specific methods.
Rationale
The need for MUNGE arose out of the HPC cluster environment. Consider the scenario in which a local daemon running on a login node receives a client request and forwards it on to remote daemons running on compute nodes within the cluster. Since the user has already logged on to the login node, the local daemon just needs a reliable means of ascertaining the UID and GID of the client process. Furthermore, the remote daemons need a mechanism to ensure the forwarded authentication data has not been subsequently altered.
A common solution to this problem is to use Unix domain sockets to determine the identity of the local client, and then forward this information on to remote hosts via trusted rsh connections. But this presents several new problems. First, there is no portable API for determining the identity of a client over a Unix domain socket. Second, rsh connections must originate from a reserved port; the limited number of reserved ports available on a given host directly limits scalability. Third, root privileges are required in order to bind to a reserved port. Finally, the remote daemons have no means of determining whether the client identity is authentic.
Overview
A process creates a credential by requesting one from the local MUNGE service. The encoded credential contains the UID and GID of the originating process. This process sends the credential to another process within the security realm as a means of proving its identity. The receiving process validates the credential with the use of its local MUNGE service. The decoded credential provides the receiving process with a reliable means of ascertaining the UID and GID of the originating process. This information can be used for accounting or access control decisions.
The contents of the credential (including any optional payload data) are encrypted with a key shared by all munged daemons within the security realm. The integrity of the credential is ensured by a message authentication code (MAC). The credential is valid for a limited time defined by its time-to-live (TTL). The daemon ensures unexpired credentials are not replayed on a particular host. Decoding of a credential can be restricted to a particular user and/or group ID. The payload data can be used for purposes such as embedding the destinations address to ensure the credential is only valid on a specific host. The internal format of the credential is encoded in a platform-independent manner. And the credential itself is base64 encoded to allow it to be transmitted over virtually any transport.
Enhancements:
- A bug was fixed that caused stack corruption on AMD-64 when using Libgcrypt.
Download (0.35MB)
Added: 2007-02-06 License: GPL (GNU General Public License) Price:
991 downloads
BBCD - Bootable Cluster CD 2.2.1c
The BCCD was created to facilitate instruction of parallel computing aspects and paradigms. more>>
BCCD - Bootable Cluster CD was created to facilitate instruction of parallel computing aspects and paradigms. Part of the difficulty instructors face is lack of dedicated resources to explore distributed computing aspects lack of time to preconfigure and test the supporting environment.
The BCCD image addresses this problem by providing a non-destructive overlay way to run a full-fledged parallel computing environment on just about any workstation-class system...Were happy to say that this now includes the MAC too!
The BCCD does share similarities with a few diskless solutions for clustering, such as the Warewulf project, the thin-OSCAR approach, Cluster Knoppix (only an openMosix system, no MPI/LAM/PVM build tools, ...), and so on. This is definitely the trend in HPC. But the main differences are that the BCCD will always fit in your pocket, be highly customizable for specific institutions needs, and will always be geared toward education and not dedicated clusters.
The "gar" build system also sets the BCCD apart from other projects. "gar" is a mix between BSDs "ports" system, Linux from scratch, and gentoo Linux. With gar, you can build an entire BCCD image from net-fetched sources in about two hours (assuming you have a primed ccache!).
The BCCD is also distinctly different from NPACI-Rocks, OSCAR, Cluster in a box or other type of mass-imaging clustering project for two reasons:
1. Its a non-destructive overlay on top of the current hardware. Once a system is rebooted, it reverts back to its original state. It is intended to be booted "over top" of a currently-configured Windows/Linux/BSD/etc. system.
2. Its focus in on educational aspects of High-Performance Computing (HPC) instead of the HPC core. Students will have a much better appreciation and understanding of how to tweak an MTU setting or wire the topology across a cluster if they understand how a distributed computation is laid out! Emphasis is placed upon building, configuring, and running distributed applications.
<<lessThe BCCD image addresses this problem by providing a non-destructive overlay way to run a full-fledged parallel computing environment on just about any workstation-class system...Were happy to say that this now includes the MAC too!
The BCCD does share similarities with a few diskless solutions for clustering, such as the Warewulf project, the thin-OSCAR approach, Cluster Knoppix (only an openMosix system, no MPI/LAM/PVM build tools, ...), and so on. This is definitely the trend in HPC. But the main differences are that the BCCD will always fit in your pocket, be highly customizable for specific institutions needs, and will always be geared toward education and not dedicated clusters.
The "gar" build system also sets the BCCD apart from other projects. "gar" is a mix between BSDs "ports" system, Linux from scratch, and gentoo Linux. With gar, you can build an entire BCCD image from net-fetched sources in about two hours (assuming you have a primed ccache!).
The BCCD is also distinctly different from NPACI-Rocks, OSCAR, Cluster in a box or other type of mass-imaging clustering project for two reasons:
1. Its a non-destructive overlay on top of the current hardware. Once a system is rebooted, it reverts back to its original state. It is intended to be booted "over top" of a currently-configured Windows/Linux/BSD/etc. system.
2. Its focus in on educational aspects of High-Performance Computing (HPC) instead of the HPC core. Students will have a much better appreciation and understanding of how to tweak an MTU setting or wire the topology across a cluster if they understand how a distributed computation is laid out! Emphasis is placed upon building, configuring, and running distributed applications.
Download (200MB)
Added: 2006-03-26 License: GPL (GNU General Public License) Price:
1316 downloads
Gluster 1.2.2 (GlusterFS)
GlusterFS package contains clustered file storage that can scale to peta bytes. more>>
GlusterFS package contains clustered file storage that can scale to peta bytes. GlusterFS is a programmable system. With little thinking, you can even redesign the GlusterFS file system by re-arranging the GlusterFS components using translator interface. It is all achieved through volume specification file. This allows GlusterFS to be flexible for all kinds of storage needs. Even with all these advanced features, GlusterFS is very easy to setup and manage.
Gluster is a GNU cluster distribution aimed at commoditizing Supercomputing and Superstorage. Core of the Gluster provides a platform for developing clustering applications tailored for a specific tasks such as HPC Clustering, Storage Clustering, Enterprise Provisioning, Database Clustering etc.
<<lessGluster is a GNU cluster distribution aimed at commoditizing Supercomputing and Superstorage. Core of the Gluster provides a platform for developing clustering applications tailored for a specific tasks such as HPC Clustering, Storage Clustering, Enterprise Provisioning, Database Clustering etc.
Download (0.26MB)
Added: 2007-01-17 License: GPL (GNU General Public License) Price:
1012 downloads
Other version of Gluster
License:GPL (GNU General Public License)
BioBrew Linux 3.0.2.04
BioBrew Linux is an open source Linux distribution based on the NPACI Rocks cluster software and enhanced for bioinformaticists. more>>
BioBrew Linux is an open source Linux distribution that is enhanced for life scientists. It is customized for cluster and bioinformatics computing. It automates cluster installation, includes all the HPC software a cluster enthusiast needs, and contains popular bioinformatics applications.
BioBrew Linux is an open source Linux distribution based on the NPACI Rocks cluster software and enhanced for bioinformaticists and life scientists. While it looks, feels, and operates like ordinary Red Hat Linux, BioBrew Linux includes popular cluster software e.g. MPICH, LAM-MPI, PVM, Modules, PVFS, Myrinet GM, Sun Grid Engine, gcc, Ganglia, and Globus, *and* popular bioinformatics software e.g. the NCBI toolkit, BLAST, mpiBLAST, HMMER, ClustalW, GROMACS, PHYLIP, WISE, FASTA, and EMBOSS.
It runs on everything from notebook computers to large clusters. BioBrew Linux for the Itanium architecture is only available for purchase at this time through Callident. Please contact Brewmeister Glen Otero for information regarding BioBrew Linux on Itanium.
<<lessBioBrew Linux is an open source Linux distribution based on the NPACI Rocks cluster software and enhanced for bioinformaticists and life scientists. While it looks, feels, and operates like ordinary Red Hat Linux, BioBrew Linux includes popular cluster software e.g. MPICH, LAM-MPI, PVM, Modules, PVFS, Myrinet GM, Sun Grid Engine, gcc, Ganglia, and Globus, *and* popular bioinformatics software e.g. the NCBI toolkit, BLAST, mpiBLAST, HMMER, ClustalW, GROMACS, PHYLIP, WISE, FASTA, and EMBOSS.
It runs on everything from notebook computers to large clusters. BioBrew Linux for the Itanium architecture is only available for purchase at this time through Callident. Please contact Brewmeister Glen Otero for information regarding BioBrew Linux on Itanium.
Download (1980MB)
Added: 2005-05-18 License: GPL (GNU General Public License) Price:
1622 downloads
Secleted [ 0 ] software to compare
- Page: 1 of 1
- 1
Copyright Notice:
Software piracy is theft, Using crack, password, serial numbers, registration codes, key generators is illegal and prevent future software development. The above hpc search only lists software in full, demo and trial versions for free download. Download links are directly from our mirror sites or publisher sites, torrent files or links from rapidshare.com, yousendit.com or megaupload.com are not allowed