LAMPI

LA-MPI is an implementation of the Message Passing Interface (MPI) motivated by a growing need for fault tolerance at the software level in large high-performance computing (HPC) systems.LA-MPI has two primary goals: network fault tolerance and high performance. Network fault tolerance is acheived by implementing a highly efficient checksum/retransmission protocol. The integrity of delivered data is (optionally) verified at the user-level using a checksum or CRC. Data that is corrupt (or never delivered) is retransmitted. As for high performance, LA-MPI’s lightweight checksum/retransmission protocol allows us to achieve low latency messaging. Furthermore, the flexible approach taken to the use of redundant data paths in a network-device-rich system leads to high network bandwidth since different messages and/or message-fragments can be sent in parallel along different paths. Also, since LA-MPI is developed for use on the the large systems at Los Alamos National Laboratory we have verified that LA-MPI is scalable to over 3,500 processes.
Find LAMPI at: http://public.lanl.gov/lampi/

Rating: 3.8/5. From 5 votes.
Please wait...
Share
This entry was posted in Data Communication. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *