Scaling To Petascale Institute
| Friday, June 30, 2017Software Libraries Presenter: Lois Curfman McInnes, Argonne National Laboratory Abstract: Software libraries—high-quality, encapsulated, documented, tested, and multiuse software collections—provide widely reusable capabilities that are robust, efficient, and scalable for high-performance computing (HPC). A rich variety of HPC numerical software libraries provide easy access to sophisticated mathematical algorithms and high-performance data structures that have been developed by experts. By using numerical libraries, researchers in computational science and engineering (CSE) do not need to write this complex code and can instead focus on their primary domain interests. CSE domain scientists will generally be able to employ much better algorithms through HPC numerical libraries than they could write themselves because their own expertise in the field does not match that of specialized library developers. Moreover, HPC libraries enable advanced users to customize and extend capabilities to exploit domain-specific knowledge. This presentation will introduce a variety of open-source HPC numerical libraries, with emphasis on design principles and functionality to support large-scale CSE simulations. We will also introduce work in the xSDK (Extreme-scale Scientific Software Development Kit), where community policies are improving software quality and interoperability, as we work toward productive and sustainable software ecosystems for extreme-scale science. Parallel I/O Presenter: Jialin Liu, NERSC Abstract: There is no physics without I/O”--Anonymous Physicist, SciDAC Conference 2009. Parallel IO is an important topic in HPC. This tutorial covers the basics of parallel IO in high performance computing. Starting with the traditional HPC IO stack, we will briefly go through each critical component from hardware (HDD, RAID, Burst Buffer) to software (Parallel File System, Lustre, Datawarp) and then from I/O middleware to application. and explain common causes of typical HPC IO bottlenecks, e.g., mismatches between logical access and physical layout’. We will then discuss how to scale IO on a parallel file system (Lustre) with IO middleware (MPIIO). We will cover scaling best practice with high level IO libraries, e.g., HDF5 and its python interface, H5py. We will also briefly discuss the use of Darshan for profiling IO performance. In the last, the object store is briefly introduced. HDF5 Presenter: Frank Willmore, HDF Group Abstract: Data management is critical to the development, execution and supervision of practices that control, protect, deliver and enhance the value of science assets. This talk provides guidance on managing scientific data, including designing data models, optimizing performance, and choosing software packages, with a focus on using HDF5 and how HDF5 has been applied in a variety of science community data management scenarios. HDF5 is one of the most broadly-used I/O middleware packages for scientific data storage in today’s computing ecosystem. It is designed to organize, store, discover, access, analyze, share, and preserve diverse, complex data in continuously evolving heterogeneous computing and storage environments. Primarily designed to enhance the process of managing scientific data, HDF5 enables scientists to stay focused on their research by taking over many of the time-consuming aspects of interacting with the storage system. Globus: Simplifying Research Data Management via SaaS Presenter: Greg Nawrocki, Argonne National Laboratory
Abstract: Globus is software-as-a-service for research data management. Our goal is to make it easy for researchers to manage their data throughout its lifecycle, using just a web browser to move, share, and publish data, directly from your own storage systems. Globus provides secure, reliable, high-performance file transfer, the ability to share files with collaborators, and flexible workflows for identifying, describing, curating, and publishing data sets. Since its launch at SC10, the service has been deployed at hundreds of research institutions across the US and abroad. In this talk, we will provide an introductory overview and demonstration of Globus, and describe recent enhancements that bring additional capabilities to both researchers and research computing administrators.
Globus: Building the Modern Research Data Portal with Globus PaaS: Introduction and Transfer API
Presenter: Greg Nawrocki, Argonne National Laboratory
Abstract: We will introduce the Globus platform and describe how you can use Globus services to deliver unique data management capabilities in your applications. This will include:
• Overview of use cases: Common patterns like data publication/distribution, orchestration of data flows, etc.
• Overview of the Globus platform: Architecture and brief overview of available services
• Introduction to the Globus Transfer API: Make your first call and move data with Globus
• Introduction to the Python SDK for using Globus Auth and Transfer
You will use a Jupyter notebook to experiment with the Globus Transfer API, using it to manage endpoints, transfer and share files. We will also demonstrate a simple, yet fully-functional, application that leverages the Globus platform for data distribution and analysis.
Software Engineering
Presenter: Anshu Dubey, Argonne National Laboratory Abstract: The computational science and engineering (CSE) communities develop complex applications to solve scientific and engineering challenges. These applications have many moving parts that need to interoperate with one another. These communities are facing new challenges created by the confluence of disruptive changes in computing architectures, demand for greater scientific reproducibility and new opportunities for higher fidelity simulations with multi-physics and multi-scales. Architecture changes require new software design and implementation strategies, and significant refactoring of existing code. Reproducibility demands require more rigor across the entire software endeavor. Code coupling requires aggregate team interactions including integration of software processes and practices. These challenges demand large investments in scientific software development and improved practices. This presentation will provide a compilation of software engineering best practices that have generally been found to be useful by science communities. The topics covered will include software lifecycle including software design for longevity, and a software process designed for reproducibility and sustainability. |