The WoWoHa Summer Webinar Series is intended to bring together people in the larger High Performance Computing (HPC) community who are developing and using workflow technology for simulations, experiment analysis, and convergent workflows leveraging machine learning and data analytics.
Fri, June 5 2020 - 10am PT
Talk 00: The Role of Workflows in Credible High Consequence Computation Simulation
The foundation of computational simulation credibility is an evidence package that encompasses requirements definition, assessment of code capabilities to model the desired physics, computational and SME judgment elements demonstrating why the results can be trusted for the intended use and communicating gaps. The Eclipse based Next Generation Workflow (NGW) has been developed to graphically author, communicate and reliably execute analysis as well as ensemble workflows over a heterogenous set of computing platforms. The Credibility Framework (CF) provides a platform to execute, collect and communicate credibility evidence integrates with NGW. CF provides is configurable by non-programmers through simple Excel spreadsheets to enable tailoring the credibility process to meet the needs of the analysis provider organizations, customers and regulatory agencies. An end-to-end use case is demonstrated through a realistic solid mechanics exemplar with ensemble studies supporting computational aspects of credibility such as solution verification with respect to various numerical factors as well as sensitivity analysis and uncertainty quantification.
Fri, June 12 2020 - 10am PT
Talk 01: The Themis Ensemble Generation Tool
We present a new ensemble of simulations generator named Themis. Themis leverages a simulation submission batch script to create an ensemble with minimal setup time. Themis can be used to generate simple parameter studies, which can be scaled to million member studies, or to generate complex design optimization workflows or machine learning workflows such that users can create dynamic and adaptive optimization loops using straightforward Python scripting. Themis has an easy-to-use command line interface for fast study generation, and a Python API for building complex workflows. We will demonstrate how to evolve a batch submission script, which runs a single simulation, to a study using the Themis command line interface. Themis’ CLI allows users to: generate studies, dry-run studies, report study status, kill/restart of individual simulations, and harvest simulation outputs. We will also show how Themis’s Python API can be used to build a dynamic optimization workflow incorporating simulation, post-processing, and Scikit Learn. Our new capability is free-standing, with a Python interface, allowing it to be incorporated into existing tools and workflows. We will present our path forward to supporting massive ensembles on the El Capitan system to be sited in 2022 by discussing the results of scaling to a million member ensemble and our ongoing collaboration with FLUX, the next generation scheduler team in Livermore Computing.
Fri, June 19 2020 - 10am PT
Talk 02: Pegasus workflow execution environments at OLCF using Kubernetes
Workflows are a key technology for enabling complex scientific computations. They capture the interdependencies between data processing tasks as well as the mechanisms to execute those steps reliably and efficiently. Workflows can capture complex processes to promote sharing and reuse, and also provide provenance information necessary for the verification of scientific results and for scientific reproducibility. Pegasus (https://pegasus.isi.edu) is being used in production in a number of scientific domains. In 2016 the LIGO gravitational wave experiment used Pegasus to analyze instrumental data and confirm the first detection of a gravitational wave. The Southern California Earthquake Center (SCEC) based at USC, uses a Pegasus managed workflow infrastructure called Cybershake to generate seismic hazard maps for the entire state of California. In March 2017, SCEC conducted a large-scale CyberShake study on DOE systems, ORNL’s Titan and NCSA’s BlueWaters, managing over 39 thousands jobs and 1.2 PB of data. Pegasus is also being used in astronomy, bioinformatics, civil engineering, climate modeling, earthquake science, molecular dynamics and other complex analyses. In this presentation we will provide a brief overview of Pegasus and we are going to focus on how we can use Pegasus to execute on current OLCF resources by creating a workflow execution environment as a service using Kubernetes, and launching jobs on Summit and Rhea at OLCF.
Fri, June 26 2020 - 10am PT
Talk 03: BEE: A Scientific Application Workflow Engine
We present the capabilities of BEE, a workflow engine to execute and manage scientific multi-step simulations over a range of platforms including High Performance Computing and Cloud systems. The open standard Common Workflow Language for workflow descriptions is adopted by BEE. The workflow engine consists of modular component servers that communicate through a RESTFul API for running on a single computer or across the internet. Neo4j graph database is used to manage and store workflows. BEE manages workflow tasks to run through common resource managers of HPC systems such as Slurm, LSF, or PBS and manage cloud resources such as private OpenStack clouds. BEE supports running scientific simulations and the associated tasks of the workflow in containers by supporting container runtime systems such as Charliecloud, Singularity, and Podman.
Fri, July 10 2020 - 10am PT
Talk 04: Presentation of the methodology and results of a survey of existing and possible Simulation Data Management solutions for Sandia National Laboratories.
Yann Matthew Le Balc
Yann Matthew Le Balc is in charge of market analysis for NexGen Analytics. A graduate of the French Army’s officers’ school and the French Joint War College, with a specialization in international relations, he completed an Executive MBA with EDHEC Business School. Having led units of various sizes in the French Marines and worked for NATO Transformation Headquarters in Norfolk, Virginia, he ran several multi-million acquisition projects in areas from infrastructure to information systems.
Yale Lee is an engineer with a Master’s Degree in Aeronautical Engineering from ISAE Supaéro (Higher Institute for Aeronautics and Space) and a Master of Research Degree in Automatics, Signal and Image Processing from Ecole Normale Supérieure of Paris-Saclay. She has experience in aeronautics simulation and, through NexGen Analytics, is also involved in SNL’s Automatic Report Generator project.
The session consists in a presentation of the results of a survey conducted for Sandia National Laboratories in January and February 2020 with the aim of reviewing possibilities for satisfying SNL’s Simulation Data Management (SDM) needs. This survey, which involved contributions from an SNL working group with LLNL representation, was facilitated by NexGen Analytics.
The presentation will look into specific SDM solutions as well as products embedded in wider offerings, often Product Lifecycle Management (PLM) software. It will also consider whether more general database management solutions can be tailored to SNL’s needs.
Starting with a use case scenario capturing SNL’s main requirements, the presentation will share the survey results by comparing possible SDM options against a list of key criteria, including how they satisfy the required security model, system requirements, cost, and support model.
Fri, July 17 2020 - 10am PT
Talk 05: funcX: a federated function serving fabric for workflows
Exploding data volumes and velocities, new computational methods and platforms, and ubiquitous connectivity demand new approaches to computation in the sciences. These new approaches must enable computation to be mobile, so that, for example, it can occur near data, be triggered by events (e.g., arrival of new data), be offloaded to specialized accelerators, or run remotely where resources are available. They also require new design approaches in which monolithic applications can be decomposed into smaller components, that may in turn be executed separately and on the most suitable resources. To address these needs we present funcX—a distributed function as a service (FaaS) platform that enables flexible, scalable, and high performance remote function execution. funcX’s endpoint software can transform existing clouds, clusters, and supercomputers into function serving systems, while funcX’s cloud-hosted service provides transparent, secure, and reliable function execution across a federated ecosystem of endpoints. We motivate the need for funcX with several scientific case studies, present our prototype design and implementation, show optimizations that deliver throughput in excess of 1 million functions per second, and demonstrate, via experiments on two supercomputers, that funcX can scale to more than more than 130000 concurrent workers.
Fri, July 24 2020 - 10am PT
Talk 06: Sina and the State of Simulation Data Management
Our team has built Sina to bring sub-second, many-million-run-scale querying to an environment that has traditionally spawned many user-authored one-off solutions. We’re working to fold database technologies (SQLite, MySQL, Cassandra) into this environment with minimal disruption to users by unifying different backends behind a single, simple Python API. But we’re looking forward to another goal as well: unifying the outputs of many scientific simulations under a single file schema. We’ll be presenting on Sina’s capabilities, some successes and challenges, the delights of performant large-scale querying for large-scale data, and what interested groups can do to help move their own environments towards a more HPC- and user-time-friendly future.
Fri, July 31 2020 - 10am PT
Talk 07: An Ontology for Scientific Workflows
Jay Jay Billings
Scientific workflows exist in many different domains and for many different computing platforms. As these systems have proliferated, they have also become increasingly complex and harder to maintain. Furthermore, these systems often exist as self-sufficient islands of capability that can be over-specialized and locked into a specific domain. Some commonality exists and three major workflow types are readily apparent in (i) modeling and simulation, (ii) high-throughput data analysis, and (iii) optimization. A far more detailed understanding of different workflow types is required to determine how large, interdisciplinary workflows that span the types and multiple computing facilities can be created and executed. This work presents a new model of scientific workflows that attempts to create such an understanding with a formal, machine-readable ontology that can be used to answer design questions about interoperability for workflows that need to be executed across distributed workflow management systems. Example instances are presented for simple workflows, and a perspective on interoperability in shared.
Fri, August 7 2020 - 10am PT
Talk 08: Goal Centric Patterns for Knowledge Intensive Processes
Implementing business and technical processes requires coordination between technical implementation teams, business representatives and stakeholders. There have been a number of approaches proposed and described that attempt to simplify the process by either providing technical implementation details or by applying a form of functional decomposition to derive the steps. While useful, these approaches fail to adequately convey the fundamentals of the processes they are describing. This paper describes a novel approach to achieve this goal through three key criteria: initiating event, the goal, and the desired outcome, resulting in a new method of describing processes.
Fri, August 14 2020 - 10am PT
Talk 09: Data Processing Workflows at Scattering User Facilities
At Oak Ridge National Laboratory (ORNL) and Brookhaven National Laboratory (BNL), particle scattering user facilities like the Spallation Neutron Source and Relativistic Heavy Ion Collider collect increasingly large data sets accessed by thousands of users each year. The scientists that use this data have a wide variety of needs for data curation, analysis, and archival to advance their research programs. In this presentation, workflow systems that support the operations of instruments at each scattering facility will be presented and discussed. After presenting the current systems, discussions of current work to redesign the processing workflows will be discussed for future facilities at both ORNL and BNL.
Fri, August 21 2020 - 10am PT
Talk 10: An Overview of Workflow Management Systems
Workflow Management Systems (WMS) provide a number of benefits to computational researchers including orchestration of diverse tasks, optimal usage of compute resources, reproducibility of analysis, and more. Picking a WMS can be a daunting task though with a considerable number of options each with a wide variety of features. The changing workflow landscape with an increase in containerization and cloud computing technologies also has had a significant impact. This talk will discuss key aspects to consider when choosing a WMS with examples from some common scientific and data science WMS options.
Fri, August 28 2020 - 10am PT
Talk 11: Nodeworks: An Open-source Visual Workflow Toolset for Uncertainty Quantification, Optimization, Machine Learning, …
Justin Weber, Aytekin Gel, and Charles Tong
Nodeworks is an open-source toolset written in Python and developed by the U.S. Department of Energy’s National Energy Technology Laboratory (NETL) for enabling users to visually construct scientific workflows. The interface provides a worksheet where collections of nodes with specific tasks can be assembled and connected. User input through widgets and user feedback through visualizations are directly embedded in each node. Nodeworks utilizes the existing optimization and uncertainty quantification (UQ) libraries in the Python ecosystem such as SciPy, SALib, scikit-learn, etc., and embeds them in specialized nodes to enable the user to interact with the libraries through a graphical user interface. Users can generate parameter studies, create simulation files, submit jobs to HPC queues, construct surrogate models, and perform optimization, sensitivity analysis, and non-intrusive UQ analysis. Nodeworks has also been partially integrated with PSUADE UQ Toolkit from LLNL to provide access to advanced features through same GUI. These features have enabled geometric optimization of a cyclone using NETL’s multiphase flow computational fluid dynamics code MFIX, as well as deterministic calibration of model parameters. Current and future development will focus on incorporating Bayesian calibration and machine learning nodes to add more capabilities to Nodeworks.
Sandia National Laboratories
|Website and Registration|
Lawrence Livermore National Laboratory
Lawrence Livermore National Laboratory
WoWoHa Organizing Committee
Robert Clay (Co-Chair)
Sandia National Laboratories
Dan Laney (Co-Chair)
Lawrence Livermore National Laboratory
Argonne National Laboratory
Lawrence Berkeley National Laboratory
Los Alamos National Laboratory
French Alternative Energies and Atomic Energy Commission (CEA)
Please use this form to contact us.