This page hosts a brief description and additional material for some workshops I organised myself. Please note that there is usually an official workshop page on the respective conference site. Links are provided.



The power of Lo(o)sing Control - When does a re-implementation of mature simulation fragments with HPC DSLs pay off?

Conference: ISC High Performance
Date: 28 June 2018, 2pm-6pm
Organisers: Tobias Weinzierl, Michael Bader
Venue: Frankfurt Marriott Hotel, room Lux (5th floor)

Exascale roadmaps tend to postulate a need for revolutionary software rewrites. Otherwise, so they claim, software will not be able to scale to the next level. One of these revolutions is to translate applications into tasks with dependencies; the task language here can be read as DSL. Another one is the replacement of intrinsics, SIMD pragmas, stencil codes, and so forth with purpose-built DSLs. Such rewrites however are expensive in terms of development time.

We invite speakers developing exascale codes to summarise their experiences and expectations w.r.t. novel programming techniques. The project-driven talks are complemented by provocations by task middleware and DSL developers. Yet, we do not aim for a comprehensive techniques overview. Instead, the workshop is to sketch answers to the following questions:

  • Where is the pain barrier where consortia are willing to rewrite major code parts of their codes? Are these techniques well-suited for libraries, core components or the whole application?
  • To which degree are mature simulation codes willing to give up control of SIMDzation, task scheduling, data placement, and so forth?
  • What metrics or characteristics should new paradigms expect their user codes to hand over to the DSL/runtime (arithmetic intensity, affinity, dependencies)?


We have a report/summary of the whole workshop available from this page.


Each speaker is given 20 minutes for a brief provocation. The individual experience reports, project presentations and opinions will then be subject of a discussion of all speakers as well as the audience.

  • 14:00-14:10. Weinzierl: Opening remarks
  • 14:10-14:30. Fahringer: Provocation
  • 14:35-14:55. Bower: Provocation
  • 15:00-15:20. Ellingwood: Provocation
  • 15:25-15:45. Kronbichler: Provocation
  • 15:50-16:00. Questions from the audience to all four speakers (plenary discussion)
  • 16:30-16:50. Köstler: Provocation
  • 16:55-17:15. Bader: Provocation
  • 17:20-18:00. Weinzierl: Plenary discussion


Michael Bader (Technische Universität München): Predictive Load Balancing vs. Reactive Work Stealing - Parallel Adaptive Mesh Refinement and the Chameleon Project

One of the predictions on exascale hardware claims that equal work load will no longer lead to equal execution time. Hence, applications will need to react adaptively to load imbalances at runtime despite best efforts in predictive load distribution. The Chameleon project ( aims at providing a respective reactive programming environment, by small augmentations to the standard MPI and OpenMP programming models. For applications, a crucial question will be how much this reactive behaviour has to be triggered by the application itself, and to what extent it can be supported by the programming environment - constrained, of course, by how much extra programming effort and to what extent intrusive changes to the existing application are desired. In my talk I will present the interplay of predictve load balancing and reactive work stealing in the parallel AMR code sam(oa)², and how these approaches shall be supported by the intended Chameleon environment.

Richard Bower (Durham University): SWIFT - the benefits of re-implementing Gadget, the astrophysics benchmark

Nathan D. Ellingwood (Sandia): Lessons Learned: Experiences from Introducing the Kokkos Programming Model into Legacy Applications

For quite some time the computer science community has been working on new high level parallel programming models with one of its main sales pitches being to make 'Next Generation Platforms' (NGP) accessible to a wide range of applications and their developers. With machines such as Sierra and Summit slated to come online this summer, prototypes of upcoming exascale platforms are now here and views of adopting a parallel programming model have changed from 'Why would I want to deal with this?' to 'I need a viable solution now!' to enable legacy codes and large applications to utilize the new platforms. The Kokkos team is offering one such solution, providing a parallel programming ecosystem in C++, which supports the major publicly available HPC platforms and promises to isolate application teams from future architecture changes.

In this talk, a short overview of the Kokkos EcoSystem (consisting of KokkosCore, KokkosKernels, KokkosTools and KokkosSupport) will be provided and how it addresses the requirements of legacy applications for a transition to exascale architectures. The presentation will also discuss 'lessons learned' from the ongoing porting effort of the numerous applications adopting Kokkos, with a focus on issues, which were not encountered in proxy app experiments.

Thomas Fahringer (University Innsbruck): t.b.c.

Contemporary parallel programming approaches often rely on well-established parallel libraries and/or language extensions to address specific HW resources that can lead to mixed parallel programming paradigms. In contrast to these approaches, AllScale proposes a C++ template based approach to ease the development of scalable and efficient general-purpose parallel applications. Applications utilize a pool of parallel primitives and data structures for building solutions to their domain-specific problems. HPC experts who provision high level, generic operators and data structures for common use cases, design these parallel primitives. The supported set of constructs may range from ordinary parallel loops, over stencil and distributed graph operations as well as frequently utilized data structures including (adaptive) multidimensional grids, trees, and irregular meshes, to combinations of data structures and operations like entire linear algebra libraries. This set of parallel primitives is implemented using pure C++ and may be freely extended by third party developers, similar to conventional libraries in C++ development projects. One of the peculiarities of AllScale is its main source of parallelism that is based on nested recursive task parallelism. Sophisticated compiler analysis determines the data needed for every task which is of paramount importance to achieve performance across a variety of parallel architectures.

Harald Köstler (Universität Erlangen-Nürnberg): Code generation approaches for HPC

In the last years various different approaches for increasing productivity and portability of HPC codes were considered. Usually either external or embedded domain-specific languages are developed, where performance of the resulting implementations is of course very important for many applications from computational science and engineering. In my talk I will give a short overview of some prominent existing code generation frameworks and discuss their advantages and disadvantages. Then, I will present our code generation framework ExaStencils written in Scala in more detail. It allows whole program generation with several back-ends like C++ and CUDA for a restricted class of applications that can be described by partial differential equations on structured grids. Furthermore, applications from fluid dynamics and geosciences will be shown including their performance on CPU and GPU clusters.

Martin Kronbichler (Technische Universität München): High-performance finite element computations with the deal.II finite element library

I will present HPC challenges from the perspective of high-order finite element computations within the generic deal.II finite element library. Besides a wide range of finite element functionality, the deal.II library comes with a flexible matrix-free operator evaluation evaluation infrastructure based on fast quadrature with sum factorization techniques whose design has been guided by high performance computing principles. The two central ingredients are efficient arithmetics by explicit SIMD instructions as well as memory-lean data structures. Our work has shown that these concepts clearly outperform sparse matrix kernels for applications using quadratic or higher-degree shape functions, rendering performance close to the underlying hardware limits for both continuous and discontinuous elements possible. Despite its efficiency, the deal.II implementation is accessible to the programmer with the full flexibility of the C++ programming language.

There are several challenges that we constantly need to address from a library perspective, such as the need to cover a wide range of differential operators, polynomial degrees, and mesh configurations, that prevent us from adopting problem-specific solutions. Furthermore, the high-performance operator evaluation must be integrated into a solver stack including explicit time integrators or multigrid linear and nonlinear solvers. This integration is particularly intrigued, because in the typical memory-bandwidth-constrained setting performance can often only be gained by merging operations across the boundaries of mathematical operations, such as merging vector updates and inner products with operator evaluation. This goes somewhat against what many scientific software writers consider best practice today, namely the concept of splitting algorithms into different components or even different libraries to reduce the amount of code that must be maintained.

When Gravitational Waves hit Durham - an ExaHyPE workshop

Date: 30 November 2017
Time: 14:00-16:30
Organiser: Tobias Weinzierl
Room: E360 (Engineering building just up the stairs)

From the ExaHyPE website.

The ExaHyPE (An Exascale Hyperbolic PDE Engine) project focuses on the development of new mathematical and algorithmic approaches for solvers for hyperbolic equation systems on exascale systems. It initially focuses on simulations in geophysics and astrophysics. For the latter, ExaHyPE will simulate systems of orbiting neutron stars that are merging. Not only are such systems suspected of being the greatest source of gravitational waves in the universe, but they could also be the cause of gamma ray explosions - the most powerful known catastrophic scenarios of the universe. Indeed, very recent observations have validated this prediction. Exascale simulations of such processes will allow us to study these long-standing mysteries of astrophysics and gain new insights.

The present workshop brings together talks from various ExaHyPE researchers and thus is both a tour de force through the project as well as a presentation of some recent astrophysical research highlights within the aura of ExaHyPE. While it hosts some classic presentations, we do encourage all participants to bring along their computers - a majority of ExaHyPE developers will hang around which provides the opportunity to do some hands-on experience with the open source code; either after the talks or then throughout the subsequent pub visit.

The workshop is open to the public, but we ask participants to drop Sarah Craggs an email if they plan to attend. This allows us to avoid any shortage of coffee.


ISC HPC 2016: Form Follows Function - Do algorithms and applications challenge or drag behind the hardware evolution?

Conference: ISC High Performance
Date: 23 June 2016
Organisers: Tobias Weinzierl, Michael Bader
Venue: Frankfurt Marriott Hotel, room Volt (5th floor)

Exascale roadmaps are dominated by predictions on hardware trends. At the same time, hardware-software co-design is a frequently cited phrase. It suggests that software development has an impact on the hardware evolution. Is this assumption valid? If so, what does this interplay of predicted hardware improvements and software development look like? We invite speakers from consortia developing exascale codes designing software for hardware that does not yet exist. The project-driven talks are complemented by vendor-driven provocations. The aim of the workshop is to sketch answers to a couple of questions from an algorithm/application point of view:

  1. How do predictions on new hardware features impact the projects' research agenda? Notably, how do the hardware roadmaps shape algorithm development today?
  2. What characteristics make some applications exascale candidates? Do these characteristics stem from particular algorithmic ideas, and are there constraints on the type of algorithms and applications that we will see soon on the exascale level?
  3. Do statements on hardware-aware algorithm development and hardware-software co-design affect particular machine aspects, or is notably the last term a buzzword?
  4. To which degree can simulation codes have an impact on what machines are designed?



Tobias Weinzierl (Organiser)

Tobias Weinzierl is Lecturer at the School of Engineering and Computing Sciences at Durham University. He holds a PhD from Technische Universität München (TUM) and is member of the two research groups Innovative Computing in computer science and Computational Mechanics in engineering. His work mainly focuses on the algorithm, data structure and software paradigm side in HPC. Tobias is particularly interested in dynamically adaptive multiscale methods based upon spacetrees that interact with multigrid solvers for partial differential equations, particle systems, on-the-fly visualisation and dynamic load balancing.
Michael Bader (Organiser)

Michael Bader is associate professor at Technische Universität München (TUM) and leads the research group Hardware-aware algorithms and software for HPC, which was jointly set up with the Leibniz Supercomputing Centre in 2011 as one of the measures accompanying the installation of the SuperMUC petascale machine. From 2009 to 2011, he was assistant professor at the SimTech Cluster of Excellence at University of Stuttgart. Michael investigates algorithms for parallel adaptive mesh refinement (in the context of tsunami simulation and porous media flow) and parallelization and load balancing of large-scale earthquake simulation on unstructured adaptive meshes. A special focus is on performance engineering of simulation software for heterogeneous supercomputers.

Michael and Tobias cooperate on performance engineering, adaptive mesh refinement (AMR), software infrastructure and light-weight MPI and task-based load balancing in the ExaHyPE project (, which has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 671698.



Jack Dongarra, Iain Duff, Laura Grigori and Bo Kagström: The NLAFET Project: Enabling Performance and Scalability for Linear Algebra Software on HPC Systems

Jack Dongarra holds an appointment at the University of Tennessee, Oak Ridge National Laboratory, and the University of Manchester. He specializes in numerical algorithms in linear algebra, parallel computing, use of advanced-computer architectures, programming methodology, and tools for parallel computers. He was awarded the IEEE Sid Fernbach Award in 2004; in 2008 he was the recipient of the first IEEE Medal of Excellence in Scalable Computing; in 2010 he was the first recipient of the SIAM Special Interest Group on Supercomputing's award for Career Achievement; in 2011 he was the recipient of the IEEE IPDPS Charles Babbage Award; and in 2013 he received the ACM/IEEE Ken Kennedy Award. He is a Fellow of the AAAS, ACM, IEEE, and SIAM and a member of the National Academy of Engineering.
Erwin Laure and Philipp Schlatter: Co-design in CFD - Initial Experiences from the ExaFLOW (Enabling Exascale Fluid Dynamics Simulations) Project

Erwin Laure is Director of the PDC - Center for High Performance Computing Center at KTH, Stockholm. He is the Coordinator of the EC-funded ExaFLOW and EPiGRAM projects and actively involved in major e-infrastructure projects (EGI, PRACE, EUDAT) as well as exascale computing projects. He is also the coordinator of the HPC Center of Excellence for Computational Biomolecular Research (BioExcel). His research interests include programming environments, languages, compilers and runtime systems for parallel and distributed computing, with a focus on exascale computing.
Raphaël Léger: A feedback on approaching the DEEP-ER platform with a DGTD-based simulation software for Bioelectromagnetics applications

Raphaël Léger received his PhD degree in Applied Mathematics from the University Paris-Est Marne-La-Vallee in 2011. His doctoral research, conducted within the Computational Fluid Dynamics and aeroacoustics group of Onera (the French aerospace lab), is concerned with the development of high-order numerical methodologies for complex acoustic wave propagation problems. In May 2012, he joined the Nachos project-team of Inria Sophia-Antipolis Méditerranée as a postdoctoral research fellow. He has been involved since then in the development of flexible numerical methodologies based on Discontinuous Galerkin Time-Domain methods for electromagnetic wave propagation problems (mainly targeting nanophotonics and bioelectromagnetics applications) as well as their implementation in a massively parallel context. Since December 2013, he is involved in the co-design process of the DEEP-ER European project on Exascale computing in which he contributes as an application developer.
Peter Messmer and Daniel Thiemert: The ESCAPE project - an NVIDIA point of view

Peter Messmer is the Director of the NVIDIA Co-Design Lab for Hybrid Multicore Computing at ETH Zurich and a Senior Software Engineer in the Compute Developer Technology Group, NVIDIA's consulting group for scientific computing projects on Graphics Processing Units (GPUs). Since joining NVIDIA in November 2011, Peter Messmer has been working with clients on accelerating a range of projects with GPUs, including ocean simulation, weather forecast codes, and seismic simulation codes both with CUDA and OpenACC.
Marie-Christine Sawley: Feedback from working on proto-applications to prepare for extreme scaling (Intel IPAG-EU, EXA2CT project)

Marie-Christine Sawley holds a degree in Physics and a PhD in Plasma Physics from EPFL (1985). After a postdoc at the University of Sydney, she joined the EPFL in 1988 to lead the support group for HPC applications. She led a number of HPC initiatives for introducing new technology at the EPFL such as the PATP with the Cray T3D, the SwissTX prototype and the establishing the Vital IT partnership between HP, EPFL and the SIB. She joined the ETH Zurich in 2003 to become the general manager of the Swiss national Supercomputing centre, significantly expanding its capacities to serve a large scientific and technology portfolio. From 2008 until 2010 she worked as senior scientist at the Institute for particle physics of ETH on the massive computing requirements on the LCG grid for the CMS detector. She joined Intel at the end of 2010 to become the Intel manager of the ECR lab in Paris. The ECR lab activities concentrate on software for Exascale, tools, runtime models and applications for XEON and MIC. Marie-Christine drove the Intel participation in the EXA2CT project.
Michèle Weiland and Mark Parsons: Addressing the I/O bottleneck of HPC workloads (NEXTGenIO)

Mark Parsons is the Executive Director of EPCC and holds a Personal Chair in High Performance Computing. He is also the Associate Dean for e-Research at the University of Edinburgh. Mark is the Project Coordinator of the NEXTGenIO FET-HPC project, which looks at addressing the I/O bottleneck through the use of novel memory technologies. His research interests include Exascale software, data intensive computing and novel hardware design. He also has many interests in distributed computing ranging from its industrial use in projects such as FORTISSIMO to the provision of pan-European HPC services through the PRACE Research Infrastructure.

Michèle Weiland is a Project Manager at EPCC. She is the Project Coordinator for the EC Framework 7 funded project Adept (Addressing Energy in Parallel Technologies), which focuses on modelling the power usage and the performance of parallel software on a wide range of hardware architectures. More recently she has taken on the role of Project Manager of the FET-HPC NEXTGenIO project. Prior to this she was involved in the Exascale focused CRESTA project. She is the PI for a collaborative project with the UK Met Office, developing the next generation cloud model, and the Co-I for a UK-funded Software for the Future project that will develop a new simulation and optimisation platform for marine technology. Michèle's research interests are in the fields of energy efficiency, software performance analysis and novel programming models.
Xavier Vigouroux and Daniel Thiemert: The ESCAPE project - an Atos-Bull point of view

After a PhD in Distributed computing, Xavier Vigouroux worked for several major companies in different positions: From Investigator at Sun labs to Support Engineer for HP. He has now been working for Bull - now integrated within Atos - for several years. He led the HPC benchmarking team for the first five years and is now in charge of the Center for Excellence in Parallel Programming at Atos.


Jack's slides:Form Follows Function
Mark's slides:Form Follows Function
Peter's slides:ESCAPE
Philipp's slides:Co-design in CFD - Initial Experiences from the ExaFLOW Project
Raphaël's slides:A feedback on approaching the DEEP-ER platform with a DGTD-based simulation software for Bioelectromagnetics applications
Xavier's slides:The ESCAPE project - an Atos-Bull point of view

SIAM CSE 2015: Flooding the Cores - Computing Flooding Events with Many-Core and Accelerator Technologies

Conference: SIAM Conference on Computational Science and Engineering 2015
Date: 18 March 2015
Organisers: Michael Bader, Kyle T. Mandli, Tobias Weinzierl

Detailed simulation of flooding events, such as in storm surges, tsunamis or rain floods, requires substantial computing power and literally floods the computer with job tasks. Relevant supercomputing technology, on the other hand, is characterized by an increasing amount of parallelism on all scales. Hence, numerical methods, algorithms and software need to be tuned for data parallelism, many-core compute nodes (including accelerator technology) and large-scale parallelism. This minisymposium thus focuses on approaches to reduce time to solution for simulating complex flooding events on supercomputers and architectures characterized by many-core and accelerator technologies, in particular GPUs and Xeon Phis.


Oliver Meister (TUM): Parallelization Techniques for Tsunami Simulation Using Space-Fillig-Curve Orders
Robert Weiss (Virginia Tech): Understanding Tsunami and Hurricane Deposits with a Mess-Scale Model for Sediment Dynamics
Tobias Weinzierl (Durham University): A Patchwork Family - Task Distribution Patterns for Shallow Water Equations on Patch-structured AMR Grids
Matthew G. Knepley (University of Chicago): FEM Integration with Quadrature on the GPU
Andre R. Brodtkorb (SINTEF): Flooding with Equelle: A Domain Specific Language for Finite Volume Methods
Rajesh Gandham (Rice): High Performance High Order Numerical Method for Tsunami Wave Propagation
Ralf-Peter Mundani (TUM): Thermal Comfort Simulations on Massive Parallel Systems