Why HSP2? - respec/HSPsquared GitHub Wiki

Challenges with Existing HSPF

When the developers created HSPF they implemented a top down approach leveraging what was the standard for structured design. The idea of the design was to make the system relatively easy to extend, so that users can add their own modules with relatively little disruption of the existing code. At that time, it was the "hoped that HSPF will become a valuable tool for water resource planners". Today, over thirty years later, the model remains largely unchanged and is one of the premiere watershed scale environmental models in use. This is truly remarkable and a testament to the vision and expertise of these developers. However, the change in technology that has occurred over the last 30 years and our scientific understanding of environmental processes mandates upgrades be made to HSPF to allow it to evolve and continue to be the premiere tool for water resource planners. Below are some of the challenges that have emerged. In general, the challenges associated with the legacy software include maintaining the code, upgrading functionality, and integrating with modern technology.

The Complexity and Rigidity of Fortran-77 Memory Management and “Punch Card” style User Control Input (UCI) file Have Hindered Maintenance and Upgrading Functionality.

The Engineering code is intertwined with and complicated by memory management code that makes it overly burdensome to add new modules.
Several aspects of managing memory are largely un-documented and require knowledge of the SEQ file, which only a handful of individuals have.
The model size (i.e., number of operations) is fixed within a given model release
The land use, translation factors, and many of the parameters are fixed. In some cases the complicated Special Actions Module can be used to mimic these real-world varying conditions but the complexity of this module significantly increases the time to develop the input file and significantly slows down the model execution times.
The ASCII text “Punch Card” style UCI is difficult for modern software to interact with and therefore it is difficult to optimize the calibration and perform sensitivity and uncertainty analysis.

Legacy Code And Data Model Has Limitations In Ability To Integrate With Modern Software, Hardware, And Leverage Parallel Computing which has left Voids in Optimization, Pre-, and Post-Processing Tools.

WDM 32-bit architecture limits the size of WDM file.
WDM is not supported by any Commercial Off-The-Shelf (COTS) software and therefore pre-, and post-processing of time series data can be cumbersome. Currently, the process typically relies on in-house solutions to augment the small number of compatible software products in the public domain that are becoming incompatible with today's operating systems.
Code is not MPI enabled to run parallel on a cluster.

Project Goal and Objectives

The overall goal of this project is to mitigate the aforementioned challenges currently facing the HSPF model so it will continue to be a premiere watershed model into the foreseeable future. The major objectives to accomplish this goal are shown below.

Retain all current functionality, from the user's point of view, and provide a migration path for legacy applications. Internal functionality (that which is invisible to the user) may be modified or deleted as necessary.
Provide documentation within new code to transparently show the translation path.
Elevate engineering code to make the engineering/science clear and not be lost within the memory management aspects of the code.
Restructure for maintainability, to remove fixed limits (e.g., operations, land use, parameters), and to maintain or improve execution time.
Code should be independent of operating system and hardware.
New code should be compatible with multiple cores and GPU for acceleration, but not require specific hardware.
Place code in open source and freely distributed over the web.

The Solution - HSP2

The solution derived for this project consists of two primary tasks; 1) Convert Code To A Modern Widely Accepted, Open Source, High Performance Computing (HPC) Code; and 2) Convert Model Input And Output Files To Modern Widely Accepted, Open Source, Data Model, Library, And Binary File Format.

For the code conversion, Python was chosen as the new language. Python is an interpreted, object-oriented, high-level programming language with dynamic semantics that has become one of the most popular open-source languages. Its high-level built in data structures, combined with dynamic typing and dynamic binding, make it very attractive for Rapid Application Development, as well as for use as a scripting or glue language to connect existing components together. Since Python is an interpreted and dynamically typed programming language, the execution of python code can be slow compared to compiled statically typed programming languages, such as C and Fortran. However, projects such as Numba have now allowed us to overcome this challenge. Numba is an just-in-time specializing compiler which compiles annotated Python and NumPy code to LLVM (through decorators). It leverages the LLVM Compilation Framework to parse, compile to, and optimize assembly code, in the same manner as the fastest compiled languages such as C and Fortran. Although Numba utilizes very powerful libraries beneath Python for performance, the code you write is always pure Python, making it easier to author, debug, verify, and maintain. Some strengths of Python include the following.

Open source, cross-platform and functional on a wide number of platforms, including supercomputers and other HPC environments.
Strong position in scientific computing with a large community of users, easy to find help and documentation.
Extensive scientific libraries and analysis packages.
Great performance due to close integration with time-tested and highly optimized codes written in C and Fortran.
Good support for Parallel processing with processes and threads, MPI, and GPU computing.
Support for interactive work, including execution, visualization and debugging.

For conversion of the legacy model input and output files, HDF5 was chosen to store the model input and output. The HDF5 technology suite is designed to organize, store, discover, access, analyze, share, and preserve diverse, complex data in continuously evolving heterogeneous computing and storage environments.

HDF5 supports all types of data stored digitally, regardless of origin or size, including the GIS, time series, and UCI file common to HSPF applications.
HDF5 includes tools and applications for managing, manipulating, viewing, and analyzing the data in the collection.
The HDF5 data model, file format, API, library, and tools are open and distributed without charge.