additional_io_routines_for_error_catching - TerrenceMcGuinness-NOAA/global-workflow GitHub Wiki

Additional I/O Routines for Error Catching Implementation

Related to PR #673: Add capability to catch bort errors

This document provides a comprehensive list of additional I/O routines in NCEPLIBS-bufr that should have error catching capabilities added, following the pattern established in PR #673. The routines are organized by complexity level to facilitate phased implementation.


Level 1: Simple I/O Query Routines (Low Complexity)

These routines primarily query state or perform simple operations with minimal error paths:

  1. ufbcnt (src/openclosebf.F90)

    • Query message/subset counts for a logical unit
    • Returns current message number and subset number
  2. ufbqcd (src/cftbvs.F90)

    • Query code/flag information for a mnemonic
    • Returns descriptor and code/flag table information
  3. ufbqcp (src/cftbvs.F90)

    • Query code/flag pair information
    • Returns mnemonic for given code/flag descriptor
  4. status (src/openclosebf.F90)

    • Query file connection status
    • Returns logical unit status information
  5. ufbpos (src/readwritesb.F90)

    • Position to specific message/subset within file
    • Allows random access to specific locations in BUFR file

Level 2: Message-Level I/O Routines (Low-Medium Complexity)

These routines handle complete BUFR messages but have straightforward error handling:

  1. readerme (src/readwritemg.F90)

    • Read BUFR message from memory array
    • Processes message from byte array instead of file
  2. rdmsgw (src/readwritemg.F90)

    • Read BUFR message wrapper function
    • Low-level message reading utility
  3. openmg (src/readwritemg.F90)

    • Open BUFR message for output without subset initialization
    • Alternative to openmb for specific use cases
  4. closmg (src/readwritemg.F90)

    • Close current BUFR message for output
    • Finalizes message and prepares for next
  5. msgwrt (src/readwritemg.F90)

    • Write BUFR message to file
    • Low-level message writing utility
  6. copymg (src/copydata.F90)

    • Copy complete BUFR message from one file to another
    • Transfers entire message including all subsets

Level 3: Subset-Level I/O Routines (Medium Complexity)

These routines handle individual data subsets within messages:

  1. writsb (src/readwritesb.F90)

    • Write data subset to output message
    • Complements readsb (already protected)
  2. copysb (src/copydata.F90)

    • Copy data subset from input to output file
    • Transfers single subset between files
  3. ufbmms (src/memmsgs.F90)

    • Access specific message/subset from memory
    • Direct access to memory-resident data
  4. ufbmns (src/memmsgs.F90)

    • Read next subset from memory
    • Sequential access through memory-resident messages
  5. readlc (src/readwriteval.F90)

    • Read long character strings from data subset
    • Handles strings longer than standard BUFR descriptors
  6. writlc (src/readwriteval.F90)

    • Write long character strings to data subset
    • Stores extended character data in subsets

Level 4: Value-Reading/Writing Routines (Medium-High Complexity)

These are the core UFB* routines that read/write data values - similar to ufbint which already has error catching:

  1. ufbrep (src/readwriteval.F90)

    • Read/write replicated sequences
    • Handles repeated data structures within subset
  2. ufbstp (src/readwriteval.F90)

    • Read/write stacked replicated sequences
    • Processes nested/stacked replication patterns
  3. ufbseq (src/readwriteval.F90)

    • Read/write entire sequences
    • Operates on complete sequence descriptors
  4. ufbovr (src/readwriteval.F90)

    • Overwrite existing values in subset
    • Updates values without full subset reconstruction
  5. ufbevn (src/readwriteval.F90)

    • Read/write event stacks
    • Handles specialized event sequence structures
  6. ufbinx (src/readwriteval.F90)

    • Read values from specific message/subset without positioning
    • Random access to data values
  7. ufbget (src/readwriteval.F90)

    • Retrieve complete table of values
    • Bulk extraction of data elements

Level 5: Advanced Memory Operations (High Complexity)

These routines manage BUFR messages in internal memory arrays:

  1. ufbmem (src/memmsgs.F90)

    • Read entire BUFR file into internal memory
    • Enables random access to all messages in file
  2. ufbmex (src/memmsgs.F90)

    • Read and expand BUFR file into memory
    • Similar to ufbmem with additional processing
  3. readmm (src/memmsgs.F90)

    • Read message from internal memory arrays
    • Memory-based alternative to readmg
  4. ufbrms (src/memmsgs.F90)

    • Read values from specific message/subset in memory
    • Combines memory access with value extraction
  5. ufbtam (src/memmsgs.F90)

    • Build table of all messages in memory
    • Creates index/inventory of memory-resident data

Level 6: Bulk Copy/Transfer Operations (High Complexity)

These routines perform complex multi-step operations:

  1. copybf (src/copydata.F90)

    • Copy entire BUFR file
    • Complete file-to-file transfer operation
  2. cpymem (src/copydata.F90)

    • Copy messages from internal memory to output file
    • Transfers data from memory arrays to file
  3. ufbcpy (src/copydata.F90)

    • Copy all data from input to output subset
    • Complete subset-level data transfer
  4. ufbcup (src/copydata.F90)

    • Copy and update subset data
    • Combines copy with value modifications

Level 7: Specialized I/O Operations (Highest Complexity)

These routines have special requirements or complex internal state management:

  1. ufbtab (src/openclosebf.F90)

    • Build table of data from multiple messages
    • Aggregates data across message boundaries
    • Note: Already has some error handling
  2. ufbdmp (src/dumpdata.F90)

    • Dump formatted contents of data subset
    • Human-readable output of subset structure
  3. ufdump (src/dumpdata.F90)

    • Dump formatted contents of entire message
    • Complete message structure visualization
  4. dxdump (src/dxtable.F90)

    • Dump dictionary tables to output file
    • Outputs internal table definitions
  5. openbt (src/openbt.F90)

    • Open DX BUFR table file (user override point)
    • Special case: stub routine meant for user customization

Implementation Priority Recommendations

Phase 1 (Quick Wins): Level 1-2 Routines (Items 1-11)

  • Timeline: 1-2 months
  • Rationale: Low complexity, high user visibility
  • Benefit: Immediate improvement in error handling for common operations
  • Testing: Straightforward unit tests for each routine

Phase 2 (Core Protection): Level 3-4 Routines (Items 12-24)

  • Timeline: 2-4 months
  • Rationale: Most frequently used by applications
  • Benefit: Greatest impact on operational reliability
  • Pattern: Similar to already-implemented ufbint in PR #673
  • Testing: Extensive testing with real-world data scenarios

Phase 3 (Memory Safety): Level 5 Routines (Items 25-29)

  • Timeline: 2-3 months
  • Rationale: Critical for applications using memory-mode operations
  • Benefit: Prevents memory corruption from cascading errors
  • Testing: Focus on memory leak and corruption detection

Phase 4 (Comprehensive Coverage): Level 6-7 Routines (Items 30-38)

  • Timeline: 2-3 months
  • Rationale: Complete the safety net
  • Benefit: Handle edge cases and specialized workflows
  • Testing: Integration tests with complex operational scenarios

Key Implementation Considerations

1. Pattern Consistency

  • Follow the same catch_bort_*_c() wrapper pattern established in PR #673
  • Maintain consistent error message format and return codes
  • Use bort_target_is_unset flag to prevent nested error catching

2. Backward Compatibility

  • Maintain existing function signatures and behavior
  • Ensure applications without error catching still work unchanged
  • Return codes should match existing conventions (0 = success, -1 = error)

3. Test Coverage

  • Each addition should include test cases similar to intest14.F90
  • Test both successful operations and intentional error conditions
  • Verify error messages are captured correctly
  • Test nested call scenarios

4. Documentation Updates

  • Update user guide (docs/user_guide.md) with list of protected routines
  • Add examples of error catching usage for each routine type
  • Document error message formats and return codes
  • Update API documentation in source code headers

5. Performance Impact

  • Minimal overhead from setjmp/longjmp when no errors occur
  • No performance degradation for applications not using error catching
  • Consider profiling critical paths in operational systems

6. C Interface Layer

  • Each protected Fortran routine needs a corresponding C wrapper function
  • C wrapper uses setjmp to establish error return point
  • Fortran routine checks bort_target_is_unset flag before proceeding
  • Error string captured in moda_borts module arrays

Technical Background

Error Catching Mechanism (from PR #673)

The error catching system implemented in PR #673 uses the following approach:

  1. C Language Layer (borts.c):

    • Uses setjmp/longjmp for non-local jumps
    • Provides catch_bort_*_c() wrapper functions for each protected routine
    • Catches errors from bort() calls via longjmp
  2. Fortran Interface (borts.F90, bufr_c2f_interface.F90):

    • Each protected routine checks bort_target_is_unset flag
    • If catching is active, calls corresponding C wrapper
    • C wrapper establishes return point and calls back to Fortran
    • Prevents nested error catching with flag check
  3. Error Storage (modules_arrs.F90):

    • moda_borts module stores error state
    • caught_str array holds error message text
    • caught_str_len indicates error presence (0 = none, >0 = error occurred)
    • bort_target_is_unset flag prevents recursive catching

Currently Protected Routines (PR #673)

The following routines already have error catching implemented:

  • openbf - Open BUFR file
  • closbf - Close BUFR file
  • readmg - Read BUFR message
  • readns - Read next subset
  • readsb - Read data subset
  • ufbint - Read/write data values

Testing Strategy

Unit Tests

  • Individual routine error conditions
  • Boundary cases and invalid inputs
  • Error message content verification

Integration Tests

  • Multi-routine workflows with error recovery
  • Real-world data processing scenarios
  • Performance benchmarking

Regression Tests

  • Ensure existing applications continue to work
  • Verify backward compatibility
  • Test with operational data streams

Platform Coverage

  • Linux (GNU, Intel compilers)
  • macOS
  • Windows (if applicable)
  • Various Fortran compiler versions

Success Metrics

  1. Coverage: Percentage of public API routines with error catching
  2. Reliability: Reduction in unexpected program terminations
  3. Usability: Developer feedback on error handling improvements
  4. Performance: Negligible overhead in production systems
  5. Adoption: Number of applications utilizing error catching

References

  • PR #673: https://github.com/NOAA-EMC/NCEPLIBS-bufr/pull/673
  • Issue #671: Original proposal for setjmp/longjmp implementation
  • Issue #675: Python interface abort problem that motivated this work
  • Test Program: test/intest14.F90 - Example usage of error catching
  • Documentation: docs/user_guide.md - NCEPLIBS-bufr user guide

Contact and Collaboration

For questions or contributions related to error catching implementation:


Document Version: 1.0
Date: October 14, 2025
Author: Technical Analysis of PR #673
Status: Planning Document