additional_io_routines_for_error_catching - TerrenceMcGuinness-NOAA/global-workflow GitHub Wiki
Additional I/O Routines for Error Catching Implementation
Related to PR #673: Add capability to catch bort errors
This document provides a comprehensive list of additional I/O routines in NCEPLIBS-bufr that should have error catching capabilities added, following the pattern established in PR #673. The routines are organized by complexity level to facilitate phased implementation.
Level 1: Simple I/O Query Routines (Low Complexity)
These routines primarily query state or perform simple operations with minimal error paths:
-
ufbcnt(src/openclosebf.F90)- Query message/subset counts for a logical unit
- Returns current message number and subset number
-
ufbqcd(src/cftbvs.F90)- Query code/flag information for a mnemonic
- Returns descriptor and code/flag table information
-
ufbqcp(src/cftbvs.F90)- Query code/flag pair information
- Returns mnemonic for given code/flag descriptor
-
status(src/openclosebf.F90)- Query file connection status
- Returns logical unit status information
-
ufbpos(src/readwritesb.F90)- Position to specific message/subset within file
- Allows random access to specific locations in BUFR file
Level 2: Message-Level I/O Routines (Low-Medium Complexity)
These routines handle complete BUFR messages but have straightforward error handling:
-
readerme(src/readwritemg.F90)- Read BUFR message from memory array
- Processes message from byte array instead of file
-
rdmsgw(src/readwritemg.F90)- Read BUFR message wrapper function
- Low-level message reading utility
-
openmg(src/readwritemg.F90)- Open BUFR message for output without subset initialization
- Alternative to
openmbfor specific use cases
-
closmg(src/readwritemg.F90)- Close current BUFR message for output
- Finalizes message and prepares for next
-
msgwrt(src/readwritemg.F90)- Write BUFR message to file
- Low-level message writing utility
-
copymg(src/copydata.F90)- Copy complete BUFR message from one file to another
- Transfers entire message including all subsets
Level 3: Subset-Level I/O Routines (Medium Complexity)
These routines handle individual data subsets within messages:
-
writsb(src/readwritesb.F90)- Write data subset to output message
- Complements
readsb(already protected)
-
copysb(src/copydata.F90)- Copy data subset from input to output file
- Transfers single subset between files
-
ufbmms(src/memmsgs.F90)- Access specific message/subset from memory
- Direct access to memory-resident data
-
ufbmns(src/memmsgs.F90)- Read next subset from memory
- Sequential access through memory-resident messages
-
readlc(src/readwriteval.F90)- Read long character strings from data subset
- Handles strings longer than standard BUFR descriptors
-
writlc(src/readwriteval.F90)- Write long character strings to data subset
- Stores extended character data in subsets
Level 4: Value-Reading/Writing Routines (Medium-High Complexity)
These are the core UFB* routines that read/write data values - similar to ufbint which already has error catching:
-
ufbrep(src/readwriteval.F90)- Read/write replicated sequences
- Handles repeated data structures within subset
-
ufbstp(src/readwriteval.F90)- Read/write stacked replicated sequences
- Processes nested/stacked replication patterns
-
ufbseq(src/readwriteval.F90)- Read/write entire sequences
- Operates on complete sequence descriptors
-
ufbovr(src/readwriteval.F90)- Overwrite existing values in subset
- Updates values without full subset reconstruction
-
ufbevn(src/readwriteval.F90)- Read/write event stacks
- Handles specialized event sequence structures
-
ufbinx(src/readwriteval.F90)- Read values from specific message/subset without positioning
- Random access to data values
-
ufbget(src/readwriteval.F90)- Retrieve complete table of values
- Bulk extraction of data elements
Level 5: Advanced Memory Operations (High Complexity)
These routines manage BUFR messages in internal memory arrays:
-
ufbmem(src/memmsgs.F90)- Read entire BUFR file into internal memory
- Enables random access to all messages in file
-
ufbmex(src/memmsgs.F90)- Read and expand BUFR file into memory
- Similar to
ufbmemwith additional processing
-
readmm(src/memmsgs.F90)- Read message from internal memory arrays
- Memory-based alternative to
readmg
-
ufbrms(src/memmsgs.F90)- Read values from specific message/subset in memory
- Combines memory access with value extraction
-
ufbtam(src/memmsgs.F90)- Build table of all messages in memory
- Creates index/inventory of memory-resident data
Level 6: Bulk Copy/Transfer Operations (High Complexity)
These routines perform complex multi-step operations:
-
copybf(src/copydata.F90)- Copy entire BUFR file
- Complete file-to-file transfer operation
-
cpymem(src/copydata.F90)- Copy messages from internal memory to output file
- Transfers data from memory arrays to file
-
ufbcpy(src/copydata.F90)- Copy all data from input to output subset
- Complete subset-level data transfer
-
ufbcup(src/copydata.F90)- Copy and update subset data
- Combines copy with value modifications
Level 7: Specialized I/O Operations (Highest Complexity)
These routines have special requirements or complex internal state management:
-
ufbtab(src/openclosebf.F90)- Build table of data from multiple messages
- Aggregates data across message boundaries
- Note: Already has some error handling
-
ufbdmp(src/dumpdata.F90)- Dump formatted contents of data subset
- Human-readable output of subset structure
-
ufdump(src/dumpdata.F90)- Dump formatted contents of entire message
- Complete message structure visualization
-
dxdump(src/dxtable.F90)- Dump dictionary tables to output file
- Outputs internal table definitions
-
openbt(src/openbt.F90)- Open DX BUFR table file (user override point)
- Special case: stub routine meant for user customization
Implementation Priority Recommendations
Phase 1 (Quick Wins): Level 1-2 Routines (Items 1-11)
- Timeline: 1-2 months
- Rationale: Low complexity, high user visibility
- Benefit: Immediate improvement in error handling for common operations
- Testing: Straightforward unit tests for each routine
Phase 2 (Core Protection): Level 3-4 Routines (Items 12-24)
- Timeline: 2-4 months
- Rationale: Most frequently used by applications
- Benefit: Greatest impact on operational reliability
- Pattern: Similar to already-implemented
ufbintin PR #673 - Testing: Extensive testing with real-world data scenarios
Phase 3 (Memory Safety): Level 5 Routines (Items 25-29)
- Timeline: 2-3 months
- Rationale: Critical for applications using memory-mode operations
- Benefit: Prevents memory corruption from cascading errors
- Testing: Focus on memory leak and corruption detection
Phase 4 (Comprehensive Coverage): Level 6-7 Routines (Items 30-38)
- Timeline: 2-3 months
- Rationale: Complete the safety net
- Benefit: Handle edge cases and specialized workflows
- Testing: Integration tests with complex operational scenarios
Key Implementation Considerations
1. Pattern Consistency
- Follow the same
catch_bort_*_c()wrapper pattern established in PR #673 - Maintain consistent error message format and return codes
- Use
bort_target_is_unsetflag to prevent nested error catching
2. Backward Compatibility
- Maintain existing function signatures and behavior
- Ensure applications without error catching still work unchanged
- Return codes should match existing conventions (0 = success, -1 = error)
3. Test Coverage
- Each addition should include test cases similar to
intest14.F90 - Test both successful operations and intentional error conditions
- Verify error messages are captured correctly
- Test nested call scenarios
4. Documentation Updates
- Update user guide (
docs/user_guide.md) with list of protected routines - Add examples of error catching usage for each routine type
- Document error message formats and return codes
- Update API documentation in source code headers
5. Performance Impact
- Minimal overhead from
setjmp/longjmpwhen no errors occur - No performance degradation for applications not using error catching
- Consider profiling critical paths in operational systems
6. C Interface Layer
- Each protected Fortran routine needs a corresponding C wrapper function
- C wrapper uses
setjmpto establish error return point - Fortran routine checks
bort_target_is_unsetflag before proceeding - Error string captured in
moda_bortsmodule arrays
Technical Background
Error Catching Mechanism (from PR #673)
The error catching system implemented in PR #673 uses the following approach:
-
C Language Layer (
borts.c):- Uses
setjmp/longjmpfor non-local jumps - Provides
catch_bort_*_c()wrapper functions for each protected routine - Catches errors from
bort()calls vialongjmp
- Uses
-
Fortran Interface (
borts.F90,bufr_c2f_interface.F90):- Each protected routine checks
bort_target_is_unsetflag - If catching is active, calls corresponding C wrapper
- C wrapper establishes return point and calls back to Fortran
- Prevents nested error catching with flag check
- Each protected routine checks
-
Error Storage (
modules_arrs.F90):moda_bortsmodule stores error statecaught_strarray holds error message textcaught_str_lenindicates error presence (0 = none, >0 = error occurred)bort_target_is_unsetflag prevents recursive catching
Currently Protected Routines (PR #673)
The following routines already have error catching implemented:
openbf- Open BUFR fileclosbf- Close BUFR filereadmg- Read BUFR messagereadns- Read next subsetreadsb- Read data subsetufbint- Read/write data values
Testing Strategy
Unit Tests
- Individual routine error conditions
- Boundary cases and invalid inputs
- Error message content verification
Integration Tests
- Multi-routine workflows with error recovery
- Real-world data processing scenarios
- Performance benchmarking
Regression Tests
- Ensure existing applications continue to work
- Verify backward compatibility
- Test with operational data streams
Platform Coverage
- Linux (GNU, Intel compilers)
- macOS
- Windows (if applicable)
- Various Fortran compiler versions
Success Metrics
- Coverage: Percentage of public API routines with error catching
- Reliability: Reduction in unexpected program terminations
- Usability: Developer feedback on error handling improvements
- Performance: Negligible overhead in production systems
- Adoption: Number of applications utilizing error catching
References
- PR #673: https://github.com/NOAA-EMC/NCEPLIBS-bufr/pull/673
- Issue #671: Original proposal for
setjmp/longjmpimplementation - Issue #675: Python interface abort problem that motivated this work
- Test Program:
test/intest14.F90- Example usage of error catching - Documentation:
docs/user_guide.md- NCEPLIBS-bufr user guide
Contact and Collaboration
For questions or contributions related to error catching implementation:
- Submit issues on GitHub: https://github.com/NOAA-EMC/NCEPLIBS-bufr/issues
- Discuss on pull requests related to error catching
- Contact NCEPLIBS-bufr maintainers for coordination
Document Version: 1.0
Date: October 14, 2025
Author: Technical Analysis of PR #673
Status: Planning Document