ERROR_CATCHING_IMPLEMENTATION_PLAN - TerrenceMcGuinness-NOAA/global-workflow GitHub Wiki
Plan Version: 1.0
Date: October 14, 2025
Author: Claude (Anthropic) via GitHub Copilot Interface
For: Terrence McGuinness (TerrenceMcGuinness-NOAA)
Repository: NOAA-EMC/NCEPLIBS-bufr
Base Branch: develop
Related PR: #673
- Executive Summary
- Implementation Strategy
- Phase 1: Simple Query Routines (Weeks 1-3)
- Phase 2: Message-Level Operations (Weeks 4-7)
- Phase 3: Subset Operations (Weeks 8-11)
- Phase 4: Value Operations (Weeks 12-17)
- Automated Testing Framework
- Continuous Integration Strategy
- Quality Assurance Checklist
- Risk Management
This plan details the systematic implementation of error catching capabilities for 24 additional I/O routines in NCEPLIBS-bufr, following the pattern established in PR #673. The implementation is divided into 4 phases covering approximately 17 weeks of development time.
Target Routines: 24 functions across 4 complexity levels
Estimated Timeline: 17 weeks
Key Deliverables:
- 24 new C wrapper functions in
borts.c - 24 new Fortran interface functions in
bufr_c2f_interface.F90 - 24 protected recursive subroutines with error catching logic
- 24+ comprehensive test cases in extended test suite
- Updated documentation and examples
- β All 24 routines protected with error catching
- β Zero breaking changes to existing API
- β Test coverage β₯95% for error paths
- β Performance overhead <1% with catching active
- β All CI/CD tests passing
- β Documentation complete and reviewed
Each routine follows this three-layer architecture:
βββββββββββββββββββββββββββββββββββββββ
β Application Program β
β - Activates catching: catch_borts()β
β - Calls I/O routine β
β - Checks errors: check_for_bort() β
ββββββββββββββββ¬βββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββ
β Fortran Layer (*.F90) β
β - Check bort_target_is_unset β
β - If catching, delegate to C β
β - Else, execute normally β
ββββββββββββββββ¬βββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββ
β C Wrapper Layer (borts.c) β
β - setjmp() establishes return pointβ
β - Call back to Fortran via _f() β
β - If longjmp triggered, return β
ββββββββββββββββ¬βββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββ
β Fortran Implementation (_f bridge) β
β - Execute actual routine logic β
β - Any bort() calls longjmp back β
βββββββββββββββββββββββββββββββββββββββ
For each protected routine ROUTINE_NAME:
File 1: src/borts.c
void catch_bort_ROUTINE_NAME(params...) {
if (setjmp(context) == 1) return;
// String null termination if needed
ROUTINE_NAME_f(params...);
}File 2: src/bufr_c2f_interface.F90
recursive subroutine ROUTINE_NAME_c(...) bind(C, name='ROUTINE_NAME_f')
! Convert C strings to Fortran
call ROUTINE_NAME(...)
end subroutineFile 3: src/[source_file].F90 (e.g., readwriteval.F90)
recursive subroutine ROUTINE_NAME(...)
use moda_borts
if (bort_target_is_unset) then
bort_target_is_unset = .false.
caught_str_len = 0
call catch_bort_ROUTINE_NAME_c(...)
bort_target_is_unset = .true.
return
endif
! Normal implementation
end subroutineFile 4: src/bufr_interface.h
void catch_bort_ROUTINE_NAME(params...);File 5: test/intest_ROUTINE_NAME.F90
! Test both error and success paths-
Code Analysis (30 min)
- Read routine source and understand error conditions
- Identify all
call bort()statements - Document function signature and parameters
-
C Wrapper Implementation (45 min)
- Add
catch_bort_ROUTINE_NAME()toborts.c - Handle string parameters (null termination)
- Add declaration to
bufr_interface.h
- Add
-
Fortran Bridge Implementation (30 min)
- Add
ROUTINE_NAME_c()tobufr_c2f_interface.F90 - Handle string conversions (C β Fortran)
- Use
bind(c)attribute correctly
- Add
-
Routine Protection (45 min)
- Add
use moda_bortsto routine - Insert catching check at routine entry
- Verify no nested catch attempts
- Add
-
Test Development (2 hours)
- Create test cases for error conditions
- Create test cases for success paths
- Test edge cases and boundary conditions
-
Build & Unit Test (30 min)
- Compile with all variants (4, 8, d)
- Run unit tests
- Fix compilation errors
-
Integration Test (30 min)
- Run full test suite
- Verify no regressions
- Performance check
-
Documentation (30 min)
- Update routine docstring
- Add example to user guide
- Update API reference
Total per routine: ~6 hours
With automation and iteration: ~4 hours average
| # | Routine | Source File | Complexity | Estimated Hours |
|---|---|---|---|---|
| 1 | ufbcnt |
openclosebf.F90 | Low | 3 |
| 2 | ufbqcd |
cftbvs.F90 | Low | 4 |
| 3 | ufbqcp |
cftbvs.F90 | Low | 4 |
| 4 | status |
openclosebf.F90 | Low | 3 |
| 5 | ufbpos |
readwritesb.F90 | Medium | 5 |
Total Phase 1: 19 hours over 3 weeks (6-7 hours/week)
File: src/openclosebf.F90, line ~598
Current Signature:
recursive subroutine ufbcnt(lunit,kmsg,ksub)
integer, intent(in) :: lunit
integer, intent(out) :: kmsg, ksubError Conditions:
- File not open:
'BUFRLIB: UFBCNT - BUFR FILE IS CLOSED' - Invalid unit: Via
status()call
Implementation Steps:
- borts.c addition:
/**
* Catch any bort error inside of subroutine ufbcnt().
*
* @param lunit - Fortran logical unit number for BUFR file
* @param kmsg - Message number
* @param ksub - Subset number
*
* @author [Your Name] @date 2025-10-14
*/
void
catch_bort_ufbcnt(int lunit, int *kmsg, int *ksub)
{
if (setjmp(context) == 1) return;
ufbcnt_f(lunit, kmsg, ksub);
}- bufr_c2f_interface.F90 addition:
!> Wrap ufbcnt() for C interface.
recursive subroutine ufbcnt_c(lunit, kmsg, ksub) bind(C, name='ufbcnt_f')
integer(c_int), value, intent(in) :: lunit
integer(c_int), intent(out) :: kmsg, ksub
call ufbcnt(lunit, kmsg, ksub)
end subroutine ufbcnt_c- openclosebf.F90 modification:
recursive subroutine ufbcnt(lunit,kmsg,ksub)
use moda_borts
! ... existing declarations ...
! Check for I8 integers
if(im8b) then
! ... existing I8 handling ...
endif
! If we're catching bort errors, set a target return location
if (bort_target_is_unset) then
bort_target_is_unset = .false.
caught_str_len = 0
call catch_bort_ufbcnt_c(lunit, kmsg, ksub)
bort_target_is_unset = .true.
return
endif
! ... rest of routine unchanged ...- Test case in intest14.F90 (or new intest15.F90):
! Test ufbcnt with invalid unit
call ufbcnt(999, kmsg, ksub)
call check_for_bort(errstr, errstr_len)
if (errstr_len <= 0 .or. index(errstr(1:errstr_len), 'INPUT UNIT NUMBER') == 0) stop 20File: src/cftbvs.F90, line ~395
Current Signature:
recursive subroutine ufbqcd(lunit,nemo,iqcd)
integer, intent(in) :: lunit
character*(*), intent(in) :: nemo
integer, intent(out) :: iqcdError Conditions:
- File not open
- Invalid mnemonic
- Not a code/flag descriptor
Implementation: Follow same pattern as ufbcnt, with string handling for nemo
Similar detailed implementation for each routine...
File: src/readwritesb.F90, line ~899
Complexity Note: Medium complexity due to subset positioning logic
| # | Routine | Source File | Complexity | Estimated Hours |
|---|---|---|---|---|
| 6 | readerme |
readwritemg.F90 | Medium | 5 |
| 7 | rdmsgw |
readwritemg.F90 | Medium | 4 |
| 8 | openmg |
readwritemg.F90 | Medium | 5 |
| 9 | closmg |
readwritemg.F90 | Medium | 4 |
| 10 | msgwrt |
readwritemg.F90 | Medium | 5 |
| 11 | copymg |
copydata.F90 | High | 6 |
Total Phase 2: 29 hours over 4 weeks (7-8 hours/week)
File: src/readwritemg.F90, line ~223
Current Signature:
recursive subroutine readerme(mesg,lunit,subset,jdate,iret)
integer, intent(in) :: mesg(:), lunit
character*8, intent(out) :: subset
integer, intent(out) :: jdate, iretSpecial Considerations:
- Reads from memory array, not file
- No C wrapper needed for
mesgarray (Fortran only) - Use
c_loc()for array pointer if needed
C Wrapper:
void
catch_bort_readerme(int *mesg, int mesg_len, int lunit, char *subset,
int *jdate, int subset_str_len, int *iret)
{
if (setjmp(context) == 1) return;
readerme_f(mesg, mesg_len, lunit, subset, jdate, subset_str_len, iret);
}Error Conditions to Test:
- Invalid message data
- Corrupted BUFR structure
- Table mismatch
Both routines in readwritemg.F90, standard message open/close operations.
copymg is higher complexity - requires careful testing of table synchronization.
| # | Routine | Source File | Complexity | Estimated Hours |
|---|---|---|---|---|
| 12 | writsb |
readwritesb.F90 | Medium | 5 |
| 13 | copysb |
copydata.F90 | High | 6 |
| 14 | ufbmms |
memmsgs.F90 | Medium | 5 |
| 15 | ufbmns |
memmsgs.F90 | Medium | 5 |
| 16 | readlc |
readwriteval.F90 | Medium | 5 |
| 17 | writlc |
readwriteval.F90 | Medium | 5 |
Total Phase 3: 31 hours over 4 weeks (7-8 hours/week)
- Complement to
readsb(already protected) - Write operations require careful error handling
- Test with message overflow scenarios
- Complex multi-file operations
- Table synchronization critical
- Test with mismatched tables
- Access internal memory arrays
- Pointer safety critical
- Test memory boundary conditions
- String length validation
- Buffer overflow prevention
- Test with maximum string lengths
| # | Routine | Source File | Complexity | Estimated Hours |
|---|---|---|---|---|
| 18 | ufbrep |
readwriteval.F90 | High | 7 |
| 19 | ufbstp |
readwriteval.F90 | High | 7 |
| 20 | ufbseq |
readwriteval.F90 | High | 7 |
| 21 | ufbovr |
readwriteval.F90 | High | 6 |
| 22 | ufbevn |
readwriteval.F90 | High | 7 |
| 23 | ufbinx |
readwriteval.F90 | High | 6 |
| 24 | ufbget |
readwriteval.F90 | Medium | 5 |
Total Phase 4: 45 hours over 6 weeks (7-8 hours/week)
These routines are similar to ufbint (already protected), following the same pattern:
Common Pattern:
recursive subroutine ufb<NAME>(lunin,usr,i1,i2,iret,str)
use moda_borts
! I8 handling...
if (bort_target_is_unset) then
bort_target_is_unset = .false.
caught_str_len = 0
call strsuc(str,cstr,lcstr) ! String cleanup
call catch_bort_ufb<NAME>_c(lunin,usr,i1,i2,iret,cstr,lcstr)
bort_target_is_unset = .true.
return
endif
! Normal implementation...Week 12: ufbrep - Replicated sequences
Week 13: ufbstp - Stacked replication
Week 14: ufbseq - Complete sequences
Week 15: ufbovr - Value overwriting
Week 16: ufbevn - Event stacks
Week 17: ufbinx & ufbget - Random access and bulk retrieval
Create modular test framework for iterative development:
File: test/test_error_catching_lib.F90 (New utility module)
module test_error_catching_lib
implicit none
integer, parameter :: MAX_ERRSTR_LEN = 400
contains
!> Test helper: Verify error was caught
subroutine expect_error(test_name, errstr, errstr_len, expected_substr, stop_code)
character(*), intent(in) :: test_name, expected_substr
character(*), intent(in) :: errstr
integer, intent(in) :: errstr_len, stop_code
if (errstr_len <= 0) then
print *, 'FAIL:', test_name, '- No error caught'
stop stop_code
endif
if (index(errstr(1:errstr_len), expected_substr) == 0) then
print *, 'FAIL:', test_name, '- Wrong error message'
print *, 'Expected substring:', expected_substr
print *, 'Got:', errstr(1:errstr_len)
stop stop_code
endif
print *, 'PASS:', test_name
end subroutine
!> Test helper: Verify no error occurred
subroutine expect_success(test_name, errstr_len, stop_code)
character(*), intent(in) :: test_name
integer, intent(in) :: errstr_len, stop_code
if (errstr_len /= 0) then
print *, 'FAIL:', test_name, '- Unexpected error'
stop stop_code
endif
print *, 'PASS:', test_name
end subroutine
!> Test helper: Clear error state
subroutine clear_error()
character(MAX_ERRSTR_LEN) :: errstr
integer :: errstr_len
call check_for_bort(errstr, errstr_len)
end subroutine
end module test_error_catching_libPhase 1 Tests: test/intest15_phase1.F90
program test_phase1_error_catching
use test_error_catching_lib
implicit none
integer :: catch_borts, isetprm
print *, 'Testing Phase 1: Query Routines Error Catching'
if (isetprm('NFILES', 10) /= 0) stop 1
if (catch_borts('Y') /= 0) stop 2
call test_ufbcnt()
call test_ufbqcd()
call test_ufbqcp()
call test_status()
call test_ufbpos()
print *, 'All Phase 1 tests PASSED'
contains
subroutine test_ufbcnt()
integer :: kmsg, ksub
character(MAX_ERRSTR_LEN) :: errstr
integer :: errstr_len
! Test 1: Invalid unit number
call ufbcnt(999, kmsg, ksub)
call check_for_bort(errstr, errstr_len)
call expect_error('ufbcnt_invalid_unit', errstr, errstr_len, &
'INPUT UNIT NUMBER', 100)
call clear_error()
! Add more test cases...
end subroutine
! Similar for other routines...
end programPhase 2 Tests: test/intest16_phase2.F90
Phase 3 Tests: test/intest17_phase3.F90
Phase 4 Tests: test/intest18_phase4.F90
Script: test/run_incremental_tests.sh
#!/bin/bash
# Run incremental error catching tests
set -e
echo "Building NCEPLIBS-bufr with error catching..."
cd build
make -j4
echo ""
echo "Running Phase 1 tests..."
ctest -R "intest15.*" --output-on-failure
echo ""
echo "Running Phase 2 tests..."
ctest -R "intest16.*" --output-on-failure
echo ""
echo "Running Phase 3 tests..."
ctest -R "intest17.*" --output-on-failure
echo ""
echo "Running Phase 4 tests..."
ctest -R "intest18.*" --output-on-failure
echo ""
echo "Running all original tests for regression check..."
ctest -R "intest[0-9]_" --output-on-failure
echo ""
echo "All tests PASSED!"Script: test/benchmark_error_catching.sh
#!/bin/bash
# Benchmark performance impact of error catching
ITERATIONS=1000
TEST_FILE="testfiles/data/prepbufr.nqc.200911241030"
echo "Benchmarking error catching performance..."
# Baseline: catching disabled
echo "Running baseline (no catching)..."
time for i in $(seq 1 $ITERATIONS); do
./intest_perf_baseline
done
# With catching: inactive
echo "Running with catching activated but no errors..."
time for i in $(seq 1 $ITERATIONS); do
./intest_perf_catching_inactive
done
# With catching: error triggered
echo "Running with catching and errors..."
time for i in $(seq 1 10); do # Fewer iterations
./intest_perf_catching_active
doneFile: .github/workflows/error_catching_dev.yml
name: Error Catching Development CI
on:
push:
branches: [ error_catching_impl ]
pull_request:
branches: [ develop ]
jobs:
build-and-test:
strategy:
matrix:
os: [ubuntu-latest, macos-latest]
compiler: [gnu, intel]
kind: [4, 8, d]
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v3
- name: Setup Fortran
uses: fortran-lang/setup-fortran@v1
with:
compiler: ${{ matrix.compiler }}
- name: Configure
run: |
mkdir build && cd build
cmake .. -DTEST_FILE_DIR=../test_data
- name: Build
run: |
cd build
make -j4
- name: Run Unit Tests
run: |
cd build
ctest --output-on-failure -R "intest1[5-8]"
- name: Run Regression Tests
run: |
cd build
ctest --output-on-failure
- name: Performance Check
run: |
cd build/test
bash benchmark_error_catching.shFile: .git/hooks/pre-commit (Developer setup)
#!/bin/bash
# Pre-commit hook for error catching development
echo "Running pre-commit checks..."
# Check for common mistakes
echo "Checking for bort_target_is_unset flag management..."
if git diff --cached | grep -E "bort_target_is_unset\s*=\s*\.false\." | grep -v "\.true\."; then
echo "ERROR: Found bort_target_is_unset = .false. without matching .true."
echo "Ensure proper flag reset in all error catching blocks."
exit 1
fi
# Run quick tests
echo "Running quick unit tests..."
cd build && ctest -R "intest14" --output-on-failure
if [ $? -ne 0 ]; then
echo "ERROR: Unit tests failed. Fix before committing."
exit 1
fi
echo "Pre-commit checks passed!"
exit 0For each routine, verify:
-
Code Implementation
- C wrapper function added to
borts.c - C function declared in
bufr_interface.h - Fortran bridge added to
bufr_c2f_interface.F90 - Original routine modified with catching check
-
use moda_bortsadded to routine -
bort_target_is_unsetflag properly managed - String parameters null-terminated in C
- String conversions correct in Fortran bridge
- C wrapper function added to
-
Testing
- Test case for invalid unit number
- Test case for closed file
- Test case for invalid parameters
- Test case for success path
- Test case for edge conditions
- All tests pass with KIND_4
- All tests pass with KIND_8
- All tests pass with KIND_d
-
Documentation
- Routine docstring updated with error catching note
- Parameter descriptions accurate
- Example usage provided
- Error messages documented
-
Integration
- CMakeLists.txt updated if needed
- No compilation warnings
- No regression in existing tests
- Performance impact measured (<1%)
At the end of each phase:
- All phase routines implemented
- Phase test program complete and passing
- Regression tests all passing
- Performance benchmarks acceptable
- Code review completed
- Documentation merged
- Git branch merged to develop
- Release notes updated
| Risk | Probability | Impact | Mitigation |
|---|---|---|---|
| Breaking API changes | Low | Critical | Strict testing, backward compat checks |
| Performance degradation | Low | High | Continuous benchmarking, optimization |
| Memory leaks | Medium | High | Valgrind testing, careful pointer management |
| Thread safety issues | Medium | Medium | Document limitations, add thread tests |
| Test data corruption | Low | Medium | Version control test files, checksums |
| Incomplete error coverage | Medium | Medium | Systematic error condition analysis |
| Compilation errors | Medium | Low | Multi-compiler CI, incremental testing |
| Merge conflicts | High | Low | Frequent rebasing, small PRs |
If schedule slips:
- Reduce scope to critical routines only
- Extend timeline with stakeholder approval
- Add temporary resources
If critical bug found:
- Immediate rollback capability (feature flags)
- Hot-fix process defined
- Escalation path established
If performance issues:
- Profile and optimize hot paths
- Consider lazy initialization
- Add runtime configuration options
Weekly tracking:
- Routines completed vs. planned
- Test coverage percentage
- Performance overhead measurement
- Code review turnaround time
Phase gates:
- All tests passing
- No P0/P1 bugs outstanding
- Documentation complete
- Stakeholder sign-off
# Clean build
rm -rf build && mkdir build && cd build
cmake .. -DTEST_FILE_DIR=../test_data
make -j4
# Incremental build
cd build && make -j4# Run all error catching tests
ctest -R "intest1[4-8]" --output-on-failure
# Run specific phase
ctest -R "intest15" --output-on-failure
# Run with verbose output
ctest -R "intest14" -V
# Run performance benchmarks
cd build/test && bash benchmark_error_catching.sh# Build with debug symbols
cmake .. -DCMAKE_BUILD_TYPE=Debug
# Run under valgrind
valgrind --leak-check=full ./intest14_4
# Run under gdb
gdb --args ./intest14_4/**
* Catch any bort error inside of subroutine <ROUTINE_NAME>().
*
* @param <param1> - <description>
* @param <param2> - <description>
*
* @author <Your Name> @date <Date>
*/
void
catch_bort_<ROUTINE_NAME>(<params>)
{
/* Set the target location to which to return if a bort error is caught. */
if (setjmp(context) == 1) return;
/* Add trailing null to string parameters if needed */
<string_param>[<string_len>] = '\0';
/* Recursively call the subroutine. */
<ROUTINE_NAME>_f(<params>);
}!> Wrap <ROUTINE_NAME>() for C interface.
!>
!> @param <param1> - <description>
!> @param <param2> - <description>
!>
!> @author <Your Name> @date <Date>
recursive subroutine <ROUTINE_NAME>_c(<params>) bind(C, name='<ROUTINE_NAME>_f')
use iso_c_binding
integer(c_int), value, intent(in) :: <int_params>
integer(c_int), intent(out) :: <out_params>
character(kind=c_char), intent(in) :: <string_params>(*)
character(len=:), allocatable :: f_string
! Convert C strings to Fortran if needed
f_string = c_f_string(<string_param>)
call <ROUTINE_NAME>(<params>)
end subroutine <ROUTINE_NAME>_crecursive subroutine <ROUTINE_NAME>(<params>)
use moda_borts
! ... other use statements ...
implicit none
! ... parameter declarations ...
! Check for I8 integers
if(im8b) then
! ... I8 handling ...
endif
! If we're catching bort errors, set a target return location if one doesn't already exist.
if (bort_target_is_unset) then
bort_target_is_unset = .false.
caught_str_len = 0
call catch_bort_<ROUTINE_NAME>_c(<params>)
bort_target_is_unset = .true.
return
endif
! ... normal routine implementation ...
end subroutine <ROUTINE_NAME>End of Implementation Plan - Phase 1-4 (Routines 1-24)
Next Steps:
- Review and approve plan
- Set up development branch
- Configure CI/CD pipeline
- Begin Phase 1 Week 1 implementation
Questions? Contact: Terrence McGuinness (TerrenceMcGuinness-NOAA)