Stuck Appeals - TISTATechnologies/caseflow GitHub Wiki
Stuck appeals are AMA or legacy appeals that are temporarily stalled or cannot progress without fixing the underlying data. Stuck appeals delay Veterans from receiving their benefits. This page organizes stuck appeals according to what caused the appeal to be stuck (rather than the symptoms) because the cause (for the most part) determines what solution is applied.
Causes and typical solutions:
- Due to user action -- If the action is accidental or a misunderstanding, a UI redesign or warning message may be needed to alert the user. If the user needs to undo the action, an engineer may need to fix it manually.
- Due to manual modification -- If an engineer made incorrect changes or made changes without also updating other data that is typically changed, then more engineer training and/or better documentation is needed.
- Due to a bug -- These are usually quickly identified and quickly fixed. A solution for reducing bugs is outside the scope of this topic.
- Due to external data service -- An unexpected change in external data may result in a stuck appeal. Caseflow should validate such data at the interface boundary and log errors for assessment, possibly presenting error details to the user so they can address the problem with the external data service.
To streamline investigation into stuck appeals, the following sections describe instances for each cause with the intent of (1) reducing investigation ramp-up time and (2) collecting all possible scenarios for stuck appeals to improve how we address them.
See Stuck Appeals Metabase dashboard for appeals that may be stuck or haven't seen recent activity.
Questions to consider:
- How do we know when we have sufficient coverage of all stuck-appeal scenarios?
- What is left after we start with all appeals, remove known good appeals, and remove known bad appeals?
Resource material
- Sep 2020 detection summary
- https://github.com/department-of-veterans-affairs/caseflow/wiki/Resolving-Background-Job-Alerts
- https://vajira.max.gov/browse/CASEFLOW-1850
- https://hackmd.io/UVDSSMJGRxiQuem5eRlcqw
- https://github.com/department-of-veterans-affairs/caseflow/issues/14307
- https://github.com/department-of-veterans-affairs/caseflow/issues/15121
Tip: To find all the "TODO" items, click Edit and use the browser to search for "TODO".
Typical solution: update UI or user training
Cancelling an appeal causes the RootTask to be cancelled.
- The appeal is not considered stuck if action was unintentional.
- The remedy is outline in Bat Team Quick Ref.
- Other scenarios where Caseflow automatically cancels or closes an appeal or
RootTask
:app/workflows/ama_appeal_dispatch.rb:74: dispatch_task.root_task.update!(status: Constants.TASK_STATUSES.completed) app/workflows/legacy_appeal_dispatch.rb:37: @appeal.root_task.update!(status: Constants.TASK_STATUSES.completed)
If appeal is in a hearing docket and the hearing is withdrawn, then the appeal be cancelled. ref
If stuck for too long, then this job will notify Tango for manual intervention.
- This job provides a count of appeals that might not be closed based on open children tasks under a closed
RootTask
. - Likely cause: A user acts on a dispatched appeal, e.g., MailTask, TrackVeteranTask (maybe a new POA is assigned?), which could be a valid action. If the action is valid, the job should be updated to ignore certain task types (e.g., PR #16500 and PR #17041).
- Likely cause: A user who is assigned to a task becomes inactive.
- We have a manual process of handling inactive users -- see https://github.com/department-of-veterans-affairs/caseflow/wiki/Resolving-Background-Job-Alerts#tasksassignedtoinactiveusers.
- In addition to the
TasksAssignedToInactiveUsersChecker
job, we can monitor the tasks in Metabase -- Metabase chart showing Tasks assigned to Inactive Users and also Task counts assigned to/by Inactive users and Tasks assigned to/by Inactive users.
- There are hearing-related tasks assigned to inactive users. Contact Tango to address them.
- TODO: Ensure original tasks are cancelled during bulk reassignment #14600
- TODO Add feature: Caseflow needs a UI that allows a user to reassign task. This relies on users to know which tasks to reassign (i.e. all tasks assigned to inactive user).
See Stuck Appeal dashboard, specifically:
Typical solution: engineer training and documentation
- For org-tasks (those assigned to orgs) with a
def self.default_assignee
, get statistics on tasks that aren't assigned to thedefault_assignee
.- Tasks that define
default_assignee
: many*MailTask
s,FoiaColocatedTask
,MissingHearingTranscriptsColocatedTask
,ScheduleHearingColocatedTask
, andTranslationColocatedTask
- To check in prod:
- Tasks that define
TranslationColocatedTask.group(:assigned_to_type, :assigned_to_id).count
=> {["Organization", 214]=>888}
FoiaColocatedTask.group(:assigned_to_type, :assigned_to_id).count
=> {["Organization", 202]=>8932}
MissingHearingTranscriptsColocatedTask.group(:assigned_to_type, :assigned_to_id).count
=> {["Organization", 224]=>1389}
ScheduleHearingColocatedTask.joins(:parent).includes(:parent).where.not(parents_tasks: { type:
"ScheduleHearingColocatedTask" }).group(:assigned_to_type, :assigned_to_id).count.sort
=> [[["Organization", 20], 616], [["Organization", 23], 65]]
ScheduleHearingColocatedTask.where(assigned_to: Organization.find(20)).pluck(:created_at).max
=> Tue, 14 Dec 2021 09:49:58 EST -05:00
ScheduleHearingColocatedTask.where(assigned_to: Organization.find(23)).pluck(:created_at).max
=> Mon, 05 Aug 2019 14:52:00 EDT -04:00
# Looks like the default_assignee was changed in 2019
MailTask.joins(:parent).includes(:parent).where.not(parents_tasks: { type: MailTask.descendants.map(&:name) }).group(:assigned_to_type, :assigned_to_id).count.sort
=> [[["Organization", 18], 31318],
[["Organization", 23], 1],
[["Organization", 24], 1],
[["Organization", 202], 1],
[["Organization", 213], 1],
[["Organization", 225], 1],
[["Organization", 330], 1],
[["Organization", 474], 380]]
Organization.where(id: [18,23,24,202,213,225,330,474]).pluck(:id, :type, :name).sort
=> [[18, "MailTeam", "Mail"],
[23, "Colocated", "VLJ Support Staff"],
[24, "AodTeam", "AOD"],
[202, "PrivacyTeam", "Privacy Team"],
[213, "LitigationSupport", "Litigation Support"],
[225, "HearingAdmin", "Hearing Admin"],
[330, "CaseReview", "Case Review"],
[474, "ClerkOfTheBoard", "Clerk of the Board"]]
Assess if there are any problems and maybe create a Metabase chart to monitor any trends or anomalies. Look for outliers in Time spent on tasks grouped by assignee type.
Investigation
# Ignoring MailTasks for now
tts=%w[ ColocatedTask FoiaColocatedTask MissingHearingTranscriptsColocatedTask ScheduleHearingColocatedTask TranslationColocatedTask]
tts.map{|tt| [tt, tt.constantize.default_assignee.name]}
=> [["ColocatedTask", "VLJ Support Staff"],
["FoiaColocatedTask", "Privacy Team"],
["MissingHearingTranscriptsColocatedTask", "Transcription"],
["ScheduleHearingColocatedTask", "Hearings Management"],
["TranslationColocatedTask", "Translation"]]
tts.map{|tt| [tt, tt.constantize.open.where.not(assigned_to_id: tt.constantize.default_assignee.id).count]}
=> [["ColocatedTask", 5153],
["FoiaColocatedTask", 0],
["MissingHearingTranscriptsColocatedTask", 0],
["ScheduleHearingColocatedTask", 3],
["TranslationColocatedTask", 0]]
# Let's examine one of them
tt=ScheduleHearingColocatedTask
# what organizations do the assignees belong to?
tt.open.where.not(assigned_to_id: tt.default_assignee.id).map(&:assigned_to).map{|u| u.organizations.pluck(:name)}
=> [["Hearings Management", "Transcription", "Hearing Admin"],
["Hearings Management", "Transcription", "Hearing Admin"],
["Hearings Management", "Transcription", "Hearing Admin", "Case Movement Team"]]
# Good. They are all in the "Hearings Management" org (aka ScheduleHearingColocatedTask.default_assignee)
TODO: Ticket Appeals with unusual task tree structures #15271) - investigate and fix and/or create tickets to fix the root cause. The following subsections describe specific problems.
TODO: Investigate odd mail task trees #13269
We don't expect colocated org tasks to be children of a colocated user's task Concern is with reporting and these tasks possibly not closing correctly when complete.
- TODO: Monitor Metabase chart - Appeals without active task
- Example: Slack thread
- TODO: some potentially stuck appeals that are not covered by existing stuck-appeal checkers - ticket https://vajira.max.gov/browse/CASEFLOW-1981
TODO: Clean up open tasks with closed parents #13438
- most are legacy appeals
- cause: location 99
- solution: check with the Board and close tasks those that are indeed closed
-
MailTasks
- See "Odd Task Trees > MailTasks" section above before applying solution
- cause: Board processes mail for a dispatched appeal
- possible solution: filter these out from alert -- like in PR #16500
-
TrackVeteranTask
- cause: maybe a new POA is assigned after appeal dispatch?
- possible solution: filter out? cancel task? Check with Team Victor
- mostly
HearingTask
with closed RootTask- 156 are legacy appeals; 18 are AMA
- clean up open hearing tasks with closed parents on legacy appeals #14703
- diagnose open HearingTasks on closed appeals #14748
- cause: ? -- ask Team Tango for help.
- NoShowHearingTask, NoShowHearingTask=>255,
- LegacyAppeal: 250, AMA Appeal: 4
- follow up with Tango on the Hearing-related tasks; some of these may overlap with Inactive users (see section above)
- new InformalHearingPresentationTask?
- cause: ? -- may be related to a POA change. Check with Team Victor
- many single-occurrence problems for certain task types
- cause: ?
- PrivacyActTask / FoiaColocatedTask / PreRoutingFoiaColocatedTask
- cause: FOIA requests? May be false positives.
- possible solution: If these are valid, update the job to ignore these task types
- VeteranRecordRequest
- cause: (from Peter Karman 6-24-2020 in https://github.com/department-of-veterans-affairs/caseflow/issues/13438#issuecomment-649027986): A VeteranRecordRequest is typically closed by the BusinessLine assigned to it and that happens outside of Queue.
- solution: If these are valid, update the job to ignore this task types
Typical solution: update code
TODO: Check Appeals with unusual task tree structures #15271. Some result from manual modification, while others may be cause by buggy code that wasn't tested under certain scenarios.
- TODO: Create job to check for duplicate tasks, i.e. tasks of the same type (and same parent task?) and possibly assigned to the same user.
- cause: An appeal will be stuck if the assignee closes only one of them (if the same user is assigned to both tasks, the remaining open task will remain in their queue, which they may not access). If the user sees duplicate tasks in their queue, they can cancel all-but-one of them to prevent stuck appeals. A job might be useful for detecting bugs that create duplicate tasks.
- The need for this job is motivated by [Epic] Prevent duplicate TranslationTasks for an appeal #11176 and a recent user-caused duplication Appeal with two Assign Tasks dsva-vacols#145 (which is addressed in the next bullet).
- TODO: (Ticket Add validation checks where we expect only one open task #15220) The Board has confirmed certain task types).
TODO: Blocked appeals for unrecognized appellant can be addressed with the new EngineeringTask
.
- Add code to create
EngineeringTask
to Unrecognized Appellant appeals that are ready for dispatch - In prod, add
EngineeringTask
to ~15 Unrecognized Appellant appeals that have been blocked from dispatch
This job alerts on appeals lacking active tasks. There are several possible causes:
- cause: unrecognized appellant
- pattern: lack of
BvaDispatchTask
afterJudgeDecisionReviewTask
is completed - solution: TODO: create
EngineeringTask
to Unrecognized Appellant appeals that are ready for dispatch -- see section above
- pattern: lack of
- cause: task is
on_hold
instead ofassigned
orcancelled
- pattern: artifact of Bat Team solving Cancel-IHP-Task
- solution: TODO: identify the root cause and fix it
-
TrackVeteranTask
,EvidenceSubmissionWindowTask
, andTimedHoldTask
can be on-hold without active children tasks. Since these reside underRootTask
, theRootTask
can be on-hold without active children as well.- solution: TODO: update the job to ignore these task types
We haven't seen any alerts from this since Nov 2020. This section is left for future reference in case similar bugs pop up in the future.
From Handle EvidenceSubmissionWindowTasks with last_submitted_at edge cases #15245
On occasion, we receive an alert that there are task timers that should have been completed but have not been processed.
Solved by PR Handle edge cases for EvidenceSubmissionWindowTasks #15598.
StuckAppealsChecker > AppealsWithClosedRootTaskOpenChildrenQuery (bug in the IHP Task creation code)
- cause: bug in the IHP Task creation code
- solution: TODO: Team Victor will handle the data clean-up and investigate IHPT open after root task closed #14451.
Examples
finds "legacy appeals charged to CASEFLOW in VACOLS with no active Caseflow tasks"
Symptom: Sentry alert
- cause: Sometimes a legacy appeal needs to be redistributed but it can't because there are open tasks (excluding TrackVeteranTask and RootTask).
- Solution: Bat Team Quick Ref
- TODO: Recheck the status of appeals for this related ticket Investigate priority legacy cases that have been stuck at ready to distribute #14597
Symptom: Sentry alert
-
TypeError: app/establishClaim/util/index -
- cause & solution: Legacy Appeal missing a veteran
- due to "a claimant (such as the veteran's spouse or child)'s ID is listed [in VACOLS case record] instead, which causes problems when searching for the veteran".
- cause & solution: Legacy Appeal missing a veteran
Symptom: Sentry alert
-
VACOLS::Case::InvalidLocationError
- cause: "occurs when Caseflow cannot find the user-selected assignee in VACOLS due to a mix-up in CSS_IDs"
- To monitor this problem, Metabase dashboard #17 was created (as a result of Create job to check for users that cannot be found in VACOLS #15268)
- TODO: Create job to check for users that cannot be found in VACOLS (possibly due to a change made in VACOLS)
- Occurs rarely but will stall appeal processing
Examples
This blocks the appeal from progressing.
- Appeal was outcoded successfully but the file didn't get uploaded after 2 days -- Slack thread
- To better triage, Sentry alerts were updated to add more context