Client Record De‐Duplication FAQ - greenriver/hmis-warehouse GitHub Wiki

The warehouse has three client de-duplication mechanisms. Deterministic, Probabilistic, and automated.

The deterministic mechanism that looks at three items:

  1. Name
  2. Valid Social Security Number
  3. Valid Date of Birth

Every time a new client is introduced into the warehouse, a duplicate check is run that looks for exact matches between two of the three items. This is checked against each client already in the warehouse and any incoming clients. If an exact match is found, the records are bound together, but the source data is not discarded.

In addition to looking for “obvious” matches, we run statistical analysis on each client looking for probabilistic matches. This is an area of the application where we can take human interaction and improve match accuracy.

The current set takes into account:

  1. Name
  2. Date of Birth
  3. Social Security Number
  4. Race
  5. Gender
  6. Veteran Status

Finally, we can use the score from the probabilistic match to automate the acceptance and rejection of matches.

The HMIS will take a fairly aggressive approach of attempting to prevent initial inside insert of duplicate clients. When adding a client into the HMIS there is a workflow that will prompt users multiple times to confirm that other people are not a match for the client they are entering. It will still allow duplicate entry because there are people with similar info out there.