Data Privacy and Security Framework - VSAResearchGroup/MathEngine GitHub Wiki

Objective

VSA handles sensitive and confidential data of students which can allow the onlooker to extract Personally Identifiable Information about the students. Such data may become of harmful use for the student and hence, stringent definition of rules and policies is required. The subsequent sections of this document provide various aspects, policies and procedures that must be followed by designers, developers and operational-staff of the Virtual Student Advisor, to help maintain privacy and security of educational records.

3 Key Points

While handling, the data residing at rest or in transition, within, to, or from VSA, the following three aspects must be taken into consideration, as the key pillars for the data-security policy:

  1. Sensitivity of Data
    Sensitive data encompasses a wide range of information and may include SSNs, student enrollment number, student ID, date-of-birth, religious or similar beliefs, criminal offences, etc.

  2. Accessibility of Data
    The ability of authorizing an individual or a group of individuals, or an information system to handle and use the VSA data.

  3. Volume of Data
    Large number of students enroll from community colleges to university degree programs. Each of such student’s record generates large amount of interrelated confidential and personally identifiable information. It must be always kept into consideration that VSA would be handling vast amounts of such information and this information must be handled with due care and diligence.

What can be Vulnerable?

Other than direct declassification, educational records can also be susceptible to various other forms of attacks which may reveal about student’s identity, financial and social background, racial and background information, health conditions and so on. These educational records may include:

  • Personally, Identifiable information.
  • Biographical information.
  • Courses taken and other school participation and activities.
  • Non-school and post-school experience.
  • Assessment information.
  • Student’s financial and financial aid records.
  • Transportation.
  • Health conditions.
  • Special program participation and student support services received.
  • Discipline/concentration information.

Educational records can be manipulated in various ways to identify a student from a group of people, which otherwise, may be not be possible to achieve with segregated and isolated pieces of information. This may leave the student vulnerable to various attacks.

The following sections will encapsulate data manipulations which may lead to conclusive results about individual/group of students. These activities can be categorized under two categories:

  1. Quantitative data manipulations
    Quantitative data manipulations include construction of combinations and permutations of various pieces of data, or, data as a whole, in order to generate inferences.
  2. Qualitative data manipulations
    Qualitative data manipulations account for the demographics, purpose and other similar features for or according to which the data has been collected. Qualitative data manipulations usually account for data-privacy concerns.

Quantitative Data Manipulations

Data Aggregation

Different parts of student’s educational records can be correlated by union, summation, concatenation, pivoting and various other constructs within relational databases to yield some form of aggregated data. These individual pieces of data by themselves may not be adequate to reveal additional information but may act otherwise when aggregated. For example: although, the first name and date-of-birth of a person may be very common in general and hence they may not be adequate to extract data from a relational database, but a database query including first name union date-of-birth may reveal student’s complete educational records.

De – Identification

Data pieces distributed across various records requires at least one primary token to co-relate all these records to a common identity. Usually, the primary token relates to every record directly, i.e. a student ID would be a primary token for student’s name as well his/her registered bank account records. Bank account records being unique and similar in nature to student ID, can also be used to query the database and yield exactly similar results as in the case of student ID.

Such sensitive educational records need to be de-identified, when the data is at rest or not being used in the right context, as stated in the example above.

Anonymization

De-identification of data-records specifically in context to student’s name, date-of-birth or PII record which directly concludes the student’s identity is known as anonymization. Anonymization of data-records, at the most basic level, prevents identification of a student by querying/accessing a data-record. For example: student ID can be queried to reveal student’s name.

Authentication and Process Control

Authentication identifies those personnel who/what are required to access the educational records besides the end-users. Access and process control limit the depth to which the educational records may be accessed by some authorized personnel or an authorized information system. For example, the Registrar of the college may view all student’s records in full capacity but the IT administrator of the college may not have the permission to perform an automated deep-packet inspection.

Training

Secure backup and disaster resilience policies should be prepared for the information system such that it would enable incidence response and recovery in-case of a disaster. Both, the staff personnel and educators must collaborate towards designing and developing disaster recovery plan and they should be trained and made aware of such operational aspects of VSA.

Qualitative Data Manipulations/Privacy Concerns

Purpose Selection

Collection of student’s educational records should only be purposefully collected, in accordance with FERPA compliance. The purpose of collection of data must be explicitly mentioned and clearly explained to the end-user. One of the common ways to do this is by the means of issuing a disclaimer before each request for data and issuing a notification before each usage of the data.

Data Collection Limitation

Data collection should be governed by the principals of “do as we go”. User-data should only be collected as per the requirements. Excess data may lead to information leak. It may also impose various legal restrictions upon its handlers with regard to its security and privacy.

Data Quality

Data being collected may not be required to be thoroughly accurate. An estimation of collected-data may also be used in a similar strength as in its exactly accurate form. For example, educational records being collected can be represented in the form of approximation and percentages and may still be useful without being exact.

Openness

Student educational records may not be only used by the college information system. Various public and governmental agencies and federal educational bodies might be required to gain transparent access to student educational records on a regular basis to audit and ensure compliance with FERPA. VSA must be capable of sharing its data with various governmental and federal educational bodies, on demand.

Individual Participation

Individual Participation or No Mandating defines if the end-user may be authorized to search or access all their records in VSA databases.

Accountability

Minimum service level agreement must be defined explicitly, including the security features being provided by VSA. Minimum service level agreement must include explicit information about Responsibilities and Liabilities of VSA towards its various stakeholders.

Assumption

Conceptualization of VSA is based upon various assumptions which must be explicitly enumerated and clearly documented.