Employee Database UseCase Doc - lambda-land/VDBMS GitHub Wiki

Steps taken to build employee v-db:

  1. Install original employees example database in https://github.com/datacharmer/test_db .
  2. Run query in VDBMS/databases/employeeDB/step1_chop_employees.sql (to create view table based on schema evolution example).
  3. Run query in VDBMS/databases/employeeDB/step2_build_vdb.sql (to create variational database for employee use case).
  4. Migrate data from MySQL to PostgreSQL using pgloader:
    • install the the pgloader in pgloader github link
    • edit content in the lisp script located at: /VDBMS/databases/employeeDB/step3_migrate.lisp
    • run the lisp script to migrate: run command pgloader step3_migrate.lisp in terminal
  5. If you couldn't get setp 4 to work follow the steps 4-9 except step 8 for creating enron email vdb, here, and adjust it to the employee database.

The Schemas and Queries encoded in Haskell are located at:

Link: https://github.com/lambda-land/VDBMS/tree/VDBMSCaseStudy/src/VDBMS/UseCases/EmployeeUseCase

Employee Database Schema:

Where the schema comes from:

The schema evolution of employee database is from paper Managing and Querying Transaction-time Databases under Schema Evolution

Queries from prima paper

Where the data set comes from:

The data set in MySql format is from https://github.com/datacharmer/test_db

The syntax of table name:

  • v[x]_[table name]: the table in version x
  • v_[table name]: the table in variational database

Schema:

  • Plain database schema
    • schema version 1
      • v1_engineerpersonnel (empno, name, hiredate, title, deptname)
      • v1_otherpersonnel (empno, name, hiredate, title, deptname)
      • job (title, salary)
    • schema version 2
      • v2_empacct (empno, name, hiredate, title, deptname)
      • job (title, salary)
    • schema version 3
      • v3_empacct (empno, name, hiredate, title, deptno)
      • job (title, salary)
      • dept (deptname, deptno, managerno)
    • schema version 4
      • v4_empacct (empno, hiredate, title, deptno)
      • job (title, salary)
      • dept (deptname, deptno, managerno)
      • v4_empbio (empno, sex, birthdate, name)
    • schema version 5
      • v5_empacct (empno, hiredate, title, deptno, salary)
      • dept (deptname, deptno, managerno)
      • v5_empbio (empno, sex, birthdate, firstname, lastname)
  • variational database
    • v_engineerpersonnel (empno, name, hiredate, title, deptname, presCond)
    • v_engineerpersonnel (empno, name, hiredate, title, deptname, presCond)
    • v_job (title, salary, presCond)
    • v_empacct (empno, name, hiredate, title, deptname, deptno, salary, presCond)
    • v_dept (deptname, deptno, managerno, presCond)
    • v_empbio (empno, sex, birthdate, name,lastname,firstname, presCond)

V-Schema for Employee UseCase

Note that elem == elemTrue (Element has feature expression as True, we omit True here.

  • v_engineerpersonnelv1(empno, name, hiredate, title, deptname)
  • v_otherpersonnelv1 (empno, name, hiredate, title, deptname)
  • v_empacctv2 or v3 or v4 or v5(empno, namev2 or v3, hiredate, title, deptnamev2, deptnov3 or v4 or v5, salaryv5)
  • v_jobv2 or v3 or v4 (title, salary)
  • v_deptv3 or v4 or v5 (deptname, deptno, managerno)
  • v_empbiov4 or v5 (empno, sex, birthdate, namev4, firstnamev5, lastnamev5)

Feature Model

( v1 and not v2 and not v3 and not v4 and not v5) or

(not v1 and v2 and not v3 and not v4 and not v5) or

(not v1 and not v2 and v3 and not v4 and not v5) or

(not v1 and not v2 and not v3 and v4 and not v5) or

(not v1 and not v2 and not v3 and not v4 and v5)

Populate the database

Divided employees into 5 variants based on their hire date. We have 2 options here, but our experiments are based on the option 2.

Option 1: Chop the employees(total: 300024) into 5 parts and do not migrate data of old version to new version

Based on original number of employees.

  • variant1: hire_date < '1988-01-01'
    • total number: 104967
  • variant2: '1988-01-01' <= hire_date < '1991-01-01'
    • total number: 85440
  • variant3: '1991-01-01' <= hire_date < '1994-01-01'
    • total number: 60742
  • variant4: '1994-01-01' <= hire_date < '1997-01-01'
    • total number: 36524
  • variant5: hire_date >= '1997-01-01'
    • total number: 12351

Option 2: Copy data of old version to new version according to original employees.

  • variant1: employees whose hire_date < '1988-01-01'
    • total number: 104967
  • variant2: employees whose hire_date < '1991-01-01'
    • total number: 190407
  • variant3: employees whose hire_date < '1994-01-01'
    • total number: 251149
  • variant4: employees whose hire_date < '1997-01-01'
    • total number: 287673
  • variant5: employees whose hire_date < '2000-01-28'
    • total number: 300024

Strategy for populating the employee_vdb based on option 2:

  1. Chop employees into 5 parts (A,B,C,D,E) based on hire_date and migrate from old version to new one. Based on original number of employees whose to_date is '9999-01-01', we only experiment on the "permanent" employee.

    • variant1: hire_date < '1988-01-01'
      • total number: 83815 (42247 engineer + 41568 other)
    • variant2: hire_date < '1991-01-01'
      • total number: 152216 (83815 + 68401)
    • variant3: hire_date < '1994-01-01'
      • total number: 201006 (152216 + 48790)
    • variant4: hire_date < '1997-01-01'
      • total number: 230297 (201006 + 29291)
    • variant5: `hire_date < '2000-01-28'``
      • total number: 240124 (230297 + 9827)
  2. For v-table v_engineerpersonnel and v_otherpersonnel:

    1. populate employees in A whose title is Enginner into enginnerpersonnel
    2. insert the rest of employees of A into otherpersonnel
    3. The presCond value for these 2 tables shoule be v1
  3. For v-table ``v_empacct`:

    1. Employees in B,C,D,E will be inserted into v_empacct
- empno name hiredate title deptname deptno salary presCond
B xx xx xx xx xx NULL NULL v2
C xx xx xx xx NULL xx NULL v3
D xx NULL xx xx NULL xx NULL v4
E xx NULL xx xx NULL xx xx v5
  1. For v-table v_job, Employee in A,B,C,D,E will be inserted in it.
  2. For v-table v_dept, employes in C, D, E will be inserted in it as follows:
- deptname deptno managerno presCond
C xx xx xx v3
D xx xx xx v4
E xx xx xx v5
  1. For v-table v_empbio, employee in D, E will be inserted in it accordingly:
- empno sex birthdate name firstname lastname presCond
D xx xx xx xx NULL NULL v4
E xx xx xx NULL xx xx v5

NOTE:

The database is originally created in mySql, and converted to SQLite3 compatible dump by use tool mysql2sqlite.

  • The dump file is located in ./VDBMS/databases/employeeDB
    • mySql dump file: emp_vdb.sql
    • sqlite3 dump file:emp_vdb.db
⚠️ **GitHub.com Fallback** ⚠️