Enron Email Database UseCase Doc - lambda-land/VDBMS GitHub Wiki

Enron Email Database UseCase Documents

Use case queries and schemas

Steps take to build VDB for Enron Case Study

  1. Import the enron data into a new database. eg. mysql -u #username# -p #database# < #dump_file#
  2. Run Query VDBMS/databases/enronEmailDB/step1_build_enron_variants.sql (to create view for db variants of different product)
  3. Run Query VDBMS/databases/enronEmailDB/step2_build_enron_vdb.sql (to create v-db)
  4. Get a mysql dump by mysqldump --user=root --no-create-db --no-create-info --password --skip-set-charset --skip-add-locks --skip-disable-keys enron v_alias v_auto_msg v_employee v_filter_msg v_forward_msg v_mailhost v_message v_recipientinfo v_remail_msg > enron_mysql.sql
  5. Adjust the dump for postgres by:
    • removing the set argument commands at the beginning and end of the file by:
      • printing command files to make sure you have the right lines: sed -n '6,11p' enron_mysql.sql
      • then deleting them: sed -n '6,11d' enron_mysql.sql
      • getting the number of lines: wc -l enron_mysql.sql
      • finding out where the commands start: tail -n 8 enron_mysql.sql
      • make sure you have the range right: sed -n '213,220p' enron_mysql.sql
      • and finally deleting them: sed -n '213,220d' enron_mysql.sql
    • adjust the name of the table to postgres by running the following for all tables:
      • find . -name enron_mysql.sql -exec sed -i '' -e 's/INSERT INTO 'v_alias' /INSERT INTO public.v_alias /g' {} +
      • note that in the above command 'v_alias' is actually wrapped in two of `
  6. Create an enron database in postgres and use create queries to create tables, here
  7. Import the postgres dump to postgres by: psql -U postgres enron < enron_mysql.sql
  8. After importing the database connect to postgres and fix the type of is_something attributes in table v_message by:
    • ALTER TABLE v_message ALTER is_forward_msg TYPE boolean USING CASE WHEN is_forward_msg=0 THEN FALSE ELSE TRUE END;
  9. If you like to have a dump of the postgres database do: pg_dump -U postgres enron > enron_postgres.sql

Original Enron Email Database

Data Set

Enron Data is From here

Schema

  • employee(eid, firstname, lastname, email_id, email2, email3, email4, folder, status)

    • size: 149 tuples
  • message(mid, sender, date, message_id, subject, body, folder)

    • size: 252759 tuples
  • recipientinfo(rid, mid, rtype, rvalue, dater)

    • size: 2064442 tuples
  • referenceinfo(rfid, mid, reference)

    • size: 54778

Enron Email Variational Database

1. Schema

  • v_employee(eid, firstname, lastname, email_id, folder, status, verification_key, public_key, presCond)

    • verification_key is responded to signature
    • public_key is responded to encrypt
  • v_message(mid, sender, date, message_id, subject, body, folder, is_system_notification, is_signed, is_encrypted, is_autoresponse, is_forward_msg, presCond)

    • is_signed is responded to signature
    • is_encrypted is responded to encrypt
  • v_recipientinfo(rid, mid, rtype, rvalue, presCond)

  • v_referenceinfo(rfid, mid, reference,presCond)

  • v_auto_msg(eid, subject, body, presCond)

    • whole table will depend on autoresponder
  • v_forward_msg(eid, forwardaddr, presCond)

    • whole table is responded to forwardmessage
  • v_remail_msg(eid, pseudonym, presCond)

    • whole table is responded to remailmessage
  • v_filter_msg(eid, suffix, presCond)

    • whole table is responded to filtermessages
  • v_mailhost(eid, username, mailhost, presCond)

    • whole table is responded to mailhost
  • v_alias(eid, email, nickname, presCond)

    • whole table is responded to addressbook

2. Features in Enron Email V-DB

  • signature
  • addressbook
  • filtermsg
  • autoresponder
  • forwardmsg
  • mailhost
  • encrypt
  • remailmsg

3. Variants Designed For Enron case study

    1. Products for daily use include features:
      • Features:
        • forwardmsg
        • filtermsg
      • Presence Condition(p1):
        • forwardmsg AND filtermsg AND NOT (addressbook OR encrypt OR remailmsg OR autoresponder OR signature OR mailhost)
    1. Products that focus on Privacy
      • Features:
        • signature
        • encrypt
        • remailmsg
      • Presence Condition(p2):
        • signature AND encrypt AND remailmsg AND NOT (addressbook OR filtermsg OR autoresponder OR forwardmsg OR mailhost)
    1. Products that focus on Group Usage
      • features:
        • addressbook
        • autoresponder
        • mailhost
        • signature
        • encrpt
      • Presence Condition(p3):
        • addressbook AND autoresponder AND mailhost AND encrypt AND signature AND (NOT (forwardmsg OR remailmsg OR filtermsg))
    1. Products that enables all features
      • Presence Condition(p4):
        • signature AND addressbook AND filtermsg AND autoresponder AND forwardmsg AND mailhost AND encrypt AND remailmsg
    1. Products that disables all features
      • Presence Condition(p5):
        • (NOT signature) AND (NOT addressbook) AND (NOT filtermsg) AND (NOT autoresponder) AND (NOT forwardmsg) AND (NOT mailhost) AND (NOT encrypt) AND (NOT remailmsg)

Populate the Database

1. Break employee into 5 groups

We have 150 employees in Enron database and 5 product variants for email system, we therefore decide to break 150 employees into 5 groups (A,B,C,D,E), and let those 5 groups people be user of the corresponding 5 different product variants. Assign employee to different group based on employee number (eid) as follows:

  • Product for daily usage: 0 < eid <= 30
  • Product for privacy: 30 < eid <= 60
  • Product for group usage: 60 < eid <= 90
  • Product for all enabled features: 90 < eid <= 120
  • Product for all disabled features: 120 < eid <= 150

2. For v_employee:

v_employee(eid, firstname, lastname, email_id, folder, status, sign, public_key, presCond)

- ---- eid firstname lastname email_id folder status verification_key public_key presCond
Daily xx xx xx xx xx xx NULL NULL p1
Privacy xx xx xx xx xx xx xx xx p2
Group xx xx xx xx xx xx NULL NULL p3
Enable xx xx xx xx xx xx xx xx p4
Disable xx xx xx xx xx xx NULL NULL p5

3. For v_message

v_message(mid, sender, date, message_id, subject, body, folder, is_system_notification, is_signed, is_encrypted, is_from_remailer, is_autoresponse, is_forward_msg, presCond)

mid sender date message_id subject body folde is_system_notification is_signed is_encrypted is_autoresponse is_forward_msg presCond
xx xx xx xx xx xx xx xx NULL NULL NULL xx p1
xx xx xx xx xx xx xx xx xx xx NULL NULL p2
xx xx xx xx xx xx xx xx NULL NULL xx NULL p3
xx xx xx xx xx xx xx xx xx xx xx xx p4
xx xx xx xx xx xx xx xx NULL NULL NULL NULL p5

4. For v_recipientinfo:

v_recipientinfo(rid, mid, rtype, rvalue, presCond)

rid mid rtype rtype rvalue presCond
xx xx xx xx xx p1
xx xx xx xx xx p2
xx xx xx xx xx p3
xx xx xx xx xx p4
xx xx xx xx xx p5

5. For v_referenceinfo:

v_referenceinfo(rfid, mid, reference,presCond) Just like the table v_recipientinfo, we insert presCond for corresponding product data.

6. For other tables:

For that rest of tables, since the whole table depends on a specific features, we will populate the table with user's email data based on which product group they belong to.

References

  1. Fundamental Nonmodularity in Electronic Mail
  2. SPLverifier: Model Checking of Software Product Lines
  3. Feature-Aware Verification
  4. SPL2go-Email JML
  5. SPL2go-EmailSystem_C
  6. SPL2go-EmailSystem_Java