Introduction to DBMS (Database Management System ) - CloudScope/DevOpsWithCloudScope GitHub Wiki

What is a DBMS?

A Database Management System (DBMS) is software that enables users to define, create, maintain, and control access to databases. It serves as an intermediary between users and the database, allowing for efficient data management, retrieval, and manipulation.

Types of DBMS

  1. Hierarchical DBMS: Data is organised in a tree-like structure. Each record has a single parent and can have multiple children. Example: IBM Information Management System (IMS).

  2. Network DBMS: Similar to hierarchical DBMS but allows more complex relationships with multiple parent-child relationships. Example: Integrated Data Store (IDS).

  3. Relational DBMS (RDBMS): Data is organised in tables (relations). Each table consists of rows and columns, and relationships are established through foreign keys. Example: MySQL, PostgreSQL, Oracle.

  4. Object-oriented DBMS: Data is represented in the form of objects, as used in object-oriented programming. Example: ObjectDB.

  5. NoSQL DBMS: Designed for large-scale data storage and retrieval. They support various data models (document, key-value, column-family, graph). Example: MongoDB, Cassandra.

Key Components of DBMS

  • DBMS Engine: The core service that handles data storage, retrieval, and management.
  • Database Schema: The structure that defines the organization of data.
  • Query Processor: Translates user queries into a form that the DBMS can process.
  • Data Model: Defines how data is stored and accessed (e.g., relational, hierarchical).
  • Database Access Language: Typically SQL (Structured Query Language) for relational databases.

Functions of DBMS

  1. Data Definition: Defines the structure of data (schemas).
  2. Data Manipulation: Allows users to insert, update, delete, and query data.
  3. Data Security: Manages user access and permissions to ensure data integrity and confidentiality.
  4. Data Integrity: Ensures accuracy and consistency of data through constraints and rules.
  5. Data Backup and Recovery: Provides mechanisms for data recovery in case of failures.

Advantages of DBMS

  • Data Abstraction: Simplifies data management for users.
  • Data Sharing: Allows multiple users to access and manipulate data concurrently.
  • Data Security: Offers robust security features to protect sensitive information.
  • Improved Data Consistency: Reduces redundancy and ensures data accuracy.
  • Easier Data Management: Provides tools for managing large amounts of data efficiently.

Disadvantages of DBMS

  • Complexity: Requires specialized knowledge for setup and management.
  • Cost: Licensing, hardware, and maintenance costs can be high.
  • Performance Overhead: Might introduce latency due to additional processing layers.

Use Cases

  • Enterprise Applications: ERP, CRM systems.
  • Web Applications: Online stores, content management systems.
  • Data Warehousing: Storing and analyzing large datasets for business intelligence.

Super Keys in RDBMS

A super key is a set of one or more attributes (columns) that, when taken collectively, can uniquely identify a record (row) in a relational database table. The concept of super keys is fundamental to understanding database normalization and integrity.

Key Characteristics of Super Keys

  1. Uniqueness: Every super key must ensure that no two rows in a table can have the same values for the attributes that comprise the super key.

  2. Redundancy: A super key can have additional attributes that are not necessary for unique identification. For instance, if an employee table has a combination of (EmployeeID, Name) as a super key, it is still a super key if we add the employee's address, making it (EmployeeID, Name, Address).

  3. Set of Attributes: A super key can consist of a single attribute or a combination of multiple attributes.

Example

Consider a table named Employees with the following columns:

  • EmployeeID
  • FirstName
  • LastName
  • Email

Possible super keys could include:

  • {EmployeeID}: Unique identifier.
  • {Email}: Assuming emails are unique for each employee.
  • {EmployeeID, FirstName}: While not necessary to include FirstName for uniqueness, this combination still qualifies as a super key.
  • {EmployeeID, LastName, Email}: Another valid super key that includes unnecessary attributes.

Distinction from Other Keys

  • Candidate Key: A minimal super key. A candidate key has no unnecessary attributes; removing any attribute from it would mean it no longer uniquely identifies records. For the Employees table, EmployeeID and Email could be considered candidate keys.

  • Primary Key: A special type of candidate key that is chosen to uniquely identify records in a table. Only one primary key can be designated per table, and it cannot contain NULL values.

  • Alternate Key: Any candidate key that is not selected as the primary key. For example, if EmployeeID is the primary key, then Email would be an alternate key.

Importance of Super Keys

  • Data Integrity: Ensuring unique identification of records is crucial for maintaining data integrity within relational databases.

  • Normalization: Understanding super keys aids in the normalization process, which minimizes redundancy and dependency within database tables.

  • Database Design: Identifying super keys is a critical step in designing relational databases, as it helps define relationships between tables.

Candidate Keys in RDBMS

A candidate key is a specific type of key in a relational database that can uniquely identify each record in a table. It is a subset of super keys, characterized by its minimality—meaning it contains no unnecessary attributes. If any attribute is removed from a candidate key, it will no longer have the property of uniqueness.

Key Characteristics of Candidate Keys

  1. Uniqueness: Each candidate key must uniquely identify each row in the table.

  2. Minimality: A candidate key cannot have any extra attributes beyond those necessary for uniqueness. Removing any attribute from a candidate key will result in it no longer uniquely identifying the records.

  3. Multiple Candidates: A table can have multiple candidate keys. For example, both an employee ID and an email address could serve as unique identifiers for employee records.

Example

Consider a table named Students with the following columns:

  • StudentID
  • Email
  • FirstName
  • LastName

In this example, both StudentID and Email can serve as candidate keys because:

  • {StudentID}: Uniquely identifies each student.
  • {Email}: Assuming each student has a unique email address.

Distinction from Other Keys

  • Super Key: A superset of candidate keys that includes any combination of attributes that can uniquely identify a row. For example, {StudentID, FirstName} is a super key but not a candidate key because it contains unnecessary attributes.

  • Primary Key: One of the candidate keys is selected to be the primary key, which is used to uniquely identify records in the table. A primary key must not contain NULL values.

  • Alternate Key: Any candidate key that is not selected as the primary key. For instance, if StudentID is chosen as the primary key, then Email becomes an alternate key.

Importance of Candidate Keys

  1. Data Integrity: Ensuring that records can be uniquely identified prevents duplication and maintains data integrity.

  2. Database Design: Identifying candidate keys is crucial in the database design process, aiding in normalization and establishing relationships between tables.

  3. Query Optimization: Candidate keys can improve the performance of database queries by allowing efficient indexing.

Primary Key in RDBMS

A primary key is a specific type of candidate key in a relational database that is designated to uniquely identify each record in a table. It serves as the main reference point for records and plays a crucial role in maintaining data integrity and establishing relationships between tables.

Key Characteristics of Primary Keys

  1. Uniqueness: Each value of a primary key must be unique across all records in the table, ensuring that no two rows can have the same primary key value.

  2. Non-nullable: A primary key must not contain NULL values. This ensures that every record can be uniquely identified.

  3. Immutable: While not a strict requirement, it is generally advisable that the values of a primary key should not change over time. This stability helps maintain the integrity of relationships with other tables.

  4. Single Primary Key per Table: Each table can have only one primary key, although that primary key may consist of a single attribute or a combination of multiple attributes (composite key).

Example

Consider a table named Employees with the following columns:

  • EmployeeID (Primary Key)
  • FirstName
  • LastName
  • Email

In this example, EmployeeID serves as the primary key because it uniquely identifies each employee.

Composite Primary Key

A primary key can also consist of multiple attributes. This is known as a composite primary key. For example, consider a CourseEnrollments table:

  • StudentID
  • CourseID

In this case, the combination of {StudentID, CourseID} serves as the composite primary key, uniquely identifying each enrollment record.

Distinction from Other Keys

  • Candidate Key: A primary key is selected from the set of candidate keys. Candidate keys are potential unique identifiers, while the primary key is the one chosen for actual use.

  • Foreign Key: A foreign key is an attribute (or set of attributes) in one table that refers to the primary key in another table, establishing a relationship between the two tables.

Importance of Primary Keys

  1. Data Integrity: Primary keys enforce entity integrity by ensuring that each record is unique and identifiable.

  2. Relationship Establishment: Primary keys are essential for defining relationships between tables in a relational database, allowing for effective joins and data retrieval.

  3. Indexing: Most database systems automatically create an index for primary keys, improving query performance and data retrieval speed.

Composite Key in RDBMS

A composite key is a type of primary key that consists of two or more attributes (columns) in a table. Together, these attributes uniquely identify each record in that table. Composite keys are used when a single attribute is not sufficient to ensure uniqueness.

Key Characteristics of Composite Keys

  1. Uniqueness: The combination of the attributes in a composite key must be unique for every record in the table.

  2. Multiple Attributes: Unlike a simple primary key, which consists of a single attribute, a composite key involves two or more attributes. Each attribute in the composite key can potentially contain duplicate values, but their combination must remain unique.

  3. Non-nullable: Like primary keys, the components of a composite key must not contain NULL values, ensuring that every record can be uniquely identified.

Example

Consider a table named CourseEnrollments that tracks which students are enrolled in which courses. The table might include the following columns:

  • StudentID
  • CourseID
  • EnrollmentDate

In this case, neither StudentID nor CourseID alone is sufficient to uniquely identify a record, as a student can enroll in multiple courses. Therefore, the combination of {StudentID, CourseID} serves as the composite key, uniquely identifying each enrollment record.

Usage of Composite Keys

Composite keys are commonly used in scenarios such as:

  1. Many-to-Many Relationships: In relational databases, many-to-many relationships often require junction (or associative) tables, where composite keys are essential to uniquely identify records that connect two entities.

  2. Complex Data Structures: When data relationships are more complex, such as when multiple attributes are required to define a unique record, composite keys become invaluable.

Distinction from Other Keys

  • Primary Key: A composite key is a type of primary key, meaning it enforces the uniqueness and non-nullability requirements of primary keys.

  • Foreign Key: A foreign key can reference a composite key, meaning a foreign key in another table can consist of the same attributes used in the composite key to maintain referential integrity.

Importance of Composite Keys

  1. Data Integrity: They help enforce data integrity by ensuring that unique combinations of attributes exist, preventing duplicate entries.

  2. Normalization: Composite keys facilitate the normalization process by allowing a more flexible design of relational databases.

  3. Efficient Querying: Composite keys can optimize query performance, especially when indexing is applied, as they can leverage multiple attributes for faster data retrieval.

Foreign Keys in RDBMS

A foreign key is an attribute (or a set of attributes) in one table that establishes a link between the data in two tables. It acts as a reference to the primary key in another table, enabling relationships between tables and enforcing referential integrity.

Key Characteristics of Foreign Keys

  1. Reference to Primary Key: A foreign key in one table points to a primary key in another table, establishing a relationship between the two.

  2. Non-uniqueness: Unlike primary keys, foreign keys can contain duplicate values. Multiple records in the child table can refer to the same record in the parent table.

  3. NULL Values: Foreign keys can accept NULL values unless specified otherwise. A NULL foreign key means that the record does not currently reference a record in the parent table.

  4. Enforcement of Referential Integrity: Foreign keys help maintain data integrity by ensuring that the value in the foreign key column must either match a value in the referenced primary key or be NULL.

Example

Consider two tables: Students and CourseEnrollments.

Students Table:

StudentID (PK) FirstName LastName
1 John Doe
2 Jane Smith

CourseEnrollments Table:

EnrollmentID (PK) StudentID (FK) CourseID
1 1 101
2 1 102
3 2 101

In this example:

  • StudentID in the CourseEnrollments table is a foreign key that references the StudentID primary key in the Students table.
  • This relationship indicates which students are enrolled in which courses.

Importance of Foreign Keys

  1. Data Integrity: Foreign keys help maintain data integrity by ensuring that relationships between tables are valid. For instance, you cannot have a course enrollment for a non-existent student.

  2. Establishing Relationships: Foreign keys are essential for establishing relationships between tables, facilitating JOIN operations for retrieving related data.

  3. Cascading Actions: Foreign keys can define actions such as cascading updates or deletes. For example, if a student is deleted from the Students table, related entries in the CourseEnrollments table can also be deleted automatically.

  4. Normalization: Foreign keys support the normalization process by allowing the design of relational databases with minimal redundancy.

Distinction from Other Keys

  • Primary Key: A primary key uniquely identifies records in its own table, whereas a foreign key references a primary key in another table.

  • Composite Key: A composite key can include foreign keys, allowing the relationship to involve multiple attributes.

Short Notes on Keys in RDBMS

1. Super Key

  • Definition: A set of one or more attributes that can uniquely identify each record in a table.
  • Characteristics:
    • Can have redundant attributes.
    • Ensures uniqueness of records.
  • Example: In a Students table, {StudentID}, {Email}, and {StudentID, FirstName} are all super keys.

2. Candidate Key

  • Definition: A minimal super key with no unnecessary attributes.
  • Characteristics:
    • Must be unique and non-nullable.
    • Multiple candidate keys can exist in a table.
  • Example: For the Students table, both {StudentID} and {Email} can be candidate keys.

3. Primary Key

  • Definition: A special candidate key selected to uniquely identify records in a table.
  • Characteristics:
    • Must be unique and non-nullable.
    • Only one primary key per table.
  • Example: In the Students table, {StudentID} is the primary key.

4. Composite Key

  • Definition: A primary key that consists of two or more attributes.
  • Characteristics:
    • Ensures uniqueness through a combination of attributes.
    • Can be used when a single attribute is insufficient for uniqueness.
  • Example: In a CourseEnrollments table, {StudentID, CourseID} serves as a composite key.

5. Foreign Key

  • Definition: An attribute in one table that references the primary key of another table.
  • Characteristics:
    • Can contain duplicate values and NULLs.
    • Enforces referential integrity between related tables.
  • Example: In the CourseEnrollments table, {StudentID} is a foreign key referencing {StudentID} in the Students table.

Example Scenario: University Database

Table 1: Students

StudentID (PK) Email FirstName LastName
1 [email protected] John Doe
2 [email protected] Jane Smith
3 [email protected] Alice Jones

Keys in the Students Table:

  • Primary Key: StudentID (uniquely identifies each student)
  • Candidate Keys: {StudentID}, {Email} (both can uniquely identify students)
  • Super Keys: {StudentID}, {Email}, {StudentID, FirstName}, {StudentID, LastName} (all can uniquely identify records)

Table 2: Courses

CourseID (PK) CourseName
101 Mathematics
102 History
103 Biology

Keys in the Courses Table:

  • Primary Key: CourseID
  • Candidate Keys: {CourseID}
  • Super Keys: {CourseID}, {CourseID, CourseName}

Table 3: CourseEnrollments

EnrollmentID (PK) StudentID (FK) CourseID (FK) EnrollmentDate
1 1 101 2024-01-15
2 1 102 2024-01-16
3 2 101 2024-01-15
4 3 103 2024-01-17

Keys in the CourseEnrollments Table:

  • Primary Key: EnrollmentID
  • Composite Key: {StudentID, CourseID} (This combination ensures uniqueness for enrollments.)
  • Foreign Keys: StudentID (references StudentID in Students table), CourseID (references CourseID in Courses table)

Summary of Key Relationships

  • Primary Key: StudentID in Students table uniquely identifies each student. EnrollmentID in CourseEnrollments uniquely identifies each enrollment record.
  • Candidate Keys: Both StudentID and Email can serve as unique identifiers for students.
  • Composite Key: {StudentID, CourseID} uniquely identifies each enrollment, allowing for multiple enrollments per student.
  • Foreign Keys: StudentID and CourseID in CourseEnrollments create a relationship between students and courses, enforcing referential integrity.