Introduction to Database - sammanthp007/Linux-Kernel-Development GitHub Wiki

Data

Raw facts, or facts that have not yet been processed to reveal their meaning to the end user.

Information

The result of processing raw data to reveal its meaning. Information consists of transformed data and facilitates decision making.

Knowledge

The body of information and facts about a specific subject. Knowledge implies familiarity, awareness, and understanding of information as it applies to an environment. A key characteristic is that new knowledge can be derived from old knowledge.

Data management

A process that focuses on data collection, storage, and retrieval. Common data management functions include addition, deletion, modification, and listing.

Database: A shared, integrated computer structure that houses a collection of related data. A database contains two types of data: end-user data (raw facts) and metadata.

Metadata

Data about data; that is, data about data characteristics and relationships. See also data dictionary.

Database management system (DBMS)

The collection of programs that manages the database structure and controls access to the data stored in the database.

Data inconsistency

A condition in which different versions of the same data yield different (inconsistent) results.

Query

A question or task asked by an end user of a database in the form of SQL code. A specific request for data manipulation issued by the end user or the application to the DBMS.

Ad hoc query

A “spur-of-the-moment” question.

Query result set

The collection of data rows returned by a query.

Roles and Advantages of DBMS

  • Improved data sharing: DBMS helps create an environment in which end users have better access to more and better-managed data. Such access makes it possible for end users to respond quickly to changes in their environment.
  • Improved data security
  • Better data integration: Wider access to well-managed data promotes an integrated view of the organization’s operations and a clearer view of the big picture.
  • Minimized data inconsistency
  • Improved data access: Using ad hoc queries and getting query result set.

data quality

A comprehensive approach to ensuring the accuracy, validity, and timeliness of data.

Types of Database

1. Based on Users

Single User Multi User
A database that supports only one user at a time. A database that supports multiple concurrent users.
Desktop database: A single-user database that runs on a personal computer. - Workgroup database: A multiuser database that usually supports fewer than 50 users or is used for a specific department in an organization. - Enterprise database: The overall company data representation, which provides support for present and expected future needs.

2. Based on location

Centralized Distributed [Cloud]
A database located at a single site. A logically related database that is stored in two or more physically independent sites. A database that is created and maintained using cloud services, such as Microsoft Azure or Amazon AWS.

3. Based on type of Data stored

General-purpose Discipline-specific
A database that contains a wide variety of data used in multiple disciplines. A database that contains data focused on specific subject areas.

4. Based on how they will be used and on the time sensitivity of the information gathered from them (Most Popular)

Operational Analytical
A database that is designed primarily to support a company’s day-to-day operations is classified as an operational database. A database focused primarily on storing historical data and business metrics used for tactical or strategic decision making.
Also called online transaction processing (OLTP) database, transactional database, or production database Comprise two main components: a data warehouse and an online analytical processing front end. Data warehouse is a specialized database that stores historical and aggregated data in a format optimized for decision support. Online analytical processing (OLAP) is a set of tools that provide advanced data analysis for retrieving, processing, and modeling data from the data warehouse.
For example, transactions such as product or service sales, payments, and supply purchases reflect critical day-to-day operations. Such analysis typically requires extensive “data massaging” (data manipulation) to produce information on which to base pricing decisions, sales forecasts, market strategies, and so on. Analytical databases allow the end user to perform advanced analysis of business data using sophisticated tools.

Business intelligence

A set of tools and processes used to capture, collect, integrate, store, and analyze data to support business decision making.

5. Based on how the degree to which they are structured

Unstructured Structured Semi structured
Data that exists in its original, raw state; that is, in the format in which it was collected. Data that has been formatted to facilitate storage, use, and information generation. Data that has already been processed to some extent.
Unstructured and semistructured data storage and management needs are being addressed through a new generation of databases known as XML databases. An XML database supports the storage and management of semistructured XML data.

Importance of Database

  1. Remove redundancy
  2. Select queries
  3. Update data
  4. Calculation

Shortcomings of File System Data Processing

  • Lengthy development times
  • Difficulty of getting quick answers
  • Complex system administration : Even a simple file system with a few files requires creating and maintaining several file management programs
  • Lack of security and limited data sharing: Sharing data among multiple geographically dispersed users introduces a lot of security risks. In terms of creating data management and reporting programs, security and data-sharing features are difficult to program and consequently are often omitted from a file system environment
  • Extensive programming
  • Structural and Data dependence: Terminologies:
    1. Structural dependence: A data characteristic in which a change in the database schema affects data access, thus requiring changes in all access programs.
    2. Structural independence: A data characteristic in which changes in the database schema do not affect data access.
    3. Data dependence: A data condition in which data representation and manipulation are dependent on the physical data storage characteristics
    4. Data independence: A condition in which data access is unaffected by changes in the physical data storage characteristics.
    5. Logical data format: The way a person views data within the context of a problem domain.
    6. Physical data format: The way a computer “sees” (stores) data.
  • Data Redundancy: Sets stage for
    1. Poor data security
    2. Data entry-error
    3. Data inconsistency
    4. Data integrity
  • Data Anomaly: Reasons
    1. Updating
    2. Deleting
    3. Inserting

Database

Five components of Database System:

  1. Hardware
  2. Software
  3. People a. System Administrators b. Database Administrators c. Database Designer d. System Analysts and Programmers: They design and create the data-entry screens, reports, and procedures through which end users access and manipulate the database’s data. e. End user

Functions of DBMS:

  1. Data dictionary management: DBMS stores definitions of the data elements and their relationships (metadata) in a data dictionary. Data dictionary -> component of DBMS that stores metadata.
  2. Data storage management: The DBMS creates and manages the complex structures required for data storage, thus relieving you from the difficult task of defining and programming the physical data characteristics. a. Performance Tuning: Activities that make a database perform more efficiently in terms of storage and access speed.
  3. Data transformation and presentation: Logical to Physical data format
  4. Security management
  5. Multiuser access control
  6. Backup and recovery management
  7. Data integrity management
  8. Database access languages and application programming interfaces: The DBMS provides data access through a query language
  9. Database communication interfaces: A current-generation DBMS accepts end-user requests via multiple, different network environments
Why a Spreadsheet Is Not a Database
While a spreadsheet allows for the manipulation of data in a tabular format, 
it does not support even the most basic database functionality such as support 
for self-documentation through metadata, enforcement of data types or domains 
to ensure consistency of data within a col- umn, de ned relationships among tables, 
or constraints to ensure consistency of data across related tables. Most users lack 
the necessary training to recognize the limitations of spread- sheets for these 
types of tasks.