SQL Indexing, DataPages - Yash-777/MyWorld GitHub Wiki

To check the MySQL version, you can use any of the following methods:

SELECT VERSION(); -- 8.0.33
🛠️ Table Schema and Collation Comparison

This page compares two table definitions using different character sets (utf8mb3 vs utf8mb4) and explores their structure and metadata using MySQL diagnostic commands.

  • 📐 Table Definitions and Character Set Differences
  • 📋 Table Structure: DESCRIBE 'your_table'
  • 📊 Table Metadata: SHOW TABLE STATUS

📐 Table Definitions and Character Set Differences

-- Table using utf8mb4
DROP TABLE IF EXISTS Employees;

CREATE TABLE Employees (
    Id INT AUTO_INCREMENT PRIMARY KEY,
    Name VARCHAR(50),
    Email VARCHAR(50),
    Department VARCHAR(50) -- active BOOLEAN,
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

-- Table using utf8 (mapped to utf8mb3 internally)
DROP TABLE IF EXISTS Employe;

CREATE TABLE Employe (
    Id INT AUTO_INCREMENT,
    Name VARCHAR(50),
    Email VARCHAR(50),
    Department VARCHAR(50),
	PRIMARY KEY (`Id`)
	-- ,KEY `Employees_Id_INDEX` (`Id`) -- Duplicate index => ALTER TABLE Employees ADD CONSTRAINT Employees_Id_INDEX (Id);
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_general_ci;

📋 DESCRIBE Employees;Structure of Table

DESCRIBE Employees;

+-------------+--------------+------+-----+---------+----------------+
| Field       | Type         | Null | Key | Default | Extra          |
+-------------+--------------+------+-----+---------+----------------+
| Id          | int          | NO   | PRI | NULL    | auto_increment |
| Name        | varchar(50)  | YES  |     | NULL    |                |
| Email       | varchar(50)  | YES  |     | NULL    |                |
| Department  | varchar(50)  | YES  |     | NULL    |                |
+-------------+--------------+------+-----+---------+----------------+

📊 Table Metadata: SHOW TABLE STATUS LIKE 'Employees';Table Info Summary

SHOW TABLE STATUS LIKE 'Employees';

+-----------+--------+---------+-------------+----------------------+--------------------+
| Name      | Engine | Rows    | Avg_row_len | Data_length (bytes)  | Collation          |
+-----------+--------+---------+-------------+----------------------+--------------------+
| employees | InnoDB | 0       | 0           | 16384                | utf8mb3_general_ci |
+-----------+--------+---------+-------------+----------------------+--------------------+

📊 Combined Table Metadata from INFORMATION_SCHEMA.TABLES - SHOW TABLE STATUS for multiple tables at once in a single result.

SELECT 
    T.TABLE_NAME, T.ENGINE, T.ROW_FORMAT, T.TABLE_ROWS, T.AVG_ROW_LENGTH, T.DATA_LENGTH,
    T.INDEX_LENGTH, T.DATA_FREE, T.AUTO_INCREMENT, T.CREATE_TIME,  
    T.TABLE_COLLATION, CCSA.CHARACTER_SET_NAME AS TABLE_CHARSET, T.TABLE_COMMENT
FROM INFORMATION_SCHEMA.TABLES T
LEFT JOIN INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY CCSA
	   ON T.TABLE_COLLATION = CCSA.COLLATION_NAME
WHERE T.TABLE_SCHEMA = DATABASE()
  AND T.TABLE_NAME IN ('Employees', 'Employe');

-- result
+-----------+--------+---------+-------------+----------------------+--------------------+----------------+
| Name      | Engine | Rows    | Avg_row_len | Data_length (bytes)  | Collation          | TABLE_CHARSET  |
+-----------+--------+---------+-------------+----------------------+--------------------+----------------+
| employe   | InnoDB | 0       | 0           | 16384                | utf8mb3_general_ci | utf8mb3        |
+-----------+--------+---------+-------------+----------------------+--------------------+----------------+
| employees | InnoDB | 0       | 0           | 16384                | utf8mb4_0900_ai_ci | utf8mb4        |
+-----------+--------+---------+-------------+----------------------+--------------------+----------------+

-- employees after inserting 2000000 records.

| TABLE_NAME | ENGINE | ROW_FORMAT  | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH | INDEX_LENGTH | DATA_FREE | AUTO_INCREMENT | CREATE_TIME         | TABLE_COLLATION      | TABLE_CHARSET  |
| -----------| ------ | ----------- | ---------- | -------------- | ----------- | ------------ | --------- | -------------- | ------------------- | -------------------- | -------------- |
| employe    | InnoDB | Dynamic     | 0          | 0              | 16384       | 0            | 0         | 1              | 2025-08-07 13:59:45 | utf8mb3_general_ci   | utf8mb3        |
| employees  | InnoDB | Dynamic     | 199384     | 76             | 15220736    | 0            | 16777216  | 200001         | 2025-08-07 22:51:51 | utf8mb4_0900_ai_ci   | utf8mb4        |
🗂️ MySQL Indexing Summary : Primary Key, Unique Key, and GEN_CLUST_INDEX in MySQL (InnoDB)

InnoDB requires a clustered index for every table, using either a PRIMARY KEY, a UNIQUE NOT NULL key, or an auto-generated GEN_CLUST_INDEX. Defining a PRIMARY KEY or UNIQUE NOT NULL key is recommended for better control and performance.

  • 🔑 Primary Key (Clustered Index): A PRIMARY KEY uniquely identifies each row in a table and cannot contain NULL values. Default name: PRIMARY
    CREATE TABLE employees (
      id INT NOT NULL AUTO_INCREMENT,
      full_name VARCHAR(50),
      PRIMARY KEY (id) // → Clustered index on `id` → The Primary Key becomes the clustered index.
    ) ENGINE=InnoDB;
  • 🧭 Unique not-null index: A UNIQUE KEY ensures all values in the specified column(s) are unique but allows NULL values (unlike PRIMARY KEY).

    Allows NULL unless specified as NOT NULL

    CREATE TABLE users (
      user_id INT NOT NULL, // No Primary Key, but a UNIQUE NOT NULL index exists
      email VARCHAR(100),
      full_name VARCHAR(50),
      UNIQUE KEY (email) // → InnoDB will use that index as the clustered index.
    ) ENGINE=InnoDB;
  • 🕵️‍♂️ Generated Clustered Index: No PRIMARY KEY or UNIQUE NOT NULL index → means InnoDB will internally create GEN_CLUST_INDEX. This is a 6-byte internal row ID and not visible through SHOW INDEX.
    CREATE TABLE logs (
      event_id INT NOT NULL,
      event_message VARCHAR(255)
    ) ENGINE=InnoDB;

    ⚙️ Is It Possible to See GEN_CLUST_INDEX? ⚠️ Unfortunately, you can't see GEN_CLUST_INDEX via SHOW INDEX or SHOW CREATE TABLE SQL, but its presence is guaranteed when no explicit index is defined.

    ⚠️ Best Practice: Always define a PRIMARY KEY or UNIQUE NOT NULL to avoid relying on internal indexes.

for more info check grok.com chat


📐 Table Definitions and Character Set Differences

/*
Check if a Table Has a Clustered Index, If not InnoDB uses a hidden clustered index (GEN_CLUST_INDEX).
 * If you see PRIMARY KEY (column_name), the table has a clustered index on that column (e.g., Id).
 * If no PRIMARY KEY exists, look for a UNIQUE index on non-nullable columns.
 * If neither is present, the table uses a hidden clustered index (not visible in the output).
⚙️ Is It Possible to See GEN_CLUST_INDEX?
Unfortunately, you can't see GEN_CLUST_INDEX via SHOW INDEX or SHOW CREATE TABLE.

⚠️ You cannot view GEN_CLUST_INDEX directly via SQL, but its presence is guaranteed when no explicit index is defined.
*/
DROP TABLE IF EXISTS NoPrimaryIndex;

CREATE TABLE NoPrimaryIndex ( id INT UNIQUE NOT NULL, Name VARCHAR(50) ) ENGINE=InnoDB;
SHOW INDEX FROM NoPrimaryIndex;

SHOW CREATE TABLE Employees;
/*
CREATE TABLE `NoPrimaryIndex` (
   `id` int NOT NULL,
   `Name` varchar(50) DEFAULT NULL
 ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
*/

INSERT INTO NoPrimaryIndex (id, Name) VALUES (1, 'Alice'), (2, 'Bob'), (3, 'Charlie');

-- Check inserted row count
SELECT COUNT(*) FROM NoPrimaryIndex;
select * from NoPrimaryIndex;
select * from NoPrimaryIndex order by 1 desc;

EXPLAIN ANALYZE SELECT * FROM NoPrimaryIndex WHERE id = 2;
EXPLAIN ANALYZE SELECT * FROM NoPrimaryIndex WHERE Name = 'ABC 99990';
/*
-> Filter: (noprimaryindex.id = 2)  (cost=0.55 rows=1) (actual time=0.0612..0.0826 rows=1 loops=1)
     -> Table scan on NoPrimaryIndex  (cost=0.55 rows=3) (actual time=0.0506..0.0765 rows=3 loops=1)
  
After Inserting total of 100000 (from 4 to 100000)
-> Filter: (noprimaryindex.id = 255)  (cost=10091 rows=10018) (actual time=0.345..109 rows=1 loops=1)
     -> Table scan on NoPrimaryIndex  (cost=10091 rows=100183) (actual time=0.0542..100 rows=100000 loops=1)
     
-> Filter: (noprimaryindex.`Name` = 'ABC 99990')  (cost=10091 rows=10018) (actual time=143..143 rows=1 loops=1)
     -> Table scan on NoPrimaryIndex  (cost=10091 rows=100183) (actual time=0.0466..117 rows=100000 loops=1)
*/
CREATE INDEX NoPrimaryIndex_name_idx ON NoPrimaryIndex(name);

EXPLAIN ANALYZE SELECT * FROM NoPrimaryIndex WHERE Name = 'ABC 990';
/*
-> Index lookup on NoPrimaryIndex using NoPrimaryIndex_name_idx (Name='ABC 99990')  (cost=0.35 rows=1) (actual time=0.0368..0.0407 rows=1 loops=1)

-> Index lookup on NoPrimaryIndex using NoPrimaryIndex_name_idx (Name='ABC 990')  (cost=0.35 rows=1) (actual time=0.0487..0.0546 rows=1 loops=1)
...
-> Index lookup on NoPrimaryIndex using NoPrimaryIndex_name_idx (Name='ABC 990')  (cost=0.35 rows=1) (actual time=0.0292..0.0337 rows=1 loops=1)
 */

-- Add Primary Key to a Single Column, ⚠️ If Column Is Not NOT NULL, You Must Modify It First
ALTER TABLE NoPrimaryIndex ADD PRIMARY KEY (id);

EXPLAIN ANALYZE SELECT * FROM NoPrimaryIndex WHERE Name = 'ABC 990';
/*
-> Covering index lookup on NoPrimaryIndex using NoPrimaryIndex_name_idx (Name='ABC 990')  (cost=1.1 rows=1) (actual time=0.0305..0.0348 rows=1 loops=1)
 
-> Covering index lookup on NoPrimaryIndex using NoPrimaryIndex_name_idx (Name='ABC 990')  (cost=1.1 rows=1) (actual time=0.0178..0.0213 rows=1 loops=1)
*/


-- Inserting total of 100000 (from 4 to 100000)
DROP PROCEDURE IF EXISTS InsertNoPrimaryIndex;
DELIMITER $$

CREATE PROCEDURE InsertNoPrimaryIndex()
BEGIN
    DECLARE counter INT DEFAULT 4;
    DECLARE batchSize INT DEFAULT 10000; -- Insert 10,000 rows per batch
    DECLARE totalRows INT DEFAULT 100000;

    WHILE counter <= totalRows DO
        -- Inner loop to insert a batch
        BEGIN
            DECLARE i INT DEFAULT 0;
            WHILE i < batchSize AND counter <= totalRows DO
                INSERT INTO NoPrimaryIndex (id, Name)
                VALUES (
					counter,
                    CONCAT('ABC ', counter)
                );
                SET i = i + 1;
                SET counter = counter + 1;
            END WHILE;
        END;

        -- Optional progress output
        SELECT CONCAT(counter - 1, ' rows inserted') AS progress;

        -- Sleep for a short time to avoid timeout (adjust as needed)
        DO SLEEP(0.5);  -- 0.5 seconds
    END WHILE;
END$$

DELIMITER ;

-- Call the procedure
CALL InsertNoPrimaryIndex();

-- Optional: Drop the procedure after running
DROP PROCEDURE InsertNoPrimaryIndex;
MySQL SQL script that estimates the allocated size of each field in a table and computes the total estimated row size.
CREATE TABLE Employees (
    Id INT AUTO_INCREMENT PRIMARY KEY,
    Name VARCHAR(50),
    Email VARCHAR(50),
    Department VARCHAR(50) -- active BOOLEAN,
);
🛠️ Bulk Data Insertion in MySQL Using Batching and Delay (Insert Up to 200,000 Rows)

used stored procedure to:

  • Insert in smaller batches (e.g. 10,000 rows at a time).
  • Add a short delay (SLEEP) between each batch.
  • Ensure the connection doesn’t time out. Error Code: 2013. Lost connection to MySQL server during query
DROP TABLE IF EXISTS Employees;

-- Create the Employees table
CREATE TABLE Employees (
    Id INT AUTO_INCREMENT PRIMARY KEY,
    Name VARCHAR(50),
    Email VARCHAR(50),
    Department VARCHAR(50)
);

DROP PROCEDURE IF EXISTS InsertEmployees;
DELIMITER $$

CREATE PROCEDURE InsertEmployees()
BEGIN
    DECLARE counter INT DEFAULT 1;
    DECLARE batchSize INT DEFAULT 10000; -- Insert 10,000 rows per batch
    DECLARE totalRows INT DEFAULT 200000;

    WHILE counter <= totalRows DO
        -- Inner loop to insert a batch
        BEGIN
            DECLARE i INT DEFAULT 0;
            WHILE i < batchSize AND counter <= totalRows DO
                INSERT INTO Employees (Name, Email, Department)
                VALUES (
                    CONCAT('ABC ', counter),
                    CONCAT('abc', counter, '@pragimtech.com'),
                    CONCAT('Dept ', counter)
                );
                SET i = i + 1;
                SET counter = counter + 1;
            END WHILE;
        END;

        -- Optional progress output
        SELECT CONCAT(counter - 1, ' rows inserted') AS progress;

        -- Sleep for a short time to avoid timeout (adjust as needed)
        DO SLEEP(0.5);  -- 0.5 seconds
    END WHILE;
END$$

DELIMITER ;

-- Call the procedure
CALL InsertEmployees();

-- Optional: Drop the procedure after running
DROP PROCEDURE InsertEmployees;

-- Check inserted row count
SELECT COUNT(*) FROM Employees;

To calculate the allocated size of each field (column) and the total size of a table row (and overall table),

📋 DESCRIBE Employees;Structure of Table

DESCRIBE Employees;

+-------------+--------------+------+-----+---------+----------------+
| Field       | Type         | Null | Key | Default | Extra          |
+-------------+--------------+------+-----+---------+----------------+
| Id          | int          | NO   | PRI | NULL    | auto_increment |
| Name        | varchar(50)  | YES  | MUL | NULL    |                |
| Email       | varchar(50)  | YES  |     | NULL    |                |
| Department  | varchar(50)  | YES  | MUL | NULL    |                |
+-------------+--------------+------+-----+---------+----------------+

📊 SHOW TABLE STATUS LIKE 'Employees';Table Info Summary

SHOW TABLE STATUS LIKE 'Employees';

+-----------+--------+---------+-------------+----------------------+--------------------+
| Name      | Engine | Rows    | Avg_row_len | Data_length (bytes)  | Collation          |
+-----------+--------+---------+-------------+----------------------+--------------------+
| employees | InnoDB | 199,384 | 76          | 15,220,736           | utf8mb4_0900_ai_ci |
+-----------+--------+---------+-------------+----------------------+--------------------+

📐 Estimated Field-wise Size Calculation (Using utf8mb4)

Assume:

  • int = 4 bytes
  • varchar(N) with utf8mb3 = N × 3 + 1 bytes / utf8mb4 = N × 4 + 1 bytes (length prefix)
🛠️ 📋 Estimated Column Size Table (Based on utf8 → utf8mb3, 3 bytes/char)

🛑 Warning Recap from MySQL:

  • ✅ 'utf8' is currently an alias for UTF8MB3
  • 🟨 It will become an alias for UTF8MB4 in future releases
  • ⚠️ Collation utf8mb3_general_ci is deprecated

On Using Charset as UTF8 leads to warning

 warning(s): 3719 'utf8' is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in a future release.
Please consider using UTF8MB4 in order to be unambiguous. 
3778 'utf8mb3_general_ci' is a collation of the deprecated character set UTF8MB3. 
Please consider using UTF8MB4 with an appropriate collation instead.

CREATE TABLE Employe (
    Id INT AUTO_INCREMENT,
    Name VARCHAR(50),
    Email VARCHAR(50),
    Department VARCHAR(50),
	PRIMARY KEY (`Id`)
	-- ,KEY `Employees_Id_INDEX` (`Id`) -- Duplicate index => ALTER TABLE Employees ADD CONSTRAINT Employees_Id_INDEX (Id);
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_general_ci;

📋 DESCRIBE Employees;Structure of Table

DESCRIBE Employees;

+-------------+--------------+------+-----+---------+----------------+
| Field       | Type         | Null | Key | Default | Extra          |
+-------------+--------------+------+-----+---------+----------------+
| Id          | int          | NO   | PRI | NULL    | auto_increment |
| Name        | varchar(50)  | YES  |     | NULL    |                |
| Email       | varchar(50)  | YES  |     | NULL    |                |
| Department  | varchar(50)  | YES  |     | NULL    |                |
+-------------+--------------+------+-----+---------+----------------+

📊 SHOW TABLE STATUS LIKE 'Employees';Table Info Summary

SHOW TABLE STATUS LIKE 'Employees';

+-----------+--------+---------+-------------+----------------------+--------------------+
| Name      | Engine | Rows    | Avg_row_len | Data_length (bytes)  | Collation          |
+-----------+--------+---------+-------------+----------------------+--------------------+
| employees | InnoDB | 0       | 0           | 16384                | utf8mb3_general_ci |
+-----------+--------+---------+-------------+----------------------+--------------------+

On execution SQL Script Field-wise Size Calculation

-- Set current schema and table
SET @schema_name = DATABASE(); -- use current database
SET @table_name = 'Employe';   -- your table name

+------------------+-------------+-----------+---------------------+----------------------------+
| ORDINAL_POSITION | COLUMN_NAME | DATA_TYPE | CHARACTER_SET_NAME  | estimated_column_size_bytes |
+------------------+-------------+-----------+---------------------+----------------------------+
| 1                | Id          | int       | NULL                | 4                          |
| 2                | Name        | varchar   | utf8mb3             | 151                        |
| 3                | Email       | varchar   | utf8mb3             | 151                        |
| 4                | Department  | varchar   | utf8mb3             | 151                        |
+------------------+-------------+-----------+---------------------+----------------------------+
🛠️ 📄 SQL Script Field-wise Size Calculation - Charset utf8mb3 and utf8mb4
  1. Set current schema and table
SET @schema_name = DATABASE(); -- use current database
SET @table_name = 'Employe';   -- your table name
  1. Get the character set of the table into a variable
SELECT 
    CCSA.CHARACTER_SET_NAME 
INTO @table_charset
FROM INFORMATION_SCHEMA.TABLES T
JOIN INFORMATION_SCHEMA.COLLATION_CHARACTER_SET_APPLICABILITY CCSA
    ON T.TABLE_COLLATION = CCSA.COLLATION_NAME
WHERE T.TABLE_SCHEMA = @schema_name
  AND T.TABLE_NAME = @table_name;

-- 3. Column size estimation
SELECT 
    ORDINAL_POSITION,
    COLUMN_NAME,
    DATA_TYPE,
    CHARACTER_SET_NAME,

    CASE 
        -- Fixed-size types
        WHEN DATA_TYPE IN ('tinyint') THEN 1
        WHEN DATA_TYPE IN ('smallint') THEN 2
        WHEN DATA_TYPE IN ('mediumint') THEN 3
        WHEN DATA_TYPE IN ('int', 'integer') THEN 4
        WHEN DATA_TYPE IN ('bigint') THEN 8
        WHEN DATA_TYPE IN ('float') THEN 4
        WHEN DATA_TYPE IN ('double', 'real') THEN 8
        WHEN DATA_TYPE IN ('date') THEN 3
        WHEN DATA_TYPE IN ('time') THEN 3
        WHEN DATA_TYPE IN ('year') THEN 1
        WHEN DATA_TYPE IN ('datetime') THEN 8
        WHEN DATA_TYPE IN ('timestamp') THEN 4
        WHEN DATA_TYPE IN ('bit') THEN CEIL(CHARACTER_MAXIMUM_LENGTH / 8)

        -- Decimal types
        WHEN DATA_TYPE IN ('decimal', 'numeric') THEN 
            CEIL((NUMERIC_PRECISION + 2) / 2)

        -- VARCHAR and CHAR with charset fallback
        WHEN DATA_TYPE IN ('varchar', 'char') THEN 
            CHARACTER_MAXIMUM_LENGTH *
            CASE 
                WHEN CHARACTER_SET_NAME IS NOT NULL THEN
                    CASE CHARACTER_SET_NAME
                        WHEN 'utf8mb3' THEN 3
                        WHEN 'utf8mb4' THEN 4
                        ELSE 1
                    END
                ELSE
                    CASE 
                        WHEN @table_charset IN ('utf8', 'utf8mb3') THEN 3
                        WHEN @table_charset = 'utf8mb4' THEN 4
                        ELSE 1
                    END
            END +
            CASE WHEN DATA_TYPE = 'varchar' THEN 1 ELSE 0 END

        -- TEXT and BLOB types
        WHEN DATA_TYPE IN ('tinytext', 'tinyblob') THEN 255
        WHEN DATA_TYPE IN ('text', 'blob') THEN 65535
        WHEN DATA_TYPE IN ('mediumtext', 'mediumblob') THEN 16777215
        WHEN DATA_TYPE IN ('longtext', 'longblob') THEN 4294967295

        ELSE 0
    END AS estimated_column_size_bytes

FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = @schema_name
  AND TABLE_NAME = @table_name
ORDER BY ORDINAL_POSITION;

output:

+------------------+-------------+-----------+---------------------+----------------------------+
| ORDINAL_POSITION | COLUMN_NAME | DATA_TYPE | CHARACTER_SET_NAME  | estimated_column_size_bytes |
+------------------+-------------+-----------+---------------------+----------------------------+
| 1                | Id          | int       | NULL                | 4                          |
| 2                | Name        | varchar   | utf8mb4             | 201                        |
| 3                | Email       | varchar   | utf8mb4             | 201                        |
| 4                | Department  | varchar   | utf8mb4             | 201                        |
+------------------+-------------+-----------+---------------------+----------------------------+

✅ Get Total Estimated Row Size - To get the sum of all column sizes:

SELECT 
    SUM(
        CASE 
            WHEN DATA_TYPE IN ('tinyint') THEN 1
            WHEN DATA_TYPE IN ('smallint') THEN 2
            WHEN DATA_TYPE IN ('mediumint') THEN 3
            WHEN DATA_TYPE IN ('int', 'integer') THEN 4
            WHEN DATA_TYPE IN ('bigint') THEN 8
            WHEN DATA_TYPE IN ('float') THEN 4
            WHEN DATA_TYPE IN ('double', 'real') THEN 8
            WHEN DATA_TYPE IN ('date') THEN 3
            WHEN DATA_TYPE IN ('time') THEN 3
            WHEN DATA_TYPE IN ('year') THEN 1
            WHEN DATA_TYPE IN ('datetime') THEN 8
            WHEN DATA_TYPE IN ('timestamp') THEN 4
            WHEN DATA_TYPE IN ('bit') THEN CEIL(CHARACTER_MAXIMUM_LENGTH / 8)
            WHEN DATA_TYPE IN ('decimal', 'numeric') THEN CEIL((NUMERIC_PRECISION + 2) / 2)
            WHEN DATA_TYPE IN ('varchar') THEN CHARACTER_MAXIMUM_LENGTH * 
                CASE CHARACTER_SET_NAME 
                    WHEN 'utf8' THEN 3
                    WHEN 'utf8mb4' THEN 4
                    ELSE 1
                END + 1
            WHEN DATA_TYPE IN ('char') THEN CHARACTER_MAXIMUM_LENGTH * 
                CASE CHARACTER_SET_NAME 
                    WHEN 'utf8' THEN 3
                    WHEN 'utf8mb4' THEN 4
                    ELSE 1
                END
            WHEN DATA_TYPE IN ('tinytext') THEN 255
            WHEN DATA_TYPE IN ('text') THEN 65535
            WHEN DATA_TYPE IN ('mediumtext') THEN 16777215
            WHEN DATA_TYPE IN ('longtext') THEN 4294967295
            WHEN DATA_TYPE IN ('tinyblob') THEN 255
            WHEN DATA_TYPE IN ('blob') THEN 65535
            WHEN DATA_TYPE IN ('mediumblob') THEN 16777215
            WHEN DATA_TYPE IN ('longblob') THEN 4294967295
            ELSE 0
        END
    ) AS total_estimated_row_size_bytes
FROM 
    INFORMATION_SCHEMA.COLUMNS
WHERE 
    TABLE_SCHEMA = @schema_name
    AND TABLE_NAME = @table_name;

output:

total_estimated_row_size_bytes
------------------------------
607
+------------+---------------+------------------------------+----------------+
| Column     | Type          | Size Calculation             | Approx. Size   |
+------------+---------------+------------------------------+----------------+
| Id         | int           | Fixed size                   | 4 bytes        |
| Name       | varchar(50)   | 50 × 4 + 1 = 201             | 201 bytes      |
| Email      | varchar(50)   | 50 × 4 + 1 = 201             | 201 bytes      |
| Department | varchar(50)   | 50 × 4 + 1 = 201             | 201 bytes      |
+------------+---------------+------------------------------+----------------+
|            |               |              TOTAL           | 607 bytes max  |
+------------+---------------+------------------------------+----------------+

🧮 Quick Comparison:

Metric Value
Max Theoretical Row 607 bytes
Avg Row Length (actual) 76 bytes
Total Rows 199,384
Data Length (total) 15,220,736 bytes (~15MB)

🔎 The large difference between 607 bytes theoretical max and 76 bytes average implies that:

  • Most string fields are not filled up to their max (50 characters)
  • Data is sparse or strings are typically shorter

📌 Short Description: The query retrieves column metadata (name, type, and size details) for the table named Employees from the INFORMATION_SCHEMA.COLUMNS view. It shows maximum lengths for string types and precision/scale for numeric types.

SELECT 
    ORDINAL_POSITION AS position,
    COLUMN_NAME AS column_name,
    DATA_TYPE AS data_type,
    CHARACTER_MAXIMUM_LENGTH AS char_max_length,
    NUMERIC_PRECISION AS numeric_precision,
    NUMERIC_SCALE AS numeric_scale,
    IS_NULLABLE AS is_nullable,
    COLUMN_KEY AS key_type,
    COLUMN_DEFAULT AS default_value,
    EXTRA AS extra_attributes
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = DATABASE()  -- ensures current DB context
  AND TABLE_NAME = 'Employees'
ORDER BY ORDINAL_POSITION;

output CSV data nicely formatted as a tabular plain-text table:

+----------+-------------+------------+-------------------+-------------------+----------------+-------------+-----------+---------------+--------------------+
| Position | Column Name | Data Type  | Char Max Length   | Numeric Precision | Numeric Scale  | Is Nullable | Key Type  | Default Value | Extra Attributes   |
+----------+-------------+------------+-------------------+-------------------+----------------+-------------+-----------+---------------+--------------------+
|    1     | Id          | int        | NULL              | 10                | 0              | NO          | PRI       | NULL          | auto_increment     |
|    2     | Name        | varchar    | 50                | NULL              | NULL           | YES         | MUL       | NULL          |                    |
|    3     | Email       | varchar    | 50                | NULL              | NULL           | YES         |           | NULL          |                    |
|    4     | Department  | varchar    | 50                | NULL              | NULL           | YES         | MUL       | NULL          |                    |
+----------+-------------+------------+-------------------+-------------------+----------------+-------------+-----------+---------------+--------------------+

❓ Why is key_type (COLUMN_KEY) empty for the Email column?

The COLUMN_KEY field in INFORMATION_SCHEMA.COLUMNS shows whether a column is part of a key/index:

Value Meaning
PRI Primary key
UNI Unique key
MUL Non-unique index (multiple duplicates)
(empty) No index/key on this column

The key_type is empty, which means: There is no index (primary, unique, or normal) defined on the Email column.

🛠️ If You Expect It to Be Indexed:

  • You might have forgotten to add an index.
  • Or you intended Email to be unique, but forgot to enforce it.

How is data stored in sql database
image
Data is stored in series of data-pages in a Tree-Like structure (Root, Leaf Nodes) (B-Tree) (Clustered Index Structure)
image

In MySQL, there is no concept of clustered vs nonclustered indexes like in SQL Server. However, MySQL supports:

  • Primary key = clustered index (in InnoDB)
  • Secondary index = nonclustered-like

So, to create an index on the Name column in MySQL (equivalent to a nonclustered index in SQL Server), you simply use:

CREATE INDEX IDX_Employees_Name ON Employees(Name);

📌 Additional Notes:

  • In MySQL's InnoDB engine, the primary key is always the clustered index.
  • Any secondary index (like above) stores a copy of the indexed column plus a reference to the primary key, which is used to look up the full row — similar to a nonclustered index with row locator in SQL Server.
  • You can also use UNIQUE INDEX if you want to enforce uniqueness:
CREATE UNIQUE INDEX IDX_Employees_Email ON Employees(Email);
🛠️ Bulk Data Insertion in MySQL Using Batching and Delay (Insert Up to 200,000 Rows)

used stored procedure to:

  • Insert in smaller batches (e.g. 10,000 rows at a time).
  • Add a short delay (SLEEP) between each batch.
  • Ensure the connection doesn’t time out. Error Code: 2013. Lost connection to MySQL server during query
DROP TABLE IF EXISTS Employees;

-- Create the Employees table
CREATE TABLE Employees (
    Id INT AUTO_INCREMENT PRIMARY KEY,
    Name VARCHAR(50),
    Email VARCHAR(50),
    Department VARCHAR(50)
);

DROP PROCEDURE IF EXISTS InsertEmployees;
DELIMITER $$

CREATE PROCEDURE InsertEmployees()
BEGIN
    DECLARE counter INT DEFAULT 1;
    DECLARE batchSize INT DEFAULT 10000; -- Insert 10,000 rows per batch
    DECLARE totalRows INT DEFAULT 200000;

    WHILE counter <= totalRows DO
        -- Inner loop to insert a batch
        BEGIN
            DECLARE i INT DEFAULT 0;
            WHILE i < batchSize AND counter <= totalRows DO
                INSERT INTO Employees (Name, Email, Department)
                VALUES (
                    CONCAT('ABC ', counter),
                    CONCAT('abc', counter, '@pragimtech.com'),
                    CONCAT('Dept ', counter)
                );
                SET i = i + 1;
                SET counter = counter + 1;
            END WHILE;
        END;

        -- Optional progress output
        SELECT CONCAT(counter - 1, ' rows inserted') AS progress;

        -- Sleep for a short time to avoid timeout (adjust as needed)
        DO SLEEP(0.5);  -- 0.5 seconds
    END WHILE;
END$$

DELIMITER ;

-- Call the procedure
CALL InsertEmployees();

-- Optional: Drop the procedure after running
DROP PROCEDURE InsertEmployees;

-- Check inserted row count
SELECT COUNT(*) FROM Employees;

Example: Employees table with 200000 records

select * from Employees;
select count(*) from Employees; -- 200000

Describe Employees;
SHOW INDEXES FROM Employees;

ANALYZE TABLE Employees;
EXPLAIN SELECT * from Employees;
Employees Table with Total of Records/Rows: 200000
EXPLAIN ANALYZE SELECT * from Employees where name = "ABC 7";
Before Index - Table Scan
-> Filter: (employees.`Name` = 'ABC 7')  (cost=20172 rows=19940) (actual time=0.111..150 rows=1 loops=1)
     -> Table scan on Employees  (cost=20172 rows=199396) (actual time=0.103..109 rows=200000 loops=1)
After Index - Index lookup
CREATE INDEX IDX_Employees_Name ON Employees(Name); -- Secondary index = nonclustered-like
-> Index lookup on Employees using IDX_Employees_Name (Name='ABC 7')  (cost=0.35 rows=1) (actual time=0.0453..0.0487 rows=1 loops=1)
On Duplicate Index - Index lookup
CREATE INDEX IDX_Employees_Name_2 ON Employees(Name);
MySQL Warning Code 1831 indicates the creation of a duplicate index.
⚠️ warning(s): 1831 Duplicate index 'IDX_Employees_Name_2' defined on the table 'employees'. This is deprecated and will be disallowed in a future release.
 Records: 0  Duplicates: 0  Warnings: 1
-> Index lookup on Employees using IDX_Employees_Name (Name='ABC 7')  (cost=0.35 rows=1) (actual time=0.0457..0.049 rows=1 loops=1)
 

In MySQL, when multiple indexes exist on the same column(s) in the same order (even if they have different names), they are considered redundant. However:

✅ Index Lookup Behavior:

MySQL will typically use the first-created index (based on internal order in the INFORMATION_SCHEMA.STATISTICS table) that satisfies the query optimizer's requirements.

So, the first B-tree structure (i.e., the first index created) is likely to be used for lookups, and the second will rarely (if ever) be used unless the first is dropped or becomes unavailable.

Identify and delete the duplicate index.

DROP INDEX IDX_Employees_Name_2 ON Employees;

MySQL allows redundant indexes

Why does MySQL allow CREATE INDEX on a column that's already the PRIMARY KEY?

(e.g., CREATE INDEX IDX_Employees_ID ON Employees(id); when id is already the PRIMARY key)

🔍 Detailed Explanation

In InnoDB (the default storage engine):

  • The PRIMARY KEY is a clustered index, and it already indexes the column (Id in your case).
  • Creating a secondary index on the same column (Id) is allowed, but it creates a separate index structure — which:
    • Takes extra disk space
    • Requires extra maintenance on inserts/updates/deletes
    • Provides no performance benefit in this case

📌 Example

-- Suppose you already have this:
PRIMARY KEY (Id)

-- And then you do:
CREATE INDEX IDX_Employees_ID ON Employees(Id);

Now, MySQL has:

  • A clustered index (PRIMARY)
  • A redundant secondary index (IDX_Employees_ID) that indexes the same column

MySQL Index Creation Script (on Employees table)

MySQL allows redundant indexes
CREATE INDEX IDX_Employees_ID ON Employees(id);
CREATE UNIQUE INDEX IDX_Employees_ID_UNIQUE ON Employees(id);
Before creating indexes, always check existing ones:
SHOW INDEX FROM Employees;
image
MySQL index creation on sequential columns, we’ll simulate different combinations. Below is a script that creates multiple indexes with varying column order for comparison, focusing on combinations of:
CREATE UNIQUE INDEX IDX_Employees_Email ON Employees(Email); -- Secondary index = nonclustered-like
CREATE INDEX IDX_Employees_Name ON Employees(Name); -- Secondary index = nonclustered-like
-- 1. Index on (id, name, email)
CREATE INDEX idx_id_name_email_1 ON Employees(id, name, email);
CREATE INDEX idx_id_name_email_2 ON Employees(id, name, email); -- Identical index, redundant
-- 2. Index on (id, email)
CREATE INDEX idx_id_email ON Employees(id, email);
-- 3. Index on (name, email, id)
CREATE INDEX idx_name_email_id ON Employees(name, email, id);
-- 4. Index on (name, id, email)
CREATE INDEX idx_name_id_email ON Employees(name, id, email);
-- 5. Another variation of (name, email, id) - identical to idx_name_email_id
CREATE INDEX idx_name_email_id_2 ON Employees(name, email, id); -- Redundant
Indexes Recap grok.com
SELECT 
    TABLE_SCHEMA AS table_schema,
    TABLE_NAME AS table_name,
    INDEX_NAME AS `index`,
    GROUP_CONCAT(COLUMN_NAME ORDER BY SEQ_IN_INDEX) AS index_columns,
    NON_UNIQUE AS NON_UNIQUE
FROM 
    INFORMATION_SCHEMA.STATISTICS
WHERE 
    TABLE_SCHEMA = DATABASE() -- Applies to all tables in the current DB
    AND TABLE_NAME = 'Employees'
    -- AND INDEX_NAME != 'PRIMARY' -- AND SEQ_IN_INDEX = 1  (AND COLUMN_NAME != 'id')
GROUP BY 
    TABLE_SCHEMA, TABLE_NAME, INDEX_NAME, NON_UNIQUE
ORDER BY 
    TABLE_NAME, INDEX_NAME;
Understand Leftmost Prefix grok.com
| table_schema | table_name | index                    | index_columns  | NON_UNIQUE |
|--------------|------------|--------------------------|----------------|------------|
| mydb         | Employees  | PRIMARY                  | Id             | 0          |
| mydb         | Employees  | IDX_Employees_ID_UNIQUE  | Id             | 0          |
| mydb         | Employees  | IDX_Employees_ID         | Id             | 1          |
| mydb         | Employees  | IDX_Employees_Name       | Name           | 1          |
| mydb         | Employees  | IDX_Employees_Email      | Email          | 1          |
| mydb         | Employees  | idx_id_email             | Id,Email       | 1          |
| mydb         | Employees  | idx_id_name_email_1      | Id,Name,Email  | 1          |
| mydb         | Employees  | idx_id_name_email_2      | Id,Name,Email  | 1          |
| mydb         | Employees  | idx_name_email_id        | Name,Email,Id  | 1          |
| mydb         | Employees  | idx_name_email_id_2      | Name,Email,Id  | 1          |
| mydb         | Employees  | idx_name_id_email        | Name,Id,Email  | 1          |
DROP INDEX IDX_Employees_ID_UNIQUE ON Employees;
DROP INDEX IDX_Employees_ID ON Employees;
Identify redundant or duplicate indexes to optimize query performance and reduce write overhead on the table.grok.com
-- Set the table names variable (comma-separated list, e.g., 'Employees,OtherTable')
-- SET @table_names = ''; -- Modify this to include desired table names or leave empty for all tables
SET @table_names = 'Employees';

-- Drop temporary tables if they exist
DROP TEMPORARY TABLE IF EXISTS selected_tables;
DROP TEMPORARY TABLE IF EXISTS selected_tables_copy;
DROP TEMPORARY TABLE IF EXISTS id_indexes;
DROP TEMPORARY TABLE IF EXISTS id_indexes_copy;
DROP TEMPORARY TABLE IF EXISTS general_redundant_indexes;
DROP TEMPORARY TABLE IF EXISTS id_redundant_indexes;
DROP TEMPORARY TABLE IF EXISTS redundant_indexes;

-- Create temporary table to store selected table names
CREATE TEMPORARY TABLE selected_tables (
    table_name VARCHAR(64)
);

-- Create a copy of selected_tables to avoid 'Can't reopen table' error
CREATE TEMPORARY TABLE selected_tables_copy (
    table_name VARCHAR(64)
);

-- Create temporary table to store single-column 'id' indexes
CREATE TEMPORARY TABLE id_indexes (
    table_schema    VARCHAR(64),
    table_name      VARCHAR(64),
    index_name      VARCHAR(64)
);

-- Create a copy of id_indexes to avoid 'Can't reopen table' error
CREATE TEMPORARY TABLE id_indexes_copy (
    table_schema    VARCHAR(64),
    table_name      VARCHAR(64),
    index_name      VARCHAR(64)
);

-- Create temporary table for general redundant indexes (non-PRIMARY, excluding single-column 'id' indexes)
CREATE TEMPORARY TABLE general_redundant_indexes (
    table_schema     VARCHAR(64),
    table_name       VARCHAR(64),
    redundant_index  VARCHAR(64),
    kept_index       VARCHAR(64),
    index_columns    TEXT,
    drop_statement   TEXT
);

-- Create temporary table for id-specific redundant indexes (against PRIMARY)
CREATE TEMPORARY TABLE id_redundant_indexes (
    table_schema     VARCHAR(64),
    table_name       VARCHAR(64),
    redundant_index  VARCHAR(64),
    kept_index       VARCHAR(64),
    index_columns    TEXT,
    drop_statement   TEXT
);

-- Create final storage table for redundant index info
CREATE TEMPORARY TABLE redundant_indexes (
    table_schema     VARCHAR(64),
    table_name       VARCHAR(64),
    redundant_index  VARCHAR(64),
    kept_index       VARCHAR(64),
    index_columns    TEXT,
    drop_statement   TEXT
);

-- Step 1: Parse table names from @table_names variable
INSERT INTO selected_tables (table_name)
SELECT TRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(@table_names, ',', numbers.n), ',', -1)) AS table_name
FROM (
    SELECT a.N + b.N * 10 + 1 AS n
    FROM 
        (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) a,
        (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) b
    ORDER BY n
) numbers
WHERE 
    @table_names IS NOT NULL 
    AND @table_names != ''
    AND n <= 1 + (LENGTH(@table_names) - LENGTH(REPLACE(@table_names, ',', '')))
UNION
SELECT TABLE_NAME 
FROM INFORMATION_SCHEMA.TABLES 
WHERE TABLE_SCHEMA = DATABASE() 
    AND (@table_names IS NULL OR @table_names = '');

-- Step 2: Copy selected_tables to selected_tables_copy
INSERT INTO selected_tables_copy (table_name)
SELECT table_name FROM selected_tables;

-- Step 3: Identify single-column 'id' indexes
INSERT INTO id_indexes (table_schema, table_name, index_name)
SELECT 
	TABLE_SCHEMA, TABLE_NAME, INDEX_NAME
FROM 
    INFORMATION_SCHEMA.STATISTICS
WHERE 
    TABLE_SCHEMA = DATABASE()
    AND INDEX_NAME != 'PRIMARY'
    AND COLUMN_NAME = 'id'
    AND SEQ_IN_INDEX = 1
    AND TABLE_NAME IN (SELECT table_name FROM selected_tables)
    AND NOT EXISTS (
        SELECT 1
        FROM INFORMATION_SCHEMA.STATISTICS s2
        WHERE s2.TABLE_SCHEMA = TABLE_SCHEMA
          AND s2.TABLE_NAME = TABLE_NAME
          AND s2.INDEX_NAME = INDEX_NAME
          AND s2.SEQ_IN_INDEX > 1
    );

-- Step 4: Copy id_indexes to id_indexes_copy
INSERT INTO id_indexes_copy (table_schema, table_name, index_name)
SELECT * FROM id_indexes;

-- Step 5: Insert general redundant indexes (non-PRIMARY, excluding single-column 'id' indexes)
INSERT INTO general_redundant_indexes (table_schema, table_name, redundant_index, kept_index, index_columns, drop_statement)
SELECT 
    s1.TABLE_SCHEMA,
    s1.TABLE_NAME,
    s1.INDEX_NAME AS redundant_index,
    s2.INDEX_NAME AS kept_index,
    GROUP_CONCAT(s1.COLUMN_NAME ORDER BY s1.SEQ_IN_INDEX) AS index_columns,
    CONCAT('ALTER TABLE `', s1.TABLE_NAME, '` DROP INDEX `', s1.INDEX_NAME, '`;') AS drop_statement
FROM 
    INFORMATION_SCHEMA.STATISTICS s1
JOIN 
    INFORMATION_SCHEMA.STATISTICS s2
    ON s1.TABLE_SCHEMA = s2.TABLE_SCHEMA
    AND s1.TABLE_NAME = s2.TABLE_NAME
    AND s1.SEQ_IN_INDEX = s2.SEQ_IN_INDEX
    AND s1.COLUMN_NAME = s2.COLUMN_NAME
    AND s1.INDEX_NAME != s2.INDEX_NAME
WHERE 
    s1.TABLE_SCHEMA = DATABASE()
    AND s1.INDEX_NAME != 'PRIMARY'
    AND s2.INDEX_NAME != 'PRIMARY'
    AND s1.TABLE_NAME IN (SELECT table_name FROM selected_tables)
    AND s2.TABLE_NAME IN (SELECT table_name FROM selected_tables_copy)
    AND s1.INDEX_NAME NOT IN (
        SELECT index_name 
        FROM id_indexes 
        WHERE table_schema = s1.TABLE_SCHEMA 
          AND table_name = s1.TABLE_NAME
    )
    AND s2.INDEX_NAME NOT IN (
        SELECT index_name 
        FROM id_indexes_copy 
        WHERE table_schema = s2.TABLE_SCHEMA 
          AND table_name = s2.TABLE_NAME
    )
GROUP BY 
    s1.TABLE_SCHEMA, s1.TABLE_NAME, s1.INDEX_NAME, s2.INDEX_NAME
HAVING 
    COUNT(*) = (
        SELECT COUNT(*) 
        FROM INFORMATION_SCHEMA.STATISTICS s3
        WHERE s3.TABLE_SCHEMA = s1.TABLE_SCHEMA
          AND s3.TABLE_NAME = s1.TABLE_NAME
          AND s3.INDEX_NAME = s1.INDEX_NAME
    )
    AND COUNT(*) = (
        SELECT COUNT(*) 
        FROM INFORMATION_SCHEMA.STATISTICS s4
        WHERE s4.TABLE_SCHEMA = s2.TABLE_SCHEMA
          AND s4.TABLE_NAME = s2.TABLE_NAME
          AND s4.INDEX_NAME = s2.INDEX_NAME
    )
    AND redundant_index > kept_index;

-- Step 6: Insert id-specific redundant indexes (single-column 'id' against PRIMARY)
INSERT INTO id_redundant_indexes (table_schema, table_name, redundant_index, kept_index, index_columns, drop_statement)
SELECT 
    s.TABLE_SCHEMA,
    s.TABLE_NAME,
    s.INDEX_NAME AS redundant_index,
    'PRIMARY' AS kept_index,
    'id' AS index_columns,
    CONCAT('ALTER TABLE `', s.TABLE_NAME, '` DROP INDEX `', s.INDEX_NAME, '`;') AS drop_statement
FROM 
    INFORMATION_SCHEMA.STATISTICS s
WHERE 
    s.TABLE_SCHEMA = DATABASE()
    AND s.INDEX_NAME != 'PRIMARY'
    AND s.COLUMN_NAME = 'id'
    AND s.SEQ_IN_INDEX = 1
    AND s.TABLE_NAME IN (SELECT table_name FROM selected_tables)
    AND NOT EXISTS (
        SELECT 1
        FROM INFORMATION_SCHEMA.STATISTICS s2
        WHERE s2.TABLE_SCHEMA = s.TABLE_SCHEMA
          AND s2.TABLE_NAME = s.TABLE_NAME
          AND s2.INDEX_NAME = s.INDEX_NAME
          AND s2.SEQ_IN_INDEX > 1
    )
    AND EXISTS (
        SELECT 1
        FROM INFORMATION_SCHEMA.STATISTICS s3
        WHERE s3.TABLE_SCHEMA = s.TABLE_SCHEMA
          AND s3.TABLE_NAME = s.TABLE_NAME
          AND s3.INDEX_NAME = 'PRIMARY'
          AND s3.COLUMN_NAME = 'id'
          AND s3.SEQ_IN_INDEX = 1
    );

-- Step 7: Combine results into final redundant_indexes table
INSERT INTO redundant_indexes
	SELECT * FROM general_redundant_indexes where index_columns != 'id'
	UNION
	SELECT * FROM id_redundant_indexes;

-- Clean up temporary tables
DROP TEMPORARY TABLE IF EXISTS selected_tables;
DROP TEMPORARY TABLE IF EXISTS selected_tables_copy;
DROP TEMPORARY TABLE IF EXISTS id_indexes;
DROP TEMPORARY TABLE IF EXISTS id_indexes_copy;
DROP TEMPORARY TABLE IF EXISTS general_redundant_indexes;
DROP TEMPORARY TABLE IF EXISTS id_redundant_indexes;

-- Step 8: Select final results
SELECT * FROM redundant_indexes ORDER BY table_name, redundant_index;

🔥 Why Does MySQL Allow Duplicate Index's?

Because:

  • MySQL doesn't enforce index uniqueness by definition, only by column content (UNIQUE)
  • It's up to the developer to manage redundancy

Some databases (like PostgreSQL) avoid creating redundant indexes, but MySQL doesn't.

🧹 What Should You Do?

You can safely drop the redundant index:

DROP INDEX IDX_Employees_ID ON Employees;

And rely on the existing PRIMARY KEY (Id) for any lookups, joins, or sorts on Id.



Here's a detailed explanation of SQL indexing in relation to your example, covering performance, behavior, and how SQL engines optimize queries.

SQL Server index structure sqlshack.com MySQL Visual Explain Plan

SQL indexing is a crucial technique for optimizing database performance, especially for large tables and frequently executed queries. It's akin to creating an index in a book, allowing the database system to quickly locate specific data without scanning the entire table.

Imagine a giant library without any kind of catalog or organization. If you wanted to find a specific book, you'd have to walk through every aisle, pull out every book, and check its title. This is like a database without an index.

Now, imagine that same library with a detailed index in the front. You look up the book title, and the index tells you exactly which shelf and row it's on. This is what an index does for your database.


SQL Server : What is SQL Indexing? Youtube: Key/RID lookup, Unique Index, Clustered/Non-Clustered Index

RID Lookup / Nonclustered Index Structures Key Lookup / Clustered Index Structures
    Nonclustered indexes have the same B-tree structure as clustered indexes, except for the following significant differences:
  • The data rows of the underlying table are not sorted and stored in order based on their nonclustered keys.
  • The leaf layer of a nonclustered index is made up of index pages instead of data pages.
Nonclustered indexes can be defined on a table or view with a clustered index or a heap. Each index row in the nonclustered index contains the nonclustered key value and a row locator. This locator points to the data row in the clustered index or heap having the key value.
    The row locators in nonclustered index rows are either a pointer to a row or are a clustered index key for a row, as described below:
  • If the table is a heap (no clustered index), the row locator is a pointer built from file ID, page number, and row number on the page — known as a Row ID (RID).
  • If the table has a clustered index (or the index is on an indexed view), the row locator is the clustered index key. If the clustered index is not unique, SQL Server adds a uniqueifier — a 4-byte hidden value to ensure uniqueness in nonclustered indexes.
SQL Server retrieves the data row by searching the clustered index using the clustered index key stored in the leaf row of the nonclustered index.
In SQL Server, indexes are organized as B-trees.
  • Each page in an index B-tree is called an index node.
  • The top node is the root node.
  • The bottom level is the leaf nodes.
    • Any levels between root and leaf are intermediate levels.
  • In a clustered index, the leaf nodes contain the data pages of the table.
  • The root and intermediate nodes contain index pages with index rows.
Each index row contains a key and a pointer to:
  • Either an intermediate level page,
  • Or a data row in the leaf level.
Pages in each level of the index are linked in a doubly-linked list.
image image
image image

An SQL index is a special lookup table that the database search engine can use to speed up data retrieval. It's like an alphabetical list of terms, with pointers to where they can be found.

  • Read Performance Improvement: When you ask MySQL to find something, it first checks if there's an index that can help. If there is, it uses the index to quickly pinpoint the location of the data, instead of scanning through every single record. This is how it dramatically improves "read" (SELECT) performance.
    • How MySQL Converts SQL to Native and Fetches Records: When you write a SQL query like SELECT * FROM feature WHERE name = "Card View";, MySQL goes through several steps:
      • Parsing: MySQL first reads and understands your SQL query. It checks for syntax errors.
      • Optimization: This is where the magic happens! MySQL's "optimizer" (a smart piece of software) looks at your query and the available indexes. It tries to figure out the fastest way to get the data. It considers:
        • Which indexes exist?
        • How selective is the WHERE clause (how many rows will it likely match)?
        • What's the cost of using an index versus a full table scan?
        • Based on these factors, the optimizer generates an "execution plan." This plan is like a detailed instruction manual for fetching the data.
      • Execution: MySQL then follows the execution plan.
        • Without an Index (Table Scan): If there's no suitable index, MySQL will perform a "full table scan." This means it literally goes through every single row in the feature table, one by one, checking if name equals "Card View". This is slow for large tables.
        • With an Index (Index Lookup): If there's an index on the name column, MySQL will:
          • Go to the index.
          • Quickly find "Card View" in the index (indexes are typically stored in a highly organized way, like a B-Tree, which allows for very fast lookups).
          • The index entry for "Card View" contains "pointers" (or direct references) to the actual rows in the feature table where "Card View" exists.
          • MySQL then directly jumps to those specific rows to retrieve the data. This is significantly faster.
  • Write Performance (Lack of Performance): While indexes are great for reading, they come with a cost for writing (INSERT, UPDATE, DELETE).
    • Whenever you add, change, or remove a record, MySQL not only has to update the actual data in the table but also needs to update all the indexes that involve those columns.
    • This extra work means that INSERT, UPDATE, and DELETE operations can be slower on tables with many indexes. It's a trade-off: faster reads for slightly slower writes.

Here are the possible steps involved in SQL indexing, from planning to maintenance:

I. Planning and Design (Before Creation)

  1. Identify Candidate Columns:

    • Frequently queried columns: Columns used in WHERE, JOIN, ORDER BY, and GROUP BY clauses are prime candidates.
    • Columns with high selectivity: Columns with a wide range of distinct values (e.g., product_id, customer_name) are better candidates than columns with few distinct values (e.g., gender).
    • Foreign key columns: Indexing foreign key columns significantly improves join performance.
    • Primary keys: A clustered index is often automatically created on primary key columns, as they uniquely identify rows and are frequently used for lookups.
    • Columns with low NULL values: Indexes generally work better on columns that don't contain many NULL values.
  2. Understand Query Patterns:

    • What kind of queries are being run? Are they mostly SELECT statements? Are there many INSERT, UPDATE, or DELETE operations? Indexes improve SELECT performance but can slow down write operations due to the overhead of maintaining the index structure.
    • What are the most critical queries in terms of performance? Focus on optimizing these first.
  3. Choose the Right Index Type:

    • Clustered Index:
      • Determines the physical order of data in the table.
      • A table can have only one clustered index.
      • Best for range queries or when data needs to be retrieved in a specific sorted order.
      • Often created automatically on the primary key.
    • Non-Clustered Index:
      • Does not affect the physical order of data.
      • Creates a separate data structure with pointers to the actual data rows.
      • A table can have multiple non-clustered indexes.
      • Good for improving performance on queries that filter or sort based on columns not included in the clustered index.
    • Unique Index:
      • Ensures that all values in the indexed column(s) are unique.
      • Can be clustered or non-clustered.
      • Used for data integrity as well as performance.
    • Composite/Multi-column Index:
      • Created on two or more columns.
      • Useful for queries that filter or sort on multiple columns together. The order of columns in the composite index matters.
    • Filtered Index (SQL Server specific):
      • Indexes a subset of rows in a table, based on a WHERE clause.
      • Reduces index size and maintenance overhead for specific queries.
    • Other specialized index types: (e.g., Full-text indexes for searching text, spatial indexes for geographic data, XML indexes for XML data).
  4. Consider the Trade-offs:

    • Storage Space: Indexes require additional disk space.
    • Write Performance: INSERT, UPDATE, and DELETE operations can be slower as the index needs to be updated.
    • Maintenance Overhead: Indexes need to be maintained (reorganized/rebuilt) over time to reduce fragmentation.

II. Creation of Indexes

  1. Syntax: Use the CREATE INDEX statement. The basic syntax is:
CREATE [UNIQUE] [CLUSTERED | NONCLUSTERED] INDEX index_name
ON table_name (column1 [ASC|DESC], column2 [ASC|DESC], ...);
  • index_name: A descriptive name for your index.
  • table_name: The table on which the index is to be created.
  • column1, column2, ...: The columns to include in the index. You can specify ascending (ASC) or descending (DESC) order.
  1. Execution: Run the CREATE INDEX command in your database management system (DBMS). This process can take time for large tables, especially if it's a clustered index (as it involves physically reordering the data).

III. Monitoring and Maintenance (After Creation)

  1. Monitor Index Usage:
  • Use database-specific tools or Dynamic Management Views (DMVs) (e.g., sys.dm_db_index_usage_stats in SQL Server) to track how frequently indexes are being used by queries.
  • Identify unused or underutilized indexes.
  1. Analyze Query Execution Plans:
  • Use EXPLAIN (PostgreSQL, MySQL) or SHOW PLAN (SQL Server) to see how the database optimizer is using (or not using) your indexes. This helps in understanding if your indexes are effective.
  1. Identify and Address Fragmentation:
  • Over time, as data is inserted, updated, and deleted, indexes can become fragmented. Fragmentation can degrade query performance.
  • Monitor fragmentation levels (e.g., sys.dm_db_index_physical_stats in SQL Server).
  1. Perform Index Maintenance:
  • Reorganize Index: A lighter operation that defragments the index without rebuilding it from scratch. It reorders the leaf-level pages of the index. (e.g., ALTER INDEX ... REORGANIZE in SQL Server).
  • Rebuild Index: A more intensive operation that drops and re-creates the index. This removes fragmentation completely and can improve performance more significantly. (e.g., ALTER INDEX ... REBUILD in SQL Server). Automate these tasks using scheduled jobs or maintenance plans.
  1. Remove Unnecessary Indexes:
  • If an index is not being used or is providing little to no performance benefit, it's best to remove it to reduce storage overhead and improve write performance.
  • Use the DROP INDEX statement:
DROP INDEX index_name ON table_name; -- SQL Server
DROP INDEX index_name; -- PostgreSQL, Oracle
ALTER TABLE table_name DROP INDEX index_name; -- MySQL
  1. Review and Re-evaluate:
  • Regularly review your indexing strategy as your data and query patterns evolve.
  • Adjust indexes as needed based on performance analysis and business requirements.

By following these steps, you can effectively implement and manage SQL indexes to significantly improve the performance of your database applications.


How it's fetched in MySQL (Huge Records vs. 10 Records):

  • Huge Records (e.g., 100,000,000 records):
    • Without Index: Finding a specific record would be like searching for a needle in a massive haystack. MySQL would have to check millions of rows, taking a very long time (seconds, minutes, or even longer).
    • With Index: The index acts like a detailed map. MySQL uses the map to go directly to the handful of locations where the "needle" might be. The difference in speed is enormous.
  • 10 Records:
    • Without Index: MySQL would quickly scan all 10 records. It's so few that the overhead of using an index might actually be slower than just scanning them directly.
    • With Index: MySQL would still use the index if it exists and is deemed beneficial by the optimizer. However, the performance difference compared to a full scan would be negligible, or sometimes even slightly worse due to the overhead of going through the index structure. This highlights an important point: Indexes are most beneficial for large tables. For very small tables, they might not offer much, or could even slightly decrease performance due to their overhead.

Does Indexing Create a New Table?

No, an index does not create a completely separate, visible "new table" that you can query directly like SELECT * FROM idx_name_feature;.

Instead, an index is a separate data structure that MySQL manages internally alongside your main table. Think of it as:

  • Main Table (feature): Contains all your actual data rows.
  • Index (idx_name_feature): Contains a sorted list of the Name values, each pointing to the specific row(s) in the feature table where that Name appears.

How to View Indexes:

You can't "select" from an index directly, but you can see which indexes exist on a table:

-- To see indexes on your 'table_name' table
SHOW INDEX FROM table_name;

-- To see the full table definition including indexes
SHOW CREATE TABLE table_name;

Best Way to Check Indexing Performance (Using EXPLAIN ANALYZE)

The EXPLAIN ANALYZE command (or just EXPLAIN in older MySQL versions, but ANALYZE gives actual execution times) is your best friend for understanding how MySQL executes a query and whether it's using an index.

Huge Records (e.g., 100,000 records)

Let's re-examine your examples:

Original Query (No Index on Name):

explain analyze select * from feature where name = "Cart View";
/*
    EXPLAIN
    -> Filter: (feature.Name = 'Cart View')  (cost=121 rows=118) (actual time=0.523..1.93 rows=5 loops=1)
     -> Table scan on feature  (cost=121 rows=1182) (actual time=0.135..1.35 rows=1182 loops=1)
*/

Explanation:

  • Table scan on feature: This is the key indicator! It means MySQL is reading every single row of the feature table (all 1182 rows, as actual time shows).
  • cost=121: This is an estimated cost (a unit for comparison, lower is better).
  • rows=1182: MySQL scanned all rows.
  • actual time=0.135..1.35: It took between 0.135 and 1.35 milliseconds to scan the table.
  • Filter: (feature.Name = 'Cart View'): After scanning, it then filters those 1182 rows to find the 5 that match "Cart View".

Creating the Index:

CREATE INDEX idx_name_feature ON feature (Name);

This command tells MySQL to build an index named idx_name_feature on the Name column of the feature table. MySQL will now create that separate, sorted data structure.

Query After Creating Index:

explain analyze select * from feature where name = "Cart View"; -- Index
/*
    EXPLAIN
    -> Index lookup on feature using idx_name_feature (Name='Cart View')  (cost=1.75 rows=5) (actual time=0.095..0.0985 rows=5 loops=1)
*/

Explanation:

  • Index lookup on feature using idx_name_feature: This is exactly what you want to see! It means MySQL used your newly created index.
  • cost=1.75: The estimated cost is drastically lower (1.75 vs. 121), indicating a much more efficient operation.
  • rows=5: MySQL only had to "look up" 5 rows in the index, which directly pointed to the 5 matching data rows.
  • actual time=0.095..0.0985: The actual time is significantly reduced (around 0.095 milliseconds, compared to over 1 millisecond for the table scan). This is a clear win!

Primary Key Example:

explain analyze select * from feature where id = 3; -- Primary key
/*
    EXPLAIN
    -> Rows fetched before execution  (cost=0..0 rows=1) (actual time=500e-6..500e-6 rows=1 loops=1)
*/

Explanation:

  • Rows fetched before execution: This indicates an extremely fast lookup. Primary keys are usually very highly optimized and often correspond to a "clustered index" in MySQL (specifically, InnoDB tables cluster data by the primary key). This means the data itself is physically stored in the order of the primary key, making lookups by ID incredibly fast.
  • actual time=500e-6: This is 0.0005 milliseconds, practically instantaneous!

Indexing small tables: Indexing is generally not beneficial for tables with a small number of records.

For tables with very few records (like your company table with only 2), MySQL's optimizer intelligently opts for a full table scan, as the overhead of using an index often outweighs the benefits.

Let's look at your company table example where the index wasn't used despite being created:

explain analyze select * from company where name = "code"; -- Records count 2

Before:
-> Filter: (company.Name = 'code')  (cost=0.45 rows=1) (actual time=0.688..0.688 rows=0 loops=1)
     -> Table scan on company  (cost=0.45 rows=2) (actual time=0.105..0.136 rows=2 loops=1)
After:
-> Filter: (company.Name = 'code')  (cost=0.45 rows=1) (actual time=0.0587..0.0587 rows=0 loops=1)
     -> Table scan on company  (cost=0.45 rows=2) (actual time=0.0465..0.0533 rows=2 loops=1)

The reason the index isn't used here, even "After" creating it, is because the company table only has 2 records!

MySQL's optimizer is smart. For a table with just 2 records, it's often faster for MySQL to simply scan those 2 records directly from the table (a "table scan") than to go through the overhead of:

  • Looking up the index.
  • Traversing the index structure.
  • Following the pointer(s) from the index to the actual data rows.
  • The cost of using the index (even if minimal) might outweigh the benefit for such a tiny table. The cost=0.45 and rows=2 in your EXPLAIN ANALYZE output for both "Before" and "After" clearly show that MySQL decided a full table scan was the most efficient approach for a table of that size. The actual time is slightly faster "After" because perhaps some internal caching or other minor factors came into play, but it's still fundamentally doing a table scan.

When to consider indexing: While there's no fixed number, indexes typically become useful when tables have hundreds to thousands of records or more, especially if queries frequently filter or join on specific columns.

MySQL Indexing Commands Explained

-- Replace 'company' with your actual table name if different.
-- Replace 'idx_code' with your desired index name.
-- Replace 'code' with the actual column name you want to index.

-- 1. Display existing indexes on the 'company' table.
SHOW INDEXES FROM company;

-- 2. Display the full table definition, which includes index definitions.
SHOW CREATE TABLE company;

-- 3. Create an index on the 'code' column of the 'company' table.
--    (Uncomment and run this if you want to create the index)
CREATE INDEX idx_code ON company(code);

-- 4. Drop the previously created index (or an existing one).
--    (Uncomment and run this if you want to drop the index)
DROP INDEX idx_code ON company;

-- 5. Show the execution plan for a full table scan.
EXPLAIN SELECT * FROM company;

-- 6. Update statistics for the 'company' table.
--    (Always a good idea after significant data changes or index creation/deletion)
ANALYZE TABLE company;

-- 7. Show the actual execution plan and performance metrics for a query.
EXPLAIN ANALYZE SELECT * FROM company WHERE name = "code";

-- 8. Show the actual execution plan, forcing the use of a specific index.
--    (Useful for testing if an index improves performance or to see overhead)
EXPLAIN ANALYZE SELECT * FROM company FORCE INDEX (idx_code) WHERE name = 'code';

Here's a quick guide to understanding your indexing operations:

  • SHOW INDEXES FROM company;
    • What it does: Displays all indexes currently defined on the company table.
    • Usefulness: Essential to verify if an index exists, its type (UNIQUE, PRIMARY, MULTIPLE), and the columns it covers.
  • CREATE INDEX idx_code ON company(code);
    • What it does: Creates a non-clustered index named idx_code on the code column of the company table.
    • Usefulness: Improves read performance for queries filtering or sorting by the code column.
  • DROP INDEX idx_code ON company;
    • What it does: Deletes the index named idx_code from the company table.
    • Usefulness: Used to remove unnecessary or poorly performing indexes, reducing write overhead and storage space.
  • EXPLAIN SELECT * from company;
    • What it does: Shows the execution plan that MySQL's optimizer intends to use for the SELECT * from company query.
    • Output you'd see: Likely Type: ALL (meaning a full table scan), as SELECT * without a WHERE clause usually doesn't benefit from indexes.
    • Usefulness: Helps predict query performance before execution.
  • ANALYZE TABLE company;
    • What it does: Collects and updates statistics about the company table (e.g., number of rows, distinct values in columns).
    • Usefulness: Crucial for the MySQL optimizer! Up-to-date statistics help the optimizer make better decisions about which index (if any) to use, or if a table scan is more efficient. This is often "missed" but very important for index effectiveness. Run this after significant data changes.
  • EXPLAIN ANALYZE SELECT * from company where name = "code";
    • What it does: Executes the SELECT query and then shows the actual execution plan along with real-time performance metrics (like actual time and rows processed).
    • Output you'd typically see for a small table (like 2 records): Likely Table scan because the optimizer determines it's faster to just read all 2 records than use an index.
    • Usefulness: Provides detailed, real-world insight into how a query performed and if indexes were truly utilized.
  • EXPLAIN ANALYZE SELECT * FROM company FORCE INDEX (idx_code) WHERE name = 'code';
    • What it does: Executes the SELECT query, but forces MySQL to use the idx_code index for the company table, even if the optimizer thinks a table scan would be better.
    • Output you'd see: It will attempt to use idx_code. However, for a 2-record table, you'll likely still see the actual time being similar or even slightly higher than a table scan, demonstrating why the optimizer usually avoids indexes on tiny tables.
    • Usefulness: A powerful diagnostic tool to test the performance of a specific index. You use this when you suspect MySQL's optimizer is making a wrong decision, or to confirm the overhead of an index on a small table.

Key Takeaway: Indexes are a performance optimization for large datasets. For very small tables, the optimizer will often correctly determine that a full table scan is more efficient.

⚠️ **GitHub.com Fallback** ⚠️