Index - gusenov/kb GitHub Wiki

  • Quora / What is indexed data structure?

    • index means position/location of a data
  • IGI Global / What is Indexing Structure

    • A data structure that speed the search operations on a database. Creating and Maintaining an index structure has an associated cost of extra writes and storage space. This cost is amortized at the search phase, where querying using an index is faster than searching every row in a database.
  • How does indexing work in a database?

    • Database Indexing allows us to cut down the number of rows/records that need to be examined when a select query with a where clause is executed.
    • An index is a data structure that stores the values for a certain specific column of a table and helps us avoid a full table scan.
    • Data structures for indexing
      • B-trees Time-efficient (logarithmic time) for lookups, deletions, and insertions. Data that is stored inside of a B-tree can be sorted.
      • Hash Tables Extremely efficient for looking up values. However, hash tables are not sorted data structures.
      • R-tree Used with spatial databases. It helps with queries like "find all the coffee shops within 2 miles of my location".
      • Bitmap Index Works for columns that have many instances of those values, i.e., columns with low selectivity. For example, a column with boolean values.
    • Cost of having a database index
      • Index takes up additional space, so the larger the table, the bigger the index.
      • Every time you perform an add, delete, or update operation, the same operation will need to be performed on the index as well.
  • Chartio

  • The Data School by Chartio by Matt David

    • SQL Optimization: Improve Queries and Create Models
      • Query Optimizations
      • Column and Table Optimizations
        • Indexing
          • Indexing makes columns faster to query by creating pointers to where data is stored within a database.
          • An index is a structure that holds the field the index is sorting and a pointer from each record to their corresponding record in the original table where the data is actually stored.
          • Types of Indexing
            • Clustered indexes are the unique index per table that uses the primary key to organize the data that is within the table. The clustered index ensures that the primary key is stored in increasing order, which is also the order the table holds in memory.
            • Non-clustered indexes are sorted references for a specific field, from the main table, that hold pointers back to the original entries of the table.
      • Modeling Data
      • Databases
  • 4 Index Data Structures A Data Engineer Must Know

    • index structure is formed, independent of the data structure, which accelerates the search for certain fields
    • Based on these pointers, the database management system can then find the data using a search algorithm.
    • Index Structure Types
      • Clustered, non-clustered
      • Internal, external
      • Dynamic, static
      • Sparse, dense
      • Single (Sorted Field, Hash Table, B-Tree, Binary Tree, B*-Tree), multidimensional (R-Tree, Bitmap Index, Grid File, K-D-Tree, Range Tree)
  • GURU99 / Indexing in DBMS: What is, Types of Indexes with EXAMPLES

    • An Index is a small table having only two columns. The first column comprises a copy of the primary or candidate key of a table. Its second column contains a set of pointers for holding the address of the disk block where that specific key value stored.
    • An index –
      • Takes a search key as input
      • Efficiently returns a collection of matching records.
    • Types of Indexing in DBMS
      • Primary Index is an ordered file which is fixed length size with two fields. The first field is the same a primary key and second, filed is pointed to that specific data block.
        • Dense Index records contain search key value and points to the real record on the disk.
        • Sparse Index record that appears for only some of the values in the file.
      • Clustering records themselves are stored in the Index and not pointers.
      • Secondary Index (non-clustering index) can be generated by a field which has a unique value for each record, and it should be a candidate key.
        • Index record is a record point to a bucket that contains pointers to all the records with their specific search-key value.
      • Multilevel Index is created when a primary index does not fit in memory
      • B-Tree Index is a multilevel format of tree based indexing in DBMS technique which has balanced binary search trees. All leaf nodes of the B tree signify actual data pointers.
  • Javatpoint

    • DBMS Tutorial
      • Indexing and B+ Tree
        • Indexing in DBMS
          • Index structure:
            • Search key contains a copy of the primary key or candidate key of the table. The values of the primary key are stored in sorted order so that the corresponding data can be accessed easily.
            • Data Reference contains a set of pointers holding the address of the disk block where the value of the particular key can be found.
          • Indexing methods
            • Ordered indices
            • Primary Index created on the basis of the primary key of the table
              • Dense index contains an index record for every search key value in the data file.
              • Sparse index instead of pointing to each record in the main table, the index points to the records in the main table in a gap.
            • Clustering Index records which have similar characteristics are grouped, and indexes are created for these group.
            • Secondary Index to reduce the size of mapping, another level of indexing is introduced. In this method, the huge range for the columns is selected initially so that the mapping size of the first level becomes small. Then each range is further divided into smaller ranges. The mapping of the first level is stored in the primary memory, so that address fetch is faster. The mapping of the second level and actual data are stored in the secondary memory (hard disk).
  • OptimizDBA / Understanding The Techniques Of Database Indexing

    • Types of Indexing
      • Clustered Indexing to reduce the expense of search because numerous records connected to the same topic are stored in one location
      • Non-Clustered Indexing (Secondary Indexing) indicates the location of the data, i.e., it provides us with a list of virtual pointers to the location in which the data is.
      • Multilevel Indexing separates an entire block into smaller blocks so that the Index can be saved in one block. These blocks can then be separated into inner blocks, which can be subsequently mapped towards those containing data. It is then saved in the main memory, with less overhead.
  • What Are the Types of Indexes in a Relational Database? by Radu Gheorghiu

    • A database index is an additional data structure created on top of the data in a table.
    • Types of Indexes
      • From the point of view of the characteristics of the index attribute:
        • Primary Index
        • Clustered Index defines the order in which data is physically stored on DATA PAGES and implicitly in the table.
        • Secondary Index
      • From the point of view of the number of index references to a data file:
        • Dense Index
        • Sparse Index
      • Specialized indexes for highly specific scenarios:
        • Bitmap Index is ideal when you have to query and filter a table on a column with a small number of distinct values compared to the total number of rows in the table.
        • Reverse Index is optimized to search for data in descending order.
        • Hash Index
        • Filtered Index
        • Function-based Index
        • Spatial Index
    • Data Structure
      • Balanced Tree
      • Hash
    • PostgreSQL Indexes
      • B-tree index is used to search for equality and range comparisons in columns that can be sorted.
      • hash index used when simple equality comparisons are needed
      • GiST used in finding the “nearest neighbor” in geometric data types.
      • SP-GiST is based on different data structures such as quadtrees, k-d trees, and radix trees and is used in similar scenarios as GiST.
      • GIN is also called an “inverted index.” This type of index is used in scenarios in which the data is formed by an array. The inverted index contains a separate entry for each component value of the array.
      • BRIN stands for “Block Range Index.” It’s used to store summaries of values in consecutive physical data pages within a table. They are best suited when the values in their rows are correlated with the physical order of the data pages.
    • Oracle Indexes
      • B-tree is great for primary keys and columns with a very large number of distinct values compared to the total number of rows.
      • Bitmap you want to use them when the number of distinct values in a column is very small compared to the total number of rows.
      • function-based index is a type of index in which the value stored in the search tree is defined by a function.
    • SQL Server Indexes
      • clustered index physically reorganizes rows in the data pages so that they are sorted (ascending or descending).
      • nonclustered index is the equivalent of the standard b-tree index in other database engines. It is generally great for searching through data with a very large number of distinct values.
      • Filtered indexes are created on specific subsets of data. They can be used for optimizing searches for data with given criteria when the data is skewed.
  • Coding Ninjas – Learn coding online at India’s best coding institute

  • Tutorials Point

  • Become a software engineer at a product-based company

Wikipedia

Stack Exchange

Indexes

Tree-based indexes

B-tree

  • Evolution of tree data structures for indexing: more exciting than it sounds by Dmitry Dolgov

    • Whenever we speak about indexes, especially in PostgreSQL context, there is a lot to talk about: B-tree, Hash, GiST, SP-GiST, GIN, BRIN, RUM.
  • Postgres Indexes Under the Hood by Russell Cohen

    • B-Trees are a generalization of binary search trees that allow nodes to have more than 2 children. The number of children per node is up to the implementer of the data structure. Databases typically have branching factors in the low thousands.
  • An Introduction to Database Indexing by Kyle Annen

    • B-Tree based indexes are created by default on any primary key, foreign key, and uniquely constrained fields.
    • A B-Tree is a tree data structure that self-balances and has a log(n) search complexity. This means that the search time through the tree is drastically reduced compared to a linear complexity search.
  • DZone / How Database B-Tree Indexing Works

    • B-tree is a data structure that provides sorted data and allows searches, sequential access, attachments, and removals in sorted order.
    • The B-tree is highly capable of storage systems that write large blocks of data.
    • The concept of Binary tree can be extended into a more generalized form, which is known as B-tree. Instead of having a single entry for a single node, B-tree uses an array of entries for a single node and having reference to child node for each of these entries.
    • B+tree is another data structure that used to store data, which looks almost the same as the B-tree. The only difference of B+tree is that it stores data on the leaf nodes. This means that all non-leaf node values are duplicated in leaf nodes again.
    • B-tree used for indexing and B+tree used to store the actual records. B+tree provides sequential search capabilities in addition to the binary search, which gives the database more control to search non-index values in a database.

Self-balancing BST (binary search trees)

Red-black trees

AVL trees

Star-tree

Skip list

  • Twitter / Reducing search indexing latency to one second by Nico Tonozzi and Dumitru Daniliuc
    • indexing latency, the amount of time it takes for new information to become available in the search index
    • The core of almost all search systems is a data structure called an inverted index. An inverted index is designed to quickly answer questions like "Which documents have the word cat in them?".
    • unrolled linked list is a linked list with multiple values per link
    • Skip lists support O(log n) lookups and insertions into a sorted set or map, and are relatively easy to adapt to support concurrency. A skip list has multiple levels, and each level stores elements in a linked list. The lowest level contains all of the elements, the next highest level contains some fraction of those elements, and so on, until we reach the highest level which contains just a few elements. If an element is stored at level N, then it is also stored at all levels 1, 2, …, N - 1. Each of the elements in a higher level contains a link to the equivalent element in a lower level. To find a given element, you start out at the highest level, and read along that linked list until you find an element that is greater than the one you are looking for. At that point, you descend one level and start examining elements in the more densely populated linked list.

Sorted array

PGM-index

  • The Piecewise Geometric Model index (PGM-index) is a data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes while providing the same worst-case query time guarantees.

ALEX (Adaptive Learned indEX)

DBMS

NoSQL

SQL

SQLite

  • SQLiteTutorial.net
    • Indexes
      • An index is an additional data structure that helps improve the performance of a query. Imagine an index in the database like an index of a book. By looking at the index, you can quickly identify page numbers based on the keywords.
      • You can consider a table as a list of pairs: (rowid, row).
      • Unlike a table, an index has an opposite relationship: (row, rowid).
      • Balanced tree for organizing indexes: B-tree keeps the amount of data at both sides of the tree balanced so that the number of levels that must be traversed to locate a row is always in the same approximate number.

PostgreSQL

MySQL

  • MySQL

  • How database indexing actually works internally?

    • The main point of having a index is to cut down the number of records/rows in a table which needs to be examined by the database, to speed up the searching queries.
    • database table does not reorder itself when we put index on a column to optimize the query performance
    • The major advantage of B-tree is that the data in it is sortable. Along with it, B-Tree data structure is time efficient and operations such as searching, insertion, deletion can be done in logarithmic time.
    • hash table is only good for looking up key value pairs
    • R-tree data structure and these are used to help with spatial problems such as "find all bus stand within 5KM of me"
    • Bit map index works well on column that contain Boolean values
    • Indexing makes "reads faster but writes slower". Every write, update, insert operation to the table, will have to be done to the index also.
    • MySQL put indexes in memory.
    • To confirm, index is actually being considered or not, add "EXPLAIN" before the SQL query and check the "type" and "rows", which database is looking at to fetch the result.
  • An in-depth look at Database Indexing

    • Performance is extremely important in many consumer products like e-commerce, payment systems, gaming, transportation apps, and so on.
    • An index maps search keys to corresponding data on disk by using different in-memory & on-disk data structures.
    • The most important part is to understand what to index & how the indexing is going to boost the query response time. A proper index can be created only when you know exactly what your query & data access patterns look like. Mostly an index is created on the columns specified in the WHERE clause of a query as the database retrieves & filters data from the tables based on those columns. If you don’t create an index, the database scans all the rows, filters out the matching rows & returns the result. With millions of records, this scan operation may take many seconds & this high response time makes APIs & applications slower & unusable.
  • Medium

    • Indexes in MySQL by Ilkin Guluzada
      • Indexes are a type of table that keep a primary key or index field and a pointer to each record into the actual table.
      • The users cannot see the indexes, they are just used to speed up queries and will be used by the Database Search Engine to locate records very fast.

Oracle

Microsoft SQL Server

IBM Db2

Books

Courses

Papers

  • The next 50 years in database indexing or: the case for automatically generated index structures by Jens Dittrich, Joris Nix, Christian Schön
    • Idea of "inventing an index" is a misleading concept. It is the analogue of "inventing a physical query plan".
    • Almost all index structures are assembled along three principal dimensions:
      • (1) structural building blocks, e.g., a B-tree is assembled from two different structural node types (inner and leaf nodes),
      • (2) a couple of invariants, e.g., for a B-tree all paths have the same length,
      • and (3) decisions on the internal layout of nodes (row or column layout, etc.).