Concepts Discussion - Neethahiremath/Wiki GitHub Wiki

Java Architecture and Garbage Collectors:

JDK contains JRE and JVM

https://www.geeksforgeeks.org/jvm-works-jvm-architecture/

Types:

Reference count Mark and sweep Mark, sweep and compact Mark and Copy Garbage Collector generation GC

Algorithm:

Serial GC: single threaded and small heap size

parallel GC: multiple threads default for java8 also known as Throughput collector

concurrent mark and sweep: application pause is only with marking and sweeping

**G1 GC: ** large heap size, it divides the heap memory into smaller size and do the mark and sweep. default for java9

Z GC: no pause and does it in less than 10ms. handles large heap size from 10MB to 16TB used in java11 and more.

https://www.baeldung.com/java-classnotfoundexception-and-noclassdeffounderror

https://medium.com/@kasunpdh/garbage-collection-how-its-done-d48135c7fe77

https://www.geeksforgeeks.org/types-of-jvm-garbage-collectors-in-java-with-implementation-details/

How can u call garbage collections:

 System.gc();
 Runtime.getRuntime().gc();

Searching:

Linear Search: time complexity o(n) space complexity o(1)


public static int linearSearch(int arr[], int elementToSearch) {

    for (int index = 0; index < arr.length; index++) {
        if (arr[index] == elementToSearch)
            return index;
    }
    return -1;
}

Linear Search can be used for searching in a small and unsorted set of data which is guaranteed not to increase in size by much.

It is a very basic search algorithm but due to its linear increase in time complexity, it does not find application in many production systems.

Binary search:

time complexity o(logn) space complexity o(1)


public static int binarySearch(int arr[], int elementToSearch) {

    int firstIndex = 0;
    int lastIndex = arr.length - 1;

    // termination condition (element isn't present)
    while(firstIndex <= lastIndex) {
        int middleIndex = (firstIndex + lastIndex) / 2;
        // if the middle element is our goal element, return its index
        if (arr[middleIndex] == elementToSearch) {
            return middleIndex;
        }

        // if the middle element is smaller
        // point our index to the middle+1, taking the first half out of consideration
        else if (arr[middleIndex] < elementToSearch)
            firstIndex = middleIndex + 1;

        // if the middle element is bigger
        // point our index to the middle-1, taking the second half out of consideration
        else if (arr[middleIndex] > elementToSearch)
            lastIndex = middleIndex - 1;

    }
    return -1;
}

It is the most commonly used search algorithm in most of the libraries for searching. The Binary Search tree is used by many data structures as well which store sorted data.

Binary Search is also implemented in Java APIs in the Arrays.binarySearch method.

sorting:

Selection sort:



class SelectionSort
{
    void sort(int arr[])
    {
        int n = arr.length;
 
        // One by one move boundary of unsorted subarray
        for (int i = 0; i < n-1; i++)
        {
            // Find the minimum element in unsorted array
            int min_idx = i;
            for (int j = i+1; j < n; j++)
                if (arr[j] < arr[min_idx])
                    min_idx = j;
 
            // Swap the found minimum element with the first
            // element
            int temp = arr[min_idx];
            arr[min_idx] = arr[i];
            arr[i] = temp;
        }
    }
 
    // Prints the array
    void printArray(int arr[])
    {
        int n = arr.length;
        for (int i=0; i<n; ++i)
            System.out.print(arr[i]+" ");
        System.out.println();
    }
 
    // Driver code to test above
    public static void main(String args[])
    {
        SelectionSort ob = new SelectionSort();
        int arr[] = {64,25,12,22,11};
        ob.sort(arr);
        System.out.println("Sorted array");
        ob.printArray(arr);
    }
}

time complexity: O(n2) space complexity: O(1)

In place sorting are : one extra space Selection sorting Insertion sorting

Stable sorting : order is maintained after and before sorting for duplicate keys. ex: Insertion sort, Merge sort etc ..

Unstable sort Example: Quick sort and heap sort

Bubble Sort:


class BubbleSort
{
    void bubbleSort(int arr[])
    {
        int n = arr.length;
        for (int i = 0; i < n-1; i++)
            for (int j = 0; j < n-i-1; j++)
                if (arr[j] > arr[j+1])
                {
                    // swap arr[j+1] and arr[j]
                    int temp = arr[j];
                    arr[j] = arr[j+1];
                    arr[j+1] = temp;
                }
    }
 
    /* Prints the array */
    void printArray(int arr[])
    {
        int n = arr.length;
        for (int i=0; i<n; ++i)
            System.out.print(arr[i] + " ");
        System.out.println();
    }
 
    // Driver method to test above
    public static void main(String args[])
    {
        BubbleSort ob = new BubbleSort();
        int arr[] = {64, 34, 25, 12, 22, 11, 90};
        ob.bubbleSort(arr);
        System.out.println("Sorted array");
        ob.printArray(arr);
    }
}

time complexity: O(n2) space complexity: O(1)

Insertion sort:

  void sort(int arr[])
    {
        int n = arr.length;
        for (int i = 1; i < n; ++i) {
            int key = arr[i];
            int j = i - 1;
 
            /* Move elements of arr[0..i-1], that are
               greater than key, to one position ahead
               of their current position */
            while (j >= 0 && arr[j] > key) {
                arr[j + 1] = arr[j];
                j = j - 1;
            }
            arr[j + 1] = key;
        }
    }

time complexity: O(n2) space complexity: O(1)

Quick Sort:

https://www.youtube.com/watch?v=7h1s2SojIRw

Merge sort:

time complexity: O(nlogn) space complexity: O(n)

stack implementation using Array:


class Stack {
    static final int MAX = 1000;
    int top;
    int a[] = new int[MAX]; // Maximum size of Stack
 
    boolean isEmpty()
    {
        return (top < 0);
    }
    Stack()
    {
        top = -1;
    }
 
    boolean push(int x)
    {
        if (top >= (MAX - 1)) {
            System.out.println("Stack Overflow");
            return false;
        }
        else {
            a[++top] = x;
            System.out.println(x + " pushed into stack");
            return true;
        }
    }
 
    int pop()
    {
        if (top < 0) {
            System.out.println("Stack Underflow");
            return 0;
        }
        else {
            int x = a[top--];
            return x;
        }
    }
 
    int peek()
    {
        if (top < 0) {
            System.out.println("Stack Underflow");
            return 0;
        }
        else {
            int x = a[top];
            return x;
        }
    }
    
    void print(){
    for(int i = top;i>-1;i--){
      System.out.print(" "+ a[i]);
    }
  }
}

Queue implementation using array:


class Queue {
    private static int front, rear, capacity;
    private static int queue[];
 
    Queue(int c)
    {
        front = rear = 0;
        capacity = c;
        queue = new int[capacity];
    }
 
    // function to insert an element
    // at the rear of the queue
    static void queueEnqueue(int data)
    {
        // check queue is full or not
        if (capacity == rear) {
            System.out.printf("\nQueue is full\n");
            return;
        }
 
        // insert element at the rear
        else {
            queue[rear] = data;
            rear++;
        }
        return;
    }
 
    // function to delete an element
    // from the front of the queue
    static void queueDequeue()
    {
        // if queue is empty
        if (front == rear) {
            System.out.printf("\nQueue is empty\n");
            return;
        }
 
        // shift all the elements from index 2 till rear
        // to the right by one
        else {
            for (int i = 0; i < rear - 1; i++) {
                queue[i] = queue[i + 1];
            }
 
            // store 0 at rear indicating there's no element
            if (rear < capacity)
                queue[rear] = 0;
 
            // decrement rear
            rear--;
        }
        return;
    }
 
    // print queue elements
    static void queueDisplay()
    {
        int i;
        if (front == rear) {
            System.out.printf("\nQueue is Empty\n");
            return;
        }
 
        // traverse front to rear and print elements
        for (i = front; i < rear; i++) {
            System.out.printf(" %d <-- ", queue[i]);
        }
        return;
    }
 
    // print front of queue
    static void queueFront()
    {
        if (front == rear) {
            System.out.printf("\nQueue is Empty\n");
            return;
        }
        System.out.printf("\nFront Element is: %d", queue[front]);
        return;
    }
}

The deque interface is implemented by a deque data structure which is a collection that can insert and delete elements from both the ends. The two classes i.e. ArrayDeque and LinkedList implement the deque interface. We can use these classes to implement the functionality of the deque interface

Trees:

Breath first traversal also called as level order: uses queue https://www.geeksforgeeks.org/level-order-tree-traversal/


 void printLevelOrder()
    {
        Queue<Node> queue = new LinkedList<Node>();
        queue.add(root);
        while (!queue.isEmpty()) {
      
            Node tempNode = queue.poll();
            System.out.print(tempNode.data + " ");
 
            /*Enqueue left child */
            if (tempNode.left != null) {
                queue.add(tempNode.left);
            }
 
            /*Enqueue right child */
            if (tempNode.right != null) {
                queue.add(tempNode.right);
            }
        }
    }

void printInorder(Node node)
    {
        if (node == null)
            return;
 
        /* first recur on left child */
        printInorder(node.left);
 
        /* then print the data of node */
        System.out.print(node.key + " ");
 
        /* now recur on right child */
        printInorder(node.right);
    }

void printPreorder(Node node)
    {
        if (node == null)
            return;
 
        /* first print data of node */
        System.out.print(node.key + " ");
 
        /* then recur on left subtree */
        printPreorder(node.left);
 
        /* now recur on right subtree */
        printPreorder(node.right);
    }

void printPostorder(Node node)
    {
        if (node == null)
            return;
 
        // first recur on left subtree
        printPostorder(node.left);
 
        // then recur on right subtree
        printPostorder(node.right);
 
        // now deal with the node
        System.out.print(node.key + " ");
    }

BFS with Graph

https://www.geeksforgeeks.org/breadth-first-search-or-bfs-for-a-graph/


class Graph
{
    private int V;   // No. of vertices
    private LinkedList<Integer> adj[]; //Adjacency Lists
 
    // Constructor
    Graph(int v)
    {
        V = v;
        adj = new LinkedList[v];
        for (int i=0; i<v; ++i)
            adj[i] = new LinkedList();
    }
 
    // Function to add an edge into the graph
    void addEdge(int v,int w)
    {
        adj[v].add(w);
    }
 
    // prints BFS traversal from a given source s
    void BFS(int s)
    {
        // Mark all the vertices as not visited(By default
        // set as false)
        boolean visited[] = new boolean[V];
 
        // Create a queue for BFS
        LinkedList<Integer> queue = new LinkedList<Integer>();
 
        // Mark the current node as visited and enqueue it
        visited[s]=true;
        queue.add(s);
 
        while (queue.size() != 0)
        {
            // Dequeue a vertex from queue and print it
            s = queue.poll();
            System.out.print(s+" ");
 
            // Get all adjacent vertices of the dequeued vertex s
            // If a adjacent has not been visited, then mark it
            // visited and enqueue it
            Iterator<Integer> i = adj[s].listIterator();
            while (i.hasNext())
            {
                int n = i.next();
                if (!visited[n])
                {
                    visited[n] = true;
                    queue.add(n);
                }
            }
        }
    }

Height of tree


function maxDepth(node)
    {
        if (node == null)
            return 0;
        else
        {
            /* compute the depth of each subtree */
            let lDepth = maxDepth(node.left);
            let rDepth = maxDepth(node.right);
   
            /* use the larger one */
            if (lDepth > rDepth)
                return (lDepth + 1);
             else
                return (rDepth + 1);
        }
    }

you can use level ordering for finding the height, by counting the level (add null to queue at each level) and if we get null increment the count

https://makeinjava.com/find-height-binary-tree-bfs-level-order-traversal-example/

RestAPI maturity model :

https://restfulapi.net/richardson-maturity-model/

Spring Basics:

https://www.interviewbit.com/spring-interview-questions/

The Spring IoC container is at the core of the Spring Framework. The container will create the objects, wire them together, configure them, and manage their complete life cycle from creation till destruction. The Spring container uses dependency injection (DI) to manage the components that make up an application.

Spring provides following two types of containers.

BeanFactory container ApplicationContext container

https://howtodoinjava.com/spring-core/different-spring-ioc-containers/

Types of ApplicationContext container

FileSystemXmlApplicationContext ClassPathXmlApplicationContext WebXmlApplicationContext ConfigurableApplicationContext

Spring Life Cycle:

https://reflectoring.io/spring-bean-lifecycle/

types of spring scopes

Kafka: https://www.youtube.com/watch?v=dq-ZACSt_gA https://www.youtube.com/watch?v=udnX21__SuU

https://www.youtube.com/watch?v=kDx8hZhvCQ0

https://www.youtube.com/watch?v=Ai4n_NcKLZQ&list=RDCMUC8OU1Tc1kxiI37uXBAbTX7A&index=6

https://www.oreilly.com/library/view/kafka-the-definitive/9781491936153/ch04.html

Kafka cluster->kafka broker->topic->partition > offset

partition can have maximum one consumer to read but consumer can read from multiple partition.

cleanup.policy can be of 2 types delete compact

in delete once the message is consumed by consumer, to read it again u need to either reset the offset or rename the consumer group. sending the partition key is not mandatory compact message will be available for a key multiple value, till the retention is over. after the retention, for a key we will have single latest value in the topic and older value will be deleted. retension.ms is the attribute to set the time to leave. sending the partition key is mandatory for compact topic.

concurrency attribute is no of consumers for a given instance of application. so if we have 3 instance of application, we will have 3 * concurrency = no of consumers in consumer group

auto.offset.reset can have 2 values earliest latest

This means that when consumer gets restarted or new consumer group is assigned after rebalance, based on this attributes messages to process will be picked. new messages flowing will be the "latest" older messages will be the "earliest"

producer will always send the data to leader partition and data will be replicated to replica partitions. ack set to "ALL", "1" etc.. if its set to 1, it just waits for leader partition to acknowledgement if set to "ALL", it waits for acknowledgement from minimum inSync replicas (min.insync.replicas)

batch-size: 16384 buffer-memory: 33554432

batch-size: no of messages publish to Kafka topic buffer-memory: next set of messages to be sent is stored in memory.

enable-auto-commit : Kafka by default commits for every 5 sec if this is set to true

linger-ms: minimum wait time for producing the messages in batch to topic.

key-serializer and value-serializer : StringSerializer, avroSerializer based on how you are sending the message to topic.

if you need any compaction, go for avroSerializer

interceptor-classes : this will help to gather information and monitoring the matrix in control center.

yml configuration:


kafka:
    producer:
        topicname: topicname
        bootstrap-servers: localhost:9092
        acks: all
        batch-size: 16384
        buffer-memory: 33554432
        key-serializer:  org.apache.kafka.common.serialization.StringSerializer
        value-serializer: org.apache.kafka.common.serialization.StringSerializer
        linger-ms: 1
        retries: 0
        interceptor-classes: io.confluent.monitoring.clients.interceptor.MonitoringProducerInterceptor
        ssl-enabled: false
        ssl-security-protocol: SSL
        ssl-truststore-location: src/main/resources/truststore.jks
        ssl-truststore-password: ****
   

    consumer:
        bootstrap-servers: localhost:9092
        topic: topicname
        auto-commit-interval-ms: 500
        auto-offset-reset: earliest
        enable-auto-commit: true
        session-timeout-ms: 10000
        max-poll-records: 200
        max-poll-interval: 3000000
        group-id: groupid
        concurrency: 1
        ssl-enabled: false
        ssl-security-protocol: SSL
        ssl-truststore-location: src/main/resources/truststore.jks
        ssl-truststore-password: *****

producer config:



@Configuration
@Getter
@Setter
@ConfigurationProperties(prefix = "kafka.producer")
@Slf4j
public class KafkaProducerConfig {
  private List<String> bootstrapServers;
  private String acks;
  private String retries;
  private String batchSize;
  private String lingerMs;
  private String bufferMemory;
  private String keySerializer;
  private String valueSerializer;
  private String interceptorClasses;
  private boolean sslEnabled;
  private String sslSecurityProtocol;
  private String sslTruststoreLocation;
  private String sslTruststorePassword;
  private String topicName;


  @Bean(name = "producer")
  public KafkaProducer<String, String> kafkaProducer() {
    return new KafkaProducer<>(getDefaultProperties());
  }

  private Properties getDefaultProperties() {
    Properties props = new Properties();
    props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, String.join(",", bootstrapServers));
    props.put(ProducerConfig.ACKS_CONFIG, acks);
    props.put(ProducerConfig.RETRIES_CONFIG, retries);
    props.put(ProducerConfig.BATCH_SIZE_CONFIG, batchSize);
    props.put(ProducerConfig.LINGER_MS_CONFIG, lingerMs);
    props.put(ProducerConfig.BUFFER_MEMORY_CONFIG, bufferMemory);
    props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, keySerializer);
    props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, valueSerializer);
    props.put(ProducerConfig.INTERCEPTOR_CLASSES_CONFIG, interceptorClasses);
    setupSsl(props);
    return props;
  }


  private void setupSsl(final Properties props) {
    if (sslEnabled) {
      props.put(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, sslSecurityProtocol);
      props.put(SslConfigs.SSL_TRUSTSTORE_LOCATION_CONFIG, sslTruststoreLocation);
      props.put(SslConfigs.SSL_TRUSTSTORE_PASSWORD_CONFIG, sslTruststorePassword);
      props.put("ssl.endpoint.identification.algorithm", StringUtils.EMPTY);

      props.put("confluent.monitoring.interceptor.security.protocol", sslSecurityProtocol);
      props.put("confluent.monitoring.interceptor.ssl.truststore.location", sslTruststoreLocation);
      props.put("confluent.monitoring.interceptor.ssl.truststore.password", sslTruststorePassword);
    }
  }

consumer config:


@Configuration
@ConfigurationProperties(prefix = "kafka.consumer")
@Getter
@Setter
@EnableKafka
@Slf4j
public class KafkaConsumerConfig {

    private String bootstrapServers;
    private String autoOffsetReset;
    private int sessionTimeoutMs;
    private Boolean enableAutoCommit;
    private int autoCommitIntervalMs;
    private int maxPollRecords;
    private int maxPollInterval;
    private String groupId;
    private int concurrency;
    private String topic;
    private boolean sslEnabled;
    private String sslSecurityProtocol;
    private String sslTruststoreLocation;
    private String sslTruststorePassword;

    @Bean
    public KafkaListenerContainerFactory<ConcurrentMessageListenerContainer<String, String>> kafkaListenerContainerFactory() {
        ConcurrentKafkaListenerContainerFactory<String, String> factory =
                new ConcurrentKafkaListenerContainerFactory<>();

        factory.setConsumerFactory(consumerFactory());
        factory.getContainerProperties().setPollTimeout(maxPollInterval);
        factory.setConcurrency(concurrency);
        return factory;
    }

    @Bean
    public ConsumerFactory<String, String> consumerFactory() {
        Map<String, Object> props = new HashMap<>();

        props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
        props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
        props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
        props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, autoOffsetReset);
        props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, enableAutoCommit);
        props.put(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, autoCommitIntervalMs);
        props.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, sessionTimeoutMs);
        props.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, maxPollRecords);
        props.put(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG, maxPollInterval);
        props.put(ConsumerConfig.GROUP_ID_CONFIG, groupId);

        setupSsl(props);
        
		log.info("TopicName {}, concurrency {}, groupID {}", topic, concurrency, groupId);


		return new DefaultKafkaConsumerFactory<>(props);
	}

    private void setupSsl(final Map<String, Object> props) {
        if (sslEnabled) {
            props.put(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, sslSecurityProtocol);
            props.put(SslConfigs.SSL_TRUSTSTORE_LOCATION_CONFIG, sslTruststoreLocation);
            props.put(SslConfigs.SSL_TRUSTSTORE_PASSWORD_CONFIG, sslTruststorePassword);
            props.put("ssl.endpoint.identification.algorithm", StringUtils.EMPTY);

            props.put("confluent.monitoring.interceptor.security.protocol", sslSecurityProtocol);
            props.put("confluent.monitoring.interceptor.ssl.truststore.location", sslTruststoreLocation);
            props.put("confluent.monitoring.interceptor.ssl.truststore.password", sslTruststorePassword);
        }
    }

NOSQL:

CAP theorem:

types of NOSQL

document based databases: couchbase(AP) and mongo DB(CP) key value databases: DynamoDB graph based databases : NIO4J Column-oriented databases: cassandra(AP), Hbase(CP)

Graph DB: Use cases include fraud detection, social networks, and knowledge graphs.

Cassandra:

https://pramodshehan.medium.com/cassandra-architecture-fundamentals-cc617f12b957

No masters and slaves (Peer to peer). Ring type architecture Automatic data distribution across all nodes Replication of data across nodes Data kept in memory and written to disk in a lazy fashion Hash values of the keys are used to distribute data among nodes uses gossip protocol to communicate between the nodes.

When Cassandra client is hit a write and read request on the node in the cluster. That node is called as coordinator. coordinator node may be changed every time. Coordinator is selected by the cassandra driver based on the policy, you have set. Most common policies are DCAwareRoundRobinPolicy and TokenAwarePolicy

Cassandra organizes data into partitions. This is a common concept of distributed data systems. All the data is distributed into chucks called partition. Partitioning is very important for performance and scalability. In Cassandra, Partition is performed using hashing. In here Cassandra is using column called partitioning key.

Clustering columns order data within a partition. Each primary key column after the partition key is considered a clustering key.

The partition key is responsible for distributing data among nodes. In Cassandra, we can only access data from the partitioning key.

Cassandra Write Operation

Write in a commit log Write in a memtable Send acknowledgement to the client. After reaching configured limit, Flushing data from the memtable. Storing data on disk in SSTable. Data in the commitlog is purged after the flush.

What happened Cassandra machine crash before flushing data to the SSTable? All the data writes on memtable. We think machine is crashed. After that we fix the machine and start the machine again. All the memtable data are lost. To avoid loosing data, Cassandra is using commit logs. When something changes on memTable, Cassandra writes the changes to commit log to keep track changes of memtable.

Snitch: The job of a snitch is to determine which data centers and racks are to be written to and read from(relative host proximity/host is relatively nearer). Simple Snitch Property File Snitch Dynamic Snitch Rack Inferring Snitch

Replication strategy: Simple Strategy: This strategy is for using single data center and one rack.

Network Topology Strategy: When you are using multiple data centers , This is the replication strategy for replication. We can define how many replicas in each data centers.

Hinted Handoff: When a node becomes down or unresponsive, hinted handoff allows Cassandra to continue its write operation without any problems. when node is down, all the writes data which belongs to that down node key range are stored for a period of time.

The hint consists as below, target Id- downed node hint Id- time uuid message id- cassandra version data is stored as a blob. Hints are flushed to disk every 10 seconds. When gossip knows that down node is up and running again, all the remaining hint are written to the new node. After that hint file is deleted. There is an another configuration called “max_hint_window_in_ms” in cassandra.conf file

Tombstones(soft delete)

Cassandra is not deleted data from the disk immediately. If Cassandra do it, it takes lots of time. That is why Cassandra comes up with Tombstones. Tombstones are a mechanism which allows Cassandra to write fast. So Cassandra use marker as special value called tombstones to determine which data is deleted. Tombstones prevent deleted data from being returned during reads. Tombstone is generated by, DELETE statement TTL (time-to-live) INSERT or UPDATE with null values Update with collection column

tombstone_warn_threshold - Cassandra will display a warning, if the scanned tombstone count is exceeded tombstone_warn_threshold by a query. tombstone_failure_threshold - Cassandra abort the query, if the scanned tombstone count is exceed tombstone_failure_threshold value by a query.

There is a setting called Garbage Collection Grace Seconds(gc_grace_seconds). This is the amount of time that the server will wait to garbage-collect a tombstone. default value is 10 days(864000 seconds). Tombstones will be dropped during compaction after gc_grace_second has passed. Tombstones will not be removed until a compaction event even if gc_grace_seconds has elapsed.

Cassandra Consistency This is concept of CAP theorem. We can configure consistency for a session or per individual read and write operation. So we can improve consistency level using below strategies.

SSTables are immutable. Mutations, adding new data, updating data, deleting data are inserted into memtable. Always adding new record to the memtable when doing above mentioned operations. After that memtable is periodically flushed to the different SSTables. When we consider update operation in Cassandra, there may be old value and new value in different SSTables or same SSTable. In that case Cassandra is using timestamps to figure out which is the most recent value. In here Cassandra is using lots of disk space. In this case, we are trying to read some data from the Cassandra, query might need to read from several SSTables to get a result. So read operation may be slow. That is why Cassandra needs a operation called compaction. Compaction is doing read all the existing SSTables and merge all the rows with most recent information into the one SSTable. Basic idea of compaction is created new SSTable instead of existing SSTables.

As we discussed earlier, SSTables are immutable. If Cassandra need to update an existing row, it will add another new row in same SSTable or different SSTable. When doing compaction, all the tombstones are permanently removed.

Bloom Filter Bloom filter is a data structure. It is used to test whether an element is a member of a set. False positive matches are possible and false negative are not. This is extremely very fast. Cassandra is using bloom filter to check whether requested partition key is available in any of the SSTables without reading existing data of SSTables. By using blooom filter, we can avoid expensive I/O operations. There are corresponding bloom filers in memory per each SSTables.

Write consistency level

Any- a write must succeed on any available node.(Highest availability)
One- a write must succeed on any node responsible for that row.(either primary or replica)
Two- a write must succeed on two nodes.
Quorum- a write must succeed on a quorum of replica nodes.
quorum nodes = (replication factor/2) + 1
Local Quorum- a write must succeed on a quorum of replica nodes in the same data center as the coordinator nodes.
Each Quorum- a write must succeed on a quorum of replica nodes in all data centers.
All- a write must succeed on all replica nodes.(Lowest availability)

Read consistency level

One- reads from the closest node holding the data.
Quorum- return a result from quorum of servers with the most recent timestamp for the data.
Local Quorum- return a result from a quorum of servers with the most recent timestamp for the data in the same data center as the coordinator node.
Each Quorum- return a result from a quorum of servers with the most recent timestamp in all data centers.
All- return a result from all replica nodes.

OUT OF MEMORY: use heap dump, check for memory leaks, check tools like dynatrace etc..

ELK works:

apache lucene is full text matching.ELK created on top of it.fluent is installed on servers and configure to keeps polling the log file (it maintains the offset to store till where it is read) data is sent to ELK. each project it creates index which is used in kibana to see data, alerts etc.. logs are maintained 7 days

sql queries:

https://www.edureka.co/blog/interview-questions/sql-query-interview-questions

Concepts Discussion - Neethahiremath/Wiki GitHub Wiki

Cassandra:

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️