Garbage Collector introduction - vinhtbkit/bkit-kb GitHub Wiki

Java Garbage Collector basics

Java system architecture

Overview

java-overview

Hotspot JVM

hotspot-jvm

GC introduction

What is Garbage Collector (GC) in Java?

Garbage collection (or GC) is an automated way to reclaim for reuse memory that is no longer in use.

Why do we need it

  • Frees developers from having to manually release memory.
  • Allocates objects on the managed heap efficiently.
  • Reclaims objects that are no longer being used, clears their memory, and keeps the memory available for future allocations.
  • Provides memory safety by making sure that an object cannot use for itself the memory allocated for another object.

Do other languages have GC?

  • .NET: Yes.
  • C: No. Developers have to manually allocate / deallocate memory. Forget to do so will result in memory leak.
  • Rust: No. It uses ownership mechanism.

Problem with GC

  • Affect application performance during execution
  • Can cause "stop the world" event

Stack and heap

heapnstack

Stack

  • It grows and shrinks as new methods are called and returned, respectively.
  • Variables inside the stack exist only as long as the method that created them is running.
  • It's automatically allocated and deallocated when the method finishes execution.
  • If this memory is full, Java throws java.lang.StackOverFlowError.
  • Access to this memory is fast when compared to heap memory.
  • This memory is threadsafe, as each thread operates in its own stack.

Heap

  • It's accessed via complex memory management techniques that include the
    • Young Generation,
    • Old or Tenured Generation,
    • and Permanent Generation (or Metagen later)
  • If heap space is full, Java throws java.lang.OutOfMemoryError.
  • Access to this memory is comparatively slower than stack memory
  • This memory, in contrast to stack, isn't automatically deallocated. It needs Garbage Collector to free up unused objects so as to keep the efficiency of the memory usage.
  • Unlike stack, a heap isn't threadsafe and needs to be guarded by properly synchronizing the code.

Allocation

  • Stack:
    • Function's local primitive variables.
    • Object reference in functions. `
    • Function calls
  • Heap
    • Static variables
    • Object
    • Classes footprints
    • And more...

However, this is not supposed to be always correct. It depends on the JVM implementation.

Challenge #1:

From the following program, which are stored on heap and which are stored on stack?

// Application.java

class Application {
  private int counter = 0;'

  public static void main(String args) {
    Application app = new Application();
    app.increase(3);
  }

  public void increase(int i) {
    int total = count + i;
    this.counter = total;
  }
}

Challenge #2

Where is a primitive array stored? E.g:

int[] a =  new int[100]{};

How GC works

Marking

  • Objects are eligible for GC when there are no more references to that object
    • Reference variable goes out of scope
    • Reference variable is set to null
  • Mark referenced objects, decide which objects are still "living". mark

Normal deletion

  • Remove unreferenced objects, provide free space. delete

Delete with compacting

  • Compact memory for easier allocation.
  • During GC, the memory address of objects would be adjusted accordingly. This would need to trigger a Stop the world event delete-compact

Challenge #3

public class GCDemo {
    public static ArrayList<Object> l = new ArrayList<>();    
    public void doIt() {
        HashMap<String, Object> m = new HashMap<>();     
        Object o1 = new Object(); // line n1
        Object o2 = new Object();
        m.put("o1", o1);
        o1 = o2; // line n2
        o1 = null; // line n3
        l.add(m);
        m = null; // line n4
        System.gc();// line n5
    }
}

GCDemo demo = new GCDemo();
demo.doIt();
demo = null;  // line n6

When does the object created at line n1 become eligible for garbage collection?

Generational Garbage Collection

  • Having to mark and compact all the objects in a JVM is inefficient.
  • More objects lead to longer GC time
  • Analysis of applications has shown that most objects are short lived.

jvm-generational

  • Young gen: for newly allocated objects
  • Tenured / old gen: for long lived objects
  • Metaspace: for loading classes
  1. New objects allocated to Eden
  2. When Eden is full, memory allocation failed
  3. Minor GC: Mark alive objects of Eden and move to S0
  4. Repeat step 1,2
  5. Minor GC: Mark alive objects of Eden and S0, and move to S1. Eden and S0 should be empty
  6. Repeat step 1,2
  7. Minor GC: Mark alive objects of Eden and S1, and move to S0. Eden and S1 should be empty
  8. Any objects reached MaxTenuringThreshold (default=15) during MinorGC will be "promoted" to Tenured (Old gen) zone.
  9. When Tenured is full, a MajorGC is executed.

Some other factors may cause Major GC:

  • Developer calls System.gc(), or Runtime.getRunTime().gc()
  • During minor GC, if the JVM is not able to reclaim enough memory from the eden or survivor spaces, then a major GC may be triggered.
  • If we set a MaxMetaspaceSize option for the JVM and there is not enough space to load new classes, then the JVM triggers a major GC.

Types of GC

GC Performance factors

  • Total heap size
  • Young to old ratio
  • Hardware

Performance goals

  • Pause time
  • Throughput

Serial GC

serial-gc

  • Use a single thread to perform GC
  • No threads communication overhead
  • Good when we have limited hardware and no Pause time requirements. Or small application, with small dataset.

Parallel GC

parallel-gc

  • Use multiple threads to perform GC
  • Better for multi processors/threads
  • Good when we need best performance and no Pause time requirements (batch applications, backend services...)

G1 GC

g1-gc

  • Work concurrently with application
  • Use lots of hardware resources
  • Best when we need quick response time application (web applications, time sensitive services...)

GC configurations

GC parameters

value descriptions example
-Xms Starting heap size -Xms1G, -Xms1m
-Xmx Maximum heap size -Xmx1G, -Xmx1m
-XX:+UseXXXGC Use specified GC -XX:+UseSerialGC
-XX:+HeapDumpOnOutOfMemoryError Create heap dump on OOM error
-XX:NewRatio Set Old:New ratio -XX:NewRatio=2 (1x old = 2x new)
-XX:SurvivorRatio Each survivor:eden ratio -XX:SurvivorRatio=8 (1x eden = 8x survivor)
-XX:NewSize Size of young generation area -XX:NewSize=10m
-XX:MaxNewSize Max size of young generation area -XX:MaxNewSize=10m

HotSpot JVM defaults:

Option Default Value
-XX:NewRatio 2
-XX:NewSize 1310 MB
-XX:MaxNewSize unlimited
-XX:SurvivorRatio 8

General memory allocation guidelines:

  • Try granting as much memory as possible to the virtual machine.
    • The default size is often too small.
    • Leave some spaces for other programs.
    • Should double check pause issues.
  • Setting -Xms and -Xmx to the same value
    • Increases predictability by removing the most important sizing decision from the virtual machine.
    • However, the virtual machine is then unable to compensate if you make a poor choice.
  • Increase the memory as you increase the number of processors, because allocation can be made parallel.

General tuning strategy:

  • First decide max amount of memories can be allocated. Then test performance with young generation sizes, to find best settings.
  • Keep old generation size large enough to hold all application data and provide some additional space (10% ~ 20%)
    • Avoid OOM error
    • Better avoid Major GC

Tools

  • VisualVM + Visual GC plugin
  • Java Flight Recorder

Coding best practices

Use primitives when possible

  • Don't
Integer sum = 0;
  • Do
int sum = 1;

Avoid creating unnecessary objects

  • Don't
int s = square(new Rectangle(10, 20));
  • Do
int s = square(10, 20);
  • Wombo combo: select (*) + findAll() + eager fetch

Use predefined instances or cached instances

  • Don't
public List<String> getItems() {
  if (someCondition) {
    return new ArrayList();
  }
}
  • Do
return Collections.emptyList();

  • Don't
Integer x = new Integer(i);
  • Do
Integer x = Integer.valueOf(i); // Some commonly used Integers are cached. 

Stream and Optional

  • Use "primitive" Stream and Optional whenever possible
  • Think of using for-each or using Stream

Use third party libs:

Summary

  • Stack & heap
  • Mark, Sweep. Stop the world event
  • Generational GC: Young gen, tenured
  • Minor / Major GC
  • GC types/ config

Common questions

  1. Can we (developers) trigger GC manually?
  2. Avoid garbage collection?

More readings

References

  1. https://www.oracle.com/webfolder/technetwork/tutorials/obe/java/gc01/index.html
  2. https://www.freecodecamp.org/news/garbage-collection-in-java-what-is-gc-and-how-it-works-in-the-jvm/
⚠️ **GitHub.com Fallback** ⚠️