java spring batch - ghdrako/doc_snipets GitHub Wiki

Java EE use batch processing with the help of specification JSR352. Spring Batch frameworks is over this specification.

Functionalities:

  • Transaction management
  • Chunk based processing
  • Declarative I/O
  • Start/Stop/Restart
  • Retry/Skip
  • Web-based administration interface

Batch processing of data is a process where large volumes of data are collected first, then processed in a specific way, and then batch results are produced. A batch process usually is composed of tasks called jobs. Each Job describes a processing flow or steps. Each Step is composed of a reader, a processor, and a writer. In Spring Batch, the main task is called a job. Jobs can be scheduled in time or triggered by the event.

On the other hand, the Step, processing unit of the job, is one of the key points on the Spring Batch infrastructure. A job can contain one or more steps depending on the logic we define. We can define a Step in the Spring Batch by using a chunk or tasklet model.

In the chunking approach, there are 3 components in the initialization of a Step which are ordered below:

  • Item Reader: reading from the database, message queue, or whatever.
  • Item Processor: apply business logic to the data which comes from the item reader and process the data.
  • Item Writer: take the data and write to the database or message queue.

The data is stored in defined chunks and processed over these chunks. A chunk is a combination of a certain part of the data. You can specify the size of the data in a chunk with the chunk size parameter. “Chunks provide a simple solution to deal with paginated reads or situations where we don’t want to keep a significant amount of data in memory”.

Instead, in the tasklet model tasklets are performed as a single task within a step. Jobs in the tasklet model include reading, processing, and writing steps and execute each step one after the other. In contrast to the chunk model, it processes all data in a single run of steps. As a risk, if your data is very large, the resources can be exhausted. So if your data volume is large, it would be better to choose the chunk approach. The tasklet model Step creation, typically used for operations such as deleting a resource or executing a query.

In the Spring Batch framework, another important component is JobRepository. It stores job and step details in an in-memory database which is handled by the framework. On the other hand, this repository periodically stores job and step executions during item processing and calculates execution metrics to provide statistical data. So the management and process of job and step execution is provided by the Spring Batch framework.