PDP 43A (Large Events Simplified) - derekm/pravega GitHub Wiki
Status: UNDER REVIEW.
NOTE: this is an alternative to PDP-43. PDP-43 is not discarded or otherwise abandoned - this page (PDP-43A) presents an alternate approach to getting this done.
Discussion and comments in Issue 6052.
Table of Contents:
- Motivation
- Requirements
- Other Considered Approaches
- Design
- Segment Store Changes
- Client and Wire Protocol Changes
Motivation
See PDP-43.
Requirements
See PDP-43.
Changes:
- Requirement 2 is relaxed (ingestion speed may not match that of classical appends, as long as it stays within a reasonable range).
- Requirement 3 is dropped.
Other Considered Approaches
See PDP-43.
Design
See PDP-43. The entire design stays the same, except for the changes below.
Changes:
- Transient Segments are created via a new wire command
CreateTransientSegment
. TheCreateTransientSegment
will name the parent segment. The Segment Store will generate a unique name and enter it into the metadata. This name is not returned to the Client. - Transient Segments will not allow any Extended Attributes.
- Unmerged Transient Segments will be cleaned up (eventually) after a container recovery.
- If a connection is dropped, the Append Processor will lose its reference to the Transient Segment.
- Transient Segments are cleaned up periodically (as PDP-43 explains), most likely triggered by
MetadataCleanupService
running in each container. - There is no "Merge With Header" operation like PDP-43 asks for. The Transient Segment will be merged as-is using the regular merge command. It is up to the Client/Append Processor to add appropriate Event envelopes for the large event.
- The section Writer Side is dropped.
- The section Reader Side is dropped.
- The section Event Serialization is dropped.
Segment Store Changes
- NameUtils.java
- Create API to properly name transient segments. To make this a lot easier, we should borrow from the scheme we use to name transactions, and we should make sure those names cannot possibly clash.
- New Segment Type:
Transient
- This must be added as a "Role" to
SegmentType
. - A Segment may not be a
TableSegment
andTransient
at the same time.
- This must be added as a "Role" to
MetadataStore.java
- If
SegmentType.isTransient()
, then verify the name of the segment matches the transient pattern (as defined in NameUtils.java) and DO NOT insert it into the Metadata Table. Instead, create aStreamSegmentMapOperation
(withPinned==true
) and queue it up in the DurableLog. This ensures the segment is pinned to memory (will never be evicted) and we won't ever bother the Metadata Table with it.
- If
SegmentMetadataUpdateTransaction.preProcessAttributes
- Disallow extended Attributes (
Attributes.isCoreAttribute(...) == false
) for transient segments.
- Disallow extended Attributes (
- AppendProcessor
- When receiving a
CreateTransientSegment
, generate a transient segment name using NameUtils, and create that segment on the store, making sure theSegmentType
is set totransient
(see above). - Add a
close()
method on AppendProcessor which is invoked when the connection closes and will in turn delete any open Transient segments. - TODO: we need to figure out how to handle the merging of this segment. Merge is currently handled by PravegaRequestProcessor, which has no connection to AppendProcessor. Since the name of the transient segment is not sent back to the client, the client may not know which segment to seal.
- When receiving a
- MetadataCleanupService
- Every time it runs (even in force-runs due to memory pressure), collect all Transient Segments which haven't been used in a while and are not merged. Delete those segments.
- The Metadata Cleanup service already has a way to track segments to evict. This may or may not be useful as-is; for Transient Segments we must ensure that a certain amount of time has elapsed since they were last used. We currently track if they are currently used by anyone and execute every 60 seconds - this may not be appropriate. To handle this, we may need another field on
SegmentMetadata
to keep track of last access time (in clock time). This time should be set to Long.MIN_VALUE after recoveries (to ensure speedy deletions of transient segments that haven't been merged).
Client and Wire Protocol Changes
Wire Protocol
- A new wire command for
CreateTransientSegment
needs to be added. Additionally there should be a corresponding replyTransientSegmentCreated
- Either the created would contrin the segment ID so that the subsequent SetupAppend could write data to it. Or:
- Alternatively the Client may generate a unique name for the segment and send that name along. This would solve the PravegaRequestProcessor.merge problem discussed above.
Client
- The Writer needs to detect if an incoming Event is greater than the allowed limit of 8MB. If so, it must follow the Large Event protocol
- CreateTransientSegment
- SetupAppend
- Append to transient segment
- Call flush on the target segment (to preserve order)
- Merge transient segment into the target segment
- The writer need to ensure order. When a large event is detected, it must, under lock:
- create a new transient segment
- write the large event to the transient segment
- Flush target segment
- flush the transient segment
- merge the transient segment
- Wait for ack. (If a SegmentIsSealed error is instead returned 1-6 are repeated).
- Reader
- The reader will need to allocate enough memory to load the entire large Event.