Taps - garyrussell/spring-xd GitHub Wiki
A Tap allows you to "listen" to data while it is processed in an existing stream and process the data in a separate stream. The original stream is unaffected by the tap and isn’t aware of its presence, similar to a phone wiretap. (WireTap is included in the standard catalog of EAI patterns and implemented in the Spring Integration EAI framework used by Spring XD).
Simply put, a Tap is a stream that uses a point in another stream as a source.
The following XD shell commands create a stream foo1 and a tap named foo1tap:
xd:> stream create --name foo1 --definition "time | log" --deploy xd:> stream create --name foo1tap --definition "tap:stream:foo1 > log" --deploy
Since a tap is a type of stream, use the stream create command to create the tap. The tap source is specified using the named channel syntax and always begins with tap:. In this case, we are tapping the stream named foo1 specified by :stream:foo1
|
Note
|
stream: is required in this case as it is possible to tap alternate XD targets such as jobs. This tap consumes data at the source of the target stream.
|
A tap can consume data from any point along the target stream’s processing pipeline. XD provides a few ways to tap a stream after a given processor has been applied:
If the module name is unique in the target stream, use tap:stream:<stream_name>.<module_name>
If you have a stream called mystream, defined as
http | filter --expression=payload.startsWith('A') | transform --expression=payload.toLowerCase() | file
Create a tap after the filter is applied using
tap:stream:mystream.filter > ....
You may also use labels to create an alias for a module and reference the label in the tap
If you have a stream called mystream, defined as
http | transform --expression=payload.toLowerCase() | flibble: transform --expression=payload.substring(3) | file
Create a tap after the second transformer is applied using
tap:stream:mystream.flibble > ....
A primary use case for a Tap is to perform realtime analytics at the same time as data is being ingested via its primary stream. For example, consider a Stream of data that is consuming Twitter search results and writing them to HDFS. A tap can be created before the data is written to HDFS, and the data piped from the tap to a counter that correspond to the number of times specific hashtags were mentioned in the tweets.
Creating a tap on a named channel, a stream whose source is a named channel, or a label is not yet supported. This is planned for a future release.
You’ll find specific examples of creating taps on existing streams in the Analytics section.
|
Note
|
In cases where a multiple modules with the same module name, a label must be specified on the module to be tapped. For example if you want to tap the 2nd transform:
http | transform --expression=payload.toLowerCase() | tapMe: transform --expression=payload.substring(3) | file
|