Workflow - nsip/nias GitHub Wiki
Workflow
The NIAS workflow uses a chain of microservices, receiving content from a REST call, and passing it through different topics in Kafka. The chain of calls is summarised below.
Summary
-
HTTP POST xml to
/:topic/:stream/bulk
>> Kafka topic:sifxml.bulkingest
, message:TOPIC topic.stream
+ xml-fragment1, xml-fragment2, xml-fragment3 ...- HTTP POST xml to
/:topic/:stream
>> Kafka topic:sifxml.ingest
, message:TOPIC topic.stream
+ xml
- HTTP POST xml to
-
Kafka topic:
sifxml.bulkingest
, message:TOPIC topic.stream
+ xml-fragment1, xml-fragment2, xml-fragment3 ... >> Kafka topic:sifxml.validated
, message:TOPIC topic.stream 1:n:doc-id
+ xml-node1,TOPIC topic.stream 2:n:doc-id
+ xml-node2,TOPIC topic.stream 3:n:doc-id
+ xml-node3 ...- Kafka topic:
sifxml.ingest
, message:TOPIC topic.stream
+ xml ... >> Kafka topic:sifxml.validated
, message:TOPIC topic.stream 1:n:doc-id
+ xml-node1,TOPIC topic.stream 2:n:doc-id
+ xml-node2,TOPIC topic.stream 3:n:doc-id
+ xml-node3 ...
- Kafka topic:
-
Kafka topic:
sifxml.validated
, message m >> Kafka topic:sifxml.processed
, message m -
Kafka topic:
sifxml.processed
, message:TOPIC topic.stream m:n:doc-id
+ xml-node >> Kafka topic:topic.stream.none
, message: xml-node- Kafka topic:
sifxml.processed
, message:TOPIC topic.stream m:n:doc-id
+ xml-node >> Kafka topic:topic.stream.low
, message: redacted_extreme(xml-node) - Kafka topic:
sifxml.processed
, message:TOPIC topic.stream m:n:doc-id
+ xml-node >> Kafka topic:topic.stream.medium
, message: redacted_high(redacted_extreme(xml-node)) - Kafka topic:
sifxml.processed
, message:TOPIC topic.stream m:n:doc-id
+ xml-node >> Kafka topic:topic.stream.high
, message: redacted_medium(redacted_high(redacted_extreme(xml-node))) - Kafka topic:
sifxml.processed
, message:TOPIC topic.stream m:n:doc-id
+ xml-node >> Kafka topic:topic.stream.extreme
, message: redacted_low(redacted_medium(redacted_high(redacted_extreme(xml-node))))
- Kafka topic:
-
Kafka topic:
sifxml.processed
, message:TOPIC topic.stream m:n:doc-id
+ xml-node >> Kafka topic:sms.indexer
, message: tuple representing graph information of xml-node -
Kafka topic:
sms.indexer
, message: tuple representing graph information of xml-node >> Redis: sets representing the contribution of xml-node to the graph of nodes -
Kafka topic:
sifxml.processed
, message:TOPIC topic.stream m:n:doc-id
+ xml-node >> Moneta key: xml-node RefId, value: xml-node -
HTTP POST staffcsv to
/:naplan/:csv_staff/
-
Kafka topic:
naplan.csv_staff
, message: staffcsv -
Kafka topic:
sifxml.ingest
, message:TOPIC naplan.sifxmlout_staff
+ csv2sif(staffcsv) -
Kafka topic:
naplan.sifxmlout_staff.none
, message: csv2sif(staffcsv) -
Moneta key:
naplan.csv_staff
::LocalId(staffcsv), value: staffcsv
-
-
HTTP POST studentcsv to
/:naplan/:csv/
>> Kafka topic:naplan.csv
, message: studentcsv-
Kafka topic:
sifxml.ingest
, message:TOPIC naplan.sifxmlout
+ csv2sif(studentcsv) -
Kafka topic:
sifxml.processed
, message:TOPIC naplan.sifxmlout
+ csv2sif(studentcsv) + Platform Student Identifier -
Kafka topic:
naplan.sifxmlout.none
, message: csv2sif(studentcsv) + Platform Student Identifier -
Moneta key:
naplan.csv
::LocalId(studentcsv), value: studentcsv -
Kafka topic:
naplan.filereport
, message: report on contents of file
-
-
HTTP POST staffxml to
/:naplan/:sifxml_staff/
-
Kafka topic:
naplan.sifxml_staff.none
, message: staffxml -
Kafka topic:
naplan.csvstaff_out
, message: sif2csv(staffxml) -
Moneta key: RefId(staffxml), value: staffxml
-
-
HTTP POST studentxml to
/:naplan/:sifxml/
-
Kafka topic:
sifxml.processed
, message: studentxml + Platform Student Identifier -
Kafka topic:
naplan.sifxml.none
, message: studentxml + Platform Student Identifier -
Kafka topic:
naplan.csv_out
, message: sif2csv(studentxml) -
Moneta key: RefId(studentxml), value: studentxml
-
Kafka topic:
naplan.filereport
, message: report on contents of file
-
REST calls
-
ssf/ssf_server.rb
- Input: HTTP POST xml to
/:topic/:stream
- Format: xml : XML, assumed to be SIF collection
- Output: Kafka stream sifxml.ingest
- Content: xml with header
TOPIC: topic.stream
- Content: xml with header
- Constraints: XML less than 1 MB
- Input: HTTP POST xml to
-
ssf/ssf_server.rb
- Input: HTTP POST json to
/:topic/:stream
- Format: json : JSON
- Output1: Kafka stream
/:topic/:stream
- Content: json
- Output2: Kafka stream
json.storage
- Content: json with header
TOPIC: topic.stream
- Content: json with header
- Constraints: JSON less than 1 MB
- Input: HTTP POST json to
-
ssf/ssf_server.rb
- Input: HTTP POST csv to
/:topic/:stream
- Format: CSV
- Output1: Kafka stream
/csv/errors
- Content: csv errors
- Constraints: CSV is malformed
- Output1: Kafka stream
/csv/errors
- Content: csv errors
- Constraints: CSV violates CSV schema, for specific topics with defined schemas (
naplan.csv_staff
,naplan.csv
)
- Output3: Kafka stream
/:topic/:stream
- Content: csv parsed into JSON
- Constraints: CSV is valid
- Output4: Kafka stream
json.storage
- Content: csv parsed into JSON with header
TOPIC: topic.stream
- Constraints: CSV is valid
- Content: csv parsed into JSON with header
- Constraints: XML less than 1 MB
- Input: HTTP POST csv to
-
ssf/ssf_server.rb
- Input: HTTP POST xml to
/:topic/:stream/bulk
- Format: XML, assumed to be SIF collection
- Output: Kafka stream sifxml.bulkingest
- Content: xml with header
TOPIC: topic.stream
, split up into blocks of 950 KB, each but the last terminating in===snip nn===
- Content: xml with header
- Constraints: XML less than 500 MB
- Input: HTTP POST xml to
-
ssf/ssf_server.rb
- Input: HTTP GET from
/:topic/:stream
- Output: Stream of messages from Kafka stream
/:topic/:stream
- Content: JSON objects with keys
data
,key
,consumer_offset
,hwm
(= highwater mark),restart_from
(= next offset)
- Content: JSON objects with keys
- Input: HTTP GET from
-
ssf/ssf_server.rb
- Input: HTTP GET from
/csverrors
- Output: Stream of messages from Kafka streams
csv.errors
,sifxml.errors
,naplan.srm_errors
- *Content: Header:
m:n:i type
followed by the error message- type: type of error reported
- m: ordinal number of message within the error type
- n: total number of error messages within the error type
- i: record id, if the errors are being reported per record rather than for an entire file
- *Content: Header:
- Input: HTTP GET from
SSF microservices
-
cons-prod-sif-bulk-ingest-validate.rb
- Input: Kafka stream sifxml.bulkingest
- Format: sequence of XML fragments xml-fragment1, xml-fragment2, xml-fragment3, all but the last terminating in
===snip nn===
; whole message has headerTOPIC: topic.stream
- Format: sequence of XML fragments xml-fragment1, xml-fragment2, xml-fragment3, all but the last terminating in
- Output 1: Kafka stream sifxml.errors
- Content: All well-formedness or validation errors from parsing xml, the concatenation of xml-fragment1, xml-fragment2, xml-fragment3 ... (with
===snip nn===
stripped out) - Constraint: xml is malformed or invalid
- Content: All well-formedness or validation errors from parsing xml, the concatenation of xml-fragment1, xml-fragment2, xml-fragment3 ... (with
- Output 2: Kafka stream sifxml.validated
- Content: xml-node1, xml-node2, xml-node3, ... : one message for each child node of xml, each message with header
TOPIC: topic.stream m:n:doc-id
(m-th record out of n, with a hash used to identify the document that the messages came from) - Constraint: xml is valid
- Content: xml-node1, xml-node2, xml-node3, ... : one message for each child node of xml, each message with header
- Input: Kafka stream sifxml.bulkingest
-
cons-prod-sif-ingest-validate.rb
- Input: Kafka stream sifxml.ingest
- Format: SIF/XML collection xml
- Output 1: Kafka stream sifxml.errors
- Content: All well-formedness or validation errors from parsing xml. Add CSV original line if SIF document originates in CSV.
- Constraint: xml is malformed or invalid
- Output 2: Kafka stream sifxml.validated
- Content: xml-node1, xml-node2, xml-node3, ... : one message for each child node of xml, each message with header
TOPIC: topic.stream m:n:doc-id
(m-th record out of n, with a hash used to identify the payload that the messages came from) - Constraint: xml is valid
- Content: xml-node1, xml-node2, xml-node3, ... : one message for each child node of xml, each message with header
- Input: Kafka stream sifxml.ingest
-
cons-prod-sif-process.rb
- Input: Kafka stream sifxml.ingest
- Format: SIF/XML collection xml
- Output 1: Kafka stream sifxml.processed
- Content: xml subject to any needed processing
- Input: Kafka stream sifxml.ingest
-
cons-prod-privacyfilter.rb
- Input: Kafka stream sifxml.processed
- Format: SIF/XML object xml-node with header
TOPIC: topic.stream m:n:doc-id
- Format: SIF/XML object xml-node with header
- Output 1: Kafka stream
topic.stream
.none- Content: xml-node with all content matching xpaths in privacyfilters/
- Output 2: Kafka stream
topic.stream
.low- Content: xml-node with all content matching xpaths in
privacyfilters/extreme.xpath
redacted
- Content: xml-node with all content matching xpaths in
- Output 3: Kafka stream
topic.stream
.medium- Content: xml-node with all content matching xpaths in
privacyfilters/extreme.xpath
,privacyfilters/high.xpath
redacted
- Content: xml-node with all content matching xpaths in
- Output 4: Kafka stream
topic.stream
.high- Content: xml-node with all content matching xpaths in
privacyfilters/extreme.xpath
,privacyfilters/high.xpath
,privacyfilters/medium.xpath
redacted
- Content: xml-node with all content matching xpaths in
- Output 5: Kafka stream
topic.stream
.extreme- Content: xml-node with all content matching xpaths in
privacyfilters/extreme.xpath
,privacyfilters/high.xpath
,privacyfilters/medium.xpath
,privacyfilters/low.xpath
redacted
- Content: xml-node with all content matching xpaths in
- Input: Kafka stream sifxml.processed
SMS microservices
-
cons-prod-sif-parser.rb
- Input: Kafka stream sifxml.processed
- Format: SIF/XML object xml-node with header
TOPIC: topic.stream m:n:doc-id
- Format: SIF/XML object xml-node with header
- Output: Kafka stream sms.indexer
- Content: JSON tuple used to represent graphable information about xml-node:
type
=> xml-node nameid
=> xml-node RefIdlinks
=> other GUIDs in xml-nodelabel
=> human-readable label of xml-nodeotherids
=> any alternate identifiers in xml-node
- Content: JSON tuple used to represent graphable information about xml-node:
- Input: Kafka stream sifxml.processed
-
cons-sms-indexer.rb
- Input: Kafka stream sms.indexer
- Format: JSON tuple used to represent graphable information about xml-node
- Output: Redis database
- Content: Sets representing the contribution of xml-node to the graph of nodes
- Input: Kafka stream sms.indexer
-
cons-sms-storage.rb
- Input: Kafka stream sifxml.processed
- Format: SIF/XML object xml-node with header
TOPIC: topic.stream m:n:doc-id
- Format: SIF/XML object xml-node with header
- Output: Moneta Key/Value database
- Content: Key: xml-node RefId; Value: xml-node
- Input: Kafka stream sifxml.processed
-
cons-sms-json-storage.rb
- Input: Kafka stream json.storage
- Format: JSON object json with header
TOPIC: topic.stream m:n:doc-id
- Format: JSON object json with header
- Output: Moneta Key/Value database
- Content: Key: topic.stream::id (where id is extracted from record); Value: json
- Input: Kafka stream json.storage
NAPLAN microservices
-
cons-prod-csv2sif-staffpersonal-naplanreg-parser.rb
- Input: Kafka stream naplan.csv_staff
- Format: CSV following NAPLAN specification for staff records
- Output1: Kafka stream sifxml.ingest
- Content:
TOPIC: naplan.sifxmlout_staff
+CSV line {linenumber}
+ XML following NAPLAN specification for staff records - Constraint: CSV record has NAPLAN staff header LocalStaffId
- Note: this payload will be validated as all other SIF payloads. If valid, it will end up on
naplan.sifxmlout_staff.none
- Content:
- Output2: Kafka stream csv.errors
- Content: Alert that wrong record type uploaded
- Constraint: CSV record has NAPLAN student header LocaStaffId
- Input: Kafka stream naplan.csv_staff
-
cons-prod-csv2sif-studentpersonal-naplanreg-parser.rb
- Input: Kafka stream naplan.csv
- Format: CSV following NAPLAN specification for student records
- Output1: Kafka stream sifxml.ingest
- Content:
TOPIC: naplan.sifxmlout
+CSV line {linenumber}
+ XML following NAPLAN specification for student records - Constraint : JSON of CSV validates against
sms/services/naplan.student.json
- Note: this payload will be validated as all other SIF payloads. If valid, it will end up on
naplan.sifxmlout.none
- Content:
- Output2: Kafka stream csv.errors
- Content: Alert that wrong record type uploaded
- Constraint: CSV record has NAPLAN staff header LocalId
- Input: Kafka stream naplan.csv
-
cons-prod-sif2csv-studentpersonal-naplanreg-parser.rb
- Input: Kafka stream naplan.sifxml_staff.none
- Format: XML following NAPLAN specification for staff records
- Output: Kafka stream naplan.csvstaff_out
- Content: CSV following NAPLAN specification for staff records
- Input: Kafka stream naplan.sifxml_staff.none
-
cons-prod-sif2scv-studentpersonal-naplanreg-parser.rb
- Input: Kafka stream naplan.sifxml.none
- Format: XML following NAPLAN specification for student records
- Output: Kafka stream naplan.csv_out
- Content: CSV following NAPLAN specification for student records
- Input: Kafka stream naplan.sifxml.none
-
cons-prod-studentpersonal-naplanreg-unique-ids-storage.rb
- Input: Kafka stream sifxml.processed
- Format: XML following NAPLAN specification for student records
- Output1: Redis database
- Content: Key: school+localid::(ASL School Id + Local Id) ; Value: XML RefId
- Constraint: Key is unique in Redis
- Output2: Redis database
- Content: Key: school+name+dob::(ASL School Id + Given Name + Last Name + Date Of Birth) ; Value: XML RefId
- Constraint: Key is unique in Redis
- Output3: Kafka stream naplan.srm_errors
- Content: Error report on duplicate key
- Constraint: Key school+localid::(ASL School Id + Local Id) is not unique in Redis; Key: school+name+dob::(ASL School Id + Given Name + Last Name + Date Of Birth) is not unique in Redis, and xml has not been converted from CSV file
- Output3: Kafka stream naplan.srm_errors
- Content: Error report on duplicate key
- Constraint: Key school+localid::(ASL School Id + Local Id) is not unique in Redis; Key: school+name+dob::(ASL School Id + Given Name + Last Name + Date Of Birth) is not unique in Redis, and xml has been converted from CSV file, containing a comment with CSV line number
- Input: Kafka stream sifxml.processed
-
cons-prod-naplan-studentpersonal-process-sif.rb
- Input: Kafka stream sifxml.validated
- Format: XML following NAPLAN specification for student records
- Output: Kafka stream sifxml.processed
- Content: XML following NAPLAN specification for student records + Platform Student Identifier (if not already supplied)
- Constraint: XML source topic is
naplan.sifxml
ornaplan.sifxmlout
- Input: Kafka stream sifxml.validated
-
cons-prod-sif2csv-SRM-validate.rb
- Input: Kafka stream sifxml.processed
- Format: XML following NAPLAN specification for student records
- Output: Kafka stream naplan.srm_errors
- Content: Any instances where XML violates the Student Registration Management system's validation rules
- Constraint: XML source topic is
naplan.sifxml
ornaplan.sifxml_staff
- Input: Kafka stream sifxml.processed
-
cons-prod-file-report.rb
- Input: Kafka stream sifxml.processed
- Format: XML following NAPLAN specification for student records
- Output: Kafka stream naplan.filereport
- Content: Report of number of schools, students, and student per year level, for each new doc-id seen (representing a new payload loaded into NIAS)
- Constraint: XML source topic is
naplan.sifxml
ornaplan.sifxml_staff
- Input: Kafka stream sifxml.processed