[Internal] Sscheck for Flink - demiourgoi/sscheck GitHub Wiki
Project name
Probably sscheck-flink is a goof idea, not much importance anyway
Some tasks
- split the code into two projects: sscheck-core for the Spark independent code (e.g. Formula); and sscheck-spark for the spark dependent code. Keep a single repo
- define distributed specs2 matchers for Flink: Flink versions of
should existsRecord
,should forallRecord
and others likeshould be empty
, and set operations (check problems pointed out for Spark by Holden Karau, check if something similar happens for Flink too) - define generators for Flink
- define batch properties for Flink
- define streaming properties for Flink, with a flink version of DStreamTLProperty
Also, other pending tasks from first versions of sscheck:
- improve DSL for generators, for example BatchGen only has a method for always, and in general the generators follow more the style of ScalaCheck of just using functions, instead of a finer DSL style like in Specs2
Goals
Sscheck for Spark fell short in several aspects, let's try to avoid that with the Flink version
- Parallel test case execution: in part due to the constraint of 1 streaming context per JVM of SPARK-2243. This leads to test suites that take 1 hour to execute, specially due to the available hardware for our Jenkins, that leads to long batch intervals. Maybe Flink doesn't have this problem, try using several processes if needed to cope with this, see if sbt can help with this, a pure specs2 / code solution would be better
- Distributed generators: spark-testing-base already has distributed generators, using a capability of Spark, so we could do this for Spark sscheck already. We should do this for flink-check from the start.
- Distributed test suite for our Jenkins: we should setup a test environment flink-check on a YARN cluster from the start. This will also speed up releasing. I suspect this will be a prerequisite for the parallel test execution to be effective, due the available hardware for our Jenkins. Try with the Cubox as master and 2 ODROID-C2 as slaves, check if the cubox 32 bits architecture might be a problem here combined with the ODROID-C2 64 bits architecture
Note
- work on scala 2.11 from the start
- personally: migrate to IntelliJ