zoom 20201201 - wfau/gaia-dmp GitHub Wiki
-
Zoom meeting 1st December
20201201 16:00 UTC
- Previous meeting zoom-20201127
In progress
- Wiki page to plan notebooks
- in progress wiki
- Deploy a larger cluster to work with the full size data set
- in progress 239
- external access http://zeppelin.aglais.uk:8080/
- User space ssh rsync access
- in progress 195
- in progress 226
- external access ssh://[email protected]
New issues
- Configuration for Ansible deployment
- new 240
- Experiment with scaling the Ansible deployment
- new 241
- User accounts in Drupal
- in progress 242
- Integration with IRIS IAM
- in progress 243
- Resource booking in Drupal
- in progress 244
- Automated testing for Kubernetes deployment
- in progress 245
- Investigate IRIS echo S3 service for user data
- new 246
New questions
User data space
- Simple implementation reserves 10G per user.
- Simple implementation for now - works for small number of users.
- Longer term - How do we recover unused space?
- Longer term - How do we handle dormant accounts?
- Longer term - Staging mechanism to push older data to an archive and recover unused space?
Spark version
- Current live system is
spark-2.7. - Zeppelin Hadoop-Yarn deploy is
spark-2.7. - Kubernetes deploy is
spark-3.x. - Nigel's Random Forrest example uses
spark-2.7?- Does it need
spark-2.7?
- Does it need
- AXS distribution is based on
spark-2.7.- Does it need
spark-2.7or can we create aspark-3.xversion?
- Does it need
Do we stick with spark-2.7 or try to upgrade to spark-3.x.
- Are there issues with Zeppelin Hadoop Yarn deployment?
- Are there issues with getting AXS to work with
spark-3.x? - The Kubernetes deployment probably won't work with
spark-2.x.
Questions about AXS
- Can we figure out how to apply AXS changes to a standard Spark distribution?
- What benefits does AXS give us?
- Can we create an example that demonstrates this?
AXS issues
- Differences between a standard distribution and the AXS distribution.
- issue 221
- Apply differences to add AXS to our deployment.
- issue 222
- Tests that demonstrate that AXS in installed and working
- issue 223
- Benchmark to compare performance of AXS augmented deployment
- issue 224
Actions
Create script that shows conversion from csv to parquet for Gaia, writing the results to the Ceph shares (stv) Test out multiple concurrent users running jobs via Zeppelin