Modularising your Configuration Files - pathfinder-analytics-uk/dab_project GitHub Wiki

Links and Resources


Project Code

resources/demo_job.job.yml

It's better to modularise your code and include configurations for workflows in yml files stored in the resources folder.

You will need to update the relative paths to the referenced resources (such as notebooks).

resources:
  jobs:
    demo_job:
      name: demo_job
      tasks:
        - task_key: notebook_1_task
          notebook_task:
            notebook_path: ../notebooks/notebook_1.ipynb
            source: WORKSPACE
          job_cluster_key: job_cluster
      job_clusters:
        - job_cluster_key: job_cluster
          new_cluster:
            spark_version: 15.4.x-scala2.12
            spark_conf:
              spark.master: local[*, 4]
              spark.databricks.cluster.profile: singleNode
            azure_attributes:
              first_on_demand: 1
              availability: SPOT_WITH_FALLBACK_AZURE
              spot_bid_max_price: -1
            node_type_id: Standard_DS3_v2
            driver_node_type_id: Standard_DS3_v2
            custom_tags:
              ResourceClass: SingleNode
            spark_env_vars:
              PYSPARK_PYTHON: /databricks/python3/bin/python3
            enable_elastic_disk: true
            data_security_mode: SINGLE_USER
            runtime_engine: STANDARD
            num_workers: 0
      queue:
        enabled: true

Commands

databricks bundle deploy
databricks bundle deploy -t test