Ingesting data from a authenticated REST API using Job Secrets - vmware/versatile-data-kit GitHub Wiki

Overview

In this tutorial you are going to learn how to use Secrets in a data job.

Scenario

You like to read news daily and are a huge Taylor Swift fan. Let's combine these passions into a single data job, which searches for Taylor Swift news and stores them in a database.

For source of our data, you are going to use the free, key protected API of newsapi.org.

Who is this article for?

Users who wants to learn how to use Secrets in a data job. Before starting with this tutorial you should be familiar with basic concepts, explained in Hello World Data Job and Ingesting data from REST API into Database.

Estimated Time Commitment

If you have all the prerequisites in place, the completion of this tutorial should take 10 to 15 minutes.

Prerequisites

Since Job Secrets are stored securely, you'll need a pre-configured installation of the VDK Control Service and Hashicorp Vault:

  1. A VDK Control Service installation Install VDK Control Service with custom SDK and a local VDK SDK installation configured to use it
  2. A Configured VDK Control Service/Hashicorp vault integration Configuring Hashicorp Vault Instance for storing Secrets

Storing Secrets

In the first part of tutorial we are going to obtain the API key and store it as a Job Secret.

NOTE: in this tutorial you can use a pre-existing job, or create a new one by following the commands below:

Create a data job

Create Data Job

Create a data job, by executing the following command:

vdk create -n taylor-swift-news -t my-team

This will create a taylor-swift-news directory with some sample data jobs file inside. Delete the files so that only the empty directory remains.

Obtain an API Key

Go to newsapi.org and click the "Get API Key" button. Fill in the form and copy the API Key.

Store the API key in a Job Secret

You can use the "vdk secrets" command to store and retrieve secrets via the command line. If you are using the vdk cli on a private/secure console, you can directly set a secret via the following command

vdk secrets -n taylor-swift-news -t my-team --set "api_key" "<your API Key goes here>"

Alternatively you can use the "--set-prompt" option and then you'll get prompted to enter it and it won't be kept in your console's history.

vdk secrets -n taylor-swift-news -t my-team --set-prompt "api_key"

Using secrets in a data job

Now, let's create a data job step which uses the API key to retrieve the news you are interested in.

Edit The Data Job

Create a new python file, named 10_get_data.py in the data job directory. You should have the following file structure.

taylor-swift-news/
├── 10_get_data.py

Now that you've created the python file you need, let's fill in the code. This python data job does the following:

  1. Get the API key from the job secrets
  2. Prepare and execute the request for the newsapi.com
  3. Send the received data to the data base
10_get_data.py
import requests
from datetime import date, timedelta
from vdk.api.job_input import IJobInput


def run(job_input: IJobInput):
    # Get the API Key from the Job Secrets
    api_key = job_input.get_secret('api_key')
    # Get yesterday's date
    yesterday_date = date.today() - timedelta(days=1)

    # Get the data
    url = "https://newsapi.org/v2/everything"
    params = {
        "q": "Taylor Swift",
        "from": yesterday_date.strftime("%Y-%m-%d"),
        "sortBy": "popularity",
        "language": "en",
        "apiKey": api_key,
    }
    response = requests.get(url, params=params)
    response.raise_for_status()
    data = response.json()

    # Send the data to the DB
    payload = {'articles': data['articles']}
    job_input.send_object_for_ingestion(
        payload=payload,
        destination_table="taylor_swift_news"
    )

Conclusion

Congratulations! You've completed this tutorial and learned how to set and use secrets in a data job.

⚠️ **GitHub.com Fallback** ⚠️