Create Dataset, upload data to it and use it in Workflow - Texera/texera GitHub Wiki
This tutorial goes through the process of preparing data by creating dataset and creating a workflow to analyze data resided in the dataset using Texera.
More specifically, we are going to create a dataset named Sales Dataset
which contains a file about the sales data of different types of merchandises for several countries. And the workflow will calculate the average sales per item type across different countries in Europe from the CountrySalesData.csv (Make sure the downloaded file is in .csv
file extension). The sales data has been downloaded from eforexcel.com and has 100 rows of data.
We will first be creating a dataset and uploading the sales data to it. Then we will be creating a workflow on Texera Web UI to
- read the data from the file;
- filter the relevant data based on keywords;
- perform an aggregation.
1. Upload data by creating a Dataset
- Go to the Dataset tab and click the
dataset creation
icon to start creating the datasaet - Name the dataset as
Sales Dataset
, drag and drop theCountrySalesData.csv
to the file uploading area - Click
Create
, the dataset we just created, along with the preview ofCountrySalesData.csv
is shown.
2. Read data in Workflow
- On the left panel, go to the
environment
tab and clickAdd Dataset
to add theSales Dataset
to current workflow.CountrySalesData.csv
will be available to be previewed and loaded to the workflow. ' - Drag and drop a
CSV File Scan
operator. On the right panel, input the file nameCountrySalesData.csv
and select the path from the drop down menu - Run the workflow, you should be able to see the loaded sales data.
3. Add operators to analyze data
-
Drag and drop a
Filter
operator to keep only the sales data inEurope
-
Drag and drop a
Aggregate
operator to get the average sold units group byItem Type