Joins - sufiyan63/SAP-Hana-Cloud GitHub Wiki

Joins


  • Referential Join

    • In Join node when you connect one table to other. Click on the link between the tables β‡’ general properties β‡’ Select Referential Join
    • A Referential Join in SAP HANA is a type of join used in calculation views when there is a logical relationship between two tables, typically where one table contains a foreign key referencing a primary key in another table. This type of join is optimized for performance because it assumes that the joined data is always consistent, meaning every foreign key in the referencing table corresponds to a valid primary key in the referenced table.
    • Default Behavior: If no fields from the right table (referenced table) are requested in the query, the join can be completely skipped, further improving performance.
    • Output Handling: If fields from the right table are required, the join behaves like an inner join.
    • Cardinality must be specified. If it is not, the Referential Join cannot be optimized.
    • A referential join can be implemented in a standard join node as well as a star join node.
    • Consider two tables:
    • Orders (OrderID, CustomerID, OrderDate)
    • Customers (CustomerID, CustomerName, Country)
    • The CustomerID in the Orders table is a foreign key referencing the primary key CustomerID in the Customers table.
    • SELECT O.OrderID, C.CustomerName, C.Country
      FROM Orders O
      REFERENTIAL JOIN Customers C
      ON O.CustomerID = C.CustomerID;
    • If the query requests only fields from the Orders table:
    • SELECT O.OrderID
      FROM Orders O
      REFERENTIAL JOIN Customers C
      ON O.CustomerID = C.CustomerID;
    • The referential join is skipped entirely, and only data from Orders is returned because fields from Customers are not needed.
    • Conditions for Referential Join Optimization

      • A join defined as a Referential Join between two tables or sources, A and B, is pruned (not executed) when all three following conditions are met:
      • No field is requested from B
      • Integrity is placed on A
      • The cardinality on the B side is ..1
      • When the cardinality on the B side is not ..1, the join will always be executed, even if no column from B is requested.
    • Integrity Constraint

      • This setting defines in which direction the referential integrity is guaranteed.
      • Left: Every entry in left table has at least one match in right table. N..1
      • Right: Every entry in right table has at least one match in left table. 1..N
      • Both: Every entry in both tables has at least one match in the other table. 1..1
  • Inner Join

    • It returns rows when there is at least one match on both sides of the join.
  • Left Outer Join

    • A Join returns all rows from the left table, even if there are no matches in the right table.
    • Try to use N..1 and 1..1 joint cardinality β‡’ Optimal Performance
    • Try to reduce number of joint fields
    • Avoid joining on calculated columns
    • Avoid type conversions at runtime
  • Right Outer Join

    • Try to use 1..N and 1..1 joint cardinality β‡’ Optimal Performance
    • Try to reduce number of joint fields
    • Avoid joining on calculated columns
    • Avoid type conversions at runtime
  • Spatial Join

    • Create joins to query data from data sources that have spatial data.
    • Point – Represents a specific location on Earth (latitude and longitude).
    • Example: POINT =>Bangalore location.
    • You can insert POLYGON – https://arthur-e.github.io/Wicket/sandbox-gmaps3.html
    • Example:
    • Table A contains customer locations as POINT data (latitude, longitude).
    • Table B contains city boundaries as POLYGON data.
    • A spatial join helps you find which customer belongs to which city by comparing the POINT to the POLYGON.
    • You use the regular join node in a calculation view to create spatial joins by joining two database tables on columns of spatial data types.
    • In any regular join node β‡’ select the join β‡’ You can see spatial properties below the general properties
    • In the Spatial Join section, define the spatial join properties.
    • Select predicate β‡’ equals
    • Select predicate β‡’ Relate

      • If you select as the predicate value, in the Intersection Matrix text field enter the required values.
      • Example:
      • Table A β‡’ Contains customer locations (as POINT).
      • Table B β‡’ Contains city boundaries (as POLYGON).
      • If you want to check whether a customer’s location is exactly inside the city boundary or touching the boundary, you can define a specific Intersection Matrix.
      • SELECT a.customer_id, b.city_name
        FROM customer_locations a
        JOIN city_boundaries b
        ON ST_Relate(a.location, b.boundary, 'T*F***FF*');
      • 'T*F***FF*' means:
      • T β‡’ Touching
      • F β‡’ False (no overlap)
      • star β‡’ Ignore
    • Select predicate β‡’ Within Distance

      • if you select as the predicate value, in the Distance field, select a value. You can provide the distance as a fixed value or use an input parameter to provide the distance value at runtime.
      • This predicate checks if two objects are within a certain distance of each other.
      • Table A β‡’ Contains customer locations (as POINT).
      • Table B β‡’ Contains store locations (as POINT).
      • If you want to find which customers are within 5 kilometers of a store:
      • SELECT a.customer_id, b.store_name
        FROM customer_locations a
        JOIN store_locations b
        ON ST_Within_Distance(a.location, b.location, 5000); β€” 5000 meters = 5 km
      • This will return all customers whose location is within 5 km of a store location
    • Select Execute Join if β‡’ Predicate Evaluates to true
    • Choose OK.
  • Full Outer Join

    • A Full Outer Join combines the behaviours of the Left and Right Outer Joins.
    • Rows from both tables that match on joined columns
    • Rows from the left table with no match in the right table
    • Rows from the right table with no match in the left table
    • A Full Outer Join is supported by calculation views only, in the standard Join and Star Join nodes.
    • However, in a Star Join node, a full outer join can only be defined on one DIMENSION calculation view. This view must appear last in the Star Join node.
  • Text Join

    • a Text Join behaves like a Left Outer Join, with cardinality 1:1 but, in addition, you specify a language column,
    • During join execution, the language of the end user querying the calculation view is used to retrieve descriptions from the text table (here, MAKT) in the corresponding language, based on the language column.
    • Then in semantics β‡’ Label Column Field
  • Temporal Join

    • Frequently, master data stores historical values. For example, in the employee table there might be two records for one employee. One record represents the employees job position in the past, and the second record represents the employees job position today. Each record contains dates to represent the validity of the record. So how does a calculation view know which record to request?
    • It is possible to add a ? condition to a join in order to find matching records from two tables based on a date. The records are only matched if a date column of one table is within a time interval defined by two columns of the other table.
    • In this example, the status of the customers can change over time, and this information is captured in a dedicated table (Customer Status). If you need to analyse the sales orders and include the status of each customer when they issued the order, you create an Inner Join on the ContactID column and add a temporal condition as follows:
    • Temporal column β‡’ Date (Sales Orders)
    • From Column β‡’ DateFrom (Customer Status)
    • To Column β‡’ DateTo (Customer Status)
    • Temporal Condition β‡’ Include Both β‡’ determining whether the boundaries are inclusive or exclusive.
    • Only supported in the Star Join of calculation views of the type cube with star join. The join type must be defined as Inner.
    • Only columns already mapped to the output of the Star Join node can be defined as Temporal Columns in the temporal properties of the join.
    • Temporal conditions can be defined on columns of the following data types:

      • timestamp
      • date
      • integer
  • Join Cardinality

    • The cardinality of a join defines how the data from two tables joined together are related, in terms of matching rows.
    • For example, if you join the Sales Order table (left table) with the Customer table (right table), you can define an n:1 cardinality. This cardinality means that several sales orders can be related to the same customer, but the opposite is not possible (you cannot have a sales orders that relates to several customers).
    • We recommend that you specify the cardinality only when you are sure of the content of the tables. If not, just leave the cardinality blank.
    • That is, 1..1, 1..n, n..1, and n..m.
    • Always set join cardinalities so the optimizer can decide if joins can be omitted at runtime based on the query that is being sent to the calculation view.
    • If tables are used as the data sources, and not calculation views, then you can use the Propose Cardinality option.
  • Non-Equi Join

    • The join condition is not represented by an = (equal) operator. but instead is based on other comparison operators such as Greater than.
    • Operator β‡’ The condition of Join is other than Equal eg. Less than, Greater than, Less than equal to, Not Equal
    • Defining a Non-Equi Join condition is possible for the following types of joins: Inner, Left Outer, Right Outer, Full Outer
    • ProductsToBeDelivered.id Equal SUBTASKS.id
    • ProductsToBeDelivered.dueOn Less Than SUBTASKS.plannedDate
  • Dynamic Join

    • In some scenarios, you want to allow data analysis at different levels of granularity with the same calculation view.
    • In this case, assuming that you model your calculation view with a Regular Join on Country and Region, you will get correct results if you analyze the data by country, but the results will be inconsistent if you analyze the data by region.
    • With this Join, only the join columns requested in the query are brought into context and play a part in the join execution. As a consequence, the same calculation view can be used for both purposes, that is, to analyze data by country or by region.
    • A Join can be defined only with multi-column joins.
    • With this Join, if none of the joined columns are requested by the client query, you get a query runtime error.
    • If we consider the behavior of the join from an aggregation perspective:

      • In a Regular (static) Join, the aggregation is executed after the join.
      • In a Dynamic Join, when a joined column is not requested by the client query, an aggregation is triggered to remove this column. Then, the join is executed based only on the requested columns.
  • Adding a Filter to a Join Node

    • When you define a filter (a filter expression) in a join node, you have a possibility to optimize the runtime execution of the calculation view. With filter mapping, when a filter is defined on a column from one source, you can ask the SQL optimizer to also apply this filter to a column of another other source.
    • From SAP HANA Cloud QRC 4/2023, it is also possible to define filter mapping in non-equi joins.
    • The improved performance results from the early execution of filters on both sources, before the join is executed.
    • A Direction must also be specified, so that the optimizer knows from which side (left or right) an existing filter must be mapped to the other side. The default direction is Left and Right, that is, bi-directional
    • Left β‡’ Right : If Projection_1.product = 'Apple', only records with Projection_2.product_name = 'Apple' will be considered in the join.
    • Right β‡’ Left: If Projection_2.product_name = 'Orange', only records with Projection_1.product = 'Orange' will be considered in the join.
    • Left and Right : If Projection_1.product = 'Apple' and Projection_2.product_name = 'Apple', the join will work. If one side doesn’t match, the row gets excluded.
  • Joining Multiple Data Source in a Join Node

    • Support in cube with star join
    • In the Join definition select the canvas
    • Select the central table
    • Select the Multi Join Order

      • Inside Out β‡’ The join that is near to the central table is executed first
      • Outside In β‡’ The join that is far from the central table is executed first
      • In the figure, Multi-Join Scenarios, the multi-join order property only applies in scenarios 1 and 2, and affects joins J1 and J2. The precedence between joins J1 and J3 (in scenario 2) or J1 and J2 (in scenario 3) is not controlled by the multi-join order setting.
⚠️ **GitHub.com Fallback** ⚠️