BI, PowerBI, - GregLinthicum/From-Logistic-Regression-to-Long-short-term-memory-RNN GitHub Wiki
Assertions regarding substantial disparities in functionality between Power BI and Excel are greatly exaggerated.
Automation and Flexibility:
- Excel allows highly flexible report and dashboard creation. However, automating tasks can be challenging.
- Power BI offers less flexibility than Excel but provides better automation options. Users can schedule data refreshes, automate report generation, and create dynamic dashboards.
Collaboration and Sharing:
Excel files can be shared, but collaboration can be cumbersome due to version control issues. Power BI allows seamless collaboration. Reports and dashboards can be shared online, and users can collaborate in real time.\
Learning Curve:
Excel is well-established and widely known. Many users are familiar with its basic features. Power BI has a steeper learning curve, especially for advanced features. However, itโs worth investing time to harness its capabilities.
The main component contributing to the steeper learning curve when working with Power BI is the Data Analysis Expressions (DAX) language.
DAX would require some learning from PySpark Developer. For example:
Example 1. EARLIER and CALCULATE with FILTER
DAX Code:
Sales This Year = CALCULATE(SUM(Sales[Amount]), FILTER(Sales, Year(Sales[Date]) = EARLIER(Year(TODAY()))))
Excel Code:
Sales This Year = SUMIFS(Sales[Amount], Sales[Date], ">=" & DATE(YEAR(TODAY()), 1, 1), Sales[Date], "<=" & TODAY())
-
EARLIER function: Unlike PySpark where you can directly reference the current year within the expression, DAX uses the EARLIER function to navigate context transitions. EARLIER(Year(TODAY())) ensures the Year(Sales[Date]) comparison uses the year from the current context, even when the filter is evaluated.
-
CALCULATE with FILTER: This syntax might seem counter-intuitive for PySpark users coming from a dataframe manipulation background. CALCULATE applies the filter within the function, returning a new virtual table for calculating the sum of sales. In PySpark, you might achieve this with a filter transformation followed by an aggregation.
Example 2. VAR function and SELECTEDVALUE function
DAX Code:
Selected Product = VAR(CurrentProduct = SELECTEDVALUE(Products[Name])) RETURN IF(CurrentProduct = BLANK(), "All Products", CurrentProduct)
Excel Code:
Selected Product = IF(ISBLANK(A1), "All Products", A1)
-
VAR function: DAX allows variable declaration using VAR. This code stores the selected product name from the Products table's Name column in a variable named CurrentProduct. In PySpark, you might use a similar concept but likely achieve it within a User Defined Function (UDF).
-
SELECTEDVALUE function: This function retrieves the single value from a single column of the currently selected context in DAX. PySpark users might find this behavior specific to the slicer selections within the visualization tool and potentially require additional logic to handle selections in their Spark code.
Example #3. SAMEPERIODLASTYEAR function and SAMEPERIODLASTYEAR
DAX Code:
Previous Year Sales = VAR(PreviousDate = SAMEPERIODLASTYEAR(Sales[Date])) RETURN CALCULATE(SUM(Sales[Amount]), Sales[Date] = PreviousDate)
Excel Code:
Previous Year Sales = SUMIFS(Sales[Amount], Sales[Date], DATE(YEAR([@Date])-1, MONTH([@Date]), DAY([@Date])))
-
SAMEPERIODLASTYEAR function: DAX offers time intelligence functions like SAMEPERIODLASTYEAR, which returns the date for the corresponding period in the previous year based on the current date context. PySpark users might achieve a similar result by manipulating the date column but wouldn't have a direct function for this specific scenario.
-
RELATED function: DAX allows you to navigate relationships between tables. RELATED here ensures the PreviousDate calculated based on the current context in the Sales table is used when filtering for sales in the same period. PySpark users typically achieve relationship joins within their dataframes but might not encounter this context-specific behavior.