A. How can we optimize inventory levels to minimize costs while ensuring product availability?

Overview

This part of the project aims to enhance inventory management and operational efficiency for Shopee sellers by accurately forecasting product demand. In the fast-moving and competitive e-commerce industry, understanding future demand patterns is critical for optimizing stock levels, minimizing stockouts and overstock, and ultimately improving customer satisfaction and profitability.

In prior analysis, subgroup A explored how factors like discounts, pricing, and product ratings influence customer purchasing behavior. This exploration provided valuable insights into how promotions and perceived product value impact demand, revealing patterns that could guide pricing and discount strategies. Building on these findings, this project focuses on analyzing seasonality trends and leveraging them to accurately predict demand, enabling more precise and cost-effective inventory management.

To achieve these goals, the project will focus on developing robust demand forecasting models that consider seasonality, trends, and external factors affecting demand across various product categories. By leveraging historical sales data, customer purchasing patterns, and promotional influences, the project seeks to deliver actionable insights that help the business make data-driven inventory and supply chain decisions.

The technical implementation details, including code for each step of the analysis, can be found in the Inventory Optimisation Notebook in the GitHub repository.

Part 1. Exploring distribution of daily sales data

1. Correlation Matrix of Aggregated Daily Features

The heatmap shows the correlation coefficients between various daily features related to sales, discounts, and campaigns:

Strong Correlation with Discounts and Campaigns: Variables like avg_additional_discount, products_discounted_percentage, campaign_percentage, and avg_discount_percentage show high positive correlations with each other (values around 0.93 to 1.00). This indicates that higher discounts and campaign presence tend to co-occur, likely as part of sales strategies.
Daily Sales Correlation: daily_sales shows moderate positive correlations with discount-related features, particularly avg_additional_discount (0.73) and campaign_percentage (0.73), suggesting that higher discounts or campaigns are associated with increased daily sales. However, avg_checkout_price has a low to negative correlation with other discount and sales features, hinting that higher prices may not align with high discount campaigns.

This matrix suggests that campaigns and discount-driven strategies significantly influence daily sales, while higher checkout prices might play a less significant or even inversely related role.

2. Daily Sales and Growth by Day of the Week

This line graph depicts total daily sales and their growth rates across different days of the week:

Peak on Fridays: There’s a substantial increase in sales on Fridays, with a growth rate of 58.59%, indicating a possible preference for end-of-week shopping or promotional events.
Weekend and Weekday Trends: Sales remain relatively high on Sundays, suggesting that both Fridays and Sundays are optimal for customer engagement. Midweek days (Monday to Wednesday) show lower sales, with minor fluctuations.
Saturday Dip: Sales dip slightly on Saturday, possibly due to shopping patterns shifting towards Sundays.

Retailers might benefit from targeting campaigns or higher discounts towards the end of the week to leverage these peak sales periods.

3. Daily Sales and Growth by Week of the Month

This graph illustrates sales patterns across the weeks of each month:

End-of-Month Spike: The highest growth in sales is seen during the fourth week of each month (38.14% growth). This spike could align with payday effects or end-of-month promotional campaigns, as many consumers may have more disposable income at this time.
Mid-Month Decline: Weeks 2 and 3 show a decline in sales, potentially indicating lower discretionary spending mid-month.

The observed spike in the fourth week suggests that focusing campaigns around this period could align well with consumer spending patterns, particularly if it coincides with common paydays.

4. Daily Sales and Growth by Month

This plot shows sales trends and growth rates over the months:

Seasonal Trends: Sales peak in March (12.99% growth) and again in August (16.32%), possibly due to seasonality or significant promotional events. Retailers could consider timing campaigns around these periods.
June and July Fluctuations: June sees modest growth, followed by a sharp decline in July (-23.19%). The high growth in June may be linked to mid-year events, while the July dip could reflect seasonal trends.
Consistent Sales in Late Months: October and December show modest growth rates, while November sees a slight decline (-11.0%), suggesting stable yet moderate holiday season sales.

This monthly analysis suggests potential for increased promotional activities in March and August, with considerations for maintaining engagement during late-year months to counter the November dip.

newplot newplot1 newplot2 newplot3

Part 2. Choosing a sub category to explore

This code identifies the most common sub-categories in merged_df and extracts the top three sub-categories for further analysis.

Count Sub-Categories:
- merged_df['sub_category'].value_counts() calculates the frequency of each unique value in the sub_category column, returning a sorted series, top_categories, where each sub-category is listed along with its count.
- Output Example: The printed output shows Others as the most common sub-category, followed by other specific sub-categories like Android Tablets and Occupational Attire.
Selecting Top Three Sub-Categories:
- top_three = top_categories.index[1:4] selects the top three most common sub-categories, excluding the first one (Others) which serves as a catch-all category.
- top_three is an index object containing the names of the three most common sub-categories: ['Android Tablets', 'Occupational Attire', 'Business & Casual Shirts'].

This approach is useful for identifying high-frequency sub-categories that can be targeted for specialized analysis or reporting.

Part 3. Exploring seasonal decomposition of chosen sub category

Observed

In both the "Occupational Attire" and the "Entire Dataset" decompositions, the Observed component shows fluctuations over time, reflecting the actual daily sales values. For the "Occupational Attire," sales show frequent smaller peaks, with occasional sharp increases, indicating that this category might experience consistent but moderate demand spikes. In the "Entire Dataset," the peaks are much larger, suggesting higher sales volumes across all categories, with occasional significant spikes. This aligns with the expectation that a broader dataset captures more variability and volume in sales.

Trend

The Trend component for "Occupational Attire" shows a gradual increase followed by a more stable or slightly declining pattern over time, suggesting a rise in interest or demand earlier in the year, which stabilizes or slightly decreases later. For the "Entire Dataset," the trend exhibits notable jumps and drops, likely reflecting shifts in demand across multiple categories, possibly due to macro factors, seasonal sales events, or campaign periods.

The contrast between the two trends indicates that while "Occupational Attire" may have a stable, specialized market, the overall demand in the entire dataset is more affected by external factors, which lead to larger, more abrupt changes.

Seasonal

The Seasonal component in both decompositions shows clear periodicity, repeating regularly over a weekly cycle, as expected. This pattern highlights that both "Occupational Attire" and the broader dataset experience weekly demand cycles. This weekly seasonality might correlate with consumer behavior patterns, where demand could be higher during weekends or specific days due to factors like leisure shopping, paydays, or special events.

Residual

The Residual component captures the irregularities and noise in the data. For "Occupational Attire," the residuals are relatively small but show sporadic spikes, indicating occasional deviations from expected values, likely driven by unexpected increases in demand. In the "Entire Dataset," the residuals are more pronounced, with noticeable spikes, especially around May and September, likely corresponding to outlier events or large sales campaigns affecting various categories simultaneously.

Summary

Observed Patterns: "Occupational Attire" shows consistent demand with occasional spikes, while the entire dataset has larger and more frequent peaks.
Trend: Occupational Attire has a stable trend, while the entire dataset shows marked shifts, potentially reflecting broader external influences.
Seasonality: Both decompositions show a weekly seasonality, aligning with predictable consumer behavior patterns.
Residuals: Both have residual spikes, but they are more significant in the entire dataset, hinting at larger external impacts on the overall sales data.

This decomposition helps in understanding the distinct demand behavior for "Occupational Attire" versus the entire dataset, providing insights for tailoring inventory and marketing strategies accordingly.

newplot4 newplot5

ADF Test Results Summary

The Augmented Dickey-Fuller (ADF) test results for Daily Sales in Occupational Attire and Daily Sales in the Entire Dataset both indicate that the time series data is stationary.

ADF Test for Daily Sales in Occupational Attire

ADF Statistic: -12.11
p-value: 0.00
Critical Values:
- 1%: -3.449
- 5%: -2.870
- 10%: -2.571
Conclusion: Since the ADF statistic (-12.11) is lower than all critical values and the p-value (0.00) is below the 0.05 threshold, we conclude that the series is stationary. This suggests that daily sales in the Occupational Attire sub-category do not have a unit root, making it suitable for time series modeling without further differencing.

ADF Test for Daily Sales in the Entire Dataset

ADF Statistic: -4.29
p-value: 0.00
Critical Values:
- 1%: -3.449
- 5%: -2.870
- 10%: -2.571
Conclusion: Similarly, the entire dataset’s ADF statistic (-4.29) is lower than all critical values, and the p-value (0.00) is also below 0.05, indicating that the series is stationary. This means the overall sales data does not exhibit a unit root, supporting its stability for time series analysis.

Summary

Both datasets are stationary, implying they are appropriate for time series forecasting without needing additional transformations like differencing. The stationarity allows for reliable analysis of seasonal trends and patterns within each dataset.

Part 4. General demand forecasting using Prophet

This chart compares Prophet's forecasted daily sales for August with the actual observed sales. Key performance metrics are displayed: Root Mean Squared Error (RMSE) of 66.14 and an R-squared value of 0.78. These metrics provide insight into the accuracy and reliability of the model's predictions.

Key Observations

Accuracy and Fit:
- The R-squared value of 0.78 indicates a reasonably strong fit, with the model explaining 78% of the variance in the actual data. This suggests that the model is effective but has room for improvement, particularly in capturing certain sales peaks.
- An RMSE of 66.14 points to the average error between the forecasted and actual values. Lowering this error could help improve the model's precision, especially during high-variance periods.
Prediction vs. Actual Peaks:
- Prophet captures the general sales trend but underestimates some peaks, particularly around August 8-10 and August 13-14. This could be due to unexpected demand spikes that the model was unable to anticipate. Possible factors include unmodeled external events or promotional activities.
- The confidence interval (shaded region) provides a margin of error for the forecast, and actual data points often fall within this range, except during peak discrepancies. The widening of the interval during these peak periods suggests increased uncertainty.
Stability and Consistency:
- From August 18 onward, the model shows a relatively stable prediction pattern, closely matching the actual sales, except for minor deviations.
- This consistency highlights that Prophet performs well during regular sales periods but struggles to capture sudden changes, especially when demand fluctuates significantly.
Areas for Improvement:
- Incorporate additional features: Including external variables like promotional events or seasonal sales dates might improve Prophet's ability to anticipate peaks more accurately.
- Adjusting seasonal components: Fine-tuning Prophet's seasonal parameters could help the model align better with specific demand patterns that are periodic but less consistent.

Summary

The Prophet model provides a reasonable forecast for daily sales in August, with an R-squared of 0.78 showing good fit and an RMSE of 66.14 indicating a moderate prediction error. While the model captures the general trend well, it could be further refined to better handle demand spikes. Incorporating additional external variables or fine-tuning seasonal parameters may enhance its predictive capability.

newplot6

Part 5. General demand forecast seasonal decomposition

Analysis of Seasonal Decomposition Comparison: Actual vs Prophet Predicted Components

This chart compares the seasonal decomposition of the actual and Prophet-predicted components (Trend, Seasonal, and Residual) for the daily sales data. The performance metrics — RMSE and R-squared — are provided for each component, indicating the accuracy of the Prophet model in capturing each component's behavior.

Key Observations

Trend Component:
- RMSE: 39.65, R²: 0.60
- The Prophet model captures the general trend of the data, but there are noticeable deviations, particularly when the actual trend flattens out. The R² of 0.60 suggests a moderate fit, with the model explaining 60% of the trend variation in the actual data.
- Improvement Potential: The model could be further improved in its trend forecasting, particularly during periods of rapid change or plateaus in demand.
Seasonal Component:
- RMSE: 39.07, R²: 0.83
- The seasonal component shows a strong fit, with an R² of 0.83, indicating that the Prophet model captures the recurring seasonal patterns fairly well. The close alignment between the actual and predicted lines demonstrates the model’s strength in identifying periodic behavior.
- Accuracy: The low RMSE of 39.07 highlights that the model closely matches the seasonal variation, although there are minor discrepancies, especially during peak seasonal fluctuations.
Residual Component:
- RMSE: 46.63, R²: 0.63
- The residual component has an R² of 0.63, indicating that Prophet's predictions explain 63% of the residual variation in the actual data. While the residuals do not perfectly align, the model captures some of the underlying noise or irregularities.
- Challenges: The higher RMSE (46.63) here suggests that Prophet has more difficulty modeling the unpredictable fluctuations in daily sales, as residuals capture any variance not explained by the trend or seasonality.

Summary

The Prophet model performs well on the seasonal component, capturing most of the cyclical behavior, as indicated by a high R² of 0.83. However, the trend and residual components show moderate alignment, with some discrepancies. This analysis suggests that while Prophet is effective in modeling regular, predictable patterns, it may require further adjustments or additional external factors to better capture trend shifts and unpredictable sales fluctuations.

Part 6. Sub category demand forecast using Prophet

newplot8

Prophet August Forecast vs. Actual for Occupational Attire

Plot Overview: The chart compares the Prophet model's forecasted daily sales for August (in red dashed lines) with actual sales data (in blue solid lines) for the "Occupational Attire" category. The shaded area around the Prophet forecast line represents the model's confidence interval.
RMSE & R-squared:
- The RMSE (Root Mean Square Error) is low at 1.64, indicating a relatively small average prediction error for daily sales.
- The R-squared value is 0.34, suggesting a moderate level of alignment between the forecasted and actual values. This value reflects some variability in actual sales that the Prophet model couldn’t fully capture.
Analysis of Model Performance:
- The actual sales display variability with several peaks and dips throughout the month, which the Prophet forecast line follows to some extent but does not capture all fluctuations accurately.
- There is a visible deviation between the forecasted and actual values during certain peaks and troughs, indicating that the model could benefit from tuning or additional variables to capture such sharp changes in demand.
Insights:
- The lower R-squared score suggests that while the model's general trend may be accurate, the specifics of day-to-day sales variations might require additional predictors (e.g., specific seasonal events, promotions) to improve accuracy.
- The shaded confidence interval around the forecast line reflects the model’s uncertainty in predictions. For many dates, actual values fall within this interval, indicating a reasonable level of confidence, albeit with room for improvement in precision.

Seasonal Decomposition Components

This breakdown from Prophet shows how different seasonal patterns affect the "Occupational Attire" sales.

Trend Component:
- The trend displays a consistent downward slope, indicating a gradual decline in daily sales throughout August. This might suggest reduced interest in this category or seasonal changes affecting sales.
Weekly Component:
- Sales are lowest on Mondays and start increasing mid-week, peaking towards Friday and Saturday. This pattern suggests that customers are more inclined to purchase closer to the weekend, possibly due to payday schedules or personal shopping habits.
Yearly Component:
- There are noticeable peaks around major shopping events, showing seasonality that could align with broader shopping behaviors across the year.
Daily Component:
- The daily pattern shows low activity during early hours, with an increase closer to midnight, which may indicate late-night shopping behavior.
Extra Regressors Component:
- This component represents external factors that may have been added to the model to account for spikes or unusual patterns in data. Peaks in the extra regressors component could reflect one-time events or promotions.

These plots suggest that sales are affected by both daily and weekly cycles, with external events also contributing to occasional spikes.

Part 7. Sub category demand forecast seasonal decomposition

Seasonal Decomposition Comparison for Occupational Attire in August

This plot displays a decomposition of the daily sales for "Occupational Attire" in August, comparing the Prophet model’s predicted components (dashed red lines) with the actual components (solid blue lines). The decomposition is divided into three parts: trend, seasonality, and residuals.

Trend Component

RMSE: 0.79, R²: 0.33
- The trend component shows a gradual increase and then a slight decrease towards the end of August.
- The Prophet model’s predicted trend closely follows the actual trend initially but diverges slightly as the month progresses, with the actual trend showing more variability than the forecast.
- The relatively low R² value (0.33) indicates that the Prophet model struggled to capture the nuanced trend shifts in the actual data.

Seasonal Component

RMSE: 0.89, R²: 0.29
- The seasonal component displays consistent oscillations, indicating a repeating daily or weekly pattern in sales.
- The Prophet model’s seasonal predictions align with the overall shape of the actual seasonal component but with some misalignment in amplitude and timing.
- The low R² value (0.29) suggests that while the model captured the general seasonal pattern, it failed to fully match the actual amplitude and frequency.

Residuals Component

RMSE: 1.17, R²: 0.39
- The residuals show random fluctuations around zero, representing the noise or unexplained variance in the model.
- The Prophet model’s residuals show a similar pattern but don’t fully capture the peaks and troughs of the actual residuals.
- The R² value of 0.39 for the residual component implies that the model could not accurately capture a significant portion of the unexplained variance.

Overall Analysis

The Prophet model’s forecast aligns moderately well with the actual data, particularly in terms of general patterns, but lacks precision in capturing the full extent of the actual trend, seasonal variations, and residual fluctuations.
The relatively low R² values across all components indicate that the model has room for improvement in accurately modeling the complexity of sales in the Occupational Attire category, particularly in seasonal and residual variations.
Recommendation: Incorporating additional features (e.g., external events or promotions) or further tuning the Prophet model might improve its predictive performance, especially for capturing seasonal and residual variations in the data.

Part 8. Forecasting costs using demand predictions

Forecasted Cost for Occupational Attire in August

This plot represents the forecasted cost associated with the "Occupational Attire" category over August, focusing on the average cost fluctuations throughout the month.

Analysis

Average Cost (Dashed Line): The dashed blue line illustrates the predicted average cost of maintaining inventory levels or meeting demand on a daily basis. This cost fluctuates throughout August, with higher costs observed around the middle of the month.
Shaded Area (Uncertainty): The shaded region around the dashed line represents the forecasted uncertainty in the cost predictions. The range is broader during periods with higher average costs, suggesting greater variability or uncertainty in maintaining the inventory or demand at those times.
Cost Peaks and Valleys: Notably, there are cost peaks observed around August 10 and August 25. These spikes could be indicative of anticipated high demand periods, leading to increased cost forecasts, possibly due to restocking or increased holding costs.

Overall Insights

The cost forecast for "Occupational Attire" shows notable fluctuations throughout August, with distinct peaks likely due to periodic high demand or restocking needs.
The presence of a wider uncertainty band during cost peaks suggests variability in forecasted costs, which could warrant closer management during these periods to optimize expenses.
Recommendation: Monitoring inventory closely and adjusting restock or holding strategies during anticipated demand surges (e.g., around mid-August) could help in better controlling costs and reducing unnecessary expenditures.

newplot10

Part 9. Forecasting safety stock using demand predictions

Analysis of EOQ, Prophet Safety Stock, and Actual Demand for Occupational Attire in August

Graph Overview:

EOQ Theoretical Optimal Inventory Level: Represented by the purple dotted line at around 80 units, this line indicates the optimal order quantity to minimize costs associated with ordering and holding inventory.
Prophet Safety Stock: Shown by the blue dashed line, it fluctuates based on forecasted demand to provide a buffer against demand variability.
Prophet Demand Prediction: The red dashed line indicates the predicted daily demand levels according to Prophet.
Actual Demand: Displayed in the orange solid line, this line shows the real demand for each day in August.

Key Observations:

EOQ Level:
- The EOQ level of ~80 units is significantly higher than both the forecasted and actual daily demand. This high level suggests that for the Occupational Attire category, EOQ might not align well with the actual demand trends, as it could result in excess stock if followed rigidly.
Prophet Safety Stock:
- The safety stock level, which is adaptive to forecasted demand, ranges between 5 to 15 units across the month. This safety stock fluctuates in response to variations in forecasted demand, providing a buffer that adjusts to predicted demand increases or decreases.
- Compared to EOQ, Prophet’s safety stock approach appears more tailored, as it remains within a range that better aligns with actual demand fluctuations.
Prophet Demand Prediction vs. Actual Demand:
- The red dashed line (forecasted demand) closely follows the pattern of actual demand (orange line), though some discrepancies are evident, particularly around peak and low demand days.
- Despite minor inaccuracies, Prophet’s forecast generally captures the overall trend and variation of demand throughout August, showing reasonable alignment with actual demand.
Demand Peaks and Safety Stock:
- Prophet’s safety stock adapts well around demand peaks (e.g., Aug 10-12), helping to provide additional inventory buffer during anticipated demand surges.
- On days with lower demand (e.g., mid-August), the safety stock decreases, avoiding unnecessary excess stockholding costs.

Insights:

EOQ vs. Prophet Safety Stock:
- For products with highly variable daily demand like Occupational Attire, Prophet’s dynamic safety stock may offer a more effective and cost-efficient inventory strategy compared to the static EOQ approach.
- EOQ might lead to overstocking in this context, as its level is set without considering day-to-day demand fluctuations.
Forecasting Reliability:
- Prophet’s forecast shows a moderate level of accuracy for this category, making it useful for setting adaptive safety stock levels. However, improvements could be made to reduce the forecast error further, particularly for peak demand days.

Conclusion:

In summary, while the EOQ provides a static optimal order level, Prophet’s adaptive safety stock offers a more responsive approach that aligns closely with demand fluctuations. For categories like Occupational Attire, where demand varies significantly, relying on Prophet’s forecasted safety stock may help achieve a balance between meeting demand and minimizing holding costs.

newplot11

Part 10. Exploring cumulative costs vs revenue

Analysis of Cumulative Revenue and Costs for Occupational Attire in August

This graph provides an overview of the cumulative revenue and cumulative costs associated with maintaining inventory for the Occupational Attire category in August. The chart includes the following elements:

Actual Cumulative Revenue (green line): This line represents the revenue generated from actual sales over the month. It remains slightly above the Forecasted Cumulative Revenue (blue line) in the latter half of August, suggesting that actual demand was slightly higher than the forecast predicted.
Forecasted Cumulative Revenue (blue line): This line reflects the expected revenue based on the Prophet model’s demand forecast. It closely follows the trend of the actual cumulative revenue but slightly underestimates demand, particularly towards the end of the month. This underestimation suggests that initial forecasts did not fully capture the level of demand experienced in August.
Safety Stock Cumulative Cost (orange dotted line): This cost represents the cumulative expense of holding safety stock based on the Prophet model’s upper-bound forecast. The safety stock cost steadily rises throughout August, remaining below the cumulative revenue lines, which indicates that maintaining a safety stock incurs significant but manageable costs that are justified by the revenue generated.
Median Stock Cost (brown dotted line): The cumulative cost of maintaining a constant median stock level is lower than the safety stock cost. This suggests that using a median stock level approach could be a more economical alternative to safety stock, potentially providing sufficient inventory coverage without the additional expense of dynamic safety stock adjustments.

Key Insights

Revenue vs. Forecast: Actual cumulative revenue slightly surpasses forecasted cumulative revenue, hinting that the demand predictions slightly underestimated the actual sales, especially towards the end of August.
Cost Efficiency of Safety Stock: The safety stock model, while incurring higher costs than the median stock approach, maintains a balance between meeting demand and managing costs, as the cumulative safety stock cost is below the revenue lines.
Median Stock Cost Advantage: The median stock cost remains consistently lower than the safety stock cost, suggesting it may be a more cost-effective inventory strategy if demand variability is manageable.

Conclusion

For the Occupational Attire category in August, the Optimal Dynamic Inventory Strategy provides the most effective approach, balancing demand coverage with cost efficiency. This strategy adjusts inventory levels in real-time based on forecasted demand and lead time, ensuring that stock remains sufficient without excessive holding costs. Unlike a fixed median stock approach, which could leave the business vulnerable during demand spikes, the Optimal Dynamic Strategy is responsive to variability, making it particularly well-suited for fluctuating demand patterns. This analysis supports the Optimal Dynamic Strategy as the best choice for maintaining service levels while managing costs, especially in environments with high or unpredictable demand variability.

newplot12

Part 11. Exploring optimal inventory management strategies

Analysis and Insights on Inventory Strategies for Occupational Attire in August

The three graphs illustrate the inventory levels, reorder points, and safety stock levels for three different inventory strategies (Optimal Dynamic, EOQ, and Current) applied to the Occupational Attire category in August. Below is an analysis of each strategy and key insights derived from the visualizations.

1. Optimal Dynamic Inventory Strategy

Inventory Level (Orange Dashed Line): The inventory adjusts dynamically based on forecasted demand, with smaller, frequent reorders throughout the month.
Reorder Points (Orange Stars): Reorders are placed in response to projected demand, with reorder amounts varying over time, demonstrating a responsive approach.
Safety Stock Level (Red Dotted Line): Inventory levels consistently stay above the safety stock level, ensuring adequate stock without excessive overstocking.
Insight: The Optimal Dynamic Strategy provides a flexible and cost-effective approach, allowing inventory levels to align closely with demand while minimizing excess. This adaptability is particularly beneficial for managing variable demand, supporting this strategy as the most responsive and efficient among the three.

2. EOQ (Economic Order Quantity) Strategy

Inventory Level (Magenta Dashed Line): The EOQ strategy results in larger, less frequent reorders, leading to higher stock levels initially and a gradual reduction over time.
Reorder Points (Magenta Stars): Fixed reorder quantities (+77 units) are placed periodically, leading to substantial inventory increases followed by a steady depletion.
Safety Stock Level (Red Dotted Line): Inventory levels exceed the safety stock level by a significant margin, indicating potential overstock and increased holding costs.
Insight: While the EOQ strategy ensures demand coverage, it results in higher holding costs due to the larger reorder quantities. This strategy may be less efficient for categories with variable demand, as it maintains a buffer well above safety stock requirements, potentially increasing costs and tying up resources.

3. Current Inventory Strategy

Inventory Level (Brown Dotted Line): Inventory levels are maintained through periodic large reorders, resulting in sharp increases at each reorder point.
Reorder Points (Brown Stars): Fixed reorder amounts (+68 units) are placed periodically, leading to significant jumps in stock that gradually decrease.
Safety Stock Level (Red Dotted Line): The current strategy maintains inventory well above the safety stock level, similar to EOQ, but with slightly less excess.
Insight: The Current Inventory Strategy provides a stable but inflexible approach to inventory management. It ensures sufficient stock but may result in overstock during periods of lower demand, making it less cost-effective compared to the Optimal Dynamic Strategy.

Conclusion

The Optimal Dynamic Inventory Strategy emerges as the most effective option for managing inventory in the Occupational Attire category for August. It adjusts to forecasted demand, maintains inventory close to actual requirements, and minimizes excess stock, thereby reducing holding costs. By contrast, the EOQ and Current Inventory strategies lead to higher inventory levels and potential overstock, making them less suitable for handling variable demand patterns. The analysis supports the Optimal Dynamic Strategy as the best choice for managing costs while ensuring demand coverage.

newplot13 newplot14 newplot15

Part 12. Exploring profit maximising strategy

Analysis of Cumulative Cost and Revenue Comparison for Inventory Management Strategies

The graph titled "Cumulative Cost and Revenue Comparison for Inventory Management Strategies" illustrates the cumulative costs associated with three inventory strategies—Optimal Dynamic, EOQ, and Current—compared with the Actual Cumulative Revenue over August. Below is an analysis of each strategy’s performance and key insights.

Analysis

Optimal Order Cumulative Cost (Orange Dashed Line):
- The Optimal Dynamic Strategy achieves the lowest cumulative cost by the end of August. This approach effectively aligns inventory levels with demand, minimizing unnecessary costs.
- The final profit for this strategy is $648.41, significantly higher than that of the other strategies, making it the most profitable and efficient option.
EOQ Order Cumulative Cost (Purple Dashed Line):
- The EOQ Strategy incurs a higher cumulative cost due to its large, infrequent reorders, resulting in excess stock and increased holding costs.
- This strategy ends with a negative profit of -$327.70, indicating that the high cumulative costs outweigh the revenue generated, demonstrating its inefficiency for this demand pattern.
Current Order Cumulative Cost (Brown Dashed Line):
- The Current Strategy has a cumulative cost slightly lower than EOQ but still relatively high, as it maintains stock levels above actual demand requirements.
- The final profit for this strategy is - $100.28, better than EOQ but still negative, showing that this approach also fails to efficiently balance costs with revenue.
Actual Cumulative Revenue (Green Line):
- The Actual Cumulative Revenue (green line) represents revenue generated from actual sales over August, showing a steady increase.
- The revenue line remains above the cumulative cost for the Optimal Dynamic strategy, resulting in a positive profit, while the EOQ and Current strategies' cumulative costs surpass the revenue line, leading to negative profits.

Insights

Optimal Dynamic Strategy: This strategy results in the highest profit by effectively managing costs and adjusting inventory levels to match demand without significant overstock. The Optimal Dynamic Strategy is the best approach in this scenario, as it balances demand coverage with minimal excess costs.
EOQ and Current Strategies: Both the EOQ and Current strategies result in negative profits due to higher cumulative costs that exceed the revenue generated. These strategies are less effective for handling variable demand patterns, as they lead to overstock and elevated holding costs.

Conclusion

The Optimal Dynamic Inventory Strategy is the most profitable and cost-effective choice, providing a balanced approach to inventory management. By keeping costs below cumulative revenue, this strategy maximizes profit while maintaining sufficient inventory to meet demand. In contrast, the EOQ and Current strategies prove inefficient in this context, highlighting the value of adaptive inventory strategies for managing variable demand.

newplot16

Part 13. Predicting future demand

Analysis of Prophet Forecast and Inventory Management for January 2020

The following analysis summarizes the results of forecasting demand and managing inventory for January 2020. The analysis is based on three visualizations that provide insights into daily sales forecasts, optimal inventory levels with reorder points, and cumulative cost versus revenue.

1. Prophet Demand Forecast and Safety Stock

Prophet Forecast (Red Dashed Line): The forecasted daily sales for January 2020 show consistent demand fluctuations, with daily sales ranging between 1 to 3 units, as indicated by the red dashed line.
Confidence Interval (Red Shaded Area): The shaded region around the forecast represents the confidence interval, capturing the range of uncertainty in the forecast. It fluctuates moderately, suggesting some variability in demand.
Prophet Safety Stock (Blue Dashed Line): The safety stock level, calculated based on forecasted demand peaks, hovers around 10-12 units. This level provides a buffer to meet potential demand surges.

Insight: The safety stock level remains sufficiently above the forecasted demand, ensuring adequate coverage for potential fluctuations without risking stockouts. This setup provides a reliable demand forecast to guide inventory planning.

2. Optimal Inventory Levels and Reorder Points

Optimal Inventory Level (Orange Dashed Line): The inventory level fluctuates based on reorder points, adjusting dynamically to meet forecasted demand. It shows periodic replenishments throughout January, as the level dips and then spikes upon restocking.
Reorder Points (Orange Stars): The stars on the plot mark reorder points, with labels indicating the quantity added to inventory. Reorders vary from +9 to +13 units, demonstrating responsiveness to anticipated demand.
Safety Stock Level (Red Dotted Line): The safety stock level serves as a minimum threshold, keeping the inventory level above forecasted demand requirements.

Insight: The Optimal Inventory Strategy efficiently manages stock by reordering only when inventory nears the safety stock level, minimizing excess stock while ensuring availability. This approach aligns closely with demand, reducing holding costs and preventing overstock.

3. Cumulative Cost and Revenue Comparison

Optimal Order Cumulative Cost (Orange Dashed Line): The cumulative cost of inventory under the optimal strategy steadily increases over January, reflecting the costs of maintaining and replenishing stock.
Actual Cumulative Revenue (Green Line): The actual cumulative revenue grows more quickly than the cumulative cost, resulting in a positive final profit.
Optimal Profit Annotation: By the end of January, the optimal strategy yields a profit of $604.82, as shown by the annotation. This profit indicates effective cost control and alignment of inventory levels with demand.

Insight: The optimal strategy successfully balances costs and revenue, demonstrating profitability through careful inventory management. By keeping cumulative costs below cumulative revenue, this approach maximizes profit while meeting demand requirements.

Conclusion

The Optimal Dynamic Inventory Strategy, guided by the Prophet forecast and safety stock considerations, proves to be effective in January 2020. It balances demand coverage and cost efficiency by:

Maintaining a Reliable Safety Stock Level: Sufficient to meet demand fluctuations.
Optimizing Reorder Points: Reordering only as needed, based on forecasted demand and safety stock thresholds.
Maximizing Profit: Achieving a final profit of $604.82 through cost-effective inventory management.

This approach demonstrates the value of integrating demand forecasting and inventory optimization for improved profitability and resource management.

newplot17 newplot18 newplot19

Part 14. Final Conclusion

This analysis showcases a holistic approach to inventory optimization that integrates advanced demand forecasting, dynamic inventory management, and seasonality trends. By leveraging these methods, we effectively answered the initial question: How can we optimize inventory levels to minimize costs while ensuring product availability?

Key Results and Insights

Accurate Demand Forecasting with Seasonality Consideration:
- The custom forecast_next_n_months function, built on Prophet, provided demand forecasts that account for seasonal fluctuations and major promotional events. This enabled us to anticipate high-demand periods and adjust stock levels accordingly, reducing the risk of stockouts during peak times.
- By incorporating seasonality trends directly into the forecasting model, we captured the cyclical nature of demand. This foresight allowed for better-aligned inventory planning, ensuring readiness for both typical demand and surges driven by seasonal sales and holidays.
Optimal Dynamic Inventory Strategy:
- The Optimal Dynamic Inventory Strategy proved to be the most efficient approach, responding adaptively to forecasted demand with minimal excess. This approach significantly reduced holding costs while maintaining inventory levels that met demand.
- The strategy's profitability surpassed other methods, as shown by the cumulative profit analysis, by aligning inventory costs to demand with high precision.
Incorporating the Influence of Discounts, Price, and Ratings:
- Previous subgroup analyses highlighted how discounts, product pricing, and ratings influence purchasing behavior. By factoring these insights into the demand forecast, this approach ensured that inventory levels were responsive not only to seasonality but also to promotional activities and perceived product value.
- For example, during discounted sales events or high-rating product launches, inventory was adjusted to meet anticipated demand, thereby avoiding lost sales opportunities and reducing the likelihood of overstocking on lower-demand items.
Efficient Reordering and Safety Stock Management:
- The strategy used a dynamic reorder system informed by forecasted demand, adjusting order quantities based on real-time trends. This minimized costs while ensuring availability, especially during expected demand surges tied to promotions, discounts, and high-rated product periods.
- Safety stock levels were carefully calibrated to provide a buffer against unanticipated demand increases, enhancing customer satisfaction and reliability.

Solution to the Problem

This approach effectively optimized inventory by balancing cost-efficiency and demand responsiveness, informed by seasonality and key purchase drivers:

Minimized Holding and Ordering Costs: Through dynamic reordering aligned with demand patterns, the strategy reduced unnecessary holding costs associated with static inventory methods.
Ensured Product Availability: By forecasting demand based on seasonality and promotional impact, the approach minimized stockouts, meeting demand consistently, especially during high-traffic sales periods.
Maximized Profitability: By adjusting stock in response to forecasted demand and sales drivers, the strategy achieved the highest profit margin, underscoring the importance of data-driven, adaptable inventory management.

Conclusion

This project demonstrates the power of an integrated, data-informed approach to inventory management. By combining seasonality-based demand forecasting with insights into the influence of discounts, pricing, and ratings, the strategy aligns inventory closely with demand patterns. This method not only enhances profitability but also ensures operational efficiency and responsiveness. The approach provides a scalable solution for managing inventory across diverse demand cycles and market conditions, illustrating the benefits of adaptive, demand-responsive inventory practices.

B. What pricing strategies can we implement to maximize revenue?

Introduction to Shopee's Current Pricing Conditions

Shopee operates in a highly competitive e-commerce landscape characterized by diverse product categories and a customer-to-customer (C2C) marketplace model. This structure allows individual sellers to compete on the same platform, often offering similar products at varying price points. The platform's low entry barriers for sellers and wide consumer reach create a dynamic pricing environment where competition is fierce and consumer expectations for affordable pricing are high.

In this context, Shopee faces the challenge of balancing seller autonomy in pricing with maintaining an optimal price perception for buyers. Pricing must account for the wide variation in product demand, seller strategies, and consumer price sensitivity across different categories. Additionally, external competition from other e-commerce giants like Lazada and Amazon intensifies the need for pricing strategies that are not only competitive but also revenue-maximizing for both sellers and the platform itself.

In this section, we explore a comprehensive approach to implementing a pricing strategy that can achieve these goals effectively.

The technical implementation details, including code for each step of the analysis, can be found in the Price Elasticity and Dynamic Pricing Notebook in the GitHub repository.

Limitation Note

Due to the limited availability of data, this project will focus exclusively on potential pricing strategies within the Shopee platform. External market factors and pricing strategies from other platforms will not be covered in this analysis.

Key Objective

To identify pricing strategies within the Shopee platform that can maximize revenue by leveraging insights into seller competition, product categories, and customer behavior.

This objective focuses on maximizing Shopee's Revenue via Gross Merchandise Value (GMV), a key metric in e-commerce. GMV represents the total value of goods sold through the platform over a given period, calculated as:

$$ \text{GMV} = \text{Total Items Sold} \times \text{Average Selling Price (ASP)} $$

Maximizing GMV involves a strategic balance between increasing the number of transactions and optimizing pricing to enhance the overall value of sales. A well-designed pricing strategy that considers customer price sensitivity, product competition, and elasticity will help Shopee achieve higher GMV, driving platform growth and seller success.

Pricing Strategy Assumptions

Discount-Driven Pricing Approach:
- Since Shopee cannot directly control the product prices (as they are set by the sellers), the primary lever for influencing pricing and demand is through applying platform-wide or targeted discounts.
- Discounts can be applied dynamically based on insights from customer segments (e.g., loyalty tiers), product performance, and competitive analysis.
Segmented Discount Elasticity:
- The sensitivity of demand to discounts (Discount Elasticity of Demand, DED) varies across customer segments (e.g., budget shoppers vs. premium buyers) and product categories (e.g., electronics vs. fashion).
- DED will be estimated and updated regularly for each segment to capture the changing responsiveness of customers to discounts. This allows Shopee to tailor discount strategies to maximize demand.
Group Products by Sensitivity:
- Given the large and granular product catalog (10k+ products), it is impractical to set individual discount levels for each product. Instead, products will be grouped into clusters based on characteristics like category, price range, and historical sales performance.
- Each cluster will have a representative DED, and discount strategies will be applied at the cluster level rather than the individual product level.
Revenue Optimization Through Discount Forecasting:
- Instead of adjusting the product price directly, Shopee will forecast a time series of optimal discount levels using a combination of demand forecasting models and optimization techniques.
- The forecasted discount levels aim to maximize the GMV while considering constraints like campaign budgets, discount limits, and seller agreements.
Campaign Days and Promotion Strategy:
- During major sales events (e.g., 11.11, Black Friday), customers are generally more responsive to discounts. The model will incorporate campaign day indicators to increase the forecasted discount levels for these periods to boost transaction volume and GMV.
- Shopee may also offer tiered discounts (e.g., higher discounts for premium loyalty tiers) to incentivize purchases from high-value customers.
Data-Driven Feedback Loop:
- The model will include a feedback mechanism that continuously monitors the impact of discounts on demand and updates the DED estimates accordingly. This allows Shopee to refine its discount strategies and respond quickly to market changes.

Part 1: Exploratory Data Analysis

Plotting Price Variability Within Each Main Category

Skewed Price Distributions:
- Most categories have a highly right-skewed distribution, where the majority of products are clustered in the lower price ranges, with a few products at significantly higher prices.
- This is typical in e-commerce, where affordable products make up the bulk of each category, and only a few premium products are available at higher prices.
Low Price Variability in Some Categories:
- Categories like "Women's Clothes," "Men's Shoes," "Women's Shoes," and "Fashion Accessories" show very low price ranges, indicating that most products in these categories are relatively low-cost.
- Categories with low price variability might indicate that customers are highly price-sensitive, and sellers are pricing competitively within a narrow range.
High Price Variability in Certain Categories:
- Categories such as "Cameras & Drones," "Travel & Luggage," "Health & Beauty," and "Computer & Accessories" have broader price ranges, showing a mix of low-cost and high-cost items.
- The wide price range in these categories might suggest a mix of entry-level and premium products, catering to different customer segments with varying price sensitivities.

Price Elasticity Demand Concept as Key Determiner for Dynamic Pricing

What is PED?

Price Elasticity of Demand (PED) measures how sensitive customer demand is to changes in price. It provides a quantitative understanding of how price adjustments (e.g., discounts) affect the quantity sold.

Formula for PED:

Where:

% Change in Quantity Demanded:

% Change in Price:

Interpreting PED Values

The value of PED determines whether demand is elastic, inelastic, or unit elastic:

Elastic Demand (PED < -1):
- Demand is highly sensitive to price changes.
- Example: If PED = -2, a 1% decrease in price leads to a 2% increase in demand.
- Good for Dynamic Pricing: Discounts can significantly boost demand and revenue.
Inelastic Demand (-1 ≤ PED < 0):
- Demand is less responsive to price changes.
- Example: If PED = -0.5, a 1% decrease in price leads to only a 0.5% increase in demand.
- Not Ideal for Discounts: Lowering prices may not significantly increase demand.
Unit Elastic Demand (PED = -1):
- Demand changes proportionally to price.
- Example: A 1% decrease in price leads to a 1% increase in demand.
- Moderate: Discounts may balance demand growth and revenue impact.
Zero Elasticity (PED = 0):
- Demand does not change regardless of price (e.g., necessities).

What is Considered a Good PED Value?

For Discounts to Be Effective: A good PED value is typically more elastic ((PED < -1)), as it indicates that discounts will lead to significant demand increases.
For Revenue Maximization Without Excessive Discounts: Products with (PED) slightly less than (-1) are ideal because they provide predictable demand growth without excessive price reductions.

How to Assess PED

To effectively monitor and update PED calculations:

Frequency of Assessment:
- Monthly or Weekly assessments are ideal for most products, depending on:
  - Product Volatility: Seasonal or fast-moving products may require weekly updates.
  - Campaigns: Evaluate PED immediately after major sales events (e.g., mega sales).
- Products with stable demand can have PED assessed quarterly.
Actionable Insights:
- Combine PED with demand forecasts to predict the impact of proposed discounts.
- Monitor PED trends over time to detect changes in customer behavior and adapt pricing strategies accordingly.

Challenges in PED Assessment

Data Noise: Discounts applied across campaigns or multiple categories may obscure true PED.
Sufficient Data Volume: PED calculations require significant historical data to be reliable, particularly for less popular products.
Dynamic PED: Elasticity may change over time, especially with market trends, competitive pressures, or new product launches.

PED Calculation for Each Category

Due to the current limitations in available data, we calculate the Price Elasticity of Demand (PED) at the mid_category and main_category levels. This approach assumes that products within these categories exhibit similar price sensitivity, allowing us to aggregate and generalize demand behavior for analysis.

However, in an ideal scenario with more granular data, PED would be calculated at the individual product level. This would enable us to precisely identify which products have elastic or inelastic demand, providing a more tailored and accurate understanding of pricing dynamics.

For now, this category-based approach serves as a practical approximation, while acknowledging that product-level PED is the preferred method in practice.

We will use this Price Elasticity of Demand (PED) as a key metric to identify and prioritize products suitable for dynamic pricing or "discount adjustment." Given the vast number of products on Shopee, it's crucial to focus on those that are likely to benefit most from price adjustments to optimize revenue.

PED is hoped to enable us to classify products into two main categories:

Inelastic Demand Products: Items like automotive, which are less affected by discounts, as their demand remains relatively stable regardless of price changes. For such products, applying dynamic pricing might not yield significant results.
Elastic Demand Products: Items like seasonal or high-demand products (e.g., Labubu collectibles), where demand can fluctuate significantly with price changes. These are more suitable for dynamic pricing as strategic discounts can drive substantial demand and revenue growth.

Key Observations

Categories with Elastic Demand (PED < -1):
- These categories are highly responsive to price changes, making them ideal for discount-driven strategies.
- Examples:
  - Watches (-1.482): Discounts are effective in driving demand, indicating high sensitivity to price changes.
  - Games, Books & Hobbies (-1.028): Similar behavior; demand can significantly increase with appropriate discounts.
Categories with Inelastic Demand (PED > -1 and close to 0):
- Demand for these categories is relatively unaffected by price changes.
- Examples:
  - Automotive (-0.133): Discounts have minimal impact on sales; focus should be on other promotional methods like bundling or value-added services.
  - Travel & Luggage (-0.137): Similarly, discounts may not yield significant results.
  - Tickets & Vouchers (-0.017): This extremely inelastic demand suggests minimal price sensitivity.
Categories with Positive PED:
- Positive PED values indicate potential data issues or demand behaving counterintuitively (e.g., increasing price leads to increased demand, often due to luxury or status items).
- Examples:
  - Fashion Accessories (-73.81): Likely a data anomaly or miscalculation; needs further investigation.
  - Health & Beauty (0.931): Suggests price increases may increase demand, possibly reflecting perceived value or premium positioning.
  - Men's Bags & Wallets (1.606): Positive PED here suggests discounts may not align with customer expectations.
Outliers or Data Anomalies:
- Fashion Accessories (-73.81): Such an extreme negative value suggests an error in the PED calculation or an unusual sales event.
- Home Appliances (-51.69): Similarly, an extreme PED value may not reflect actual customer behavior.

Strategic Recommendations

Elastic Categories (Focus on Discounts):

Watches, Games, Books & Hobbies, Baby & Toys, Home & Living:
- Apply targeted discounts during campaigns to boost sales.
- Use PED to test various discount levels and optimize for maximum revenue.

Inelastic Categories (Focus on Alternatives):

Automotive, Travel & Luggage, Tickets & Vouchers:
- Deprioritize heavy discounts; focus instead on bundling, cross-selling, or loyalty rewards.
- Explore campaigns emphasizing product quality, durability, or other value drivers.

Categories with Positive PED (Investigate Further):

Fashion Accessories, Men's Bags & Wallets, Health & Beauty:
- Revalidate calculations for these categories.
- Positive PED may indicate luxury or premium positioning; discounts may dilute brand perception.

Address Anomalies:

Extreme PED values, such as Fashion Accessories (-73.81) and Home Appliances (-51.69), might not be accurate due to the current methodology, which calculates PED at the mid_category level rather than at the product level. These anomalies could result from aggregated data masking more granular demand behaviors.
For now, we will acknowledge these potential inaccuracies but proceed with the analysis based on reasonable PED values for other categories.
Future improvements should focus on refining PED calculations at more granular levels (e.g., individual products or subcategories) to provide a clearer picture of elasticity.

Possible Factors Affecting Price Elasticity

Product Type and Necessity:
- Essential items, such as groceries or basic clothing, are inelastic because they are non-negotiable purchases, regardless of price changes.
- Luxury or discretionary items, like branded bags or electronics, are more elastic since consumers can delay or avoid purchasing them if prices increase.
Availability of Substitutes:
- Products with readily available alternatives (e.g., generic brands or similar products) have elastic demand, as customers can easily switch to cheaper options.
- Products with few or no substitutes, such as niche or patented items, tend to have inelastic demand.
Brand Loyalty:
- Consumers loyal to a specific brand are less likely to switch, even if prices increase, making demand inelastic.
- Conversely, in markets with weak brand loyalty, consumers are more likely to respond to price changes, leading to greater elasticity.
Customer Segments:
- Lower-income customers are more price-sensitive and exhibit elastic demand because price increases significantly impact their purchasing decisions.
- New customers are also more sensitive to discounts as they are still exploring options and are not yet committed to a brand or product.
Seasonal Demand:
- During peak periods (e.g., holidays, festivals), demand for seasonal products like decorations or gifts becomes inelastic because consumers prioritize availability over price.
- In off-peak times, the same products may see more elastic demand as consumers are more sensitive to price changes.
Perceived Value:
- High-quality or premium products often have inelastic demand because customers perceive them as worth the price, even if it increases.
- If customers associate higher prices with better quality, demand may even rise with price (positive elasticity).
Discounts and Promotions:
- Discounts reduce the effective price, increasing elasticity as customers are drawn to the perceived value.
- Limited-time offers create urgency, further boosting demand responsiveness to price changes.
Competitive Factors:
- In highly competitive markets, where consumers can easily compare prices, demand becomes more elastic.
- In markets with little competition, such as monopolies, demand tends to be inelastic as customers have fewer alternatives.
Inventory and Supply Constraints:
- Scarcity (e.g., limited stock or exclusive items) reduces elasticity because customers are willing to pay more to secure the product.
- When supply is abundant, demand becomes more elastic as customers feel less urgency to purchase.
Consumer Behavior and Trends:

Social proof, such as recommendations from influencers or trending products, can make demand more elastic as customers respond quickly to price changes.
Impulse purchases, often for low-cost or trending items, are more sensitive to discounts and promotions.

Geographic Factors:

In regions with lower income levels, demand is more elastic because customers are more affected by price changes.
Shipping costs or availability of local alternatives can also influence elasticity, especially for international or imported goods.

Economic Conditions:

During economic downturns, consumers become more price-sensitive, increasing elasticity for non-essential goods.
In periods of economic growth, higher disposable income makes demand for luxury or premium products less elastic, as customers are willing to pay more.

Dynamic Pricing Model

Dynamic pricing is a strategic approach to adjusting prices or discounts based on customer behavior, product characteristics, and market conditions. In Shopee's context, dynamic pricing is implemented by dynamically assigning discounts to products rather than directly changing product prices, as sellers retain control over base prices. This system enables Shopee to influence demand and optimize revenue during flash sales or campaigns while respecting operational constraints.

1. Rationale for Dynamic Pricing via Discounts

a. Shopee’s Operational Constraint

Shopee cannot directly control product prices, as they are set by the sellers.
Instead, Shopee applies platform-driven discounts during flash sales, campaigns, and promotional events to incentivize purchases.

b. Flexibility with Discounts

Discounts can be dynamically adjusted to align with demand patterns, customer preferences, and available budget constraints.
The approach allows Shopee to optimize sales without altering seller-defined pricing structures.

c. Customer-Centric Marketing

Shopee’s ability to segment customers based on purchase behavior enables tailored discount strategies.
For example:
- Offer higher discounts to price-sensitive customer segments.
- Target loyal customers with exclusive deals on preferred product categories.

2. Predefined Customers and Products

a. Customers

Customers are predefined based on segmentation models derived from clustering algorithms applied to historical purchase data.
Segmentation attributes include:
- Discount Sensitivity: How likely a customer is to purchase given a discount.
- Product Preferences: Preferred categories or specific products based on past behavior.
- Spending Capacity: Average basket size or purchasing power.

b. Products

Products for flash sales are selected using criteria such as:
- High Price Elasticity of Demand (PED): Products with elastic demand are more responsive to discounts.
- Stock Levels: High inventory products are prioritized to reduce overstock.
- Seasonality and Trends: Items trending or in season are preferred for flash sales.
- Deal Negotiations: Sellers offering better terms for discounts may have their products prioritized.

c. Dynamic Inputs

The predefined customer and product datasets can be updated by Shopee’s marketing team, allowing flexibility to reflect new customer segments or product deals.

3. Reinforcement Learning (RL) for Dynamic Pricing

a. Why RL?

Sequential Decision-Making: RL excels at optimizing decisions over time, such as allocating discounts across multiple products while considering evolving constraints (e.g., budget and stock levels).
Adaptability: RL learns from customer responses and adjusts strategies dynamically, outperforming static rule-based methods.
Trade-Off Management: RL inherently balances competing objectives, such as maximizing revenue while staying within budget.

b. RL Framework

State Space:
- Represents the current environment, including:
  - Product prices and stock levels.
  - Remaining discount budget.
  - Customer preferences and discount sensitivity.
Action Space:
- Defines the discount levels (e.g., 0%, 10%, 20%, 30%, 40%) for each product in the flash sale.
Reward Function:
- Balances revenue generation and budget utilization:
- Includes penalties for overspending the budget or failing to utilize stock efficiently.
Environment Dynamics:
- Simulates purchases using customer probability models based on:
  - Base likelihood of purchase (intrinsic interest).
  - Sensitivity to discounts.
  - Boosts for preferred products/categories.

c. RL Implementation

Model Selection:
- Use Proximal Policy Optimization (PPO) for its robustness and ability to handle high-dimensional action spaces (e.g., discounts for multiple products).
Training Process:
- Predefined customer and product datasets serve as the input to simulate flash sale scenarios.
- RL is trained over multiple episodes, optimizing discount strategies by exploring and exploiting actions that maximize cumulative rewards.
Evaluation:
- Compare RL strategies to rule-based benchmarks (e.g., uniform discounts across all products).
- Metrics include:
  - Total revenue.
  - Budget utilization.
  - Stock clearance efficiency.

Initialize and train the Proximal Policy Optimization (PPO) model

Testing the Trained PPO Model and Summarizing Performance Metrics

Baseline Simulation and RL Model Evaluation with Revenue and Budget Comparison

Analysis of RL vs. Baseline Performance

The results demonstrate the effectiveness of the Reinforcement Learning (RL) model in optimizing discount strategies compared to a rule-based baseline approach. Here's a detailed breakdown:

1. Summary of Results

Metric	Baseline	RL	Improvement
Total Revenue ($)	2002.5	4355.0	+2645.0
Remaining Budget ($)	777.5	30.0	-780.0

RL Total Revenue: $4355.0, significantly higher than the baseline revenue of $1710.0.
RL Remaining Budget: $30.0, indicating effective use of the budget without overspending.
Improvement in Revenue: $2645.0, a 154.7% increase compared to the baseline.

2. Key Observations

a. Revenue Generation

RL Model:
- The RL agent maximized revenue by dynamically allocating discounts to products based on customer preferences, stock levels, and sensitivity to discounts.
- Generated $4355.0, indicating an effective strategy in attracting purchases.
Baseline:
- The baseline applied uniform discounts without considering customer behavior, resulting in lower sales ($1710.0). b. Budget Utilization
RL Model:
- The agent utilized almost the entire budget ($970 out of $1000) efficiently while adhering to the constraint of not overspending.
- This shows the agent prioritized high-revenue opportunities.
Baseline:
- Spent less of the budget ($190), which contributed to lower revenue due to under-utilization of discount opportunities. c. Strategic Behavior
RL Model:
- Focused discounts on products with higher customer preference and sensitivity to discounts.
- Balanced stock clearance and revenue generation, achieving higher sales while avoiding unnecessary discounts.
Baseline:
- Lacked strategic allocation of discounts, leading to missed opportunities for higher sales.

3. Next Steps: Marketing Team’s Role

a. Updating Customer Segments

Shopee’s marketing team can use clustering models to dynamically refine customer segments based on:
- Changes in purchase behavior (e.g., during holiday seasons).
- Demographic shifts or emerging trends.

b. Identifying Suitable Products

Marketing teams can prioritize products for flash sales using:
- PED metrics to target price-sensitive items.
- Stock and inventory reports to clear overstock items.
- Seller negotiations to secure deals for highly elastic products.

c. Tailored Flash Sale Strategies

Marketing teams can define flash sale objectives, such as:
- Maximizing revenue.
- Promoting new or trending products.
- Achieving stock clearance goals.
The RL model dynamically aligns discount strategies with these objectives.

Pricing Recommendation for Sellers

Though Shopee cannot adjust product prices directly, it can leverage insights from demand patterns, price elasticity of demand (PED), and market competition to recommend optimal pricing strategies to sellers. These insights are designed to help sellers adjust their prices to maximize sales, competitiveness, and revenue.

1. Finding Demand and Competition Indices

1.1 Demand Index

The demand_index classifies products into Low, Medium, or High demand categories based on their total quantity sold relative to others in their category. This helps sellers understand the popularity of their products in the market.

Classification:

Low Demand: Products with total sales below the 33rd percentile of total quantity sold within their main_category and mid_category.
Medium Demand: Products with total sales between the 33rd and 66th percentiles of total quantity sold in their category.
High Demand: Products with total sales above the 66th percentile.

Purpose:

Helps sellers identify underperforming products (Low Demand) and take corrective actions such as offering discounts, bundling, or increasing promotions.
Guides sellers on which high-performing products (High Demand) could sustain price increases or benefit from enhanced visibility in campaigns.

1.2 Competition Index

The competition_index evaluates the level of competition each product faces based on price and stock availability within its category. It categorizes products as Low, Medium, or High Competition.

Classification:

Low Competition: Products priced significantly higher or lower than the category average. These products are less comparable to others, facing minimal direct competition.
Medium Competition: Products priced moderately close to the category average, facing moderate competition from similar products.
High Competition: Products priced closest to the category average. These products face intense competition as they are directly comparable to many others in the category.

Purpose:

Provides sellers with insights into competitive positioning.
Suggests strategic price adjustments to reduce competition or capitalize on unique positioning.

2. Pricing Recommendations

Using the demand_index, competition_index, and PED, Shopee can provide sellers with actionable recommendations:

2.1 For Low-Demand Products

Low Competition:
- Suggest small price reductions to attract price-sensitive customers without devaluing the product.
- Highlight the product’s unique features to justify its price.
High Competition:
- Recommend aggressive discounts or bundling with complementary products to increase visibility and sales.
- Use flash sales or limited-time offers to create urgency.

2.2 For Medium-Demand Products

Low Competition:
- Advise maintaining current pricing if the product has unique value or low elasticity.
- Encourage sellers to invest in marketing campaigns to build awareness.
High Competition:
- Suggest moderate price reductions to match competitive products in the category.
- Offer sellers insights on how to differentiate their products through value-added features or enhanced listings.

2.3 For High-Demand Products

Low Competition:
- Recommend increasing prices slightly if PED indicates inelastic demand, capturing higher margins without reducing sales significantly.
- Suggest sellers focus on maintaining exclusivity or scarcity to sustain demand.
High Competition:
- Propose price matching with competitors while emphasizing product quality or delivery advantages.
- Highlight high-demand products in campaigns to maximize their reach and revenue potential.

3. Insights for Sellers

3.1 Demand Insights

Products with Low Demand require corrective actions to improve sales, such as targeted discounts or promotions.
High-Demand Products should focus on maximizing profitability while maintaining market share.

3.2 Competitive Insights

Products with High Competition need pricing strategies that differentiate them from the crowd, such as offering bundles, loyalty rewards, or free shipping.
Low Competition Products should leverage their unique positioning, maintaining or increasing prices to capitalize on perceived value.

4. How Sellers Can Use These Recommendations

Understand Market Position: Sellers can evaluate how their products perform in terms of demand and competition compared to the market.
Adjust Pricing Dynamically: Use Shopee’s recommendations to refine prices based on market conditions and customer behavior.
Maximize Campaign Participation: Sellers can identify which products to prioritize in flash sales or promotions to optimize visibility and profitability.

Demand Index, PED, and Price-Based Competition Analysis for E-Commerce Products

	product_id	price	total_quantity	main_category	mid_category	sub_category	item_rating	favorite	PED	demand_index	price_difference	competition_score	competition_index
1574	1612	3.114120	2	Automotive	Automotive	Automobiles	5.0	38.0	-0.136640	Low	41.033344	0.548470	Low
3259	3346	8.137196	2	Automotive	Automotive	Automobiles	No ratings yet	0.0	0.000000	Low	36.010269	0.603744	Medium
4944	5106	62.282401	4	Automotive	Automotive	Automobiles	5.0	52.0	0.714286	Medium	18.134937	0.800444	High
5618	5817	15.414894	10	Automotive	Automotive	Automobiles	No ratings yet	0.0	1.740741	High	28.732570	0.683828	High
6829	7095	21.487428	3	Automotive	Automotive	Automobiles	No ratings yet	0.0	-0.136640	Medium	22.660036	0.750650	High

Sample Price Adjustment Framework for Sellers

This framework provides sample suggestions for sellers to adjust prices based on Demand Index, Competition Index, and Price Elasticity of Demand (PED). These suggestions aim to help sellers optimize revenue and competitiveness while aligning with market dynamics.

Sample Price Adjustment Logic

1. Factors Influencing Price Adjustment

The sample price adjustment is determined by combining adjustments from three key factors:

Demand Index Adjustment:
- Reflects the product’s market demand.
  - Low Demand: Decrease price by 3% to encourage purchases.
  - Medium Demand: No change, indicating balanced demand.
  - High Demand: Increase price by 3% to maximize revenue from strong demand.
Competition Index Adjustment:
- Reflects the level of competition in the product’s category.
  - High Competition: Decrease price by 3% to remain competitive.
  - Medium Competition: No change, indicating moderate competitiveness.
  - Low Competition: Increase price by 3% to capitalize on unique positioning.
Price Elasticity of Demand (PED) Adjustment:
- Reflects customer price sensitivity.
  - High PED (Elastic Demand, PED > 1): Adjust price by ±2% (decrease for increased sales, increase for demand stabilization).
  - Unitary PED (PED = 1): No change, indicating proportional demand and price response.
  - Low PED (Inelastic Demand, PED < 1): Adjust price by ±2% to optimize revenue without significant demand impact.

2. Calculating Total Price Adjustment

The total price adjustment is the sum of the three individual adjustments:

$$ \text{Total Adjustment Factor} = \text{Demand Adjustment} + \text{Competition Adjustment} + \text{PED Adjustment} $$

3. Final Price Calculation

The sample new price is calculated as:

$$ \text{New Price} = \text{Original Price} \times (1 + \text{Total Adjustment Factor}) $$

Sample Price Adjustment Scenarios

Scenario 1: Low Demand, High Competition, High PED

Original Price: $100
Demand Adjustment: -3%
Competition Adjustment: -3%
PED Adjustment: -2%
Total Adjustment Factor: (-3% + (-3%) + (-2%) = -8%)
Sample New Price: ( 100 \times (1 - 0.08) = 92 )

Scenario 2: High Demand, Low Competition, Low PED

Original Price: $200
Demand Adjustment: +3%
Competition Adjustment: +3%
PED Adjustment: +2%
Total Adjustment Factor: (+3% + (+3%) + (+2%) = +8%)
Sample New Price: ( 200 \times (1 + 0.08) = 216 )

Scenario 3: Medium Demand, Medium Competition, Unitary PED

Original Price: $50
Demand Adjustment: 0%
Competition Adjustment: 0%
PED Adjustment: 0%
Total Adjustment Factor: ( 0% + 0% + 0% = 0%)
Sample New Price: ( 50 \times (1 + 0.00) = 50 )

How Sellers Can Use This Sample Framework

Understand Product Positioning:
- Use the Demand Index to identify whether a product needs more visibility or a price adjustment to boost sales.
- Evaluate the Competition Index to understand market competitiveness.
Leverage PED Insights:
- Adjust prices based on price sensitivity to either increase revenue or stabilize demand.
Adjust Prices Dynamically:
- Use these sample adjustments as guidelines to inform pricing decisions. Sellers can tweak the percentages or apply them selectively based on specific goals (e.g., stock clearance, profit maximization).

Conclusion

This sample pricing adjustment framework serves as a recommendation to guide sellers in optimizing their prices. The insights are tailored to help sellers make informed decisions, balancing demand, competition, and price elasticity effectively. Shopee encourages sellers to consider these adjustments as a starting point and adapt them to their unique product and market circumstances.

C. How can we improve the efficiency of our supply chain?

In our business problem, we emphasised on the severity of return rates on the overall performance of an ecommerce business. The stage of the supply chain that is responsible for managing the returns are the ‘Returns & Reverse Logistics’ stage. Thus, it would be the primary focus in our analysis for answering this question in making the supply chain more efficient. In our analytical findings, we aim to utilise our dataset to discover insights on what might be the elephant(s) in the room causing the return rates to increase. Having a deeper understanding of the causes of high return rates would help ecommerce businesses come up with better solutions that effectively target this critical issue.

The technical implementation details, including code for each step of the analysis, can be found in the Supply Chain Optimisation Notebook in the GitHub repository.

Correlation Matrix and Initial Findings

We begin by plotting a correlation matrix of the features in our deliveries dataset to observe any potential correlations between the features.

download (2)

Based on the positive correlations, the correlation coefficient between seller_response_time and return_request is 0.26, which is similar to that between delivery_time and return_request (0.28).

Thus, our initial finding is that the faster the delivery times and the quicker the seller responds, the lower the chance of return requests.

Another important point is that from the heatmap, notice that item rating and return requests have a moderate negative correlation coefficient (-0.43). We believe that this relationship is quite apparent since if the customer gives a lower item rating, it directly shows their dissatisfaction with the item itself. However, this moderate negative correlation suggests that while item rating contributes to higher return rates, it’s likely not the only factor. We see that delivery_time and seller_response_time have a weak positive correlation with return request (0.27 and 0.29 respectively) as well, which suggest that they could be other less direct factors associated with the higher return rates. Understanding the different factors that play a part in increasing return rates is something that we are keen on delving deeper into.

Further Exploration

Next, we plotted a graph that visualises the order volume, return volume and return rate for each category of products. We wanted to explore what are the impacts of each category on the phenomenon of increasing return rates.

download (3)

From the visualization, we see that the Men’s Clothes category stands out with both one of the highest order volumes and a relatively high return rate compared to other categories. This trend could be suggesting that customers might be frequently returning these items due to reasons such as change of mind or sizing issues, which are common challenges in the apparel category.

Hence, we want to examine more closely into this category specifically and explore ways to reduce the high return rate and to better manage the return volume, thereby improving customer satisfaction and reducing the logistical costs associated with returns.

Recall that based on the initial findings, we identified that faster delivery times and quicker seller responses will reduce return rates. So, our next step is to investigate more on:

Delivery Time vs Return Rate
Seller response Time vs Return Rate

(i) Delivery Time VS Return Rate

To investigate the relationship between delivery time and return rate for Men’s Clothes, we first looked at the general trend for return rates based on the delivery status (on-time vs late).

download (4)

This trend suggests that longer delivery times likely contributed to customer dissatisfaction, which potentially led to increased return requests as seen in the much higher bar on the right. Customers waiting longer for their orders may experience frustration, lose interest, or seek alternatives, influencing their likelihood to return the product. Therefore, items marked as "late" experience a substantial increase in return rates compared to on-time deliveries.

Furthermore, this aligns with the weak correlation coefficient (0.37) between delivery time and return request as seen in the heatmap. This suggests that delivery timing itself, separate from item quality, also influences return behavior to some extent.

This strengthens the case that not all returns are due to product quality issues. Instead, delayed delivery times also appear to be a driver of returns, highlighting the need to address delivery logistics for improved customer satisfaction.

Then, we wanted to find out if late delivery times had something to do with the countries the clothes were delivered from. We plotted the return rate against the average delivery time for each country (top graph) to get a general time for how long each country takes. Then, we segmented the delivery times of each country (bottom graph) and visualised the return rate for each of these segments:

0-4 days: Represents very prompt deliveries, likely to meet or exceed customer expectations, as most customers prefer faster fulfillment times.
4-9 days: Covers standard delivery times commonly expected in eCommerce, balancing reasonable shipping costs with customer patience for international orders.
9-14 days: Represents slower deliveries that may start impacting customer satisfaction, potentially correlating with a moderate increase in return requests due to dissatisfaction.
14+ days: Captures significantly delayed deliveries that are more likely to frustrate customers, leading to a higher probability of returns, especially when expectations are unmet.

download (5)

From the above graphs, we observe that higher delivery times correlate with higher return rates: Countries with longer delivery times, such as China (around 14 days) and Vietnam (around 10 days), exhibit higher return rates.

Key observation: One of the main bottlenecks of the supply chain involve both long international delivery times and late deliveries that contribute significantly to higher return rates. When customers experience delays in receiving their orders, they may become dissatisfied or lose interest in the product, leading to an increase in return requests.

Following this key observation, this brings us to our first key insight. That is, if we can improve delivery timeliness by reducing delivery times, particularly for high-return countries, we could mitigate customer dissatisfaction and reduce return rates. This is especially critical for products shipped from countries like China and Vietnam, where delivery times are generally much slower.

(ii) Seller Response Time VS Return Rate

Next, to investigate more on the relationship between seller response time and return rate, we plotted the return rate against seller response times.

Here are some insights based on the observed trend:

Positive Correlation: There appears to be a strong positive correlation between seller response time and return rate. As the response time increases, the return rate tends to rise. This suggests that delays in seller responses might contribute to a higher likelihood of returns.
Threshold Effect: The return rate starts increasing significantly around the 10-20 hour mark (about 14 hours or later) and continues to rise steeply. This could indicate that customers are more likely to return products if they don’t receive timely communication within this period.
Return Rate Plateau: After about 50 hours, the return rate stabilizes around 80-90%. This suggests that once response times reach a certain threshold, the return rate levels off, potentially indicating that customers lose confidence entirely after extended delays, leading to a high but stable return rate.

From these insights, our next key observation is that: Another main bottleneck in reducing the efficiency of the supply chain are delays in response times (i.e later than 14 hours). This leads to frustration or diminished interest of customers, which is contributing to higher return rates.

Following this key observation, our next key insight is that improving seller response times to 14 hours or earlier could reduce return rates by addressing customer concerns and managing expectations more effectively. This is likely to ensure that customers are better engaged and thus more satisfied in doing business with the sellers.

Potential Solution

Since improving international delivery times is challenging, using returned items as local stock becomes an effective strategy to circumvent the delay associated with international shipping. But even if we can use the return volume to optimize the order fulfillment process; the delivery time in particular, we will still need to tackle the root cause of the return volume – the high return rate. Hence, it is also important to find another solution to deal with the delays in response times.

9.2: Analytical Findings (Subgroup B) - 7-teens/7-teens-DSA3101-2410-Project GitHub Wiki

A. How can we optimize inventory levels to minimize costs while ensuring product availability?

Overview

Part 1. Exploring distribution of daily sales data

1. Correlation Matrix of Aggregated Daily Features

2. Daily Sales and Growth by Day of the Week

3. Daily Sales and Growth by Week of the Month

4. Daily Sales and Growth by Month

Part 2. Choosing a sub category to explore

Part 3. Exploring seasonal decomposition of chosen sub category

Observed

Trend

Seasonal

Residual

Summary

ADF Test Results Summary

ADF Test for Daily Sales in Occupational Attire

ADF Test for Daily Sales in the Entire Dataset

Summary

Part 4. General demand forecasting using Prophet

Key Observations

Summary

Part 5. General demand forecast seasonal decomposition

Analysis of Seasonal Decomposition Comparison: Actual vs Prophet Predicted Components

Key Observations

Summary

Part 6. Sub category demand forecast using Prophet

Prophet August Forecast vs. Actual for Occupational Attire

Seasonal Decomposition Components

Part 7. Sub category demand forecast seasonal decomposition

Seasonal Decomposition Comparison for Occupational Attire in August

Trend Component

Seasonal Component

Residuals Component

Overall Analysis

Part 8. Forecasting costs using demand predictions

Forecasted Cost for Occupational Attire in August

Analysis

Overall Insights

Part 9. Forecasting safety stock using demand predictions

Analysis of EOQ, Prophet Safety Stock, and Actual Demand for Occupational Attire in August

Graph Overview:

Key Observations:

Insights:

Conclusion:

Part 10. Exploring cumulative costs vs revenue

Analysis of Cumulative Revenue and Costs for Occupational Attire in August

Key Insights

Conclusion

Part 11. Exploring optimal inventory management strategies

Analysis and Insights on Inventory Strategies for Occupational Attire in August

1. Optimal Dynamic Inventory Strategy

2. EOQ (Economic Order Quantity) Strategy

3. Current Inventory Strategy

Conclusion

Part 12. Exploring profit maximising strategy

Analysis of Cumulative Cost and Revenue Comparison for Inventory Management Strategies

Analysis

Insights

Conclusion

Part 13. Predicting future demand

Analysis of Prophet Forecast and Inventory Management for January 2020

1. Prophet Demand Forecast and Safety Stock

2. Optimal Inventory Levels and Reorder Points

3. Cumulative Cost and Revenue Comparison

Conclusion

Part 14. Final Conclusion

Key Results and Insights

Solution to the Problem

Conclusion

B. What pricing strategies can we implement to maximize revenue?

Introduction to Shopee's Current Pricing Conditions

Limitation Note

Key Objective

Pricing Strategy Assumptions

Part 1: Exploratory Data Analysis

Plotting Price Variability Within Each Main Category

Price Elasticity Demand Concept as Key Determiner for Dynamic Pricing

What is PED?

Interpreting PED Values

⚠️ GitHub.com Fallback ⚠️