Data Sources and Processing - supertypeai/sectors-kb GitHub Wiki

This section focuses on how we obtain and process the data before storing it inside the database.

List of Columns for idx_financials_annual and idx_financials_quarterly

Metadata:

  • symbol: IDX company symbol ending with JK
  • date: date of the financial report

Income statement metrics:

  • total_revenue: For banks, obtained by adding the Net Interest Income and Non-Interest Income
  • gross_income: Total Revenue - Costs of Goods Sold; Not applicable for banks or insurance companies, since they don't have Costs of Goods Sold
  • operating_income: Gross Income - Operating Expense (excluding Costs of Goods Sold); For banks, obtained by subtracting the Interest Expense, Non-Interest Expense, and Loan Loss Provision from the Total Revenue.
  • pretax_income: Operating Income + Non-Operating Income
  • income_taxes
  • net_income: Earnings Before Tax - Income Tax - Minority Interest
  • ebit: Earnings Before Tax (Pretax Income) + Non-Operating Interest Expense; Not applicable for banks
  • ebitda: EBIT + Deprecation and Amortization; Not applicable for banks or insurance companies
  • diluted_shares_outstanding
  • interest_expense_non_operating: Not applicable for banks

Balance sheet metrics:

  • cash_and_short_term_investments: For companies outsides bank/ insurance companies, equivalent of Cash and Equivalents
  • cash_only: For insurance companies, equivalent of Cash and Equivalents
  • total_cash_and_due_from_banks: For banks, equivalent of Cash and Equivalents
  • total_assets
  • total_non_current_assets: Also known as fixed assets, not applicable for banks or insurance companies
  • total_liabilities
  • total_current_liabilities: Not applicable for banks or insurance companies
  • total_debt
  • total_equity
  • stockholders_equity: Also known as book value equity

Cash flow metrics:

  • net_operating_cash_flow
  • free_cash_flow

Data Sources for idx_financials_annual and idx_financials_quarterly

We have 2 data sources: WSJ and Yahoo Finance (YF) API. We determined the wsj_format by the formatting from WSJ because it follows strict formatting for companies in the same industry, e.g. Banking and Insurance have different formats due to having different metrics to consider. On the other hand, the formatting from YF API is quite random even for companies in the same industry.

We prioritize YF API as the source due to more convenient data retrieval. But for Banking and Insurance, we prioritize WSJ as the source due to the lack of certain metrics and standardization on the YF API. There are 3 steps in deciding how to determine this process:

  1. Filter companies by their industries, then for Banks and Insurance companies, set their data source to WSJ if the company is available in WSJ.
  2. Set the other companies' data source to YF if the company is available in YF. Note: this may still include Banking or Insurance industries if the company is not available in WSJ.
  3. For companies that are not available in YF, set their data source to WSJ if the company is available in WSJ. Note: this may still result in missing data for companies if data does not exist in both sources.

Looking at the 3 steps above, ideally, we would want to have Banking and Insurance company data from WSJ and other industries from YF. Due to that reason, we have a check mechanism to check this periodically ensuring Banking and Insurance industries data are sourced from WSJ. See below to see the wsj_format classification details.

WSJ

Data from WSJ are provided by FactSet, which also provides data for TradingView and MarketWatch.

Example URL:

Note: quarter can be replaced with annual

Formats

There are 4 different formats of financial data for IDX companies:

  1. General 1 (mostly non-insurance/ non-banking companies): https://www.wsj.com/market-data/quotes/ID/XIDX/ADRO/financials/quarter/income-statement
  2. General 2 (mostly companies related to real estate or finance): https://www.wsj.com/market-data/quotes/ID/XIDX/APLN/financials/quarter/income-statement
  3. Insurance (mostly insurance companies): https://www.wsj.com/market-data/quotes/ID/XIDX/AHAP/financials/quarter/income-statement
  4. Banking (mostly banking companies): https://www.wsj.com/market-data/quotes/ID/XIDX/BBCA/financials/quarter/income-statement

Note that WSJ may employ different category classifications compared to IDX

Data Processing

General 1 Format

  • Interest Expense is regarded as Non-Operating Interest Expense
  • Cash & Short Term Investments would be shown as Cash and Equivalents in Frontend
  • Total Debt = Long Term Debt + Short Term Debt & Current Portion of Long Term Debt

If both Long Term Debt and Short Term Debt & Current Portion of Long Term Debt are "-", then we will leave Total Debt as null.

  • Operating Income = Gross Income - SG&A Expense - Other Operating Expense
  • EBIT = Pretax Income + Non-Operating Interest Expense
  • EBITDA = EBIT + Deprecation and Amortization
  • Fixed Assets = Total Assets - Total Current Assets

Total Debt, Operating Income, EBIT and Fixed Assets are not available by default. EBITDA is available, but the value does not conform to our defined formula.

General 2 Format

  • Total Interest Expense is regarded as Non-Operating Interest Expense
  • Cash & Short Term Investments would be shown as Cash and Equivalents in Frontend
  • Gross Income = Operating Income + Selling, General & Admin. Expenses + Other Operating Expense
  • COGS = Total Revenue - Gross Income (not needed in Frontend)
  • EBIT = Pretax Income + Non-Operating Interest Expense
  • EBITDA = EBIT + Deprecation and Amortization
  • Total Non-Current Liabilities = Long Term Debt + Provision for Risks & Charges + Deferred Taxes Credit + Other Liabilities
  • Total Current Liabilities = Total Liabilities - Total Non-Current Liabilities

Gross Income, COGS, EBIT, EBITDA, Total Non-Current Liabilities and Total Current Liabilities are not available by default.

Banking Format

  • Total Cash & Due from Banks would be shown as Cash and Equivalents in Frontend
  • Total Revenue = Net Interest Income + Non-Interest Income

Insurance Format

  • Cash Only would be shown as Cash and Equivalents in Frontend
  • Operating Income Before Interest Expense is regarded as Operating Income
  • Interest Expense, Net of Interest Capitalized is regarded as Non-Operating Interest Expense
  • EBIT = Pretax Income + Non-Operating Interest Expense

Yahoo Finance API

Data are obtained through yfinance Python package (https://pypi.org/project/yfinance/).
Example usage:

import yfinance as yf
ticker = yf.Ticker("ADRO.JK")

# to obtain annual data (return a dataframe)
ticker.income_stmt
ticker.balance_sheet
ticker.cashflow

# to obtain quarterly data (return a dataframe)
ticker.quarterly_income_stmt
ticker.quarterly_balance_sheet
ticker.quarterly_cashflow

Data Processing

For banks, the following metrics are set to null:

  • gross_income
  • ebitda
  • cash_and_short_term_investments
  • total_non_current_assets
  • total_current_liabilities
  • ebit
  • interest_expense_non_operating

Note: total_cash_and_due_from_banks is not available on YF API

For insurance companies, the following metrics are set to null:

  • gross_income
  • ebitda
  • cash_and_short_term_investments
  • total_non_current_assets
  • total_current_liabilities

Note: cash_only is not available on YF API

Data Sources for idx_company_profile

IDX

Data (company profile) are obtained from IDX official website (e.g. https://www.idx.co.id/en/listed-companies/company-profiles/BBCA). Using Selenium to scrape the following information:

"company_name",
"symbol",
"address",
"email",
"phone",
"fax",
"NPWP",
"website",
"listing_date",
"listing_board",
"sub_sector_id",
"industry",
"sub_industry",
"register",
"shareholders",
"directors",
"commissioners",
"audit_committees",
"delisting_date"

The following columns are cleaned so the format and values are more standardized.

"shareholders",
"directors",
"commissioners",
"audit_committees"

Other columns:

  • nologo: Manual input
  • wsj_format: WSJ
  • current_source: YF API and WSJ
  • yf_currency: YF API
  • morningstar_code: MS website
  • ipo_price: IDX E-Ipo

Check https://github.com/supertypeai/sectors-kb/wiki/ETL-Pipeline-Process for more information about wsj_format and current_source

Data Sources for idx_daily_data

Yahoo Finance API

Example usage:

import yfinance as yf
ticker = yf.Ticker("ADRO.JK")

# return the current market cap 
ticker.info.get("marketCap", None)

# return a dataframe containing the daily close price and volume starting from last_date. auto_adjust is set to false because we don't want the price to be adjusted by dividend.
ticker.history(start=last_date, auto_adjust=False)["Close", "Volume"](/supertypeai/sectors-kb/wiki/"Close",-"Volume")

If the current market cap is not available from YF API, we will try to obtain it by scraping the YF website, e.g. https://finance.yahoo.com/quote/BBCA.JK/key-statistics?p=BBCA.JK. If the current market cap is still not available, we will manually derive it using the most recently available market cap and close price (assuming the share number is constant). Occasionally we might also have a null market cap for the past date. In that case, we will manually derive it using the current market cap and close price (if available) or resorting to the most recently available market cap and close price.

Data Sources for idx_key_stats

Yahoo Finance API

Example usage:

import yfinance as yf
ticker = yf.Ticker("ADRO.JK")

# return a dictionary
ticker.info

# return a dataframe containing % share held by insiders and institutions
ticker.major_holders

The following columns are obtained from ticker.info: forward_eps, recommendation_mean, and employee_num. Data from ticker.major_holders are processed into a dictionary and stored in holders_breakdown column.

Scripted Input

Point Summaries

point_summaries is updated using a script. Calculated based on the ticker's metrics on several sections.

Intrinsic Value through Discounted Cash Flow Analysis

Inflation data is obtained from https://www.inflationtool.com/indonesian-rupiah?amount=100&year1=2019&year2={year2}&frequency=yearly
Financial data is obtained from:

  • idx_company_report_mv: historical_valuation, historical_financials, market_cap
  • idx_financials_annual: basic_eps, share_issued
  • idx_sector_reports_calc: historical_valuation

Data Processing:

  • Calculate Average Cyclically Adjusted Earning (future value): price×(1+avg_inflation_rate)**(total_elements−i−1). In simpler terms, it's applying an inflation factor to each element in the list based on its position in the list. The further down the list (higher index i), the more times it applies the inflation factor. The result is a list of future values for each element in the net_income_list.
  • Calculate the sub_sector_roe : (sub_sector_pb_ttm/sub_sector_pe_ttm), sub_sector_npm : (sub_sector_ps_ttm/sub_sector_pe_ttm), ticker_roe : (ticker_pb_ttm/ticker_pe_ttm), ticker_npm : (ticker_ps_ttm/ticker_pe_ttm)
  • Calculate ticker_der: for banking sub sector (19), der formula is ticker_total_liabilities/ticker_total_equity. Instead, der formula is ticker_total_debt/ticker_total_equity
  • Calculate ticker_profit_margin_stability:
  1. Operating Profit Margin (OPM): Net Income/ Total Revenue
  2. Calculate Average OPM & Standard Deviation OPM
  3. Profit Margin Stability: Average Operating Profit Margin/Standard Deviation ​- Calculate ticker_earning_predictability: calculate the correlation between years and net_income
  • Calculate discount_rate:
  1. Beta Calculation: Define a list of beta values [0.33, 0.67, 1, 1.5 , 2]. Then, create a dictionary (data) containing financial metrics and their corresponding thresholds. The thresholds are specified for metrics such as Return on Equity (ROE), Debt-to-Equity Ratio (DER), Net Profit Margin (NPM), Profit Margin Stability, Earning Predictability, and Market Capitalization. Loop through the metrics in the dictionary: a. If the metric is 'roe', 'npm', 'earning_predictability', or 'market_cap': If the metric value is greater than the first threshold, assign the lowest beta (0.33). Otherwise, iterate through the remaining thresholds, and interpolate the beta based on the metric's position between the thresholds. b. If the metric is 'der' or not in the specified metrics: If the metric value is less than the first threshold, assign the lowest beta (0.33). Otherwise, iterate through the remaining thresholds, and interpolate the beta based on the metric's position between the thresholds.
  2. Calculate Median Beta:
  3. Calculate the discount rate using the formula: Discount Rate=0.07+(0.05×Median Beta)
  • Calculate cae_per_share : avg_cae/share_issued
  • Calculate avg_eps : average between cae_per_share & diluted_eps
  • Calculate intrinsic_value :
  1. Calculate the intrinsic value for the next 100 years. For the first 10 years: value = avg_eps×(1+growth_rate_10y)i . If the year is after the first 10 years:value=values[9]×(1+growth_rate_after_10y)(i−10)
  2. Convert to present value: present_value= value / (1+discount_rate)**(i+1) ​3. Calculate the intrinsic value by adding all of the present_value

TradingView

Data of stock's rating are gathered from TradingView. There are two types of stocks rating that are gathered, the technical rating and the analyst rating.

Technical Rating

Technical rating is breakdown into three periods of time, which are daily, weekly, and monthly. Each of technical rating data is collected frequently based on its relevant periods. The structure of the data is displayed as below:

{
  "summary": {
    "buy": 7,
    "sell": 9,
    "neutral": 10,
    "updated_on": "2024-07-21 08:15:02"
  },
  "oscillator": {
    "buy": 1,
    "data": [
      {
        "name": "Relative Strength Index (14)",
        "value": 51,
        "action": "Neutral"
      },
      ...
    ],
    "sell": 1,
    "neutral": 9,
    "updated_on": "2024-07-21 08:15:02"
  },
  "moving_average": {
    "buy": 6,
    "data": [
      {
        "name": "Exponential Moving Average (10)",
        "value": 2316,
        "action": "Sell"
      },
      ...
    ],
    "sell": 8,
    "neutral": 1,
    "updated_on": "2024-07-21 08:15:02"
  }
}

Analyst Rating

Analyst rating is collected once every three months. The structure of the data is displayed as below:

{
  "buy": 2,
  "hold": 3,
  "sell": 0,
  "n_analyst": 10,
  "strong_buy": 4,
  "updated_on": "2024-07-10 00:35:21",
  "strong_sell": 1
}

Data Sources for idx_dividend

SahamIDX

Data Processing

Dividend records are retrieved for a specified date range. Besides the dividend value, we also store the yield in idx_dividend. The dividend yield is calculated by dividing the dividend value for each date with the mean closing price of the stock for the corresponding year. The dividend yield is computed in Python and saved in the table only for the preceding year. Meanwhile, the dividend yield for the current year is dynamically computed at the view level.

Data Sources for idx_stock_split

SahamIDX: Stock Split & Reverse Stock Split

Data Processing

  1. Retrieve (reverse) stock split records from the database for future dates
  2. Retrieve (reverse) stock split records from SahamIDX for future dates
  3. Compare data from step 1 and 2, and perform necessary update and delete, so the records in the database match the ones in the SahamIDX

Note: deletion of record from the database may occur when there is a modification on the stock split date on SahamIDX website

Data Sources for idx_institution_transactions

Morningstar through RapidAPI (MS Finance and Morning Star)

API endpoint:

Example request:

import requests

url = "https://morning-star.p.rapidapi.com/stock/v2/get-ownership"

querystring = {
                "performanceId": ms_code,
                "ownership": ownership, # either "Buyers" or "Sellers"
                "asset": "institution",
            }

headers = {
	"X-RapidAPI-Key": rapid_api_key,
	"X-RapidAPI-Host": "morning-star.p.rapidapi.com"
}

response = requests.get(url, headers=headers, params=querystring)

print(response.json())

Data Processing

The API will return the top 20 institutional buyers and sellers for each symbol in the previous month, and we store the institution name and changeAmount. We calculate the net_transaction by summing over the changeAmount of the top buyers and sellers.

Data Sources for idx_manual_input

Income statement data manually extracted from PDF report. USD-IDR conversion rate used from the last working day of the financial year from here

Data Processing

Data will be processed from an Excel file with a predetermined format to the following structure:

sankey_component:

{
  "links": [
    {
      "value": 6512275000000,
      "source": "Loan",
      "target": "Interest Income"
    },
    // ... (more links)
  ],
  "nodes": [
    {
      "id": "Loan",
      "nodeColor": "hsl(195, 53%, 79%)"
    },
    // ... (more nodes)
  ]
}

income_stmt_metrics:

{
  "provision": 147840000000,
  "net_income": 4052678000000,
  "income_taxes": 975392000000,
  ... (more income statement metrics)
  "int_income_breakdown": [
    {
      "amount": 6512275000000,
      "category": "Loan"
    },
    // ... (more breakdown items)
  ],
  "operating_expense_breakdown": [
    {
      "amount": 11749000000,
      "category": "Fees and commissions"
    },
    // ... (more breakdown items)
  ]
}

Data Sources for idx_esg_score

Data (esg score) are obtained from IDX official website (e.g. https://idx.co.id/secondary/get/esg/detail/BBCA?language=en-us). Using Selenium to scrape the following information:

'symbol'
'last_esg_update_date'
'esg_score'
'controversy_risk'
'environment_risk_score'
'social_risk_score'
'governance_risk_score'

Data Sources for idx_historical_mcap

Because historical market cap data are not available from YF API, currently we retrieve the data from IDX Digital Statistic. Going on forward, we will populate the idx_historical_mcap table based on the data store on idx_daily_data

Data Sources for idx_company_forecast

Data (company_forecast) are obtained from Trading View Website in Forecast section (e.g https://www.tradingview.com/symbols/IDX-TPIA/forecast/)

Backup: Yahoo Finance official website (e.g https://finance.yahoo.com/quote/BBCA/analysis?p=BBCA).

Scrape the following information: revenue forecast amount, eps forecast amount

Data Processing

  • Process numeric value: replace T, B, M, K to number (e.g multiply by 1e12 for T), then save the value as an integer
  • Change datatypes: eps_estimate ( float32 ), revenue_estimate ( float64)

Final Columns

  • symbol: IDX company symbol ending with JK
  • year: the estimate for which year
  • revenue_estimate: estimated revenue based on yahoo finance analysis tab
  • eps_estimate: estimated earnings per share based on yahoo finance analysis tab

Data Sources for idx_ipo_perf

Price performances (7d, 30d, 90d, and 365d) are calculated using price data from YF API, calculated since the first trading date. The first trading date is considered valid if the date is at most 1 day later after the listing date.

Data Sources for sgx_short_sell

All of the data (except symbol variable) is sourced from sgx website using this url "f'https://api2.sgx.com/sites/default/files/reports/short-sell/{today.year}/{month}/website_DailyShortSell{date}1815.txt'".

  • symbol: From sgx_companies table. Left join the sgx short sell data with sgx_companies data using name column, however it used fuzzywuzzy package to take the name similarity, because the company name is not identical

Data Sources for index_daily_data

Daily Indices Price data since January 2019. All the index prices come from Yahoo Finance

  • All prices data in here using IDR except for:
    • STI (Straits Time Index, SGX), use SGD
    • KLSE (Bursa Malaysia), use MYR

Guidebook for idx_manual_input

This document explains the term and logic behind the data points used to compute the final figure in the database. It is structured to provide clarity on how each metric is derived and the rules applied to ensure consistency and accuracy in data entry.

Each section specifies the fields details, formatting rules, and any specific conditions that must be met.

Steps to follow:

  1. Sourcing official financial statements, annual reports, and other relevant documents.
  1. Extracting the required data points as per the guidelines.
  2. Leave empty if not reported ("NULL"). If it is "0", then it should be reported as "0".
  • ebitda (when NULL, compute from ebit + depreciation_amortization)
  • capital_expenditure (when NULL, compute from current_pp&e - previous_year_pp&e + current_depreciation_expenses)
  • earning_asset (when NULL, sum from Class: Earning Asset)
  • realized_capital_goods_investment
  • high_quality_liquid_asset

Banks

Industry-focused Metrics

In the context of banks, the financial metrics differ from those of non-banking companies. Key differences include the treatment of interest income and expenses. Due to the nature of banking business where income is generated from interest, there is no EBIT and EBITDA.

A few metrics that are only applicable to banks are:

  • LitA: Loan Income to Assets ratio

interest_income (Class: Loan & Deposit) / total_asset

  • B-ER: Bank Efficiency Ratio (also known as Cost to Income ratio)

operating_expense / total_revenue

  • Operating Inc.

pretax_income / total_revenue

  • NIM: Net Interest Margin ratio

net_interest_income / earning_asset

  • CASA: Current Account and Savings Account ratio

(current_account + savings_account) / total_deposit

  • NPL ratio: Non-Performing Loan ratio

loan_at_risk (Non-performing loan) / gross_loan

  • LAR: Loan At Risk ratio

loan_at_risk / gross_loan

  • CAR: Capital Adequacy Ratio

(core_capital_tier1 + supplementary_capital_tier2) / total_risk_weighted_asset

  • NPL: Non-Performing Loan ratio

non_performing_loan / gross_loan

  • LDR: Loan to Deposit ratio

gross_loan / total_deposit

income_stmt_metrics

  • interest_income: As declared in financial statement (Revenue breakdown total must tally with interest_income)

    • Class: Used for categorization in the database so the breakdown can be comparable across all banks.
      • Loans & Deposits
      • Financial Receivable: Export Bills, Acceptances, etc.
      • Securities: Government Bonds, Certificates, etc.
      • Marketable Securities: Secutities sold under agreements to repurchase, repo
      • Interbank Placement: Placement with Bank Indonesia and other banks
      • Current Accounts: Current accounts with Bank Indonesia and other banks
      • Sharia Income: Murabahah margin income, Mudharabah and Musyarakah profit sharing income, Qardh ujrah income, Ijarah leasing income, etc.
      • Other Income: Fees and commissions
    • Category: As declared in financial statement, e.g., "Loans", "Placement with Bank Indonesia and other banks", "Government Bonds", etc.
  • interest_expense: As declared in financial statement

  • net_interest_income = interest_income - interest_expense

  • premium_income: As declared in financial statement, only applicable for insurance companies

  • premium_expense: As declared in financial statement, only applicable for insurance companies

  • net_premium_income = premium_income - premium_expense

  • non_interest_income: As declared in financial statement

  • total_revenue = net_interest_income + net_premium_income + non_interest_income

  • operating_expense: only negative figures; positive figure should be processed by adding them into "non_operating_income_or_loss" where one-off items are recorded.

    • Class: Used for categorization in the database so the breakdown can be comparable.
      • Sales & Marketing: Commission, Advertising, Promotion, etc.
      • General & Admin: Rent, Utilities, Depreciation, Amortization, etc.
      • R&D: Exploration costs, Research and Development expenses, etc.
      • Salaries & Benefits: Salaries, Wages, Bonuses, etc.
      • Other expenses: Other expenses not categorized above.
    • Category: As declared in financial statement, e.g., "Personnel", "Commission", "Depreciation", "Premium of government guarantee", etc.
  • provision = As declared in financial statement; reversal of for impairment losses + (allowance for impairment losses)

  • net_operating_income = total_revenue - operating_expense - provision

  • non_operating_income_or_loss: As declared in financial statement + any one-off items from operating_expense

  • pretax_income = net_operating_income - non_operating_income_or_loss

  • income_taxes: As declared in financial statement (+zakat if applicable)

  • minorities: as declared in financial statement (also known as non-controlling interests)

  • net_income = operating_income + non_operating_income_or_loss - income_taxes - minorities

  • diluted_shares_outstanding: keyword search "weighted average", take the number under earnings per share note in financial statement

Note: It should be "0" if it is. Only left blank ("NULL") if the figure is not available.

Interest Income for Syariah banks is categorized as follows:

  • Murabahah margin income: Cost-plus financing (loan with a financing cost, instead of an interest)
  • Mudharabah: Interest-based investment product
  • Musyarakah: Partnership, Joint Venture
  • Qardh ujrah income: Qardh (benevolent loan) with a fee
  • Ijarah leasing income: Income from leasing assets
  • Istishna: Manufacturing contract financing

balance_sheet_metrics

  • gross_loan: As declared in financial statement

  • allowance_for_loans: As declared in financial statement; if negative, it is a reversal

  • net_loan = gross_loan - allowance_for_loans

  • non_loan_asset

    • Class: Earning Asset: Assets that generate income.
      • Cash and CA = Cash + Current accounts

      • Interbank Placement = Placement with Bank Indonesia and other banks

      • Securities = Government Bonds + Certificates, etc.

      • Marketable Securities = Securities sold under agreements to repurchase

      • Other Receivables = Export Bills + Interest Receivable + Finance Receivable + Acceptance Receivable, etc.

      • Investment in Shares/Associates/Joint Ventures
    • Class: Non-Earning Asset: Assets that do not generate income.
      • Derivative Receivables: Used for hedging purposes.
      • Fixed assets and right-of-use assets = Fixed assets + Right-of-use assets

      • Intangible assets
      • Deferred tax assets
      • Foreclosed assets
      • Other assets
  • total_asset = net_loan + non_loan_asset

  • earning_asset = Cash and CA + Interbank Placement + Securities + Marketable Securities + Other Receivables

  • current_account: Current accounts with Bank Indonesia and other banks

  • savings_account: Savings accounts with Bank Indonesia and other banks

  • time_deposit: Time deposits with Bank Indonesia and other banks

  • total_deposit = current_account + savings_account + time_deposit (Used for calculation for CASA ratio)

  • other_interest_bearing_liabilities: Interest-bearing liabilities that are not deposits, such as:

    • Deposits from other banks (because it is not included in CASA ratio calculation)
    • Securities sold under agreements to repurchase, Debt securities
    • Subordinated Loans
    • Borrowings
    • Interest Payable
    • Acceptance Payables
    • Liabilities to unit-link policyholders
    • Temporary Syirkah Funds (because it is not included in CASA ratio calculation)
  • non_interest_bearing_liabilities: Liabilities that do not bear interest, such as:

    • Employee Benefits
    • Liabilities due immediately
    • Tax Payables
  • total_liabilities = total_deposit + other_interest_bearing_liabilities + non_interest_bearing_liabilities

  • total_equity = total_asset - total_liabilities

  • core_capital_tier1: As declared in financial statement

  • supplementary_capital_tier2: As declared in financial statement

  • credit_rwa: Risk-weighted assets for credit risk, as declared in financial statement

  • market_rwa: Risk-weighted assets for market risk, as declared in financial statement

  • operational_rwa: Risk-weighted assets for operational risk, as declared in financial statement

  • total_risk_weighted_asset = credit_rwa + market_rwa + operational_rwa

Note: market_rwa is sometimes "0".

cash_flow_metrics

  • cash_inflow
  • cash_outflow
  • net_cash_flow
  • free_cash_flow
  • financing_cash_flow: As declared in financial statement
  • investing_cash_flow: As declared in financial statement
  • operating_cash_flow: As declared in financial statement
  • end_cash_position = operating_cash_flow + investing_cash_flow + financing_cash_flow

  • high_quality_liquid_asset: As declared in annual report. If not available, leave blank "NULL".
  • realized_capital_goods_investment: As declared in annual report. Equivalent to Capital expenditure (CapEx) in the banking industry.If not available, leave blank "NULL". Otherwise, it is "0" if there is no capital goods investment. Keywords to search for: "capital goods", "capital investment", "realized capital"

employee_breakdown

This contains total number of employees, broken down into:

  • permanent_employee: permanent employees
  • contract_employee: contract employees
  • others_employee: other employees (e.g., outsourced, temporary, etc.)
  • total_employee = permanent_employee + contract_employee + others_employee

In order to compute NIPE (Net Income Per Employee), the following formula is used:

net_income / permanent_employee

industry_breakdown

This contains data structured to provide insights into the financial health and risk of the bank, including:

  • special mention loans, non-performing loans and restructured loans for Loan-At-Risk metrics
  • non-loan assets categorized into earning and non-earning assets
  • loan distribution and exposure across various economic sectors

In order to compute Loan-At-Risk (LAR) ratio, the following formula is used:

loan_at_risk / gross_loan

loan_at_risk = Special Mention Loan + Non-performing Loan + Restructured Loan

  • Special Mention Loan: figure declared under special mention category by OJK collectibility (before allowance for impairment losses) in financial statement.
    • Normally declared in a table with collectibility categories: current, special mention, sub-standard, doubtful, and loss under Loan Receivable section or Additional Information section.
  • Non-performing Loan (NPL): sum of sub-standard, doubtful and loss category by OJK collectibility (before allowance for impairment losses) in financial statement.
    • The same table where Special Mention Loan is declared.
  • Restructured Loan (current): figure declared under current category by OJK collectibility (before allowance for impairment losses) in financial statement.
    • A different table from where Special Mention Loan and NPL are declared. But it shares the same format of collectibility categories. It is under Restructured Loan section or Additional Information section.
    • Sometimes declared in a sentence instead of a tabular form. Sentence structured as "total restructured loan in year X is IDRXXX"

Earning Asset: Assets that generate income.

earning_asset = Cash and CA + Interbank Placement + Securities + Marketable Securities + Other Receivables + Investment in Shares + Interest Receivable

Note: Interest receivable is considered earning assets because it it interest that has been earned but not yet received. The asset is generated from the same framework so it is included in earning asset.

  • Cash and CA
    • Cash
    • Current accounts with Bank Indonesia and other banks
  • Interbank Placement
    • Placement with Bank Indonesia and other banks
  • Securities
    • Government Bonds
    • Certificates
  • Marketable Securities
    • Securities sold under agreements to repurchase
    • Marketable Securities
  • Other Receivables
    • Export Bills
    • Interest Receivable
    • Finance Receivable
    • Acceptance Receivable
  • Ijarah Assets
  • Prepaid Expenses/Accrued Income

Non-Earning Asset = Assets that do not generate income directly, but are used for hedging purposes or are not directly related to the bank's core operations, e.g., derivative receivables.

  • Derivative Receivables: Used for hedging purposes.
  • Fixed assets and right-of-use assets: As declared in financial statement
  • Intangible assets: Assets that do not have physical substance, such as patents, trademarks and goodwill.
  • Deferred tax assets: Tax assets that are expected to be realized in the future.
  • Foreclosed assets: Assets that have been foreclosed by the bank due to non-payment of loans. e.g a property that has been taken back by the lender because the borrower defaulted on their mortgage payments.
  • Other assets: Other assets that do not fall into the above categories, such as prepaid expenses and accrued income.

loan_by_economic_sectors: Loan distribution across various economic sectors (Key), structured as follows:

  • Manufacturing: Industry
  • Business services
  • Trading, restaurants and hotels: Accomodation, food and beverage, and retail trade
  • Agriculture and agricultural facilities: includes forestry and fishery
  • Construction
  • Transportation and warehousing: Transportation, warehousing, and communications
  • Social/public services: includes education, health, and social services
  • Mining
  • Electricity, gas, and water: includes waste managemnt
  • Property: includes real estate and leasing activities
  • Others
  • Financial Intermediaries: Finance and insurance activities

Note: sum of loan_by_economic_sectors must tally with gross_loan

Non-Banks

income_stmt_metrics

  • total_revenue: As declared in financial statement (Revenue breakdown total must tally with total_revenue)
    • Class: Used for categorization in the database so the breakdown can be comparable across companies of the same industry.
      • Product/Service Sales: Revenue from sales of products or services
      • Product Manufacturing: Revenue from manufacturing products
      • Product Rental: Revenue from renting out products
      • Logistics & Deliveries: Revenue from logistics and delivery services
      • Service and Maintenance: Revenue from service and maintenance activities
      • Rental/Lease Income: Revenue from rental or lease of assets
      • Other Income: Other income not categorized above
      • Adjustments and elimination: Adjustments made to eliminate intercompany transactions
    • Category: As declared in financial statement
      • Coal Trading
      • Property Rental
      • MIDI
  • cost_of_revenue: As declared in financial statement
  • gross_income = total_revenue - cost_of_revenue

  • operating_expense: only negative figures; any positive figure or any one-off items should be recorded into "net non operating income/(expenses)"
    • Class: Used for categorization in the database so the breakdown can be comparable across companies of the same industry.
      • Sales & Marketing: Commission, Advertising, Promotion, etc.
      • General & Admin: Rent, Utilities, Depreciation, Amortization, etc.
      • R&D: Exploration costs, Research and Development expenses, etc.
      • Salaries & Benefits: Salaries, Wages, Bonuses, etc.
      • Other expenses: Other expenses not categorized above.
    • Category: As declared in financial statement
      • Personnel
      • General & Admin
      • Exploration
      • Depreciation
  • operating_income = gross_income - operating_expense

  • non_operating_income_or_loss = figure declared in financial statement + any one-off items, e.g., "Gain on sale of assets", "Foreign exchange gain/loss", etc.

  • pretax_income = operating_income + non_operating_income_or_loss
  • income_taxes: As declared in financial statement
  • minorities: As declared in financial statement (also known as non-controlling interests)
  • net_income = pretax_income - income_taxes - minorities

  • interest_expense_non_operating = interest/finance income - interest/finance expense/cost

  • ebit = operating_income + interest_expense_non_operating

  • ebitda: As declared in annual report/financial statement; if not available, compute from:

    ebitda = ebit + depreciation_amortization

  • diluted_shares_outstanding: As declared in financial statement
    • keyword search "weighted average", take the figure from Earnings Per Share section in financial statement.

balance_sheet_metrics

  • total_current_asset: As declared in financial statement
  • total_non_current_asset: As declared in financial statement
  • total_asset = total_current_asset + total_non_current_asset

  • total_current_liabilities: As declared in financial statement
  • total_non_current_liabilities: As declared in financial statement
  • total_liabilities = total_current_liabilities + total_non_current_liabilities

  • total_equity = total_asset - total_liabilities

  • working capital = total_current_asset - total_current_liabilities

cash_flow_metrics

  • free_cash_flow
  • financing_cash_flow: As declared in financial statement
  • investing_cash_flow: As declared in financial statement
  • operating_cash_flow: As declared in financial statement
  • end_cash_position = operating_cash_flow + investing_cash_flow + financing_cash_flow

  • capital_expenditure: As declared in financial statement; if not available, leave blank "NULL".
    • capital_expenditure if "NULL" = current_pp&e - previous_year_pp&e + current_depreciation_expenses

employee_breakdown

Identical to banking companies.

  • permanent_employee: permanent employees
  • contract_employee: contract employees
  • others_employee: other employees (e.g., outsourced, temporary, etc.)
  • total_employee = permanent_employee + contract_employee + others_employee

In order to compute NIPE (Net Income Per Employee), the following formula is used:

net_income / permanent_employee

industry_breakdown

Guidebook for idx_company_customer

Breakdown of customers: understand it as from WHO the business make money. Preferably list of companies (with or without stock ticker). Otherwise, segment or geographical location.

  1. client_name
  • Name Formatting:
      • Pte Ltd Pte. Ltd. Private Limited
      • Sdn Bhd Sdn. Bhd. Sendirian Berhad
      • PT PLN (Persero)
  • if department, less than 25 characters
  • if listed, match company name in the db
  1. client_ticker (XXXX.XX)

  2. If client information is not available, use operating segment information

  • nature of business: palm oil, wood, rental, service, etc.
  • customer type: Local/Export, etc.
  • geographical location: Java, Sumatra, Kalimantan, etc.