Data Sources and Processing - supertypeai/sectors-kb GitHub Wiki
This section focuses on how we obtain and process the data before storing it inside the database.
List of Columns for idx_financials_annual and idx_financials_quarterly
Metadata:
- symbol: IDX company symbol ending with JK
- date: date of the financial report
Income statement metrics:
- total_revenue: For banks, obtained by adding the Net Interest Income and Non-Interest Income
- gross_income: Total Revenue - Costs of Goods Sold; Not applicable for banks or insurance companies, since they don't have Costs of Goods Sold
- operating_income: Gross Income - Operating Expense (excluding Costs of Goods Sold); For banks, obtained by subtracting the Interest Expense, Non-Interest Expense, and Loan Loss Provision from the Total Revenue.
- pretax_income: Operating Income + Non-Operating Income
- income_taxes
- net_income: Earnings Before Tax - Income Tax - Minority Interest
- ebit: Earnings Before Tax (Pretax Income) + Non-Operating Interest Expense; Not applicable for banks
- ebitda: EBIT + Deprecation and Amortization; Not applicable for banks or insurance companies
- diluted_shares_outstanding
- interest_expense_non_operating: Not applicable for banks
Balance sheet metrics:
- cash_and_short_term_investments: For companies outsides bank/ insurance companies, equivalent of Cash and Equivalents
- cash_only: For insurance companies, equivalent of Cash and Equivalents
- total_cash_and_due_from_banks: For banks, equivalent of Cash and Equivalents
- total_assets
- total_non_current_assets: Also known as fixed assets, not applicable for banks or insurance companies
- total_liabilities
- total_current_liabilities: Not applicable for banks or insurance companies
- total_debt
- total_equity
- stockholders_equity: Also known as book value equity
Cash flow metrics:
- net_operating_cash_flow
- free_cash_flow
Data Sources for idx_financials_annual and idx_financials_quarterly
We have 2 data sources: WSJ and Yahoo Finance (YF) API. We determined the wsj_format by the formatting from WSJ because it follows strict formatting for companies in the same industry, e.g. Banking and Insurance have different formats due to having different metrics to consider. On the other hand, the formatting from YF API is quite random even for companies in the same industry.
We prioritize YF API as the source due to more convenient data retrieval. But for Banking and Insurance, we prioritize WSJ as the source due to the lack of certain metrics and standardization on the YF API. There are 3 steps in deciding how to determine this process:
- Filter companies by their industries, then for Banks and Insurance companies, set their data source to WSJ if the company is available in WSJ.
- Set the other companies' data source to YF if the company is available in YF. Note: this may still include Banking or Insurance industries if the company is not available in WSJ.
- For companies that are not available in YF, set their data source to WSJ if the company is available in WSJ. Note: this may still result in missing data for companies if data does not exist in both sources.
Looking at the 3 steps above, ideally, we would want to have Banking and Insurance company data from WSJ and other industries from YF. Due to that reason, we have a check mechanism to check this periodically ensuring Banking and Insurance industries data are sourced from WSJ. See below to see the wsj_format classification details.
WSJ
Data from WSJ are provided by FactSet, which also provides data for TradingView and MarketWatch.
Example URL:
- https://www.wsj.com/market-data/quotes/ID/XIDX/ADRO/financials/quarter/income-statement
- https://www.wsj.com/market-data/quotes/ID/XIDX/ADRO/financials/quarter/balance-sheet
- https://www.wsj.com/market-data/quotes/ID/XIDX/ADRO/financials/quarter/cash-flow
Note: quarter can be replaced with annual
Formats
There are 4 different formats of financial data for IDX companies:
- General 1 (mostly non-insurance/ non-banking companies): https://www.wsj.com/market-data/quotes/ID/XIDX/ADRO/financials/quarter/income-statement
- General 2 (mostly companies related to real estate or finance): https://www.wsj.com/market-data/quotes/ID/XIDX/APLN/financials/quarter/income-statement
- Insurance (mostly insurance companies): https://www.wsj.com/market-data/quotes/ID/XIDX/AHAP/financials/quarter/income-statement
- Banking (mostly banking companies): https://www.wsj.com/market-data/quotes/ID/XIDX/BBCA/financials/quarter/income-statement
Note that WSJ may employ different category classifications compared to IDX
Data Processing
General 1 Format
- Interest Expense is regarded as Non-Operating Interest Expense
- Cash & Short Term Investments would be shown as Cash and Equivalents in Frontend
- Total Debt = Long Term Debt + Short Term Debt & Current Portion of Long Term Debt
If both Long Term Debt and Short Term Debt & Current Portion of Long Term Debt are "-", then we will leave Total Debt as null.
- Operating Income = Gross Income - SG&A Expense - Other Operating Expense
- EBIT = Pretax Income + Non-Operating Interest Expense
- EBITDA = EBIT + Deprecation and Amortization
- Fixed Assets = Total Assets - Total Current Assets
Total Debt, Operating Income, EBIT and Fixed Assets are not available by default. EBITDA is available, but the value does not conform to our defined formula.
General 2 Format
- Total Interest Expense is regarded as Non-Operating Interest Expense
- Cash & Short Term Investments would be shown as Cash and Equivalents in Frontend
- Gross Income = Operating Income + Selling, General & Admin. Expenses + Other Operating Expense
- COGS = Total Revenue - Gross Income (not needed in Frontend)
- EBIT = Pretax Income + Non-Operating Interest Expense
- EBITDA = EBIT + Deprecation and Amortization
- Total Non-Current Liabilities = Long Term Debt + Provision for Risks & Charges + Deferred Taxes Credit + Other Liabilities
- Total Current Liabilities = Total Liabilities - Total Non-Current Liabilities
Gross Income, COGS, EBIT, EBITDA, Total Non-Current Liabilities and Total Current Liabilities are not available by default.
Banking Format
- Total Cash & Due from Banks would be shown as Cash and Equivalents in Frontend
- Total Revenue = Net Interest Income + Non-Interest Income
Insurance Format
- Cash Only would be shown as Cash and Equivalents in Frontend
- Operating Income Before Interest Expense is regarded as Operating Income
- Interest Expense, Net of Interest Capitalized is regarded as Non-Operating Interest Expense
- EBIT = Pretax Income + Non-Operating Interest Expense
Yahoo Finance API
Data are obtained through yfinance Python package (https://pypi.org/project/yfinance/).
Example usage:
import yfinance as yf
ticker = yf.Ticker("ADRO.JK")
# to obtain annual data (return a dataframe)
ticker.income_stmt
ticker.balance_sheet
ticker.cashflow
# to obtain quarterly data (return a dataframe)
ticker.quarterly_income_stmt
ticker.quarterly_balance_sheet
ticker.quarterly_cashflow
Data Processing
For banks, the following metrics are set to null:
- gross_income
- ebitda
- cash_and_short_term_investments
- total_non_current_assets
- total_current_liabilities
- ebit
- interest_expense_non_operating
Note: total_cash_and_due_from_banks is not available on YF API
For insurance companies, the following metrics are set to null:
- gross_income
- ebitda
- cash_and_short_term_investments
- total_non_current_assets
- total_current_liabilities
Note: cash_only is not available on YF API
Data Sources for idx_company_profile
IDX
Data (company profile) are obtained from IDX official website (e.g. https://www.idx.co.id/en/listed-companies/company-profiles/BBCA). Using Selenium to scrape the following information:
"company_name",
"symbol",
"address",
"email",
"phone",
"fax",
"NPWP",
"website",
"listing_date",
"listing_board",
"sub_sector_id",
"industry",
"sub_industry",
"register",
"shareholders",
"directors",
"commissioners",
"audit_committees",
"delisting_date"
The following columns are cleaned so the format and values are more standardized.
"shareholders",
"directors",
"commissioners",
"audit_committees"
Other columns:
- nologo: Manual input
- wsj_format: WSJ
- current_source: YF API and WSJ
- yf_currency: YF API
- morningstar_code: MS website
- ipo_price: IDX E-Ipo
Check https://github.com/supertypeai/sectors-kb/wiki/ETL-Pipeline-Process for more information about wsj_format and current_source
Data Sources for idx_daily_data
Yahoo Finance API
Example usage:
import yfinance as yf
ticker = yf.Ticker("ADRO.JK")
# return the current market cap 
ticker.info.get("marketCap", None)
# return a dataframe containing the daily close price and volume starting from last_date. auto_adjust is set to false because we don't want the price to be adjusted by dividend.
ticker.history(start=last_date, auto_adjust=False)["Close", "Volume"](/supertypeai/sectors-kb/wiki/"Close",-"Volume")
If the current market cap is not available from YF API, we will try to obtain it by scraping the YF website, e.g. https://finance.yahoo.com/quote/BBCA.JK/key-statistics?p=BBCA.JK. If the current market cap is still not available, we will manually derive it using the most recently available market cap and close price (assuming the share number is constant). Occasionally we might also have a null market cap for the past date. In that case, we will manually derive it using the current market cap and close price (if available) or resorting to the most recently available market cap and close price.
Data Sources for idx_key_stats
Yahoo Finance API
Example usage:
import yfinance as yf
ticker = yf.Ticker("ADRO.JK")
# return a dictionary
ticker.info
# return a dataframe containing % share held by insiders and institutions
ticker.major_holders
The following columns are obtained from ticker.info: forward_eps, recommendation_mean, and employee_num. Data from ticker.major_holders are processed into a dictionary and stored in holders_breakdown column.
Scripted Input
Point Summaries
point_summaries is updated using a script. Calculated based on the ticker's metrics on several sections.
Intrinsic Value through Discounted Cash Flow Analysis
Inflation data is obtained from https://www.inflationtool.com/indonesian-rupiah?amount=100&year1=2019&year2={year2}&frequency=yearly
Financial data is obtained from:
- idx_company_report_mv: historical_valuation, historical_financials, market_cap
- idx_financials_annual: basic_eps, share_issued
- idx_sector_reports_calc: historical_valuation
Data Processing:
- Calculate Average Cyclically Adjusted Earning (future value): price×(1+avg_inflation_rate)**(total_elements−i−1). In simpler terms, it's applying an inflation factor to each element in the list based on its position in the list. The further down the list (higher index i), the more times it applies the inflation factor. The result is a list of future values for each element in the net_income_list.
- Calculate the sub_sector_roe : (sub_sector_pb_ttm/sub_sector_pe_ttm), sub_sector_npm : (sub_sector_ps_ttm/sub_sector_pe_ttm), ticker_roe : (ticker_pb_ttm/ticker_pe_ttm), ticker_npm : (ticker_ps_ttm/ticker_pe_ttm)
- Calculate ticker_der: for banking sub sector (19), der formula is ticker_total_liabilities/ticker_total_equity. Instead, der formula is ticker_total_debt/ticker_total_equity
- Calculate ticker_profit_margin_stability:
- Operating Profit Margin (OPM): Net Income/ Total Revenue
- Calculate Average OPM & Standard Deviation OPM
- Profit Margin Stability: Average Operating Profit Margin/Standard Deviation - Calculate ticker_earning_predictability: calculate the correlation between years and net_income
- Calculate discount_rate:
- Beta Calculation: Define a list of beta values [0.33, 0.67, 1, 1.5 , 2]. Then, create a dictionary (data) containing financial metrics and their corresponding thresholds. The thresholds are specified for metrics such as Return on Equity (ROE), Debt-to-Equity Ratio (DER), Net Profit Margin (NPM), Profit Margin Stability, Earning Predictability, and Market Capitalization. Loop through the metrics in the dictionary: a. If the metric is 'roe', 'npm', 'earning_predictability', or 'market_cap': If the metric value is greater than the first threshold, assign the lowest beta (0.33). Otherwise, iterate through the remaining thresholds, and interpolate the beta based on the metric's position between the thresholds. b. If the metric is 'der' or not in the specified metrics: If the metric value is less than the first threshold, assign the lowest beta (0.33). Otherwise, iterate through the remaining thresholds, and interpolate the beta based on the metric's position between the thresholds.
- Calculate Median Beta:
- Calculate the discount rate using the formula: Discount Rate=0.07+(0.05×Median Beta)
- Calculate cae_per_share : avg_cae/share_issued
- Calculate avg_eps : average between cae_per_share & diluted_eps
- Calculate intrinsic_value :
- Calculate the intrinsic value for the next 100 years. For the first 10 years: value = avg_eps×(1+growth_rate_10y)i . If the year is after the first 10 years:value=values[9]×(1+growth_rate_after_10y)(i−10)
- Convert to present value: present_value= value / (1+discount_rate)**(i+1) 3. Calculate the intrinsic value by adding all of the present_value
TradingView
Data of stock's rating are gathered from TradingView. There are two types of stocks rating that are gathered, the technical rating and the analyst rating.
Technical Rating
Technical rating is breakdown into three periods of time, which are daily, weekly, and monthly. Each of technical rating data is collected frequently based on its relevant periods. The structure of the data is displayed as below:
{
  "summary": {
    "buy": 7,
    "sell": 9,
    "neutral": 10,
    "updated_on": "2024-07-21 08:15:02"
  },
  "oscillator": {
    "buy": 1,
    "data": [
      {
        "name": "Relative Strength Index (14)",
        "value": 51,
        "action": "Neutral"
      },
      ...
    ],
    "sell": 1,
    "neutral": 9,
    "updated_on": "2024-07-21 08:15:02"
  },
  "moving_average": {
    "buy": 6,
    "data": [
      {
        "name": "Exponential Moving Average (10)",
        "value": 2316,
        "action": "Sell"
      },
      ...
    ],
    "sell": 8,
    "neutral": 1,
    "updated_on": "2024-07-21 08:15:02"
  }
}
Analyst Rating
Analyst rating is collected once every three months. The structure of the data is displayed as below:
{
  "buy": 2,
  "hold": 3,
  "sell": 0,
  "n_analyst": 10,
  "strong_buy": 4,
  "updated_on": "2024-07-10 00:35:21",
  "strong_sell": 1
}
Data Sources for idx_dividend
SahamIDX
Data Processing
Dividend records are retrieved for a specified date range. Besides the dividend value, we also store the yield in idx_dividend. The dividend yield is calculated by dividing the dividend value for each date with the mean closing price of the stock for the corresponding year. The dividend yield is computed in Python and saved in the table only for the preceding year. Meanwhile, the dividend yield for the current year is dynamically computed at the view level.
Data Sources for idx_stock_split
SahamIDX: Stock Split & Reverse Stock Split
Data Processing
- Retrieve (reverse) stock split records from the database for future dates
- Retrieve (reverse) stock split records from SahamIDX for future dates
- Compare data from step 1 and 2, and perform necessary update and delete, so the records in the database match the ones in the SahamIDX
Note: deletion of record from the database may occur when there is a modification on the stock split date on SahamIDX website
Data Sources for idx_institution_transactions
Morningstar through RapidAPI (MS Finance and Morning Star)
API endpoint:
- https://ms-finance.p.rapidapi.com/stock/v2/get-ownership
- https://morning-star.p.rapidapi.com/stock/v2/get-ownership
Example request:
import requests
url = "https://morning-star.p.rapidapi.com/stock/v2/get-ownership"
querystring = {
                "performanceId": ms_code,
                "ownership": ownership, # either "Buyers" or "Sellers"
                "asset": "institution",
            }
headers = {
	"X-RapidAPI-Key": rapid_api_key,
	"X-RapidAPI-Host": "morning-star.p.rapidapi.com"
}
response = requests.get(url, headers=headers, params=querystring)
print(response.json())
Data Processing
The API will return the top 20 institutional buyers and sellers for each symbol in the previous month, and we store the institution name and changeAmount. We calculate the net_transaction by summing over the changeAmount of the top buyers and sellers.
Data Sources for idx_manual_input
Income statement data manually extracted from PDF report. USD-IDR conversion rate used from the last working day of the financial year from here
Data Processing
Data will be processed from an Excel file with a predetermined format to the following structure:
sankey_component:
{
  "links": [
    {
      "value": 6512275000000,
      "source": "Loan",
      "target": "Interest Income"
    },
    // ... (more links)
  ],
  "nodes": [
    {
      "id": "Loan",
      "nodeColor": "hsl(195, 53%, 79%)"
    },
    // ... (more nodes)
  ]
}
income_stmt_metrics:
{
  "provision": 147840000000,
  "net_income": 4052678000000,
  "income_taxes": 975392000000,
  ... (more income statement metrics)
  "int_income_breakdown": [
    {
      "amount": 6512275000000,
      "category": "Loan"
    },
    // ... (more breakdown items)
  ],
  "operating_expense_breakdown": [
    {
      "amount": 11749000000,
      "category": "Fees and commissions"
    },
    // ... (more breakdown items)
  ]
}
Data Sources for idx_esg_score
Data (esg score) are obtained from IDX official website (e.g. https://idx.co.id/secondary/get/esg/detail/BBCA?language=en-us). Using Selenium to scrape the following information:
'symbol'
'last_esg_update_date'
'esg_score'
'controversy_risk'
'environment_risk_score'
'social_risk_score'
'governance_risk_score'
Data Sources for idx_historical_mcap
Because historical market cap data are not available from YF API, currently we retrieve the data from IDX Digital Statistic. Going on forward, we will populate the idx_historical_mcap table based on the data store on idx_daily_data
Data Sources for idx_company_forecast
Data (company_forecast) are obtained from Trading View Website in Forecast section (e.g https://www.tradingview.com/symbols/IDX-TPIA/forecast/)
Backup: Yahoo Finance official website (e.g https://finance.yahoo.com/quote/BBCA/analysis?p=BBCA).
Scrape the following information: revenue forecast amount, eps forecast amount
Data Processing
- Process numeric value: replace T, B, M, K to number (e.g multiply by 1e12 for T), then save the value as an integer
- Change datatypes: eps_estimate ( float32 ), revenue_estimate ( float64)
Final Columns
- symbol: IDX company symbol ending with JK
- year: the estimate for which year
- revenue_estimate: estimated revenue based on yahoo finance analysis tab
- eps_estimate: estimated earnings per share based on yahoo finance analysis tab
Data Sources for idx_ipo_perf
Price performances (7d, 30d, 90d, and 365d) are calculated using price data from YF API, calculated since the first trading date. The first trading date is considered valid if the date is at most 1 day later after the listing date.
Data Sources for sgx_short_sell
All of the data (except symbol variable) is sourced from sgx website using this url "f'https://api2.sgx.com/sites/default/files/reports/short-sell/{today.year}/{month}/website_DailyShortSell{date}1815.txt'".
- symbol: From sgx_companies table. Left join the sgx short sell data with sgx_companies data using name column, however it used fuzzywuzzy package to take the name similarity, because the company name is not identical
Data Sources for index_daily_data
Daily Indices Price data since January 2019. All the index prices come from Yahoo Finance
- All prices data in here using IDR except for:
- STI (Straits Time Index, SGX), use SGD
- KLSE (Bursa Malaysia), use MYR
 
Guidebook for idx_manual_input
This document explains the term and logic behind the data points used to compute the final figure in the database. It is structured to provide clarity on how each metric is derived and the rules applied to ensure consistency and accuracy in data entry.
Each section specifies the fields details, formatting rules, and any specific conditions that must be met.
Steps to follow:
- Sourcing official financial statements, annual reports, and other relevant documents.
- IDX website: https://idx.co.id/en/listed-companies/financial-statements-and-annual-report/
- Company websites: Look for the "Investor Relations", retrieve documents from "Annual Reports" section.
- Extracting the required data points as per the guidelines.
- Leave empty if not reported ("NULL"). If it is "0", then it should be reported as "0".
- ebitda (when NULL, compute from ebit + depreciation_amortization)
- capital_expenditure (when NULL, compute from current_pp&e - previous_year_pp&e + current_depreciation_expenses)
- earning_asset (when NULL, sum from Class: Earning Asset)
- realized_capital_goods_investment
- high_quality_liquid_asset
Banks
Industry-focused Metrics
In the context of banks, the financial metrics differ from those of non-banking companies. Key differences include the treatment of interest income and expenses. Due to the nature of banking business where income is generated from interest, there is no EBIT and EBITDA.
A few metrics that are only applicable to banks are:
- LitA: Loan Income to Assets ratio
interest_income (Class: Loan & Deposit) / total_asset
- B-ER: Bank Efficiency Ratio (also known as Cost to Income ratio)
operating_expense / total_revenue
- Operating Inc.
pretax_income / total_revenue
- NIM: Net Interest Margin ratio
net_interest_income / earning_asset
- CASA: Current Account and Savings Account ratio
(current_account + savings_account) / total_deposit
- NPL ratio: Non-Performing Loan ratio
loan_at_risk (Non-performing loan) / gross_loan
- LAR: Loan At Risk ratio
loan_at_risk / gross_loan
- CAR: Capital Adequacy Ratio
(core_capital_tier1 + supplementary_capital_tier2) / total_risk_weighted_asset
- NPL: Non-Performing Loan ratio
non_performing_loan / gross_loan
- LDR: Loan to Deposit ratio
gross_loan / total_deposit
income_stmt_metrics
- 
interest_income: As declared in financial statement (Revenue breakdown total must tally with interest_income) - Class: Used for categorization in the database so the breakdown can be comparable across all banks.
- Loans & Deposits
- Financial Receivable: Export Bills, Acceptances, etc.
- Securities: Government Bonds, Certificates, etc.
- Marketable Securities: Secutities sold under agreements to repurchase, repo
- Interbank Placement: Placement with Bank Indonesia and other banks
- Current Accounts: Current accounts with Bank Indonesia and other banks
- Sharia Income: Murabahah margin income, Mudharabah and Musyarakah profit sharing income, Qardh ujrah income, Ijarah leasing income, etc.
- Other Income: Fees and commissions
 
- Category: As declared in financial statement, e.g., "Loans", "Placement with Bank Indonesia and other banks", "Government Bonds", etc.
 
- Class: Used for categorization in the database so the breakdown can be comparable across all banks.
- 
interest_expense: As declared in financial statement 
- 
net_interest_income = interest_income - interest_expense 
- 
premium_income: As declared in financial statement, only applicable for insurance companies 
- 
premium_expense: As declared in financial statement, only applicable for insurance companies 
- 
net_premium_income = premium_income - premium_expense 
- 
non_interest_income: As declared in financial statement 
- 
total_revenue = net_interest_income + net_premium_income + non_interest_income 
- 
operating_expense: only negative figures; positive figure should be processed by adding them into "non_operating_income_or_loss" where one-off items are recorded. - Class: Used for categorization in the database so the breakdown can be comparable.
- Sales & Marketing: Commission, Advertising, Promotion, etc.
- General & Admin: Rent, Utilities, Depreciation, Amortization, etc.
- R&D: Exploration costs, Research and Development expenses, etc.
- Salaries & Benefits: Salaries, Wages, Bonuses, etc.
- Other expenses: Other expenses not categorized above.
 
- Category: As declared in financial statement, e.g., "Personnel", "Commission", "Depreciation", "Premium of government guarantee", etc.
 
- Class: Used for categorization in the database so the breakdown can be comparable.
- 
provision = As declared in financial statement; reversal of for impairment losses + (allowance for impairment losses) 
- 
net_operating_income = total_revenue - operating_expense - provision 
- 
non_operating_income_or_loss: As declared in financial statement + any one-off items from operating_expense 
- 
pretax_income = net_operating_income - non_operating_income_or_loss 
- 
income_taxes: As declared in financial statement (+zakat if applicable) 
- 
minorities: as declared in financial statement (also known as non-controlling interests) 
- 
net_income = operating_income + non_operating_income_or_loss - income_taxes - minorities 
- 
diluted_shares_outstanding: keyword search "weighted average", take the number under earnings per share note in financial statement 
Note: It should be "0" if it is. Only left blank ("NULL") if the figure is not available.
Interest Income for Syariah banks is categorized as follows:
- Murabahah margin income: Cost-plus financing (loan with a financing cost, instead of an interest)
- Mudharabah: Interest-based investment product
- Musyarakah: Partnership, Joint Venture
- Qardh ujrah income: Qardh (benevolent loan) with a fee
- Ijarah leasing income: Income from leasing assets
- Istishna: Manufacturing contract financing
balance_sheet_metrics
- 
gross_loan: As declared in financial statement 
- 
allowance_for_loans: As declared in financial statement; if negative, it is a reversal 
- 
net_loan = gross_loan - allowance_for_loans 
- 
non_loan_asset - Class: Earning Asset: Assets that generate income.
- 
Cash and CA = Cash + Current accounts 
- 
Interbank Placement = Placement with Bank Indonesia and other banks 
- 
Securities = Government Bonds + Certificates, etc. 
- 
Marketable Securities = Securities sold under agreements to repurchase 
- 
Other Receivables = Export Bills + Interest Receivable + Finance Receivable + Acceptance Receivable, etc. 
- Investment in Shares/Associates/Joint Ventures
 
- 
- Class: Non-Earning Asset: Assets that do not generate income.
- Derivative Receivables: Used for hedging purposes.
- 
Fixed assets and right-of-use assets = Fixed assets + Right-of-use assets 
- Intangible assets
- Deferred tax assets
- Foreclosed assets
- Other assets
 
 
- Class: Earning Asset: Assets that generate income.
- 
total_asset = net_loan + non_loan_asset 
- 
earning_asset = Cash and CA + Interbank Placement + Securities + Marketable Securities + Other Receivables 
- 
current_account: Current accounts with Bank Indonesia and other banks 
- 
savings_account: Savings accounts with Bank Indonesia and other banks 
- 
time_deposit: Time deposits with Bank Indonesia and other banks 
- 
total_deposit = current_account + savings_account + time_deposit (Used for calculation for CASA ratio) 
- 
other_interest_bearing_liabilities: Interest-bearing liabilities that are not deposits, such as: - Deposits from other banks (because it is not included in CASA ratio calculation)
- Securities sold under agreements to repurchase, Debt securities
- Subordinated Loans
- Borrowings
- Interest Payable
- Acceptance Payables
- Liabilities to unit-link policyholders
- Temporary Syirkah Funds (because it is not included in CASA ratio calculation)
 
- 
non_interest_bearing_liabilities: Liabilities that do not bear interest, such as: - Employee Benefits
- Liabilities due immediately
- Tax Payables
 
- 
total_liabilities = total_deposit + other_interest_bearing_liabilities + non_interest_bearing_liabilities 
- 
total_equity = total_asset - total_liabilities 
- 
core_capital_tier1: As declared in financial statement 
- 
supplementary_capital_tier2: As declared in financial statement 
- 
credit_rwa: Risk-weighted assets for credit risk, as declared in financial statement 
- 
market_rwa: Risk-weighted assets for market risk, as declared in financial statement 
- 
operational_rwa: Risk-weighted assets for operational risk, as declared in financial statement 
- 
total_risk_weighted_asset = credit_rwa + market_rwa + operational_rwa 
Note: market_rwa is sometimes "0".
cash_flow_metrics
- cash_inflow
- cash_outflow
- net_cash_flow
- free_cash_flow
- financing_cash_flow: As declared in financial statement
- investing_cash_flow: As declared in financial statement
- operating_cash_flow: As declared in financial statement
- 
end_cash_position = operating_cash_flow + investing_cash_flow + financing_cash_flow 
- high_quality_liquid_asset: As declared in annual report. If not available, leave blank "NULL".
- realized_capital_goods_investment: As declared in annual report. Equivalent to Capital expenditure (CapEx) in the banking industry.If not available, leave blank "NULL". Otherwise, it is "0" if there is no capital goods investment. Keywords to search for: "capital goods", "capital investment", "realized capital"
employee_breakdown
This contains total number of employees, broken down into:
- permanent_employee: permanent employees
- contract_employee: contract employees
- others_employee: other employees (e.g., outsourced, temporary, etc.)
- total_employee = permanent_employee + contract_employee + others_employee
In order to compute NIPE (Net Income Per Employee), the following formula is used:
net_income / permanent_employee
industry_breakdown
This contains data structured to provide insights into the financial health and risk of the bank, including:
- special mention loans, non-performing loans and restructured loans for Loan-At-Risk metrics
- non-loan assets categorized into earning and non-earning assets
- loan distribution and exposure across various economic sectors
In order to compute Loan-At-Risk (LAR) ratio, the following formula is used:
loan_at_risk / gross_loan
loan_at_risk = Special Mention Loan + Non-performing Loan + Restructured Loan
- Special Mention Loan: figure declared under special mention category by OJK collectibility (before allowance for impairment losses) in financial statement.
- Normally declared in a table with collectibility categories: current, special mention, sub-standard, doubtful, and loss under Loan Receivable section or Additional Information section.
 
- Non-performing Loan (NPL): sum of sub-standard, doubtful and loss category by OJK collectibility (before allowance for impairment losses) in financial statement.
- The same table where Special Mention Loan is declared.
 
- Restructured Loan (current): figure declared under current category by OJK collectibility (before allowance for impairment losses) in financial statement.
- A different table from where Special Mention Loan and NPL are declared. But it shares the same format of collectibility categories. It is under Restructured Loan section or Additional Information section.
- Sometimes declared in a sentence instead of a tabular form. Sentence structured as "total restructured loan in year X is IDRXXX"
 
Earning Asset: Assets that generate income.
earning_asset = Cash and CA + Interbank Placement + Securities + Marketable Securities + Other Receivables + Investment in Shares + Interest Receivable
Note: Interest receivable is considered earning assets because it it interest that has been earned but not yet received. The asset is generated from the same framework so it is included in earning asset.
- Cash and CA
- Cash
- Current accounts with Bank Indonesia and other banks
 
- Interbank Placement
- Placement with Bank Indonesia and other banks
 
- Securities
- Government Bonds
- Certificates
 
- Marketable Securities
- Securities sold under agreements to repurchase
- Marketable Securities
 
- Other Receivables
- Export Bills
- Interest Receivable
- Finance Receivable
- Acceptance Receivable
 
- Ijarah Assets
- Prepaid Expenses/Accrued Income
Non-Earning Asset = Assets that do not generate income directly, but are used for hedging purposes or are not directly related to the bank's core operations, e.g., derivative receivables.
- Derivative Receivables: Used for hedging purposes.
- Fixed assets and right-of-use assets: As declared in financial statement
- Intangible assets: Assets that do not have physical substance, such as patents, trademarks and goodwill.
- Deferred tax assets: Tax assets that are expected to be realized in the future.
- Foreclosed assets: Assets that have been foreclosed by the bank due to non-payment of loans. e.g a property that has been taken back by the lender because the borrower defaulted on their mortgage payments.
- Other assets: Other assets that do not fall into the above categories, such as prepaid expenses and accrued income.
loan_by_economic_sectors: Loan distribution across various economic sectors (Key), structured as follows:
- Manufacturing: Industry
- Business services
- Trading, restaurants and hotels: Accomodation, food and beverage, and retail trade
- Agriculture and agricultural facilities: includes forestry and fishery
- Construction
- Transportation and warehousing: Transportation, warehousing, and communications
- Social/public services: includes education, health, and social services
- Mining
- Electricity, gas, and water: includes waste managemnt
- Property: includes real estate and leasing activities
- Others
- Financial Intermediaries: Finance and insurance activities
Note: sum of loan_by_economic_sectors must tally with gross_loan
Non-Banks
income_stmt_metrics
- total_revenue: As declared in financial statement (Revenue breakdown total must tally with total_revenue)
- Class: Used for categorization in the database so the breakdown can be comparable across companies of the same industry.
- Product/Service Sales: Revenue from sales of products or services
- Product Manufacturing: Revenue from manufacturing products
- Product Rental: Revenue from renting out products
- Logistics & Deliveries: Revenue from logistics and delivery services
- Service and Maintenance: Revenue from service and maintenance activities
- Rental/Lease Income: Revenue from rental or lease of assets
- Other Income: Other income not categorized above
- Adjustments and elimination: Adjustments made to eliminate intercompany transactions
 
- Category: As declared in financial statement
- Coal Trading
- Property Rental
- MIDI
 
 
- Class: Used for categorization in the database so the breakdown can be comparable across companies of the same industry.
- cost_of_revenue: As declared in financial statement
- 
gross_income = total_revenue - cost_of_revenue 
- operating_expense: only negative figures; any positive figure or any one-off items should be recorded into "net non operating income/(expenses)"
- Class: Used for categorization in the database so the breakdown can be comparable across companies of the same industry.
- Sales & Marketing: Commission, Advertising, Promotion, etc.
- General & Admin: Rent, Utilities, Depreciation, Amortization, etc.
- R&D: Exploration costs, Research and Development expenses, etc.
- Salaries & Benefits: Salaries, Wages, Bonuses, etc.
- Other expenses: Other expenses not categorized above.
 
- Category: As declared in financial statement
- Personnel
- General & Admin
- Exploration
- Depreciation
 
 
- Class: Used for categorization in the database so the breakdown can be comparable across companies of the same industry.
- 
operating_income = gross_income - operating_expense 
- 
non_operating_income_or_loss = figure declared in financial statement + any one-off items, e.g., "Gain on sale of assets", "Foreign exchange gain/loss", etc. 
- pretax_income = operating_income + non_operating_income_or_loss
- income_taxes: As declared in financial statement
- minorities: As declared in financial statement (also known as non-controlling interests)
- 
net_income = pretax_income - income_taxes - minorities 
- 
interest_expense_non_operating = interest/finance income - interest/finance expense/cost 
- 
ebit = operating_income + interest_expense_non_operating 
- ebitda: As declared in annual report/financial statement; if not available, compute from:
ebitda = ebit + depreciation_amortization 
- diluted_shares_outstanding: As declared in financial statement
- keyword search "weighted average", take the figure from Earnings Per Share section in financial statement.
 
balance_sheet_metrics
- total_current_asset: As declared in financial statement
- total_non_current_asset: As declared in financial statement
- 
total_asset = total_current_asset + total_non_current_asset 
- total_current_liabilities: As declared in financial statement
- total_non_current_liabilities: As declared in financial statement
- 
total_liabilities = total_current_liabilities + total_non_current_liabilities 
- 
total_equity = total_asset - total_liabilities 
- 
working capital = total_current_asset - total_current_liabilities 
cash_flow_metrics
- free_cash_flow
- financing_cash_flow: As declared in financial statement
- investing_cash_flow: As declared in financial statement
- operating_cash_flow: As declared in financial statement
- 
end_cash_position = operating_cash_flow + investing_cash_flow + financing_cash_flow 
- capital_expenditure: As declared in financial statement; if not available, leave blank "NULL".
- 
capital_expenditure if "NULL" = current_pp&e - previous_year_pp&e + current_depreciation_expenses 
 
- 
employee_breakdown
Identical to banking companies.
- permanent_employee: permanent employees
- contract_employee: contract employees
- others_employee: other employees (e.g., outsourced, temporary, etc.)
- 
total_employee = permanent_employee + contract_employee + others_employee 
In order to compute NIPE (Net Income Per Employee), the following formula is used:
net_income / permanent_employee
industry_breakdown
Guidebook for idx_company_customer
Breakdown of customers: understand it as from WHO the business make money. Preferably list of companies (with or without stock ticker). Otherwise, segment or geographical location.
- client_name
- Name Formatting:
- 
- Pte Ltd Pte. Ltd.Private Limited
 
- Pte Ltd 
- 
- Sdn Bhd Sdn. Bhd.Sendirian Berhad
 
- Sdn Bhd 
- 
- PT PLN (Persero)
 
 
- 
- if department, less than 25 characters
- if listed, match company name in the db
- 
client_ticker (XXXX.XX) 
- 
If client information is not available, use operating segment information 
- nature of business: palm oil, wood, rental, service, etc.
- customer type: Local/Export, etc.
- geographical location: Java, Sumatra, Kalimantan, etc.