Github Repo and Actions - supertypeai/sectors-kb GitHub Wiki
This page contains all github repo and actions related to Sectors.
Update the descriptions for subsectors in idx_subsector_metadata
table. The descriptions are generated by combining the basic description of each subsector (stored in sub_sector_description.json
) and the top companies (based on market cap) on each subsector (retrieved from a Supabase function: get_top_mcap_by_subsector
).
Name: Update subsector description monthly
- Description: Update the subsector description to the
idx_subsector_metadata
table - Schedule: cron: '0 0 1 * *' (7:00 AM GMT+7 on the first day of every month)
Fetches the current USD to IDR exchange rate from the ExchangeRate-API, then saves the exchange rate along with the current datetime into a JSON file (conversion_rate.json).
Name: Update Conversion Rate Daily
- Description: Update the content of
conversion_rate.json
- Schedule: cron: '0 18 * * *' (1:00 AM GMT+7 every day)
Fetch the company description from https://markets.ft.com/data/equities/tearsheet/profile?s=XXXX:JKT (where XXXX is the company ticker, i.e. BBCA) and save it to a JSON file (companies_desc.json
).
Name: update new companies every 1 week
- Description: Update the company description of new companies on
companies_desc.json
- Schedule: cron: '0 0 * * 0' (7:00 AM GMT+7 every Sunday)
Name: update all companies every 6 months
- Description: Update the company description of all companies on
companies_desc.json
- Schedule: cron: '0 0 1 */6 *' (7:00 AM GMT+7 on the 1st day of every 6th month)
Fetch the stock split data from here and the reverse split from here.
Name: Check and Update Stock Split Weekly
- Description: Check data on stock split and upsert it to the database if new data exist
- Schedule: cron: '0 0 * * *' (7:00 AM GMT+7 daily)
Fetch the stock dividend data from sahamidx.com.
Fetch the upcoming stock dividend data from [investing.com](https://www.investing.com/dividends-calendar] through its https://www.investing.com/dividends-calendar/Service/getCalendarFilteredData
endpoint.
Name: Check and Update Dividend Weekly
- Description: Check data on dividends for the last 7 days and upsert it to the database if new data exist
- Schedule: cron: '0 18 * * 5' (1:00 AM UTC+7 every Saturday)
Name: Scrape Upcoming Dividend Data
- Description: Collect the upcoming 2 weeks dividend data and upsert it to the database, rewriting existing data in case revisions occured
- Schedule: cron: '0 18 * * 0,3' (1:00 AM UTC+7 every Wednesday and Sunday)
Retrieve necessary data for idx_company_profile
table using Selenium.
Source of data: IDX (e.g. https://www.idx.co.id/en/listed-companies/company-profiles/BBCA)
Name: Scrape Shareholders Data
- Description: Scrape shareholders data from source.
- Schedule: '0 0 1 * *' # At 00:00 on day-of-month 1.
Name: Handling Null Shareholders Data
- Description: Filling null data that have not gathered yet from the monthly scraper.
- Schedule: '0 8 * * 1' # At 08:00 on Monday.
Update data from yfinance Python library for the following table:
- idx_daily_data
- idx_key_stats
- idx_financials_quarterly
- idx_financials_annual
- idx_ipo_perf
Name: Update idx_financials_annual with YFDataUpdater
- Description: Update annual financial data from YF API. To be used on
sectors_etl_pipeline
workflow. - Schedule: workflow_dispatch
Name: Update idx_financials_quarterly with YFDataUpdater
- Description: Update quarterly financial data from YF API. To be used on
sectors_etl_pipeline
workflow. - Schedule: workflow_dispatch
Name: Update idx_key_stats with YFDataUpdater
- Description: Update idx_key_stats data (
forward_eps
,recommendation_mean
,employee_num
) from YF API. - Schedule: cron: '0 13 10 * *' (run at 10th day of every month at 20:00 GMT+7)
**Name: Update idx_ipo_perf
- Description: Update null idx_ipo_perf data from YF API. Calculations also adjusts for stock splits with data from
idx_stock_split_cumulative
view in database. - Schedule: cron: '0 13 10 * *' (run at 10th day of every month at 20:00 GMT+7)
Update data from WSJ website for the following tables:
- idx_financials_quarterly
- idx_financials_annual
- idx_company_profile (update wsj_format)
Name: Sectors WSJ Data Updater (Annually)
- Description: Run source and format checker, then update annual financial data from WSJ. To be used on
sectors_etl_pipeline
workflow. - Schedule: workflow_dispatch
Name: Sectors WSJ Data Updater (Quarterly)
- Description: Run source and format checker, then update quarterly financial data from WSJ. To be used on
sectors_etl_pipeline
workflow. - Schedule: workflow_dispatch
Wrap YF Data Updater and WSJ Data Updater actions into one pipeline
Name: Workflow of Annually Data Update
- Description: Update the annual financial data to
idx_financials_annual
by running the following actions:
- Sectors WSJ Data Updater (Annually) in 'sectors_wsj_data_updater'
- Sectors YF Data Updater (Annually) in 'sectors_yf_data_updater'
- Schedule: cron: '0 0 7,21 1-3 *' (Every 7th and 21th day of each month for month 1-3) and '0 0 7 4-12 *' (Every 7th day of each month for month 4-12)
Name: Workflow of Quarterly Data Update
- Description: Update the quarterly financial data to
idx_financials_quarterly
by running the following actions:
- Sectors WSJ Data Updater (Quarterly) in 'sectors_wsj_data_updater'
- Sectors YF Data Updater (Quarterly) in 'sectors_yf_data_updater'
- Schedule: cron: '0 0 7,21 * *' (Every 7th and 21th day of each month)
Name: Calculate Intrinsic Value - Discounted Cash Flow
- Description: Calculate the intrinsic value of a stock quarterly
- Schedule: set to workflow_dispatch ( we can run it manually), but integrate it to 'sectors_etl_pipeline' to run quarterly
Name: Sectors Forecast Growth Rate -- on hold
- Description: Update all data in
idx_company_forecast
based on analysis tab in Yahoo Finance https://finance.yahoo.com/quote/AALI.JK/analysis?p=AALI.JK - Schedule: set to workflow_dispatch ( we can run it manually), but integrate it to 'sectors_etl_pipeline' to run quarterly
Name: Update Trading View Forecast
- Description: Update all data in
idx_company_forecast
based on forecast tab in Trading View https://www.tradingview.com/symbols/IDX-TPIA/forecast/ - Schedule: Run on the first day of every quarter
Retrieve new upcoming ipo data from https://e-ipo.co.id/en/ipo/closed, and output as a json named upcoming_ipo.json
Update all data in idx_esg_score
table using Selenium.
This Github Actions has to be manually run once in 1-2 months. Steps to run it:
- Go to: https://www.sustainalytics.com/esg-rating/
- Choose one company
- Fill out form to open the controversy rating section
- Open cookies in the console, copy the
ratingsvm
cookies value - Open the Github Action environment variable & edit the COOKIES variable to be the copied value
- Run
main.py
Retrieve new logo from new listed company. Do an image processing ( cropping and removing the background). Store the image in webp format to google cloud storage. Update no logo status in supabase especially column 'nologo' in idx_company_profile table
Retrieve market cap worldwide data from WFE, and output as a json named stock_exchanges_by_market_cap.json
Name: Run Main.py
- Command: python main.py
- Description: Scrape market cap data from source, store it into a temporary .txt file, and process the data to update the .json result file
- Schedule: '0 0 3 * *' Runs each month on the 3rd day
Name: Run Main.py Manually
- Command: python main.py no-scrape
- Description: Skip the scraping process, read the data from the temporary .txt file and process the data to update the .json result file
- Schedule: Only be executed manually if the scheduled action fails
Obtain institutional transactions (buy and sell) data from RapidAPI and store the data in idx_institution_transactions
.
Name: Update idx_instituion_transactions with InstitutionTransactionsUpdater
- Description: Update new monthly data to
idx_institution_transactions
. - Schedule: cron: '0 13 10 * *' # (10th day of every month at 20:00 GMT+7, still tentative)
Generate a description for each sub-sector index. There are three indexes that will be generated which are, P/E Value, Health & Resilience Index, & Growth Index. The LLM generator as of now is using the GPT 3.5-Turbo model. The result will be stored directly in the 'idx_subsector_metada' table in supabase in these specific columns pe_index_description, health_index_description, and growth_index_description for pe_value description, health & resilience index description, and sector growth description respectively.
Name: Update sub-sector index description data
- Description: Update pe_index_description, health_index_description, and growth_index_description columns in idx_subsector_metada once a week
- Schedule: corn: '0 1 * * 1' (Run every monday at 8am, Western Indonesia Time)
Updating the price and volume data of some stocks that has an anomaly based on the get_anomaly function in the supabase. The updater still using the Yahoo Finance API and the record of the changes is saved in the update_daily_data.log file.
Name: Update Anomaly Stock Price
- Description: Update price and volume of some stock which has an anomaly changes.
- Schedule: '0 2 * * 1' # Run every monday at 9am (Western Indonesia Time)
Name: Null Price and Volume Handler
- Description: Update price and volume that has null value.
- Schedule: '0 14 * * 1-5' # Run every weekdat at 9pm (Western Indonesia Time)
Name: Incomplete Stock and Null Market Cap Handler
- Description: Insert data to db if daily data stock number < idx active profile stock number and calculate market cap for the stock with null market cap (by calculating the number of outstanding share for the day before and multiply it by the latest close price)
- Schedule: '0 3 * * 1-6' # Run every monday-saturday at 10.00am (Western Indonesia Time)
Scraping news articles from sources. Included in /scripts
are a way to post news to an endpoint server that stores the labeled news articles. Adding a source follows the same template class, with override on the extract_news
method.
Name: Scrape News and Commit Results
- Description: Scrape news articles from sources with proxy.
- Schedule: '0 3 3 * *' Runs at 10 AM UTC+7 (WIB) every 3rd day of the month Name: Scrape News and Submit to Database
- Description: Scrape news from
idnfinancials
,petromindo
,indonesiancoalandnickel
, andgapki
and then commit to database with endpoint - Schedule: '0 3 * * *' Runs every day at 10 AM UTC+7 (WIB) every day
Endpoint server to store news articles. News articles posted should include source, subsector and news timestamp, optionally title, body, tags, tickers. Deployed on https://sectors-news-endpoint.fly.dev/
Endpoint | Methods | Description |
---|---|---|
/articles |
GET , POST ,DELETE , PATCH
|
GET : returns all news articlesPOST : posts one news articleDELETE : delete articles in id_listPATCH : update article |
/articles?subsector=NAME |
GET |
GET : returns all of NAME subsector news |
/articles?id=ID |
GET |
GET : returns news with id = ID
|
/articles/list |
POST |
POST : posts a list of news articles |
/logs |
GET |
GET : returns log list |
/pdf |
POST |
POST : post a idx-format pdf for insider trading, returns an inferenced article from the pdf |
/pdf/post |
POST |
POST : post the result of the inferenced article after review |
/insider-trading |
GET POST DELETE , PATCH
|
GET : returns all filings of insider tradingPOST : post a non-idx format insider trading newsDELETE : delete filings in id_listPATCH : update filing of insider trading |
/url-article |
POST |
POST : post URL and timestamp to return the inferenced result with LLM |
/evaluate-article |
POST |
POST : post article and returns the score of the article |
Article:
- id : integer, primary key, auto-generated
- title: string, can be auto-generated
- body: string, can be auto-generated
- source: string
- timestamp: datetime, format YYYYMMDD HH:MM:SS
- sector: string, auto-generated
- sub_sector: string
- tags: list of string
- tickers: list of string
Filing:
- id : integer, primary key, auto-generated
- title : string, auto-generated
- body : string, auto-generated
- source: string
- timestamp : datetime, format YYYYMMDD HH:MM:SS
- sector : string, auto-generated
- sub_sector : string
- tags : list of string
- tickers : list of string
- transaction_type : string, auto-generated
- holder_type : string
- holding_before : integer
- holding_after : integer
- amount_transaction : integer, auto-generated
- holder_name : string
- price : integer
- transaction_value : integer, auto-generated on supabase
- price_transaction : JSON object (PriceTransaction)
PriceTransaction
- price : list of float
- amount_transacted : list of float
Article POST
{
"title": "", # Optional
"body": "", # Optional
"source": "",
"timestamp": "YYYY-MM-DD HH:MM:SS",
"sub_sector": "logistics-deliveries" # Example subsector
}
Article PATCH
{
"id": 1,
"title": "",
"body": "",
"source": "",
"timestamp": "YYYY-MM-DD HH:MM:SS",
"sub_sector": "logistics-deliveries" # Example subsector
}
DELETE on /articles or /insider-trading
{
"id_list": []
}
PDF POST
PDF: form-data
file (file) : pdf file
source (text) : url
sub_sector (text) : sub_sector (optional, can be posted after `pdf/post`)
holder_type (text) : "insider" or "institution" (optional, can be posted after `pdf/post`)
PDF POST Return
{
"amount_transaction": 7034700,
"body": "On July 4, 2024, PT Paraga Artamida, an insider, executed a buy transaction involving 7,034,700 shares of PT Bumi Serpong Damai Tbk, increasing its holding from 8,503,286,664 to 8,510,303,364 shares. This transaction represents a change in ownership, with PT Paraga Artamida expanding its stake in PT Bumi Serpong Damai Tbk. The purpose of the transaction is not explicitly stated, but it may indicate confidence in the company's prospects or a strategic move to increase influence. This transaction may have implications for the company's ownership structure and potentially impact its future decisions and operations.",
"holder_name": "PT Paraga Artamida",
"holder_type": "",
"holding_after": 8510303364,
"holding_before": 8503268664,
"sector": "",
"source": "https://www.idx.co.id/StaticData/NewsAndAnnouncement/ANNOUNCEMENTSTOCK/From_EREP/202407/a90ab57ce2_35e109818f.pdf",
"sub_sector": "",
"tags": [
"Buyback",
"IDX",
"Executive Changes",
"Market Sentiment",
"Bullish",
"insider-trading"
],
"tickers": [
"BSDE.JK"
],
"timestamp": "2024-07-04 20:57:00",
"title": "PT Paraga Artamida Buy Transaction of PT Bumi Serpong Damai Tbk.",
"transaction_type": "buy",
"price": 100, # auto-generated from "price_transaction"
"transaction_value": 1000, # auto-generated from "price_transaction"
"price_transaction" : {
"price" : [100, 150, 200],
"amount_transacted" : [600, 700, 800],
}
}
Insider trading POST
{
"document_number": "nomor_surat",
"company_name": "nama_perusahaan",
"holder_name": "nama_pemegang_saham",
"source": "link_pdf",
"ticker": "ticker", // optional
"category": "category",
"control_status": "status_kontrol_saham",
"holding_before": 0, // make sure no , or .
"holding_after": 100, // make sure no , or .
"sub_sector": "banking",
"purpose": "tujuan_transaksi",
"date_time": "2024-06-12 12:31:00",
"holder_type": "insider" // can be "insider" or "institution",
"price_transaction" : {
"price" : [100, 150, 200],
"amount_transacted" : [600, 700, 800],
}
}
Insider trading PATCH
{
"amount_transaction": 7034700,
"body": "body",
"holder_name": "nama_pemegang_saham",
"created_at": "2024-07-05T11:28:54.979669+00:00",
"holder_type": "institution", // institution or insider
"holding_after": 8510303364,
"holding_before": 8503268664,
"id": 12,
"sector": "properties-real-estate",
"source": "url",
"sub_sector": "properties-real-estate",
"tags": [
"insider-trading"
],
"tickers": [
"BSDE"
],
"timestamp": "2024-07-04T20:57:00",
"title": "title",
"transaction_type": "buy",
"price": 3, # auto-generated from "price_transaction"
"transaction_value" : 3000, # auto-generated from "price_transaction"
"price_transaction" : {
"price" : [100, 150, 200],
"amount_transacted" : [600, 700, 800],
}
}
URL Article POST
{
"source": "",
"timestamp": "YYYY-MM-DD HH:MM:SS"
}
Article insert response if duplicate
{
"id_duplicate": 204,
"message": "Insert failed! Duplicate source",
"status": "restricted",
"status_code": 400
}
Scraping analyst rating and technical rating from TradingView as its source. The scraping process is done using multiprocesses with all the output data are stored inside the /data
as .json files. All the files will be combined and pre-processed before upserted into the database. The main files of technical rating and analyst rating scraper are separated into different files. Since the technical rating needs to be scrapped on different periods of time, there is an optional argument to run technical rating scraper.
python technical_main.py <ARGS>
# ARGS = [daily, weekly, monthly]. DEFAULT = daily
# EXAMPLE:
python technical_main.py monthly
python technical_main.py # Will be executed as "python technical_main.py daily"
On the other hand, there is no additional argument to run analyst rating scraper.
python analyst_main.py
Name: Scrape Analyst Rating
- Description: Scrape analyst rating from source.
- Schedule: '0 13 2 * *' Runs at 13:00 on the second day of each month
Name: Scrape Technical Rating Daily
- Description: Scrape technical rating from source on daily Period
- Schedule: '0 10 * * *' Runs at 10 am every day
Name: Scrape Technical Rating Weekly
Disabled
- Description: Scrape technical rating from source on weekly period
- Schedule: '0 11 * * 1' Runs at 11 am on Monday
Name: Scrape Technical Rating Monthly
Disabled
- Description: Scrape technical rating from source on monthly period
- Schedule: '0 12 2 * *' Runs at 12 pm on the second day of each month
Scrape SGX and KLSE data from the closing price, volume, market_cap, and financial data. All the data will be stored in the supabase database in the sgx_companies and klse_companies table.
Name: Update sgx_klse monthly financial data from yahoo finance
- Description: Update SGX and KLSE financial data from yahoo finance
- Schedule: '0 0 1 * *' Run every 1st day of the month
Name: Update sgx_klse weekly financial data from yahoo finance
- Description: Update SGX and KLSE financial data from yahoo finance
- Schedule: '0 1 * * 1' # Run every monday at 8am (Western Indonesia Time)
Name: Update sgx_companies and klse_companies monthly data
- Description: Update SGX and KLSE data for sector and subsector from the official pages. Otherwise, scrape from TradingView for the data that are not available in the official pages.
- Schedule: '0 0 1 */6 *' # Run on the first day of every 6 months at 8am (Western Indonesia Time)
Collect the list of companies in IDX indices and FTSE Global Index. Also scrape all the index listed in sectors app from yahoo finance and stored it into supabase on index_daily_data table.
Name: Update index daily price
- Description: Update FTSE and several IDX index every day
- Schedule: '0 11 * * 1-5' # Run every weekday at 6pm (Western Indonesia Time)
Processes company revenue breakdown from annual report. Script is run manually by supplying an .xlsx file which is stored into supabase on idx_company_customer
table
Processes company annual report into its metrics and sankey components (in JSON format). Script is run manually by supplying an .xlsx file which is stored into supabase on idx_manual_input
table. the script groups and sums the breakdown entries with Other*
form into a single Others
entry in the sankey components. Data processing will check for difference threshold in the breakdown summary and ensure expense breakdown entries have negative values in the input data (no positive expenses). To run the script, use the USD-IDR convertion rate of the last working day of the financial year
Processes sgx short sell daily transaction. The transaction data is from sgx website and it is scraped using Python requests package. Moreover it will also process the data to get the symbol from each company name (the company name in the original data is not standardized)
Name: Update daily sgx short sell transaction
- Description: Update sgx short sell transaction daily
- Schedule: '0 14 * * 1-5' # Run every weekday at 9pm (Western Indonesia Time)
Get Sectors Closed Ipo data to update the ipo price of latest closed ipo company. This repository will get the all the symbol from company with null ipo price in the database, then retrieve the data in the first page of e-ipo closed ipo page and update the null data in database. This action will update the table idx_company_profile
and idx_ipo_details
.
Name: Update ipo price for latest closed ipo company
- Schedule: "0 0 * */3 *" # run every 3 days
This repository will call the API to send watchlist notification to Standard/Pro users that eneble their notification.
Name: Run Watchlist Noticiation
- Schedule: '0 5 * * 1' # Every Monday at 12 pm WIB (5 am UTC)
This repository will call the API to send Sectors newsletter to the newsletter subscribers.
Name: Run Sectors Newsletter
- Schedule: '0 2 * * 1-5' # Every Monday-Friday at 9 am WIB (2 am UTC)
This repository will fetch price for each stock in these 4 conditions: All Time High, All Time Low, 52 Week High, and 52 Week Low.
Name: All Time Price Data Fetching
- Schedule: '0 3 * * 1-5' # Run every weekday at 10am (Western Indonesia Time)
This program consist of a sequential process from getting financial data from IDX's API to processing and extracting the information from the excel files.
There are two ways to run the main program:
- Automatic start
# Command:
python main.py {BATCH}
# Example:
python main.py 1
python main.py all
Description: Automatically scrape and extract based on current time Interval (3*30 + 7) for Annual Interval (30 + 7) for Quarter
if it runs in:
- Month 3: get annual and Q4 of last year
- Month 5: get Q1 of current year
- Month 8: get Q2 of current year
- Month 11: get Q3 of current year
BATCH =[1, 2, 3, 4, all]
- 1 => First quarter of the data in database
- 2 => Second quarter of the data in database
- 3 => Third quarter of the data in database
- 4 => Last quarter of the data in database
- all => scrape for all in the database
- For manual start:
# Command:
python main.py {BATCH} {YEAR} {PERIOD}
# Example:
python main.py 1 2024 tw1 #Scrape and extract Q1 of 2024 for batch 1
Description: program requires two parameters: {year} and {PERIOD}
- YEAR => year to be scrapped
- PERIOD => see PERIOD_LIST in idx_utils.py
- BATCH = [1, 2, 3, 4, all]
- 1 => First quarter
- 2 => Second quarter
- 3 => Third quarter
- 4 => Last quarter
- all => scrape for all in the database
Name: Batch 1 Process
- Description: Processing and extracting financial data for the first quarter of tickers in the database.
- Schedule: '0 0 7,14,21,28 3,5,8,11 *' # “At 00:00 on day-of-month 7,14,21,28 in March, May, August, and November.”
Name: Batch 2 Process
- Description: Processing and extracting financial data for the second quarter of tickers in the database.
- Schedule: '0 0 8,15,22,29 3,5,8,11 *' # “At 00:00 on day-of-month 8,15,22,29 in March, May, August, and November.”
Name: Batch 3 Process
- Description: Processing and extracting financial data for the third quarter of tickers in the database.
- Schedule: '0 0 9,16,23,30 3,5,8,11 *' # “At 00:00 on day-of-month 9,16,23,30 in March, May, August, and November.”
Name: Batch 4 Process
- Description: Processing and extracting financial data for the last quarter of tickers in the database.
- Schedule: '0 0 10,17,24,31 3,5,8,11 *' # “At 00:00 on day-of-month 10,17,24,31 in March, May, August, and November.”
Name: Jisdor Updater
- Description: Getting USD to IDR rate on certain dates
- Schedule:
- '0 0 31 3 *' # "At 31 March"
- '0 0 30 6 *' # "At 30 June"
- '0 0 30 9 *' # "At 30 September"
- '0 0 31 12 *' # "At 31 December"