Data Schema for Input Parameters and Generated Data Set - IBM/AMLSim GitHub Wiki
In order to generate your data set with AMLSim, you firstly prepare for the input parameter file to run the simulator. The specification for the input parameter file is as follows, and then the specification for generated data set as output is as follows.
Input (Parameter) Files
Account List (accounts.csv)
count: Number of accountsmin_balance: Minimum initial balancemax_balance: Maximum initial balancestart_day: The day when the account is openedend_day: The day when the account is closedcountry: Alpha-2 country codebusiness_type: business typesuspicious: Suspicious account or not (currently unused)model: Account behavior model ID (See alsoAbstractTransactionModel.java)- 0: Single
- 1: Fan-out
- 2: Fan-in
- 3: Mutual
- 4: Forward
- 5: Periodical
bank_id: Bank ID (string type) which these accounts belong to (optional, default value is an empty string or can be defined atconf.json)
Degree Distribution List (degree.csv)
This CSV file has three columns with header names: Count, In-degree and Out-degree.
Each CSV row indicates how many account vertices with certain in(out)-degrees should be generated.
Here is an example of degree.csv.
Count,In-degree,Out-degree
0,2,2
1,1,1
2,2,2
From this parameter file, the transaction graph generator generates a directed graph with five vertices (accounts) and five edges. Two of five vertices has no outgoing edges and two of five vertices has no incoming edges (these two vertices might be same).
AML Typology List (alertPatterns.csv)
count: Number of typologies (transaction sets)type: Name of transaction type (fan_in,fan_out,cycle...) as the AML typologyschedule_id: Transaction scheduling ID of the typology- 0: All member accounts send money in order with the same interval (number of days)
- 1: All member accounts send money in order with random intervals
- 2: All member accounts send money randomly
min_accounts: Minimum number of involved accountsmax_accounts: Maximum number of involved accountsmin_amount: Minimum initial transaction amountmax_amount: Maximum initial transaction amountmin_period: Minimum overall transaction period (number of days)max_period: Maximum overall transaction period (number of days)bank_id: Bank ID which member accounts belong to (optional: if empty, no limitation for the bank ID)is_sar: Whether the alert is SAR (True) or false alert (False)
Transaction Type List (transactionType.csv)
This CSV file has two columns with header names: Type(transaction type name) and Frequency(relative frequency)
We currently support 4 types, WIRE, CREDIT, DEPOSIT, ACH and TRANSFER.
But since we don't have real data, we only generate a transaction of the WIRE type.
In order not to confuse users of this AMLSim, we simply put "TRANSFER" as a default type of transactions.
"TRANSFER" is just a symbol name indicating that someone sends money to someone else.
Here is an example of transactionType.csv.
Type, Frequency
WIRE,5
CREDIT,10
DEPOSIT,10
In this case, the WIRE transaction will appear with the probability of 20% (5 / (5+10+10) = 0.2).
Output Files
The result data is generated as some CSV files under the output directory.
- Account list (accounts.csv)
- Transaction list (transactions.csv)
- Alert account list
- Alert transaction list
Output data schema definition
The data schema (columns and data types) can be defined by editing the data schema definition file (schema.json) under parameter file directory (paramFiles).
Accounts (accounts.csv)
CSV Schema (Column Names)
Note: (optional) columns will be added if the input account list file has the same column names.
acct_id: Account ID (int)dsply_nm: Customer ID (string)type: Account type (str)acct_stat: Account status (str)acct_rptng_crncy: Default currency (str)prior_sar_count: Whether this account is involved in SAR transactions (boolean)branch_id: Bank branch ID (int)open_dt: Date when this account is openedclose_dt: Date when this account is closedinitial_deposit: Initial balance (float)tx_behavior_id: Transaction behavior model code (int): See also Normal transaction models and AML typology modelsbank_id: Bank ID which this account belongs to (string)first_name: (optional) first name of the customer (string)last_name: (optional) last name of the customer (string)street_addr: (optional) detailed address including street name (string)city: (optional) city name (string)state: (optional) state name (string)country: (optional) Alpha-2 country code (string)zip: (optional) zip code (string)gender: (optional) gender (string)birth_date: (optional) birth date (string)ssn: (optional) social security number (string)lon: (optional) longitude of the address (float)lat: (optional) latitude of the address (float)
Example
The latter half of columns are omitted for brevity because these are optional.

Transactions (transactions.csv)
CSV Schema (Column Names)
tran_id: Transaction ID (int)orig_acct: Originator account ID (int)bene_acct: Beneficiary account ID (int)tx_type: Transaction type (string)base_amt: Transaction amount (float)tran_timestamp: Simulation step when the transaction is done (int)is_sar: Whether this transaction is SAR (boolean)alert_id: Alert ID which this transaction is involved in (int: If this transaction is not involved in any alerts, the value is -1)
Example

Alert members (alert_accounts.csv)
alert_id: Alert ID (int)alert_type: Alert type (string)acct_id: Account ID (int)acct_name: Account name (string)is_sar: SAR flag (boolean)model_id: AML typology model ID (int)start: Simulation step when the account is activated (int)end: Simulation step when the account is deactivated (int)schedule_id: Schedule ID of the AML typology (int)bank_id: Bank ID which this account belongs to (string)
Example

Alert transactions (alert_transactions.csv)
alert_id: Alert ID (int)alert_type: Alert type (string)is_sar: Whether this alert is SAR (boolean)tran_id: Transaction ID (int)orig_acct: Originator account ID (int)bene_acct: Beneficiary account ID (int)tx_type: Transaction type (string)base_amt: Transaction amount (float)tran_timestamp: Date when the transaction is done (int)
Example
