IBM DataStage8.5 tips - joyocaowei/joyocaowei.github.io GitHub Wiki

Summary

IBM Infosphere Datastage is a data warehousing tool by IBM.

InfoSphere DataStage V8.5 的新特性

Commands for controlling InfoSphere DataStage jobs

dsjob

  • dsjob -jobinfo [useid] project job|job_id

  • dsjob -logdetail [useid] project job|job_id entry

  • dsjob -report [useid] project job|jobid [report_type]

report_type可以是BASIC, DETAIL, XML(一般使用DETAIL)

job status
    1. echo "Running" ;;
    1. echo "Finished" ;;
    1. echo "Finished (see log)" ;;
    1. echo "Failed" ;;
    1. echo "Validated" ;;
    1. echo "Validated (see log)" ;;
    1. echo "Validation failed" ;;
    1. echo "Has been reset" ;;
    1. echo "Crashed" ;;
    1. echo "Stopped" ;;
    1. echo "Not Compiled" ;;
    1. echo "Not Running" ;;
  • *) echo "Unknown status" ;;

Filter stage where clause

Supported Boolean expressions and operators

The following list summarizes the Boolean expressions that are supported. In the list, BOOLEAN denotes any Boolean expression.

  • true
  • false
  • six comparison operators: =, <>, <, >, <=, >=
  • is null
  • is not null
  • like 'abc' (the second operand must be a regular expression)
  • between (for example, A between B and C is equivalent to B <= A and A> = C)
  • not BOOLEAN
  • BOOLEAN is true
  • BOOLEAN is false
  • BOOLEAN is not true
  • BOOLEAN is not false

Any of these can be combined using AND or OR.

CHANGE CAPTURE Stage in Datastage

参考文章:http://www.databaseetl.com/change-capture-stage-datastage/

Datastage Transformer

  • 控件transformer

  • @ITERATION

    Transformer Iteration is only included in Datastage 8.5 and higher versions.
    一个关于@ITERATION的例子: Unleashing the Power in DataStage 8.5 - #1: Looping Transformer

  • @INROWNUM

    Right('00000' : @INROWNUM, 5)
    Output is like: 00001,00002,00003, ... , 00015, ...

  • Dcount

    This functions returns count of delimited fields in the string.
    Example: If the given string is ‘john,ron,harry,rowling’ then this function will give count 4 for delimiter ‘,’ as shown below.
    Dcount(link_name,’,’) = 4

  • Isvalid function

    IsValid("decimal[10,4]",input.col)
    input.col是一个varchar类型的值,如果它不是一个decimal,那么上面的函数返回0,否则函数返回1.
    若input.col为"ad c", 那么IsValid("decimal[10,4]","ad c")为0;
    若input.col为"1253.123", 那么IsValid("decimal[10,4]",input.col)为1
    然后使用StringToDecimal (input.col), 得到值为1253.1230

  • Legacy null processing

  • Datastage Substring Operator

reference: http://www.databaseetl.com/datastage-string-functions-in-transformer/

COLUMN and ROW Generator Stage in Datastage

COLUMN and ROW Generator Stage in Datastage

  • Use column generator if we want to create new column with static value in each row
  • Use modify stage if have to modify structure data or deriving it from old column input
  • Use transformer if the logic is difficult
  • Use native function from db to remove complexities in datastage if possible.

Write Routines In Datastage

There are 3 kind of routines is there in Datastage.

  • server routines will write in BASIC Language
  • parlell routines will write in C/C++ Language
  • mainframe routines which will used in mainframe jobs

Setting a parameter to projdef :

Step 1: Go to datastage adminstrator and set a user defined variable say "File_Path" and its value. Click on ok and exit.

Step2 : Open your job and go to job parameters.there is a button in bottom right corner to add the environment variables. Click on that buton. Go to user-defined variables section and select the env variables you just defined "File_Path". Now the new env variable will appear in your parameter list

Step3: Double click on the place where we define the value for a variable. On double clicking, you will get a small window. Select "PROJDEF" from the drop down menu. Overriding/Changing values of $PROJDEF: there are two ways to assign/override the $PROJDEF

a) using Job Sequence(overriding) : In the job sequence,we use "Job activity" to call a job.When you select a job in the job activity, the list of parameters that are defined in the parallel job appears in the properties of job activity. There, you can change the path to your desired path. In this case, when the parallel job runs from the master sequence,$PROJDEF value will be overridden by the value being passed from the Job sequece.

b)Using Datastage Administrator(change) : In the datastage administrator,you can change the value of Env variable to your desired path.When the job runs,it will run for the changed value.

Lookup Stage

A lookup stage will get the reference data set to memory before it stats matching the data. But while considering the data for lookup it will only select the first matching record from the reference dataset and drops all the other based on the Keys. Hence is the warning message "Ignoring duplicate entry at table record "XXXX"; no further warnings will be issued for this table.

For example, DESCRIBE: $orchadmin describe [options] descriptorfile.ds
DUMP: $orchadmin dump [options] descriptorfile.ds

Fix Warnings

  1. In DB stage: CAST(FieldA AS VARCHAR2(3))

Fix errors

  1. Incorrect dboption list; it failed on expected "," or "}", got: "#" line 1

    http://www-01.ibm.com/support/docview.wss?uid=swg21386075
    http://www-01.ibm.com/support/docview.wss?uid=swg1JR35048

  2. other

⚠️ **GitHub.com Fallback** ⚠️