The+3

__**Split the project dsx to indvidial jobs**__ [] __**Information Analyzer.**__

http://publib.boulder.ibm.com/infocenter/iisinfsv/v8r0/index.jsp?topic=/com.ibm.swg.im.iis.productization.iisinfsv.overview.doc/topics/cisoiacloser.html

The unlock command is not available by default in the projects. Log into the Administrator Client then select your project and click the command button.

To create an entry to use the unlock command, execute the following two commands:

>SET.FILE UV VOC UV.VOC

>COPY FROM UV.VOC TO VOC UNLOCK

To unlock by either INODE or User No use the commands:

UNLOCK INODE inodenumber ALL

UNLOCK USER usernumber ALL

Below is the sample APT CONFIG FILE ,see in bold to mention conductor node.

{ node "node0" { fastname "server1" resource disk "/datastage/Ascential/DataStage/Datasets/node0" {pools "conductor"} resource scratchdisk "/datastage/Ascential/DataStage/Scratch/node0" {pools ""} }
 * pools "conductor"**

node "node1" { fastname "server2" pools "" resource disk "/datastage/Ascential/DataStage/Datasets/node1" {pools ""} resource scratchdisk "/datastage/Ascential/DataStage/Scratch/node1" {pools ""} } node "node2" { fastname "server2" pools "" resource disk "/datastage/Ascential/DataStage/Datasets/node2" {pools ""} resource scratchdisk "/datastage/Ascential/DataStage/Scratch/node2" {pools ""} } }

For every job that starts there will be one (1) conductor process (started on the conductor node), there will be one (1) section leader for each node in the configuration file and there will be one (1) player process (may or may not be true) for each stage in your job for each node. So if you have a job that uses a two (2) node configuration file and has 3 stages then your job will have

1 conductor 2 section leaders (2 nodes * 1 section leader per node) 6 player processes (3 stages * 2 nodes)

Your dump score may show that your job will run 9 processes on 2 nodes.

This kind of information is very helpful when determining the impact that a particular job or process will have on the underlying operating system and system resources.

Conductor Node : It is a main process to


 * 1) Start up jobs
 * 2) Resource assignments
 * 3) Responsible to create Section leader (used to create & manage player player process which perform actual job execution).
 * 4) Single coordinator for status and error messages.
 * 5) manages orderly shutdown when processing completes in the event of fatal error.

Jobs developed with DataStage EE and QualityStage are independent of the actual hardware and degree of parallelism used to run the job. The parallel Configuration File provides a mapping at runtime between the job and the actual runtime infrastructure and resources by defining logical processing nodes.

To facilitate scalability across the boundaries of a single server, and to maintain platform independence, the parallel framework uses a multi-process architecture.

The runtime architecture of the parallel framework uses a process-based architecture that enables scalability beyond server boundaries while avoiding platform-dependent threading calls. The actual runtime deployment for a given job design is composed of a hierarchical relationship of operating system processes, running on one or more physical servers


 * **Conductor Node (one per job):** the main process used to startup jobs, determine resource assignments, and create Section Leader processes on one or more processing nodes. Acts as a single coordinator for status and error messages, manages orderly shutdown when processing completes or in the event of a fatal error. The conductor node is run from the primary server
 * **Section Leaders (one per logical processing node)**: used to create and manage player processes which perform the actual job execution. The Section Leaders also manage communication between the individual player processes and the master Conductor Node.
 * **Players**: one or more logical groups of processes used to execute the data flow logic. All players are created as groups on the same server as their managing Section Leader process.

When the job is initiated the primary process (called the “conductor”) reads the job design, which is a generated Orchestrate shell (osh) script. The conductor also reads the parallel execution configuration file specified by the current setting of the APT_CONFIG_FILE environment variable.

Once the execution nodes are known (from the configuration file) the conductor causes a coordinating process called a “section leader” to be started on each; by forking a child process if the node is on the same machine as the conductor or by remote shell execution if the node is on a different machine from the conductor (things are a little more dynamic in a grid configuration, but essentially this is what happens). Each section leader process is passed the score and executes it on its own node, and is visible as a process running osh. Section leaders’ stdout and stderr are redirected to the conductor, which is solely responsible for logging entries from the job.

The score contains a number of Orchestrate operators. Each of these runs in a separate process, called a “player” (the metaphor clearly is one of an orchestra). Player processes’ stdout and stderr are redirected to their parent section leader. Player processes also run the osh executable.

Communication between the conductor, section leaders and player processes in a parallel job is effected via TCP. IBM introduced a tool called FastTrack that is a source to target mapping tool that is plugged straight into the Information Server and runs inside a browser. The tool was introduced with the Information Server and is available in the 8.1 version. As the name suggests IBM are using it to help in the analysis and design stage of a data integration project to do the source to target mapping and the definition of the transform rules. Since it is an Information Server product it runs against the Metadata Server and can share metadata with the other products and it can run inside a browser. I have talked about it previously in [|New Product: IBM //FastTrack// for Source To Target Mapping] and [|//FastTrack// Excel out of your DataStage project] but now I have had the chance to see it in action on a Data Warehouse project. We have been using the tool for a few weeks now and we are impressed. It’s been easier to learn than other Information Server products and it manages to fit most of what you need inside frames on a single browse screen. Very few bugs and it has been in the hands of someone who doesn’t know a lot about DataStage and they have been able to complete mappings and generate DataStage jobs. I hope to get some screenshots up in the weeks to come but here are some observations in how we have saved time with FastTrack: What FastTrack can do better. Do you have any suggestions for making FastTrack better?
 * __ FASTTRACK From IBM to Accelerate your development __**
 * 1) FastTrack provides faster access to metadata. In an Excel/Word mapping environment you need to copy and paste your metadata from a tool that can show it into your mapping document. FastTrack can see metadata imported through any of the Information Server import methods such as DataStage plugin imports, the import bridges from Meta Integration Technologies Inc (MITI): ErWin, InfoSphere Data Architect, Cognos, Business Objects, Informatica etc. The imports via the database connectors from any other Information Server product such as the table definitions imported and profiled by Information Analyser. You can import an entire database in a few minutes and drag and drop it onto your various mappings.
 * 2) FastTrack lets you map columns from XML and Cobol Copybook hierarchies to flat file relational database targets without any metadata massaging. In Excel you would spend days cutting and chopping XML and Cobol complex flat file definitions. With FastTrack you can access a definition imported through the DataStage Cobol or XML importers and map away.
 * 3) FastTrack lets you do source to target mapping straight out of your modelling tool. You can import your model straight into the Metadata Server via a bridge and start mapping it. No mucking around with database DDLs and no need to get access to create database schemas. This can be handy in the early days of a project.
 * 4) FastTrack has some great auto mapping functions. There is a discovery functions where you drag and drop the source or target table onto one side of the mapping and then use the discover function to find candidate matches for the other side – then choose the “Best Match” to take the first of the candidates. If you choose multiple columns you can Discover and Best Match all the columns in your table. It searches through for matching column names against the candidate tables.
 * 5) FastTrack can map auto match on the business glossary terms attached to those columns. It is one of the few products in the Information Server that makes productive use of the Business Glossary to speed things up. Of course you need to create your Glossary and fill it with terms and map those terms to physical metadata first! FastTrack lets you add Glossary terms to physical columns as you map.
 * 6) FastTrack lets you balance the mapping between business analysts and ETL developers. Both can use the tool – it’s an Excel style interface – but business analysts may be faster at mapping early in the project as they gather requirements and rules and ETL This can help avoid bottlenecks on your team if anyone can do mapping and can push the results straight into DataStage.
 * 7) FastTrack creates DataStage jobs. These jobs have the source and target connectors already loaded and configured and stages such as lookup, join and transformer already built. It even lets you add Transformer derivations such as macros, job parameters and functions from a function list.
 * 8) FastTrack handles DataStage naming conventions. FastTrack holds a set of rules for naming DataStage stages and links that you can configure to match your naming convention. Normal DataStage development means dragging and dropping stages and links onto a canvas and renaming every one. FastTrack does the naming for you.
 * 9) FastTrack lets you add the links for joins and lookups. I don’t know if you’ve tried to map joins and lookups in Excel but it’s not pretty – you have room to map the extra columns but there is no easy way to show the key fields that join the two sources together. Generally you make a note of it under the mapping. In FastTrack you choose the join/lookup table, choose the key fields that do the join and bring in the extra columns for mapping to the output table and it generates the required stages in DataStage.
 * 10) FastTrack shows progress of mapping tasks. Once you have created a mapping for all interfaces FastTrack will produce a report showing how much of each mapping has been finished saving you the time of tracking progress manually.
 * 1) Better bulk export and import functions – preferably XML and Excel. Excel for when we produce documentation. XML for when we want to back it up or move it between repositories. (Or export, run a search and replace to globally rename a transform value and import it again).
 * 2) Global search and replace on transformation values, similar to the search and replace in the DataStage Transformer, for globally renaming things like function names and job parameter values.
 * 3) More DataStage stages – it currently lets you configure settings for Lookup, Join, Connectors and Transformers. Would like to see Surrogate Key, Change Data Capture and Slowly Changing Dimension support – though it’s debatable whether those are business analyst functions for FastTrack or developer functions for DataStage. It would be cool to define Type 1 Type 2 and key fields for dimension table mapping.
 * 4) Let you run Discover and Best Match on Business Glossary terms so you can find terms that suit the column name you are mapping.
 * 5) Discover transformation rules as well as mappings … oh hang on, that’s in the next release!
 * 6) Reverse engineer DataStage Server jobs so you can generate DataStage Enterprise jobs from a Server job mapping.
 * 7) More flexible licensing. You buy licenses in packs of 10 – and that’s too many for a lot of customers!

$ cat Model_script.sh DSProj="Dev" DSJobName="Jobname" Log_Dir='/workarea/DSJobs/Development/Logs' Script_Name='PETLXXXX' . /opt/IBM/InformationServer/Server/DSEngine/dsenv dsjob -run -mode RESET ${DSProj} ${DSJobName} 2>> ${Log_Dir}/${Script_Name}_err.txt sleep 10 dsjob -run -mode NORMAL -jobstatus \ -param PARM1.Region=_p \ -param PARM1.USER=DATAST \ -param PARM1.DBS_SRC1=dbRODS \ -warn 99999 ${DSProj} ${DSJobName} 2>> ${Log_Dir}/${Script_Name}_err.txt apirc=$? if [ $apirc -eq 2 -o $apirc -eq 1 ] then finalrc=0 echo "INFO: DataStage job $DSJobName Successfully Executed" else finalrc=1 echo "ERROR: DataStage job $DSJobName failed: See Log" fi exit $finalrc
 * 1) ! /usr/bin/ksh
 * 2) MODULE NAME : Call Log Details #
 * 3) DESCRIPTION: Run the sequencer job XXXXX #
 * 4) VERSION: 1.00 #
 * 5) LANGUAGE: KORNSHELL #
 * 6) INSTALLATION: COMPANY Name#
 * 7) PROJECT: BI #
 * 8) AUTHOR: Sreedhar Molgara #
 * 9) DATE WRITTEN: 10/26/2009 #
 * 10) DETAIL #
 * 11) DESCRIPTION: Reports.... #
 * 12) RESTRICTIONS: None #
 * 13) MODIFICATION HISTORY #
 * 14) Date Chg ID Developer Release Description of Change #
 * 15) 10/12/2009  Sreedhar M 1.1 Draft Version #
 * 16) Set DataStage job Parameters
 * 1) MODIFICATION HISTORY #
 * 2) Date Chg ID Developer Release Description of Change #
 * 3) 10/12/2009  Sreedhar M 1.1 Draft Version #
 * 4) Set DataStage job Parameters
 * 1) 10/12/2009  Sreedhar M 1.1 Draft Version #
 * 2) Set DataStage job Parameters
 * 1) Set DataStage job Parameters
 * 1) Set DataStage job Parameters
 * 1) Set DataStage job Parameters
 * 1) Set DataStage job Parameters
 * 1) Set data stage environment
 * 1) Set data stage environment
 * 1) Reset the Job if this is a stand alone Job.
 * 2) Comment this step if the Job you are running is Sequence Job.
 * 1) Comment this step if the Job you are running is Sequence Job.
 * 1) Run the Job
 * 1) Run the Job
 * 1) 1 or 2 is normal... anything else should abort
 * 2) status 1 means success. status 2 means success with warnings.

Nagesh Here you go... while : do rc=` awk ' END {print NR}' ` if [ rc -ne 0 ] then dsjob <> else sleep 360 fi done
 * 1) !/bin/sh
 * 2) Loop forever
 * 1) Sleep one hour