Bulk loading is a process that can load large amounts of data into a database in a relatively short period of time. Database indexes are typically optimized for inserting rows one at a time.

With bulk loading operations, data is not inserted one row at a time; data is instead inserted through a variety of more efficient methods based on the structure of the specific database.

The Bulk load utility can be run from the Actions tab on the Pro2 web interface. Click Actions and then select Run Bulk Loads. It can also be run from the RunBulkLoader shortcut in the bprepl\Scripts folder.
Note: The Bulk Load Utility always runs the standard bulk copy procedures in bprepl\repl_mproc. It does not run the direct SQL bulk load procedures in bprepl\SQL_mproc. The SQL_mproc procedures must be run from a Pro2 Editor session.

Pro2 bulk load processor

The bulk load processor is a collection of special replication programs that perform replication for an entire table, that is done for an initial sync of the source table to the target table. A bulk load procedure executes a query such as FOR EACH table-name: BUFFER-COPY table-name TO Target-Name. This process can be run while replication is turned on. If the procedure is run a second time individual tables sync. An exception to this is if target operation attempts to process deleted data. If deleted source database rows exist in the target, then they must be removed manually. To ensure that this does not happen, truncate the data from the target table before performing a bulk load.

Bulk load procedures

After table mapping, the bulk-load procedures can be generated from the Manage Replication window in the Pro2 web interface.

To generate bulk load procedures:
  1. Select Manage Replication and then click Generate code.
  2. Select Bulk-Copy Processor. The bulk load procedures are generated in the directory specified by the MASS_LOAD_DIRECTORY property. The properties are set to bprepl/repl_mproc by default.

The template file specified by the MASS_LOAD_TEMPLATE property is used in the bulk load procedure generation. By default, this procedure is tmpl_mreplproc_restart_auto-push.p. If bulk load procedures that are generated using this template are stopped and then restarted, it picks up where it off . In addition, if a source record is locked and cannot be copied, the procedures add the record to the replqueue to be written by the replication processor and the bulk load procedure moves on to the next record.

The Bulk_Max_Cache property in the tmpl_mreplproc_restart-auto-push template stores the last processed row IDs in the cache. In general, if the loading fails or stops during the bulk load process and then restarted, the loading starts from the beginning. In such cases, the Bulk_Max_Cache property allows you to start the bulk load process from the last processed row IDs stored in the cache, instead of loading from the beginning. This property defaults to 25, and you can modify its value using the Properties tab in the Pro2 web interface.

During the bulk load process, the Oracle_Bulk_Transaction_Count property (Oracle only) commits multiple records at a time per transaction. This property defaults to 1000. You can add this property and then modify its value using the Properties tab in Pro2 web interface. When setting this property for a LAN configuration, ensure that the MASS_LOAD_TEMPLATE property uses the tmpl_mreplproc_oracle.p procedure.

For WAN, ensure that the APPSRV_MASS_LOAD_TEMPLATE property uses the tmpl_ASmproc_oracle.p procedure, and the APPSRV_FETCH_COUNT_MASS_BULK property has the same value as the Oracle_Bulk_Transaction_Count property.

You can use the INCLUDE_LOB property to choose whether to include LOBs in or exclude them from replication and bulk loading. The property is set to YES by default, which transfers LOB data to the target database. However, to exclude LOBs from replication and bulk loading, set this property to NO. This is applicable to both LAN and WAN environments.

All bulk load templates have ElapsedTime property which indicates the total time taken for the bulk load process to complete. The elapsed time information is provided in the log files generated by the bulk load procedures. You can find this information on each log entry per 100K rows and at the end of the loading process.

The bulk load procedures write to individual log files in the folder specified by the LOG_DIRECTORY property (bprepl\repl_log folder). The bulk load procedures can be run individually from the Procedure Editor or in a group by using the mr source-database.p procedure which runs each one in turn. Multiple bulk loads can also be launched at one time from the Bulk Load utility.

Multi-table bulk load criteria specification

The Bulk-Load utility automatically skips over tables that were flagged to not generate queue records (genqrec = FALSE) or to not process queue records (procqrec = FALSE). All other criteria are user-selected.
  • For mapped bulk load runners—Increase the run flag if more than one instance of a bulk load Utility is run concurrently. Each instance must be given a separate flag (typically letters A through Z) to distinguish temporary files that are created and used by each instance. Each instance of the Bulk-Load Utility can have up to 5 bulk load threads. The number of bulk load utility instances and associated threads is limited by the Progress Pro2 license.
  • For mapped replication threads—Tables that are mapped to the replication thread entered are selected. Zero (0) specifies all threads.
    • Start at table—Indicates what table to start from. Format: dbname.tablename.

      For example, sports.item starts at the item table of the sports database. It skips over tables with names alphabetically before item. A period (.) indicates to start from the beginning.

    • Multi-thread bulk loads—The number bulk load processes or threads to run at one time. The maximum is five bulk load threads per bulk load launcher.
      Note: Bulk load threads are distinct from replQueue threads.
    • Re-load table if already bulk loaded once—Pro2 tracks the tables that have been bulk- loaded and stores information that in replControl tables. It uses this data to restart a load if it is stopped. It also skips loading a table that is already loaded. If you set this flag to YES it deletes any COMPLETED control records, so the table re-loads from the beginning.
    • Table include list—A comma separated list of tables to include in the bulk load. Load patterns can be specified by using * as a wild card. For example: ca*,da*,e*,z*. This performs a bulk-copy of all tables starting with ca, da, e, and z.
    • Table exception list—A comma separated list of tables to exclude or skip. Load patterns can be specified using * as a wild card. For example, ca*,da*,e*,z*. This example performs a bulk-copy of all tables except those starting with ca, da, e, and z.

ASCII Bulk load and export

For WAN implementations, the ASCII bulk export/import is designed to sync replication tables from source to target. For large tables (typically over 50 million rows), the ASCII Bulk-export/load utility can be used to decrease the time to perform the bulk load. The process uses an ASCII dump export process on the source server. This process creates the dump and SQL load procedures, and then dumps the data to ASCII flat files. After this is finished, the dumped ASCII data files and the generated SQL load procedures are transferred to the target database server. The SQL load procedures are then run from the SQL query editor.

Run bulk loads

To run bulk loads from the bprepl\Scripts folder, run the runblkld.bat file from the Scripts folder and use the following syntax to run bulk loads:

Table 1. Syntax

"%PROEXE%" -ininame Pro2.ini -basekey "INI" -b -pf %REPLPF% -p %CODEDIR%\RunBulkLoads.p -param "ResetBulkload=no;ResetInProcess=no" >> %CODEDIR%\repl_mproclog\runbulkloads.log

Table 2. Parameters
Parameter Description
TableInc Comma-separated list of table names to be loaded. If not included on the parameter line, the program defaults to * (all tables).
TableExc Comma-separated list of table names to be excluded from the load. If not included on the parameter line, the program defaults to xxxxxxx (no tables to be excluded).
ResetBulkload Used to determine if any tables previously marked as complete should be reloaded. Valid entries are YES or NO.
ResetInProcess Used to determine if the load should restart any tables with a status of In process. YES means restart from the beginning, and NO means to attempt to restart where the load previously stopped.
DB This parameter should point to a mapped source database. It can be left blank, or you can remove it by using the -params line to load all mapped databases. Invalid entries cause the load to abort.
RunFlag Determines which run flag to be used. This parameter defaults to RunFlag=A.
ForThread If entered, the process loads only tables mapped to the ForThread value. This parameter defaults to ForThread=0. You can load tables for any thread number.
MProcThreads This parameter is used to determine how many threads runFlag process will use for a load. The default for this parameter defaults MProcThreads=9. The maximum value is 9. Values greater than 9 will be reset to 9.
MaxRecordCount This parameter is primarily used for testing. If a value is entered, the load for each table is stopped after the MaxRecordCount is reached. Default is MaxRecordCount=0.