Bulk-loading is a process where one can load large amounts of data into a database in a relatively short period of time. Database indexes are typically optimized for inserting rows one at a time.

Bulk-loading is used when you need to import or export large amounts of data relatively quickly. With bulk loading operations, you don't just insert data one row at a time; data is instead inserted through a variety of more efficient methods based on the structure of the specific database.

The Bulk-load utility can be run from the Actions tab on the Pro2 user interface. Click Actions and then select Run Bulk Loads. It can also be run from the RunBulkLoader shortcut in the bprepl\Scripts folder.
Note: The Bulk-Load Utility always runs the standard bulk copy procedures in bprepl\repl_mproc. It does not run the Direct SQL bulk load procedures in bprepl\SQL_mproc. The SQL_mproc procedures must be run from a Pro2 Editor session.

Pro2 bulk-load processor

The bulk load processor is a collection of special replication programs that perform replication for an entire table which is done for an initial sync of the source table to the target table. A bulk-load procedure basically executes a query such as “FOR EACH table-name: BUFFER-COPY table-name TO Target-Name.” This process can be run while replication is turned on. Run it again to sync an individual table. the only exception is of deletes on the target; if source deleted rows exist in the target, these need to be removed manually. To always be certain this does not happen, truncate the data from the target table before performing a bulk-load).

Bulk-load procedures

After table mapping, the bulk-Load procedures can be generated from the Manage Replication window in the Pro2 user interface. Select Manage Replication and then click Generate code. Select Bulk-Copy Processor. The bulk-load procedures are generated in the directory specified by the MASS_LOAD_DIRECTORY property (bprepl/repl_mproc by default).

The template file specified by the MASS_LOAD_TEMPLATE property is used in the bulk-load procedure generation. By default, this procedure is tmpl_mreplproc_restart_auto-push.p. Bulk-load procedures generated using this template will pick up where they left off if they are stopped and restarted. In addition, if a source record is locked and cannot be copied, the procedures add the record to the replqueue to be written by the replication processor and the bulk-load procedure moves on to the next record.

The Bulk_Max_Cache property in the tmpl_mreplproc_restart-auto-push template stores the last processed row-ids in the cache. In general, if the loading fails or stops during the bulk-load process and then restarted, the loading starts from the beginning. In such cases, the Bulk_Max_Cache property allows you to start the bulk-load process from the last processed row-ids stored in the cache instead of loading from the beginning. This property defaults to 25 and you can modify its value using the Properties tab in the Pro2 user interface.

During the bulk-load process, the Oracle_Bulk_Transaction_Count property (Oracle only) commits multiple records at a time per transaction. This property defaults to 1000. You can add this property and then modify its value using the Properties tab in Pro2 user interface. When setting this property for LAN configuration, ensure that the MASS_LOAD_TEMPLATE property must use the tmpl_mreplproc_oracle.p procedure.

For WAN, ensure that the APPSRV_MASS_LOAD_TEMPLATE property uses the tmpl_ASmproc_oracle.p procedure, and the APPSRV_FETCH_COUNT_MASS_BULK property has the same value as the Oracle_Bulk_Transaction_Count property.

You can use the INCLUDE_LOB property to choose whether to include LOBs in or exclude them from replication and bulkload. The property is set to YES by default, which transfers LOB data to the target database. However, to exclude LOBs from replication and bulkload, set this property to NO. This is applicable to both LAN and WAN environments.

All bulk_load templates have ElapsedTime property which indicates the total time taken for the bulk-load process to complete. The elapsed time information is provided in the log files generated by the bulk-load procedures. You can find this information on each log entry per 100K rows and as well as at the end of the loading process.

The bulk-load procedures write to individual log files in the folder specified by the LOG_DIRECTORY property (bprepl\repl_log folder). The Bulk-Load procedures can be run individually from the Procedure Editor or in a group by the mr<source-database>.p procedure which runs each one in turn. Multiple bulk-loads can also be launched at once from the Bulk-Load utility.

Multi-table bulk-load criteria specification

The Bulk-Load utility automatically skips over tables that have been flagged to not generate queue records (genqrec = FALSE) or to not process queue records (procqrec = FALSE). All other criteria are user-selected.
  • For mapped bulk-load runners—Increase Run Flag If more than one instance of a Bulk-Load Utility is run concurrently, each instance must be given a separate flag (typically letters A through Z) to distinguish temp files created and used by each instance. Each instance of the Bulk-Load Utility can have up to five bulk-load threads. The number of bulk-load utility instances and associated threads is limited by the Progress license on the Pro2 box. For example, two Bulk-Load runners set for five threads each equals a total of twelve Progress license sessions running.
  • For mapped repl thread—Tables mapped to the replication thread entered are selected; zero (0) specifies all threads.
    • Start at table—Indicates what table to start from. Format: {dbname}.{tablename} Example “sports.item” starts at the “item” table of the “sports” database, skipping over tables with names alphabetically before “item”. A period (.) indicates to start from the beginning.
    • Multi-thread bulk-loads—The number bulk-load processes or “threads” to run at once. The maximum is five bulk-load threads per bulk-load launcher.
      Note: Bulk-load threads are distinct from replQueue threads.
    • Re-load table if already bulk-loaded once—Pro2 track of the tables that have been bulk- loaded stores and stores information in the replControl table. It uses this data to restart a load if it is stopped. It also skips loading a table that is already completely loaded. If you set this flag to yes it deletes any “COMPLETED” control records, so the table re-loads from the beginning.
    • Table include list—A comma separated list of tables to include. Patterns can be specified using “*” as a wild card. Example: “ca*,da*,e*,z*”. This performs a bulk-copy of all tables starting with ca, da, e, and z.
    • Table exception list—A comma separated list of tables to exclude or skip. Patterns can be specified using “*” as a wild card. Example: “ca*,da*,e*,z*”. This performs a bulk-copy of all tables except those starting with ca, da, e, and z.

ASCII Bulk-load and export

For WAN implementations, the ASCII bulk-export/import was designed to sync replication tables from Source to Target. For very large tables (typically over fifty million rows), the ASCII Bulk-Export/Load utility can be used to decrease the time of the bulk load. The process uses an ASCII dump export process on the Source server. This process creates the dump procedures and the SQL load procedures and then dumps the data to ASCII flat files. Once this is finished the dumped ASCII data files and the generated SQL load procedures are transferred to the Target database server. The SQL load procedures are then run from the SQL query-editor.

Run bulk loads

To run bulk loads from the bprepl\Scripts folder, run the runblkld.bat file from the Scripts folder and use the following syntax to run bulk loads:

Table 1. Syntax

"%PROEXE%" -ininame Pro2.ini -basekey "INI" -b -pf %REPLPF% -p %CODEDIR%\RunBulkLoads.p -param "ResetBulkload=no;ResetInProcess=no" >> %CODEDIR%\repl_mproclog\runbulkloads.log

Table 2. Parameters
Parameter Description
TableInc Comma delimited list of table names to be loaded. If not included on the param line, the program defaults to "*" (all tables)
TableExc Comma delimited list of table names to be EXCLUDED from the load. If not included on the param line, the program defaults to "xxxxxxx" (no tables to be excluded)
ResetBulkload Used to determine if the load should reload any tables previously marked as complete. Valid entries are "Yes" or "No"
ResetInProcess Used to determine if the load should restart from scratch any tables with a status of "in process". 'Yes' means restart from scratch, 'No' means attempt to restart where previously stopped
DB Point this to a mapped source db. Leave blank or remove for the -params line to load all mapped databases. Invalid entries cause the load to abort.
RunFlag Determines the "run flag" to be used. Defaults to RunFlag="A" then not send in via command line parameters.
ForThread If entered the process only loads tables mapped to the ForThread value. Default is ForThread=0. Load tables for any thread number.
MProcThreads Used to determine how many "threads" this runFlag process will use for this load. Default is MProcThreads=9. Maximum Value = 9, Values greater than 9 will be reset to 9.
MaxRecordCount Primarily used for testing. If a value is entered, the load for each table stop once the MaxRecordCount is reached. Default is MProcThreads=0. (Do not stop for max record count).