Bulk load with Pro2
- Last Updated: June 8, 2020
- 5 minute read
Bulk loading is a process that can load large amounts of data into a database in a relatively short period of time. Database indexes are typically optimized for inserting rows one at a time.
With bulk loading operations, data is not inserted one row at a time; data is instead inserted through a variety of more efficient methods based on the structure of the specific database.
RunBulkLoader shortcut in the bprepl\Scripts
folder. bprepl\repl_mproc. It does not run the direct SQL bulk load
procedures in bprepl\SQL_mproc. The SQL_mproc
procedures must be run from a Pro2 Editor session. Pro2 bulk load processor
The bulk load processor is a collection of special replication programs that
perform replication for an entire table, that is done for an initial sync of the
source table to the target table. A bulk load procedure executes a query such as
FOR EACH table-name: BUFFER-COPY
table-name TO Target-Name. This
process can be run while replication is turned on. If the procedure is run a second
time individual tables sync. An exception to this is if target operation attempts to
process deleted data. If deleted source database rows exist in the target, then they
must be removed manually. To ensure that this does not happen, truncate the data
from the target table before performing a bulk load.
Bulk load procedures
After table mapping, the bulk-load procedures can be generated from the Manage Replication window in the Pro2 web interface.
- Select Manage Replication and then click Generate code.
- Select Bulk-Copy Processor. The bulk load procedures
are generated in the directory specified by the
MASS_LOAD_DIRECTORYproperty. The properties are set tobprepl/repl_mprocby default.
The template file specified by the MASS_LOAD_TEMPLATE property is used in the bulk load procedure
generation. By default, this procedure is
tmpl_mreplproc_restart_auto-push.p. If bulk load procedures
that are generated using this template are stopped and then restarted, it picks up
where it off . In addition, if a source record is locked and cannot be copied, the
procedures add the record to the replqueue to be written by the
replication processor and the bulk load procedure moves on to the next record.
The Bulk_Max_Cache property in the
tmpl_mreplproc_restart-auto-push template stores the last processed row
IDs in the cache. In general, if the loading fails or stops during the bulk load
process and then restarted, the loading starts from the beginning. In such cases,
the Bulk_Max_Cache property allows you to start the
bulk load process from the last processed row IDs stored in the cache, instead of
loading from the beginning. This property defaults to 25, and you can modify its
value using the Properties tab in the Pro2 web interface.
During the bulk load process, the Oracle_Bulk_Transaction_Count property (Oracle only) commits multiple
records at a time per transaction. This property defaults to 1000. You can add this
property and then modify its value using the Properties tab
in Pro2 web interface. When setting this property for a LAN configuration, ensure
that the MASS_LOAD_TEMPLATE property uses the
tmpl_mreplproc_oracle.p procedure.
For WAN, ensure that the APPSRV_MASS_LOAD_TEMPLATE property uses the
tmpl_ASmproc_oracle.p procedure, and the APPSRV_FETCH_COUNT_MASS_BULK property has the same
value as the Oracle_Bulk_Transaction_Count
property.
You can use the INCLUDE_LOB property to
choose whether to include LOBs in or exclude them from replication and bulk loading.
The property is set to YES by default, which transfers LOB data to
the target database. However, to exclude LOBs from replication and bulk loading, set
this property to NO. This is applicable to both LAN and WAN
environments.
All bulk load templates have ElapsedTime property which
indicates the total time taken for the bulk load process to complete. The elapsed
time information is provided in the log files generated by the bulk load procedures.
You can find this information on each log entry per 100K rows and at the end of the
loading process.
The bulk load procedures write to individual log files in the folder specified
by the LOG_DIRECTORY property
(bprepl\repl_log folder). The bulk load procedures can be run
individually from the Procedure Editor or in a group by
using the mr source-database.p procedure which
runs each one in turn. Multiple bulk loads can also be launched at one time from the
Bulk Load utility.
Multi-table bulk load criteria specification
genqrec = FALSE) or to not process queue
records (procqrec = FALSE). All other criteria are user-selected.
- For mapped bulk load runners—Increase the run flag if more than one instance of a bulk load Utility is run concurrently. Each instance must be given a separate flag (typically letters A through Z) to distinguish temporary files that are created and used by each instance. Each instance of the Bulk-Load Utility can have up to 5 bulk load threads. The number of bulk load utility instances and associated threads is limited by the Progress Pro2 license.
- For mapped replication threads—Tables
that are mapped to the replication thread entered are selected. Zero (0)
specifies all threads.
- Start at table—Indicates what
table to start from. Format:
dbname.tablename.For example,
sports.itemstarts at theitemtable of thesportsdatabase. It skips over tables with names alphabetically beforeitem. A period (.) indicates to start from the beginning. - Multi-thread bulk loads—The
number bulk load processes or threads to run at one time. The
maximum is five bulk load threads per bulk load launcher. Note: Bulk load threads are distinct from
replQueuethreads. - Re-load table if already bulk loaded
once—Pro2 tracks the tables that have been bulk-
loaded and stores information that in
replControltables. It uses this data to restart a load if it is stopped. It also skips loading a table that is already loaded. If you set this flag toYESit deletes anyCOMPLETEDcontrol records, so the table re-loads from the beginning. - Table include list—A comma
separated list of tables to include in the bulk load. Load patterns
can be specified by using
*as a wild card. For example:ca*,da*,e*,z*. This performs a bulk-copy of all tables starting withca,da,e, andz. - Table exception list—A comma
separated list of tables to exclude or skip. Load patterns can be
specified using
*as a wild card. For example,ca*,da*,e*,z*. This example performs a bulk-copy of all tables except those starting withca,da,e, andz.
- Start at table—Indicates what
table to start from. Format:
ASCII Bulk load and export
For WAN implementations, the ASCII bulk export/import is designed to sync replication tables from source to target. For large tables (typically over 50 million rows), the ASCII Bulk-export/load utility can be used to decrease the time to perform the bulk load. The process uses an ASCII dump export process on the source server. This process creates the dump and SQL load procedures, and then dumps the data to ASCII flat files. After this is finished, the dumped ASCII data files and the generated SQL load procedures are transferred to the target database server. The SQL load procedures are then run from the SQL query editor.
Run bulk loads
To run bulk loads from the bprepl\Scripts folder, run the
runblkld.bat file from the
Scripts folder and use the following syntax to run bulk
loads:
|
| Parameter | Description |
|---|---|
TableInc |
Comma-separated list of table names to be loaded. If not included on the parameter line, the program defaults to * (all tables). |
TableExc |
Comma-separated list of table names to be excluded from the load. If not included on the parameter line, the program defaults to xxxxxxx (no tables to be excluded). |
ResetBulkload |
Used to determine if any tables previously marked as complete should
be reloaded. Valid entries are YES or
NO. |
ResetInProcess |
Used to determine if the load should restart any tables with a status
of In process. YES means restart
from the beginning, and NO means to attempt to
restart where the load previously stopped. |
DB |
This parameter should point to a mapped source database. It can be
left blank, or you can remove it by using the -params
line to load all mapped databases. Invalid entries cause
the load to abort. |
RunFlag |
Determines which run flag to be used. This parameter defaults to
RunFlag=A. |
ForThread |
If entered, the process loads only tables mapped to the ForThread value. This parameter
defaults to ForThread=0. You can
load tables for any thread number. |
MProcThreads |
This parameter is used to determine how many threads runFlag process will use for a load.
The default for this parameter defaults MProcThreads=9. The maximum value is 9. Values greater
than 9 will be reset to 9. |
MaxRecordCount |
This parameter is primarily used for testing. If a value is entered,
the load for each table is stopped after the MaxRecordCount is reached. Default is
MaxRecordCount=0. |