Bulk loading is a process that can load large amounts of data into a target database in a relatively short period of time.

With bulk loading operations in Pro2, depending on the template used, data is replicated either one row at a time or multiple rows are loaded for a single transaction, resulting in a more efficient method to seed a database. Generally, bulk loading is done during the initial seeding of a database after Pro2 has been configured and after your target database is built.

The Bulk load utility can be run from the Actions tab on the Pro2 web interface. It can also be run from command line utility bprepl\Scripts\runbulkloads.bat by providing input parameter details. Both methods achieve the same result. The method that you choose depends on your preference and business needs.

Example of the command line syntax:

bprepl\RunBulkLoads.p 
-param "DB=sports;TableInc=customer,
item;
TableExc=xxxxxxxxx;
RunFlag=A;
ForThread=0;
MaxRecordCount=0;
ResetBulkload=no;
ResetInProcess=no;
MProcThreads=9"
Note: The Bulk Load Utility always runs the standard bulk copy procedures in bprepl\repl_mproc folder in the Pro2 root directory.

Pro2 bulk load processor

The bulk load processor is a collection of programs that perform a bulk load for an entire table for an initial sync of the source table to the target table. A bulk load procedure executes a query such as FOR EACH table-name: BUFFER-COPY table-name TO Target-Name. This process can be run while replication is turned on. If deleted source database rows exist in the target, then they must be removed manually. To ensure that this does not happen, truncate the data from the target table before performing a bulk load.

Generate bulk load procedures

Before you generate bulk load procedures, ensure that you have configured your source and target databases, and mapped all tables that are to be loaded.

The bulk-load procedures can be generated from the Manage Replication window in the Pro2 web interface.

To generate bulk load procedures:
  1. Select the Manage Replication tab, choose your database, and then click Generate code.
  2. Select Bulk-Copy Processor. The bulk load procedures are generated in the directory specified by the MASS_LOAD_DIRECTORY property. The properties are set to bprepl/repl_mproc by default.

The template file specified by the MASS_LOAD_TEMPLATE property is used in the bulk load procedure generation. There are several templates for creating bulk load procedures. They are located in the bprepl/repl_tmpl directory and start with tmpl_mreplproc. Each of the templates have different functionality for different target types based on your needs. For example tmpl_mreplproc_mssSendSQL.p is for Microsoft SQL Server, while tmpl_mreplproc_restart-auto-push.p is used for any target type. If bulk load procedures that are generated using this template are stopped and then restarted, they continue from where they left off. In addition, if a source record is locked and cannot be copied, the procedures add the record to the replqueue to be written by the replication processor and the bulk load procedure moves on to the next record.

The Bulk_Max_Cache property tells the template how many processed row IDs to store in the cache. In general, if the loading fails or stops during the bulk load process and then restarted, the loading starts from the beginning. In such cases, the Bulk_Max_Cache property allows you to start the bulk load process from the last processed row IDs stored in the cache, instead of loading from the beginning. This property defaults to 25, and you can modify its value using the Properties tab in the Pro2 web interface.

For Oracle WAN implementations, ensure that the APPSRV_MASS_LOAD_TEMPLATE property uses the tmpl_ASmproc_oracle.p procedure, and the APPSRV_FETCH_COUNT_MASS_BULK property has the same value as the Oracle_Bulk_Transaction_Count property.

You can use the INCLUDE_LOB property to choose whether to include or exclude LOB data from replication and bulk loading. The property is set to NO by default, which excludes LOB data from replication and bulk loading. If the property is set to YES, then the LOB data is sent to the target database. This is applicable to both LAN and WAN environments.

The elapsed time information is provided in the log files generated by the bulk load procedures. You can find this information on each log entry per 100K rows and at the end of the loading process. Time elapsed is counted in seconds and milliseconds.

The bulk load procedures write to individual log files in the folder specified by the MASS_LOAD_LOG_DIRECTORY property (bprepl\repl_mproclog folder). The bulk load procedures can be initiated from the Procedure Editor or in a group by using the mr source-database.p procedure which runs each one in turn. Multiple bulk loads can also be launched at one time from the Bulk Load utility or the web user interface.
Note: For the Oracle target type, when re-running the bulk load operation, it is recommended to delete the target records or truncate the target table before proceeding with the bulk load of that table again. Otherwise, the bulk load operation fails and generates an error due to the violation of the unique constraint.
Table 1. Bulk load properties
Name Default value Description
APPSRV_MASS_LOAD_DIRECTORY bprepl/AppSrv/as_mproc This property designates the folder that Pro2 WAN implementations of Bulk Load procedures are generated.
APPSRV_MASS_LOAD_TEMPLATE tmpl_ASmproc.p Sets the name of the template procedure that is used during the bulk-load procedure for a WAN implementation. This property value can be set to:

Default: tmpl_ASmproc.p

Microsoft SQL Server: tmpl_ASmproc_mssSendSQL.p

Oracle: tmpl_ASmproc_oracle.p

Pro2 to Pro2 replication: tmpl_ASmproc_pro2pro.p

MASS_LOAD_TEMPLATE tmpl_mreplproc_restart-auto-push.p

Sets the name of the template procedure that is used during the bulk-load generation.

MASS_LOAD_TEMPLATE_PRO2ORA tmpl_mreplproc_oracle.p This template is used in both LAN and WAN implementations for Oracle Target database. It uses the send-sql statement option to widen the Oracle transaction. The send-sql statement option does not support LOB fields.
MASS_LOAD_TEMPLATE_PRO2PRO tmpl_mreplproc_pro2pro.p This template is used in a LAN implementation with an OpenEdge Target database. It is faster than the default template, which includes the restart logic.
MASS_LOAD_TEMPLATE_PRO2SQL tmpl_mreplproc_mssSendSQL.p This template is used in both LAN and WAN implementations for Microsoft SQL server. It uses the send-sql to perform faster but it does not support LOB fields.
MASS_LOAD_LOG_DIRECTORY bprepl/repl_mproclog

Designates the Bulk Load log files storage folder location.

MASS_LOAD_DIRECTORY bprepl/repl_mproc

This property designates the folder that Pro2 Bulk Load procedures are generated.

Run bulk load from the UI

After you have configured your source and target databases, mapped your tables, generated your bulk load procedure, and set your properties, you are ready to run a bulk load.

To run a bulk load:
  1. Click the Actions tab and select Bulk Load > Run Bulk Loads.
  2. Choose your Source Database.
  3. (Optional) If you are rerunning bulk loads, reset your bulk load results.
  4. Specify the file tag in the Enter file tag for multiple bulk-load runners field if more than one instance of a bulk load is running concurrently.

    Each instance must be given a separate flag (typically letters A through Z) to distinguish temporary files that are created and used by each instance. Each instance of the Bulk-Load Utility can have up to 5 bulk load threads. The number of bulk load utility instances and associated threads is limited by the Progress Pro2 license.

  5. Set the For tables mapped to Repl Thread

    Tables that are mapped to the replication thread entered are selected. Zero (0) specifies all threads.

  6. Set the Multi-thread bulk loads

    The number bulk load processes or threads to run at one time. The default value is 9.

  7. Indicate the Max record count. This property is used per table. Enter 0 to set a limitless maximum record count.
  8. Set the Re-load table if already bulk loaded once

    Pro2 tracks the tables that have been bulk loaded and stores information in the Repl_Control tables. It uses this data to restart a load if it is stopped. It also skips loading a table that is already loaded. If you set this flag to YES it deletes any COMPLETED control records, so the table re-loads from the beginning.

  9. Set the Reset the “In Process” loads to start over if already bulk loaded once.

    Pro2 tracks the bulk load status and stores that information in Repl_Control tables. It uses this data to restart In Process tables if a load was stopped.

  10. Set the Tables include list

    A comma separated list of tables to include in the bulk load. Load patterns can be specified by using * as a wild card. For example: ca*,da*,e*,z*. This performs a bulk-copy of all tables starting with ca, da, e, and z.

  11. Set the Tables exception list

    A comma separated list of tables to exclude or skip. Load patterns can be specified using * as a wild card. For example, ca*,da*,e*,z*. This example performs a bulk-copy of all tables except those starting with ca, da, e, and z.

  12. Click Run.
  13. In the Confirmation dialog box, review the details of the bulk load process and then click Confirm.

    The Bulk Load Running Status pane displays the status of the ongoing bulk load process.

Include DB Analysis

You can also choose to bulk load only a few selected tables based on the number of records they contain. For example, you can bulk load only those tables that have over 100,000 records, or alternatively, bulk load only the tables that have fewer than 100,000 records. It can be accomplished as follows:
  1. To analyze a database or table, run the DBANALYS or TABANALYS commands in PROUTIL.
  2. To create the filters in the Run Bulk Loads utility window, use the <dbname>.tab output file. For example, for sports as the source database, run the following DBANALYS or TABANALYS commands for generating the analysis output file:
    proutil sports -C dbanalys -csoutput sports.tab
    proutil sports -C tabanalys -csoutput sports.txt
    Note: The valid extensions for the output file are .tab and .txt.
  3. In the Run Bulk Loads window, click Include DB Analysis.

        The Include DB Analysis dialog box opens.

  4. In the Include DB Analysis dialog box:
    1. In the File Name field, specify the <dbname>.tab or <dbname>.txt output file that contains the database and the table analysis.
    2. In the Insert Into List field, select either of the following:
      1. Include - To include the tables that pass the filter. These tables are displayed in the Table include list field of the Run Bulk Loads utility.
      2. Exception - To exclude the tables during the bulk load execution. These tables are displayed in the Table exception list.
    3. In the Table Record >= and AND <= fields, enter the range of number of records for which you need to filter the tables.
    4. Click OK.

Reset/Clear Bulk-Copy Results

Click Reset/Clear Bulk-Copy Results to:
  • Clear the details captured in the Repl Control table from the previous bulk load.
  • Reset the information displayed on all the bulk load status cards in the Bulk Load Running Status pane.

Bulk Load Running Status

The Bulk Load Running Status pane displays the real-time status of the ongoing bulk load. Granular data, such as the count of tables successfully processed, left to process, or failed during the bulk load process, is displayed along with the status of the bulk load process on the bulk load status cards. The information on these cards is automatically updated every five seconds.

The following table describes the user interface elements of the bulk load status card:

Element Description
Source Database Displays the name of the source database.
File Tag Displays the unique identifier that you specify at the start of the bulk load process. Typically, the File Tag is an upper case letter (A to Z).
Bulk load status Displays the present status of the bulk load card, such as:
  • Success—All tables are successfully loaded into the target database.
  • Failed—One or more tables failed to be loaded into the target database.
  • Running—One or more tables are being loaded into the target database.
Table-count summary
  • Finished—Indicated with a green circle, displaying the number of tables successfully loaded into the target database.
  • Failed—Indicated with a red circle, displaying the number of tables that failed to get loaded into the target database.
  • Left to Process—Indicated with a yellow circle, displaying the number of tables remaining to be loaded into the target database.
Status bar Indicated in green, displaying the percentage of tables successfully processed by the bulk load process. It also displays the number of successfully processed tables relative to the total number of tables.
See Thread level info Displays the list of thread IDs running for the selected bulk load process if the bulk load process is in progress, it also displays the respective source tables being processed.
Delete status card Deletes the bulk load status card.
Note: After previously bulk loading a set of tables, when you bulk load the same tables under a different file tag name by setting the Re-load table if already bulk loaded once to Yes, the previous bulk load status card is also affected. The status of the same tables is displayed identically on both cards. It is a limitation by design. To overcome this, Progress recommends deleting the previous bulk load status card using the Delete status card button.

Run bulk loads from the command line

To run bulk loads from the bprepl\Scripts folder, run the runblkld.bat file from the Scripts folder and use the following syntax to run bulk loads:

Table 2. Syntax
"%PROEXE%" -ininame Pro2.ini -basekey "INI" -b -pf %REPLPF% -p %CODEDIR%\RunBulkLoads.p -param "ResetBulkload=no;ResetInProcess=no" >> %CODEDIR%\repl_mproclog\runbulkloads.log
Table 3. Parameters
Parameter Description
TableInc Comma-separated list of table names to be loaded. If not included on the parameter line, the program defaults to * (all tables).
TableExc Comma-separated list of table names to be excluded from the load. If not included on the parameter line, the program defaults to xxxxxxx (no tables to be excluded).
ResetBulkload Used to determine if any tables previously marked as complete should be reloaded. Valid entries are YES or NO.
ResetInProcess Used to determine if the load should restart any tables with a status of In process. YES means restart from the beginning, and NO means to attempt to restart where the load previously stopped.
DB This parameter should point to a mapped source database. It can be left blank, or you can remove it by using the -params line to load all mapped databases. Invalid entries cause the load to abort.
RunFlag Determines which run flag to be used. This parameter defaults to RunFlag=A.
ForThread If entered, the process loads only tables mapped to the ForThread value. This parameter defaults to ForThread=0. You can load tables for any thread number.
MProcThreads This parameter is used to determine how many threads runFlag process will use for a load. The default for this parameter defaults MProcThreads=9. The maximum value is 9. Values greater than 9 will be reset to 9.
MaxRecordCount This parameter is primarily used for testing. If a value is entered, the load for each table is stopped after the MaxRecordCount is reached. Default is MaxRecordCount=0.

ASCII Bulk load and export

For WAN implementations, the ASCII bulk export/import is designed to sync replication tables from source to target. For large tables (typically over 50 million rows), the ASCII Bulk-export/load utility can be used to decrease the time to perform the bulk load. The process uses an ASCII dump export process on the source server. This process creates the dump and SQL load procedures, and then dumps the data to ASCII flat files. After this is finished, the dumped ASCII data files and the generated SQL load procedures are transferred to the target database server. The SQL load procedures are then run from the SQL query editor.

Bulk load reports

Generating a bulk load report can be useful in a variety of situations and can be essential for troubleshooting. A bulk load report can be generated in two ways: from the Pro2 user interface under Action > Bulk Load > Bulk Load Report, or by using the GET bulk load report API. For more information about how to use the API, see GET bulk load report API.

The bulk load report generates report of all mapped tables associated with your bulk load history. The bulk load report can be exported in PDF, Excel, and CSV formats. When the export job is complete, you can download the exported report in CSV format from the Pending Jobs watchbox.



Troubleshoot bulk load

On occasion, after the bulk load operation is complete, the replqueue is full of records with the value Bulk-Copy-Err under the User column. Records with the Bulk-Copy-Err user indicate that the record was not able to be copied to the target for some reason. When this happens, Pro2 assigns the record to what the Bulk-Copy-Err queue for processing after the error that blocked it is resolved. Check the Pro2 directory\bprepl\replmproclog for log files ending with _Err.log. Sort by size because the records with the errors are the only ones above 1KB. Resolve the errors indicated by the logs, and the queue should empty out normally as long as the replication remains running.