11 min read

 

Oracle GoldenGate 11g Implementer’s guide

Oracle GoldenGate 11g Implementer's guide

Design, install, and configure high-performance data replication solutions using Oracle GoldenGate

  • The very first book on GoldenGate, focused on design and performance tuning in enterprise-wide environments
  • Exhaustive coverage and analysis of all aspects of the GoldenGate software implementation, including design, installation, and advanced configuration
  • Migrate your data replication solution from Oracle Streams to GoldenGate
  • Design a GoldenGate solution that meets all the functional and non-functional requirements of your system
  • Written in a simple illustrative manner, providing step-by-step guidance with discussion points
  • Goes way beyond the manual, appealing to Solution Architects, System Administrators and Database Administrators       

Oracle states that GoldenGate can achieve near real-time data replication. However, out of the box, GoldenGate may not meet your performance requirements. Here we focus on the main areas that lend themselves to tuning, especially parallel processing and load balancing, enabling high data throughput and very low latency.

Let’s start by taking a look at some of the considerations before we start tuning Oracle GoldenGate.

Before tuning GoldenGate

There are a number of considerations we need to be aware of before we start the tuning process. For one, we must consider the underlying system and its ability to perform. Let’s start by looking at the source of data that GoldenGate needs for replication to work the online redo logs.

Online redo

Before we start tuning GoldenGate, we must look at both the source and target databases and their ability to read/write data. Data replication is I/O intensive, so fast disks are important, particularly for the online redo logs. Redo logs play an important role in GoldenGate: they are constantly being written to by the database and concurrently being read by the Extract process. Furthermore, adding supplemental logging to a database can increase their size by a factor of 4!

Firstly, ensure that only the necessary amount of supplemental logging is enabled on the database. In the case of GoldenGate, the logging of the Primary Key is all that is required.

Next, take a look at the database wait events, in particular the ones that relate to redo. For example, if you are seeing “Log File Sync” waits, this is an indicator that either your disk writes are too slow or your application is committing too frequently, or a combination of both. RAID5 is another common problem for redo log writes. Ideally, these files should be placed on their own mirrored storage such as RAID1+0 (mirrored striped sets) or Flash disks. Many argue this to be a misconception with modern high speed disk arrays, but some production systems are still known to be suffering from redo I/O contention on RAID5.

An adequate number (and size) of redo groups must be configured to prevent “checkpoint not complete” or “cannot allocate new log” warnings appearing in the database instance alert log. This occurs when Oracle attempts to reuse a log file but the checkpoint that would flush the blocks in the DB buffer cache to disk are still required for crash recovery. The database must wait until that checkpoint completes before the online redolog file can be reused, effectively stalling the database and any redo generation.

Large objects (LOBs)

Know your data. LOBs can be a problem in data replication by virtue of their size and the ability to extract, transmit, and deliver the data from source to target. Tables containing LOB datatypes should be isolated from regular data to use a dedicated Extract, Data Pump, and Replicat process group to enhance throughput. Also ensure that the target table has a primary key to avoid Full Table Scans (FTS), an Oracle GoldenGate best practice. LOB INSERT operations can insert an empty (null) LOB into a row before updating it with the data. This is because a LOB (depending on its size) can spread its data across multiple Logical Change Records, resulting in multiple DML operations required at the target database.

Base lining

Before we can start tuning, we must record our baseline. This will provide a reference point to tune from. We can later look back at our baseline and calculate the percentage improvement made from deploying new configurations.

An ideal baseline is to find the “breaking point” of your application requirements. For example, the following questions must be answered:

  1. What is the maximum acceptable end to end latency?
  2. What are the maximum application transactions per second we must accommodate?

To answer these questions we must start with a single threaded data replication configuration having just one Extract, one Data Pump, and one Replicat process. This will provide us with a worst case scenario in which to build improvements on.

Ideally, our data source should be the application itself, inserting, deleting, and updating “real data” in the source database. However, simulated data with the ability to provide throughput profiles will allow us to gauge performance accurately Application vendors can normally provide SQL injector utilities that simulate the user activity on the system.

Balancing the load across parallel process groups

The GoldenGate documentation states “The most basic thing you can do to improve GoldenGate’s performance is to divide a large number of tables among parallel processes and trails. For example, you can divide the load by schema”.This statement is true as the bottleneck is largely due to the serial nature of the Replicat process, having to “replay” transactions in commit order. Although this can be a constraining factor due to transaction dependency, increasing the number of Replicat processes increases performance significantly. However, it is highly recommended to group tables with referential constraints together per Replicat.

The number of parallel processes is typically greater on the target system compared to the source. The number and ratio of processes will vary across applications and environments. Each configuration should be thoroughly tested to determine the optimal balance, but be careful not to over allocate, as each parallel process will consume up to 55MB. Increasing the number of processes to an arbitrary value will not necessarily improve performance, in fact it may be worse and you will waste CPU and memory resources.

The following data flow diagram shows a load balancing configuration including two Extract processes, three Data Pump, and five Replicats:

Oracle GoldenGate 11g

Considerations for using parallel process groups

To maintain data integrity, ensure to include tables with referential constraints between one another in the same parallel process group. It’s also worth considering disabling referential constraints on the target database schema to allow child records to be populated before their parents, thus increasing throughput. GoldenGate will always commit transactions in the same order as the source, so data integrity is maintained.

Oracle best practice states no more than 3 Replicat processes should read the same remote trail file. To avoid contention on Trail files, pair each Replicat with its own Trail files and Extract process. Also, remember that it is easier to tune an Extract process than a Replicat process, so concentrate on your source before moving your focus to the target.

Splitting large tables into row ranges across process groups

What if you have some large tables with a high data change rate within a source schema and you cannot logically separate them from the remaining tables due to referential constraints? GoldenGate provides a solution to this problem by “splitting” the data within the same schema via the @RANGE function. The @RANGE function can be used in the Data Pump and Replicat configuration to “split” the transaction data across a number of parallel processes.

The Replicat process is typically the source of performance bottlenecks because, in its normal mode of operation, it is a single-threaded process that applies operations one at a time by using regular DML. Therefore, to leverage parallel operation and enhance throughput, the more Replicats the better (dependant on the number of CPUs and memory available on the target system).

The RANGE function

The way the @RANGE function works is it computes a hash value of the columns specified in the input. If no columns are specified, it uses the table’s primary key. GoldenGate adjusts the total number of ranges to optimize the even distribution across the number of ranges specified. This concept can be compared to Hash Partitioning in Oracle tables as a means of dividing data.

With any division of data during replication, the integrity is paramount and will have an effect on performance. Therefore, tables having a relationship with other tables in the source schema must be included in the configuration. If all your source schema tables are related, you must include all the tables!

Adding Replicats with @RANGE function

The @RANGE function accepts two numeric arguments, separated by a comma:

  1. Range: The number assigned to a process group, where the first is 1 and the second 2 and so on, up to the total number of ranges.
  2. Total number of ranges: The total number of process groups you wish to divide using the @RANGE function.

The following example includes three related tables in the source schema and walks through the complete configuration from start to finish.

For this example, we have an existing Replicat process on the target machine (dbserver2) named ROLAP01 that includes the following three tables:

  • ORDERS
  • ORDER_ITEMS
  • PRODUCTS

We are going to divide the rows of the tables across two Replicat groups. The source database schema name is SRC and target schema TGT. The following steps add a new Replicat named ROLAP02 with the relevant configuration and adjusts Replicat ROLAP01 parameters to suit.

Note that before conducting any changes stop the existing Replicat processes and determine their Relative Byte Address (RBA) and Trail file log sequence number. This is important information that we will use to tell the new Replicat process from which point to start.

  1. First check if the existing Replicat process is running:

    GGSCI (dbserver2) 1> info all

    Program Status Group Lag Time Since Chkpt
    MANAGER RUNNING
    REPLICAT RUNNING ROLAP01 00:00:00 00:00:02

    
    
  2. Stop the existing Replicat process:
    GGSCI (dbserver2) 2> stop REPLICAT ROLAP01
    Sending STOP request to REPLICAT ROLAP01...
    Request processed.
  3. Add the new Replicat process, using the existing trail file.
    GGSCI (dbserver2) 3> add REPLICAT ROLAP02, exttrail ./dirdat/tb
    REPLICAT added.
  4. Now add the configuration by creating a new parameter file for ROLAP02.
    GGSCI (dbserver2) 4> edit params ROLAP02


    — Example Replicator parameter file to apply changes
    — to target tables

    REPLICAT ROLAP02
    SOURCEDEFS ./dirdef/mydefs.def
    SETENV (ORACLE_SID= OLAP)
    USERID ggs_admin, PASSWORD ggs_admin
    DISCARDFILE ./dirrpt/rolap02.dsc, PURGE
    ALLOWDUPTARGETMAP
    CHECKPOINTSECS 30
    GROUPTRANSOPS 2000

    MAP SRC.ORDERS, TARGET TGT.ORDERS, FILTER (@RANGE (1,2));
    MAP SRC.ORDER_ITEMS, TARGET TGT.ORDER_ITEMS, FILTER (@RANGE
    (1,2));
    MAP SRC.PRODUCTS, TARGET TGT.PRODUCTS, FILTER (@RANGE (1,2));

  5. Now edit the configuration of the existing Replicat process, and add the @RANGE function to the FILTER clause of the MAP statement. Note the inclusion of the GROUPTRANSOPS parameter to enhance performance by increasing the number of operations allowed in a Replicat transaction.
    GGSCI (dbserver2) 5> edit params ROLAP01


    — Example Replicator parameter file to apply changes
    — to target tables

    REPLICAT ROLAP01
    SOURCEDEFS ./dirdef/mydefs.def
    SETENV (ORACLE_SID=OLAP)
    USERID ggs_admin, PASSWORD ggs_admin
    DISCARDFILE ./dirrpt/rolap01.dsc, PURGE
    ALLOWDUPTARGETMAP
    CHECKPOINTSECS 30
    GROUPTRANSOPS 2000
    MAP SRC.ORDERS, TARGET TGT.ORDERS, FILTER (@RANGE (2,2));
    MAP SRC.ORDER_ITEMS, TARGET TGT.ORDER_ITEMS, FILTER (@RANGE
    (2,2));
    MAP SRC.PRODUCTS, TARGET TGT.PRODUCTS, FILTER (@RANGE (2,2));

  6. Check that both the Replicat processes exist.

    GGSCI (dbserver2) 6> info all
    Program Status Group Lag Time Since Chkpt
    MANAGER RUNNING
    REPLICAT STOPPED ROLAP01 00:00:00 00:10:35
    REPLICAT STOPPED ROLAP02 00:00:00 00:12:25

    
    
  7. Before starting both Replicat processes, obtain the log Sequence Number (SEQNO) and Relative Byte Address (RBA) from the original trail file.

    GGSCI (dbserver2) 7> info REPLICAT ROLAP01, detail
    REPLICAT ROLAP01 Last Started 2010-04-01 15:35 Status STOPPED
    Checkpoint Lag 00:00:00 (updated 00:12:43 ago)
    Log Read Checkpoint File ./dirdat/tb000279 <- SEQNO
    2010-04-08 12:27:00.001016 RBA 43750979 <- RBA
    Extract Source Begin End
    ./dirdat/tb000279 2010-04-01 12:47 2010-04-08 12:27
    ./dirdat/tb000257 2010-04-01 04:30 2010-04-01 12:47
    ./dirdat/tb000255 2010-03-30 13:50 2010-04-01 04:30
    ./dirdat/tb000206 2010-03-30 13:50 First Record
    ./dirdat/tb000206 2010-03-30 04:30 2010-03-30 13:50
    ./dirdat/tb000184 2010-03-30 04:30 First Record
    ./dirdat/tb000184 2010-03-30 00:00 2010-03-30 04:30
    ./dirdat/tb000000 *Initialized* 2010-03-30 00:00
    ./dirdat/tb000000 *Initialized* First Record

    
    
  8. Adjust the new Replicat process ROLAP02 to adopt these values, so that the process knows where to start from on startup.
    GGSCI (dbserver2) 8> alter replicat ROLAP02, extseqno 279
    REPLICAT altered.

    GGSCI (dbserver2) 9> alter replicat ROLAP02, extrba 43750979
    REPLICAT altered.

    Failure to complete this step will result in either duplicate data or ORA-00001 against the target schema, because GoldenGate will attempt to replicate the data from the beginning of the initial trail file (./dirdat/tb000000) if it exists, else the process will abend.

  9. Start both Replicat processes. Note the use of the wildcard (*).
    GGSCI (dbserver2) 10> start replicat ROLAP*

    Sending START request to MANAGER …
    REPLICAT ROLAP01 starting

    Sending START request to MANAGER …
    REPLICAT ROLAP02 starting

  10. Check if both Replicat processes are running.

    GGSCI (dbserver2) 11> info all
    Program Status Group Lag Time Since Chkpt
    MANAGER RUNNING
    REPLICAT RUNNING ROLAP01 00:00:00 00:00:22
    REPLICAT RUNNING ROLAP02 00:00:00 00:00:14

    
    
  11. Check the detail of the new Replicat processes.

    GGSCI (dbserver2) 12> info REPLICAT ROLAP02, detail
    REPLICAT ROLAP02 Last Started 2010-04-08 14:18 Status RUNNING
    Checkpoint Lag 00:00:00 (updated 00:00:06 ago)
    Log Read Checkpoint File ./dirdat/tb000279
    First Record RBA 43750979
    Extract Source Begin End
    ./dirdat/tb000279 * Initialized * First Record
    ./dirdat/tb000279 * Initialized * First Record
    ./dirdat/tb000279 * Initialized * 2010-04-08 12:26
    ./dirdat/tb000279 * Initialized * First Record

    
    
  12. Generate a report for the new Replicat process ROLAP02.
    GGSCI (dbserver2) 13> send REPLICAT ROLAP02, report

    Sending REPORT request to REPLICAT ROLAP02 …
    Request processed.

  13. Now view the report to confirm the new Replicat process has started from the specified start point. (RBA 43750979 and SEQNO 279). The following is an extract from the report:
    GGSCI (dbserver2) 14> view report ROLAP02
    2010-04-08 14:20:18 GGS INFO 379 Positioning with begin
    time: Apr 08, 2010 14:18:19 PM, starting record time: Apr 08, 2010
    14:17:25 PM at extseqno 279, extrba 43750979.

 

LEAVE A REPLY

Please enter your comment!
Please enter your name here