Advanced Replication Design and
Setup Tips
Is Replication Appropriate?
Advanced Replication is not appropriate for all high availability
scenarios. Replication is generally most appropriate for the following
situations;
1) The number of replicated tables is modest and manageable (< 1000,
not 10,000 or more).
There is no specific limit imposed by Oracle concerning the number of
tables which can be replicated.
The main issue will concern whether the number of replicated tables
will be manageable by the DBA in a timely manner.
2) The DBA has full control over the design of the tables.
Every replicated table must have a primary key, and additional columns
and triggers need added for conflict resolution. Schemas such as Oracle
Financials or SAP do not lend themselves to replication because they
consist of 10,000 - 20,000 tables, which would represent too many tables
for a DBA to manage in a replicated environment. The DBA also does not
have full control over the schemas implemented by these applications,
and would not be able to safely modify the tables without affecting
the reliability or supportability of the application. For these types
of applications Clustering or a Standby database configuration (or both)
should be considered.
3) Database availability requires a zero-failover timeframe.
The Tnsnames.ora Net8 configuration file can be configured for Transparent
Application Failover. TAF allows Net8 to automatically direct database
access requests to alternate servers. Standby databases require a manual
switchover process to be performed by the DBA in order to recover from
a failover, with an additional delay required for the standby database
to apply the remaining archivelog files. Users can be manually directed
to alternate servers for load balancing purposes one server does not
get overloaded.
4) Dropping of a table at one site must not propagate to other sites.
DDL commands do not get replicated unless this feature is specifically
enabled. The scripts created by this application do not replicate DDL
commands in order to improve reliability due to this type of error.
Standby and Clustering solutions always transfer DDL commands between
sites. However standby databases can be configured with a delay interval
in an attempt to give the DBA the opportunity to prevent the DDL from
reaching the standby site.
5) The number of transactions per second is manageable (does not exceed
400 transactions per second). This is a general guideline, which has
been mentioned by Oracle, and is very dependent upon the hardware being
used for the database servers. Replication generates a significant amount
of redo because all of the replicated data is written multiple times.
The data gets written into a deferred transaction queue at the source
site (causing redo), gets sent out across the network where it then
gets queued again (generating more redo) until it is finally applied
to the destination database.
6) The data need replicated between servers running differing Oracle
versions or differing Operating Systems.
Standby and Clustered databases always require that the servers be running
the exact same version of Oracle and the exact same operating system
platform and version. Replication however may take place between differing
Oracle versions running on different platforms.
7) The database servers need to be physically located far away from
each other.
Standby databases can be remotely located from each other, but clustered
databases can only be as far away from each other as a short haul fiber
optic cable connection (probably not more than about 20 miles).
Replicated databases can also be remotely located as long as the communications
link is reliable and fast enough to support the transaction rate which
needs supported.
Setup Tips:
1) The use of the Oracle Replication Manager application is not recommended
(even by Oracle) for the initial setup of Advanced Replication. Some
versions of the Replication Manager application contain a known bug
whereby the application logs into the database using the SYSTEM account
instead of the SYS account during the setup process. Use of the SYSTEM
account does not provide enough priviledges to complete the required
tasks, thus causing the setup process to fail. Oracle Worldwide Support
recommends that DBAs should build their own scripts to set up replication
(the reason for the existence of this application!). Oracle recommends
that DBAs should only use the Java-based Replication Manager application
to manage replication once it has been setup. Some previous versions
(8.1.6 for instance) of the Replication Manager also do not permit for
the saving of configuration SQL scripts.
2) When setting up replication on an Oracle 9i database sqlplus must
be used, because svrmgrl is no longer available. With previous versions
of Oracle it is generally advisable to use svrmgrl when setting up replication
in order to avoid a small number of issues with sqlplus when performing
DBA tasks such as shutting down or starting up databases.
3) Forward and reverse DNS must be setup for each server hostname in
order for the database servers to be registered into the OEM repository.
The servers must be registered into the OEM repository to allow the
Replication Manager to manage the servers.
Performance:
1) If 2 master sites are located physically close to each other, then
dedicating a network interface for the exclusive use of Replication
is recommended. Even if master sites are not physically close together
may benefit from the separation of replication traffic on a different
interface for non-replication related access to the database.
2) Most UNIX systems will not route TCP/IP traffic to a 2nd network
interface unless it is assigned an IP address in another network or
subnet. For instance if the primary network interface used an IP address
of 10.1.0.10 with a subnet of 255.255.255.0, then the 2nd network interface
could potentually use an IP address of 10.2.0.10 with a subnet of 255.255.255.0.
However if the 2nd network interface were assigned an IP address of
10.1.0.11, then it would not appear to function because all of the network
traffic would get sent thru the 1st network interface.
The network administrator should be consulted prior to assigning IP
addresses or subnetes in order to insure that the site's security policy
is followed and to insure that access lists in switches, routers and
firewalls are updated properly.
Batch Operations:
1) Oracle recommends that batch transactions which insert or update
more than a few hundred records should be performed with "Procedural
Replication". Procedural Replication is Oracle's term for running
a DBA-created PL/SQL stored procedure at each master site without allowing
the changes to propagate via replication to the other sites. Within
the PL/SQL procedure the DBA should first execute the DBMS_REPUTIL.REPLICATION_OFF
procedure before performing DML operations on replicated tables. The
DBMS_REPUTIL.REPLICATION_OFF procedure disables the replication process
during the current session. After the DML operations have been completed,
the DBA should execute the DBMS_REPUTIL.REPLICATION_ON procedure to
re-enable replication within the session.
This process needs to be repeated at each of the remaining master sites
in order to insure that the data is in sync.
2) In addition to executing the DBMS_REPUTIL.REPLICATION_OFF procedure
the DBA should also disable the replication-related trigger on the date_modified
column of the table. Triggers do not execute within the same user session,
the replication process will take place based upon the modified data
and this will then cause data to be replicated to the other sites during
the batch operation.
Conflict Resolution:
1) Conflict resolution is an often-overlooked but important part of
any Advanced Replication setup. If data updates occur to the same record
simultaneously at more than one site, a conflict will occur. Oracle
will automatically handle the resolution of these types of conflicts
if conflict resolution methods are designed into the replication process.
The scripts generated by Repgen
automatically implement 3 conflict resolution methods on every replicated
table because Oracle recommends a minimum of 2 conflict resolution methods.
The 1st conflict resolution method is the LATEST TIMESTAMP method. This
method retains the most recently modified data as being the most up-to-date
information.
The 2nd conflict resolution method is SITE PRIORITY. If the LATEST TIMESTAMP
method fails to resolve a conflict, then the 2nd method is applied.
This method assigns a priority to each master site and causes data at
the site with the highest priority to be retained across all of the
sites.
There are some situations which will not be resolved by either of the
1st two conflict resolution methods. One of these situations involves
mis-convergence of the data between servers. If the number of records
within the tables at each of the servers is identical, but the actual
data contains differing values between the servers, the replication
transaction will fail with the "ORA-1403 Data Not Found" error.
According to the database the reason that the data was not found is
due to the same values not being found when comparing the before images
of data at each site. Oracle compares the data in each of the table
columns before the modification of the data. If any column at any one
of the sites contains data in the "before modification" image
of the data which is different from the site which is pushing the data,
then the database considers this to be a conflict requiring resolution.
Ideally this situation should never happen, but under real-world operating
conditions it does happen. One of the most common reasons this problem
occurs is due to the modification of data outside of the replication
process. In general it is not a good idea to modify data outside of
the replication process, but in fact it is required if you are performing
procedural replication due to batch updating requirements. The OVERWRITE
conflict resolution method will resolve this type of conflict by overriding
the conflict situation and forcing the propagation of the most recent
modification of the table to all of the sites. If the DBA is aware of
differing data within the tables (as a result of running the DBMS_RECTIFIER_DIFF
package), then the DBA can perform a harmless modification to data at
the site containing the "correct" data, thus pushing the "correct"
data to the rest of the sites and forcing convergence of the data. An
good example of a harmless modification is updating the date_modified
column of the table with the SYSDATE value. If the DBA does nothing,
the data will converge automatically during the normal course of updating
data within the tables. But ideally a person who is knowledgeable in
the design of the schema and overall dataflow should make this decision.
2) An important conflict resolution technique is conflict avoidance.
If multiple sites generate records containing the same primary key value,
a conflict will result. If no conflict resolution is implemented, then
the transactions will end up in the deferror queue. If the DBA tries
to re-apply the transactions manually, they will still fail due to the
duplicate primary keys which will fail while trying to insert data into
the table. The scripts generated by this graphical application are set
up to implement differing ranges of primary key values at each master
site. These scripts are designed around a primary key value consisting
of an incrementing integer value. Other methods can be implemented,
but would require modifying the reptriggers.sql code. It is recommended
that a large block of numbers be set aside for primary keys at each
master site. In most cases 1G blocks of primary key values will generally
be sufficient to prevent the DBA from having to re-allocate numbers
within a reasonable period of time. (The database isn't going to run
out of number anytime soon either!). The use of this straightforward
approach toward assigning primary key values can also pay benefits when
troubleshooting problems. By looking at the 1st digit in the primary
key value, the DBA can instantly determine the site which produced the
data.