DBPedias

All your database knowledge are belong to you

MySQL

Replication - How it Works

A full set of directions on how replication works can be found in the MySQL documentation. If you will be using replication for high availability purposes such as fail over, it is strongly recommended that you read and understand this documentation as well as searching the bug database for known issues with your MySQL version and replication to insure none are applicable to your situation.

This section provides the overview of the processes and files supporting replication necessary to understand the Quick Start guide and the scripts provided to support the use of replication in a production environment.

Files Used by Replication

Replication uses a number of files to keep track of the various moving parts.

Files on the Master
* my.cnf - Contains the configuration of each master (Click here to learn more about Quick Start).
  • Binary log - This file contains every statement that makes a change to the database in a binary format. The client mysqlbinlog will read this file into a text format for use by a human user.
  • Binary log index - This file contains the name of the current binary log file


Files on the Slave
* my.cnf - Contains information on configuring the MySQL server as a slave.
  • master.info - Contains information about the master used to connect to the master server.
  • relay-log.info - Contains information about the name of the current relay log and the last statement.
  • relay-log - Similar to the binary log on the master, this file contains all the statements executed by the slave. The difference is that the slave SQL process automatically purges the log when the last statement is read.
  • error-log - Is a file of problems encountered by the SQL thread. If the slave dies, this is the first place to begin looking for the reason.

Open each of the files and become familiar with their format and contents. Each will be very useful to you when (not if) disaster strikes.

The Binary Log on the Master

For replication to work, the master must first have binary logging enabled. While it is stored in binary format for performance reasons, think of it as a text file containing each statement executed by the database server.

Consider the following scenario: Start with a new install of mysql and turn on binary logging before issuing any other statement. Fill the database with schemas, tables and other objects. Copy the binary log to a new machine with a new mysql install and replay the binary log using the mysql client and mysqlbinlog client. Both the original database and the one constructed from the log should be identical. This is the premise on which replication works.

The binary log also has a number of uses in disaster recovery and trouble shooting, but is not enabled by default for a fresh mysql install. The reason for this is that under heavy load, the server will be slowed by disk writes and the logs may take up a great deal of space. To address this problem, write the binary log to a disk array different from that of the data directory to allow for the concurrent writes to the database and the binary log.

Threads Used by Replication

The slave must first be given a copy of the data, the name of the binary log and the line number when the copy was made (replication coordinates). Knowing this, the slave connects to the server with one thread (slave I/O) and requests the content of the master's binary log. The master creates a thread (binlog dump) to send the contents of the log to the slave. If more than one slave connects to the master, the master creates a thread for each slave connection. The slave I/O thread then writes each statement in to the slave's relay logs. A second thread on the slave (SQL) reads the relay log sequentially.

The two thread implementation means that the slave can easily keep up with the binary log writes on the master regardless of how long they take. The I/O thread will write a statement nearly instantaneously while the slave may take some time executing it due to its complexity.

The slave uses the replication coordinates to tell the master the point in time it needs to start sending information from. The slave updates this information in the master.info file as the relay-log is filled.

RPOVR 55.gif