DBPedias

All your database knowledge are belong to you

MySQL

Replication - Performance Setup Detail

Slaves to Handle Reads

Slaves are often used to handle large volumes of reads for an OLAP system (data warehouse). The usage pattern for a typical system is some sort of nightly/weekly/monthly batch load (ETL) followed by a group of users performing very complex and expensive selects on the data. With this general pattern in mind it is easy to see that a Master - Slave - Slave setup is ideal.

For a given number of users, a slave is created to service their needs. Each user need not connect to the same slave each time, but some intelligent ratio based on usage, need and budget can be arrived at to come up with a number of machines from which the users pull data. The key to the success of such an arrangement is that the ETL process does not cause the slave to lag so far behind that they are unavailable when users arrive in the morning to begin work. A work around here can see the ETL process affect half of the slaves the first day and half the next, or some other such rolling schedule.

While a master machine in this scenario may want to make use of more expensive SCSI drives for their concurrency properties, the slaves can use the more cost effective (and often faster) SATA drives. Similarly, if the master is just handling writes and the ETL is straight forward, cheaper CPU's and less memory may make sense. More powerful CPU's to handle complex calculations within the query and maximizing memory to maximize caching will make the slave reads as fast a possible.

It is very important to understand that this type of setup (master slave slave) is not appropriate for systems where the master is handling many writes in real time (OLTP). This is because each slave must still do the same number of writes as the master. Spreading the load among many slaves will ease overall load but not in the most cost effective manner. For OLTP systems with load issues at the database level, consider MySQL Cluster (the NDB storage engine) or rewriting your schema to leverage replication if you are not ready for clustering at this time (see section below).

Rewriting the Schema for Leverage

Consider a complex system with all of its tables in a single schema. This is quite common because there are, unfortunately, developers long before there are DBA's. Further, load issues on the database are often not addressed until the need is near critical.

With some planning it is generally easy to identify subsystems that can be moved to their own schema. There will also be a group of tables (reference tables) used by much of the system for informational purposes. Consider the ERD below:

RPPSD 67.gif

The diagram above represents three systems in the same database. On the left is a scheduling and report card system made up of five tables.

  • credit_type
  • class
  • faculty
  • student_class
  • grade
  • attendance

At the bottom there are four tables for student discipline records.

  • discipline_referral
  • administrator
  • infraction
  • administrative_action

At the right are three tables for storing the standardized test and benchmarks of students.

  • test
  • test_type
  • test_score

Careful inspection of the diagram shows that the other tables are common to all of the subsystems. These four groups of tables should each be placed in their own database. Note that foreign key constraints can still be enforced over databases in InnoDB, so nothing is given up in data integrity. The names of the four databases are sched, dsp, test and ref respectively.

Using the replicate-ignore settings on each slave, there would be three slaves in a Master - Slave - Slave configuration that would each contain the databases:

  1. ref, sched
  2. ref, dsp
  3. ref, test

If one system is busier than the others, it can be given more than one slave. In this way slave one will only see writes from the scheduling system to the sched database and ignore all activity on the discipline and test systems. This frees the slave to handle more complex reporting queries while the master handles the operation queries such as reference data look up.

Note: It is strongly recommended that you understand replicate-do and replicate-ignore, found in the MySQL documentation.