admin管理员组

文章数量:1031269

官方教程来啦!上手体验YashanDB主备部署、同步延迟和自动切换能力

在上一篇深度干货 | 如何兼顾性能与可靠性?一文解析YashanDB主备高可用技术中,我们深入探讨了 YashanDB 高可用的架构设计原理和关键技术,本文将聚焦于实践操作,快速体验 YashanDB 的主备高可用能力。

概要

YashanDB 提供了不同部署形态下故障自动切换的能力:一主一备环境中,可以基于外部仲裁 OM 实现主备自动切换;一主多备配置中,可以基于 Raft 协议实现主备自动切换。当主机异常时,触发超时时间后,备机可以快速完成角色切换,继续执行业务,业务中断时间在秒级水平。

本文将进行一主一备安装部署、体验 YashanDB 的备机同步延迟和两种自动切换能力。整体操作简单易上手,大家可前往 YashanDB 官网下载中心下载最新的个人版进行体验。

安装前准备

1 前提条件

  • 获取 YashanDB 的安装包
  • 准备三台服务器(有条件的可以准备四台服务器,OM 部署到单独的服务器)
  • 开启 SSH 服务
  • 创建 yashan 用户及用户组
  • 创建 HOME 目录和 DATA 目录
  • 检查 YashanDB 所需端口是否被占用
  • 准备测试工具:benchmarksql-5.0
  • 时钟同步,确保测试结果的正确性

2 测试环境 服务器配置情况:

环境信息:

3 创建用户

# useradd -d /home/yashan -m yashan# passwd yashan

4 创建安装目录 HOME 目录和 DATA 目录均规划在 /data/yashan 下,yashan 用户需要对该目录拥有全部权限,可执行如下命令授权:# cd /

# mkdir yashan_data# mkdir yashan_home# chmod -R 770 /data/yashan/yashan_data# chmod -R 770 /data/yashan/yashan_hom

5 下载安装包并解压 从 YashanDB 的官网()下载最新的个人版安装包并解压。

安装一主一备

  1. 生成安装配置文件:hosts.toml 和 yashandb.toml

[yashan@ob1 install]$ yasboot package se gen --cluster yashandb -u yashan -p yashan --ip 192.168.7.10,192.168.7.11 --port 22 --install-path /data1/yashan/yasdb_home --data-path /data1/yashan/yasdb_data --begin-port 1688 --node 2

hostid | group | node_type | node_name | listen_addr | replication_addr | data_path

-------------------------------------------------------------------------------------------------------------

host0001 | dbg1 | db | 1-1 | 192.168.7.10:1688 | 192.168.7.10:1689 | /data1/yashan/yasdb_data

----------+-------+-----------+-----------+-------------------+-------------------+--------------------------

host0002 | dbg1 | db | 1-2 | 192.168.7.11:1688 | 192.168.7.11:1689 | /data1/yashan/yasdb_data

----------+-------+-----------+-----------+-------------------+-------------------+--------------------------

Generate config success

  1. 调整配置文件:根据实际需要调整 yashandb.toml 配置文件中的安装参数,可在 group 级别设置 YashanDB 的所有建库参数,可在 node 级别设置 YashanDB 的所有配置参数。为了保证本次测试的稳定,redo 文件、数据文件以及归档文件需要单独使用一块磁盘,需要调整文件的创建路径[group.config]

REDO_FILE_NUM = 10

REDO_FILE_SIZE = "10G"

REDO_FILE_PATH = '/data2/yashan/redo'

[group.node.config]

ARCHIVE_LOCAL_DEST = '/home/yashan/archive'

  1. 执行安装:安装 YashanDB 的运行程序到其他服务器,并且启动运维服务进程 yasom 和 yasagent

[yashan@ob1 install]$ yasboot package install -t hosts.toml -i yashandb-personal-23.1.1.100-linux-x86_64.tar.gz

checking install package...

install version: yashandb 23.1.1.100

host0001 100% [====================================================================] 3s

host0002 100% [====================================================================] 3s

update host to yasom...

  1. 部署集群

[yashan@ob1 install]$ yasboot cluster deploy -t yashandb.toml

type | uuid | name | hostid | index | status | return_code | progress | cost ------------------------------------------------------------------------------------------------------------

task | e3205df3e98645ed | DeployYasdbCluster | - | yashandb | SUCCESS | 0 | 100 | 174 ------+------------------+--------------------+--------+----------+---------+-------------+----------+------

task completed, status: SUCCESS

  1. 设置 sys 用户密码:设置为 yashandb_123

[yashan@ob1 install]$ yasboot cluster password set --new-password yashandb_123 --cluster yashandb

type | uuid | name | hostid | index | status | return_code | progress | cost ----------------------------------------------------------------------------------------------------------

task | 4e11fb328e1695ac | YasdbPasswordSet | - | yashandb | SUCCESS | 0 | 100 | 3 ------+------------------+------------------+--------+----------+---------+-------------+----------+------

task completed, status: SUCCESS

  1. 安装后检查 检查整个集群的状态:

[yashan@ob1 install]$ yasboot cluster status --cluster yashandb --detail

hostid | node_type | nodeid | pid | instance_status | database_status | database_role | listen_address | data_path -------------------------------------------------------------------------------------------------------------------------------------------------

host0001 | db | 1-1:1 | 69010 | open | normal | primary | 192.168.7.10:1688 | /data1/yashan/yasdb_data/db-1-1 ----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------

host0002 | db | 1-2:2 | 86135 | open | normal | standby | 192.168.7.11:1688 | /data1/yashan/yasdb_data/db-1-2 ----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------

检查主备的链接状态:

SQL> select DEST_ID, CONNECTION, PEER_ADDR, STATUS, DATABASE_MODE from v$archive_dest_status;

DEST_ID CONNECTION PEER_ADDR STATUS DATABASE_MODE ------- ----------------- ---------------------------------------------------------------- ----------------- -----------------

1 CONNECTED 192.168.7.11:1689 NORMAL OPEN 1 row fetched.

检测主备的同步情况:做一些简单的业务测试

  1. 配置参数调优 根据服务器的负载生成推荐参数

SQL> EXEC DBMS_PARAM.OPTIMIZE(NULL, NULL, 90, 90);

PL/SQL Succeed.

‍查看参数推荐报告

SQL> SELECT DBMS_PARAM.SHOW_RECOMMEND() FROM DUAL;

DBMS_PARAM.SHOW_RECO ---------------------------------------------------------------- ********** Recommended Settings For HEAP Table ***********+--------------------------------+-------------+-------------+---------+| name | current | recommend | restart |+--------------------------------+-------------+-------------+---------+| DATA_BUFFER_SIZE | 64M | 272785M | True || VM_BUFFER_SIZE | 32M | 34823M | True || WORK_AREA_STACK_SIZE | 1024K | 2M | True || WORK_AREA_POOL_SIZE | 16M | 128M | True || WORK_AREA_HEAP_SIZE | 512K | 512K | True || SHARE_POOL_SIZE | 256M | 34823M | True || LARGE_POOL_SIZE | 128M | 2048M | True || MAX_PARALLEL_WORKERS | 32 | 372 | True || SCOL_DATA_BUFFER_SIZE | 128M | 128M | True || SCOL_DATA_PRELOADERS | 2 | 2 | True || COLUMNAR_WORK_AREA_HEAP_SIZE | 64M | 32M | True || COLUMNAR_VM_BUFFER_SIZE | 2G | 128M | True || COLUMNAR_BULK_SIZE | 1024 | 1024 | True || COMPRESSION | LZ4 | LZ4 | True || PQ_POOL_SIZE | 128M | 128M | True || MAX_SESSIONS | 1024 | 1024 | True || MAX_WORKERS | 0 | 0 | True || TAB_QUEUE_WINDOW_SIZE | 4 | 4 | True || BLOOM_FILTER_FACTOR | .3 | .3 | True || DEGREE_OF_PARALLEL | 1 | 1 | True || MMS_DATA_LOADERS | 4 | 8 | True || CHECKPOINT_INTERVAL | 100000 | 256M | False || CHECKPOINT_TIMEOUT | 300 | 60 | False || REDOFILE_IO_MODE | DSYNC | DSYNC | True || DATAFILE_IO_MODE | DEFAULT | DEFAULT | True || COMMIT_LOGGING | IMMEDIATE | IMMEDIATE | False || RECOVERY_PARALLELISM | 16 | 64 | True || REDO_BUFFER_SIZE | 64M | 64M | True |+--------------------------------+-------------+-------------+---------+| total memory | 346760M |+--------------------------------+-------------+-------------+---------+

Note: You can execute 'DBMS_PARAM.APPLY_RECOMMEND()' to apply the recommend parameters.

After applying the parameters, you need to restart the database.

1 row fetched.

将参数写入配置文件

SQL> EXEC DBMS_PARAM.APPLY_RECOMMEND();

PL/SQL Succeed.

配置参数是实例级别,需要每个节点都执行该操作。

  1. 开启自动切换:设置 FailoverThreshold 为 5,并且开启自动切换

[yashan@ob1 install]$ yasboot election config set -k FailoverThreshold -v 5 --cluster yashandbgroup 1 execute Succeed

[yashan@ob1 install]$ yasboot election enable on -c yashandbgroup 1 execute Succeed

[yashan@ob1 install]$ yasboot election config show --cluster yashandbgroup 1

Protection Mode: MAXIMUM PROTECTION

Members:

[1-1:1] - Primary database

[1-2:2] - Physical standby database

Transport Lag: 0 seconds

Apply Lag: 0 seconds

Apply Rate: 2.73 MByte/s

Properties:

FailoverThreshold = 5

FailoverAutoReinstate = false

ZeroDataLossMode = true

Automatic Failover: Enabled in Zero Data Loss Mode

测试备机同步延迟 8ms

1 测试方案

  • 主机创建一张表:create table ha_test (time_col timestamp),往该表插入一条数据。
  • 获取本地时间戳,用本地时间戳 update 该表的数据,并提交。持续执行该操作。
  • 在备机上查询该表的数据,通过执行查询该表的时间戳与查询到表中的数据的时间戳做差值,这个时间差就是主备同步的延迟。(表中只有一条数据,所以执行 update 和 select 操作的时间可以忽略不计)

2 测试步骤

  1. 首先准备 TPC-C 压力测试(如何使用 TPC-C 压力测试可以参考 YashanDB 的官网,有详细的介绍)。
  2. TPC-C 配置为 300 仓 128 并发,在该配置下可以达到百万级别 tpmC 的压力测试,在这种压力业务场景下执行测试验证。
  3. 分别在主机和备机上执行测试脚本(总共做 100 次测试)。
  4. 根据脚本统计的数据,计算主备业务的时间差。

测试脚本:

#!/bin/bash#主机执行update业务操作# 修改100次for ((i=1; i<=100; i++))do

# 获取当前时间并格式化为数据库可接受的格式

current_time=$(date +'%Y-%m-%d %H:%M:%S.%3N')

echo "Current time is: $current_time"

# 修改表ha_test的数据

yasql ha_test/123@192.168.7.10:1688 -c "UPDATE ha_test SET time_col='$current_time';"

sleep 0.1done

#!/bin/bash

# 备机执行查询操作while truedo

# 获取当前时间

current_time=$(date +'%Y-%m-%d %H:%M:%S.%3N')

echo "Current time is: $current_time"

#查询表ha_test的时间列数据

yasql ha_test/123@192.168.7.11:1688 -c "select time_col from ha_test;"done

3 测试结果

  • 测试时的 redo 刷盘速度(查询 V$REDOSTAT 获知):235MB/s
  • 备机查询延迟的平均值:8ms

从 100 次测试中选取 5 次数据如下:

‍ ‍

测试仲裁自动切换,RTO<8S

RTO 的计算方式:旧主机业务中断时间同新主机执行业务成功的时间差。 1 测试步骤

1.继续构造压力测试场景(使用 TPC-C 的压力测试),执行 10 分钟左右的压力业务。

2.检测主机业务的中断时间和新主机成功执行业务的时间。

3.分别在主机和备机上执行检测脚本。

4.kill 主机进程,使主机的业务中断。 测试脚本:

#!/bin/bash# 无限循环while truedo

# 获取当前时间并格式化为数据库可接受的格式

current_time=$(date +'%Y-%m-%d %H:%M:%S.%3N')

# 打印当前时间

echo "Current time is: $current_time"

# 执行写操作

yasql ha_test/123@192.168.7.10:1688 -c "UPDATE ha_test SET time_col='$current_time';"done

2 测试结果 执行测试前集群的状态:

[yashan@ob1 install]$ yasboot cluster status --cluster yashandb --detail

hostid | node_type | nodeid | pid | instance_status | database_status | database_role | listen_address | data_path -------------------------------------------------------------------------------------------------------------------------------------------------

host0001 | db | 1-1:1 | 69010 | open | normal | primary | 192.168.7.10:1688 | /data1/yashan/yasdb_data/db-1-1 ----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------

host0002 | db | 1-2:2 | 86135 | open | normal | standby | 192.168.7.11:1688 | /data1/yashan/yasdb_data/db-1-2 ----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------

kill 主机之后,集群的状态(备机已经变成了主机)

[yashan@ob1 sync_test]$ yasboot cluster status --cluster yashandb --detail

hostid | node_type | nodeid | pid | instance_status | database_status | database_role | listen_address | data_path -------------------------------------------------------------------------------------------------------------------------------------------------

host0001 | db | 1-1:1 | off | - | - | - | 192.168.7.10:1688 | /data1/yashan/yasdb_data/db-1-1 ----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------

host0002 | db | 1-2:2 | 86135 | open | normal | primary | 192.168.7.11:1688 | /data1/yashan/yasdb_data/db-1-2 ----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------

旧主机业务中断的时间戳:

Current time is: 2024-03-19 15:45:38.464SQL> UPDATE ha_test SET time_col='2024-03-19 15:45:38.464';

1 row affected.

Current time is: 2024-03-19 15:45:38.476SQL> UPDATE ha_test SET time_col='2024-03-19 15:45:38.476';

YAS-00406 connection is closed

新主机执行业务成功的时间:

Current time is: 2024-03-19 15:45:46.204SQL> UPDATE ha_test SET time_col='2024-03-19 15:45:46.204';

YAS-06010 the database is not in readwrite mode

Current time is: 2024-03-19 15:45:46.211SQL> UPDATE ha_test SET time_col='2024-03-19 15:45:46.211';

1 row affected.

3 测试总结

  • 心跳间隔配置:1s
  • 检查超时时间配置:5s
  • 当前的 redo 刷盘速度:237MB/s
  • 业务中断时间:7.745s
  • 故障转移时间:小于 3s

部署一主两备,在线增加备机

1.恢复环境并关闭仲裁自动切换,仲裁自动切换仅使用于一主一备的环境配置

[yashan@ob1 yasdb_home]$ yasboot election enable off -c yashandbgroup 1 execute Succeed

[yashan@ob1 yasdb_home]$ yasboot election config show --cluster yashandbgroup 1

Protection Mode: MAXIMUM PROTECTION

Members:

[1-2:2] - Primary database

[1-1:1] - Physical standby database

Transport Lag: 0 seconds

Apply Lag: 0 seconds

Apply Rate: 391.00 MByte/s

Properties:

FailoverThreshold = 5

FailoverAutoReinstate = false

ZeroDataLossMode = true

Automatic Failover: DISABLED

[yashan@ob1 yasdb_home]$ yasboot cluster status --cluster yashandb --detail

hostid | node_type | nodeid | pid | instance_status | database_status | database_role | listen_address | data_path -------------------------------------------------------------------------------------------------------------------------------------------------

host0001 | db | 1-1:1 | 14818 | open | normal | primary | 192.168.7.10:1688 | /data1/yashan/yasdb_data/db-1-1 ----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------

host0002 | db | 1-2:2 | 86135 | open | normal | standby | 192.168.7.11:1688 | /data1/yashan/yasdb_data/db-1-2 ----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------

2.生成配置文件:hosts_add.toml 和 yashandb_add.toml

[yashan@ob1 install]$ yasboot config node gen -c yashandb -u yashan -p yashan --ip 192.168.7.12 --port 22 --data-path /data1/yashan/yasdb_data --install-path /data1/yashan/yasdb_home -g 1 --node 1

hostid | group | node_type | node_name | listen_addr | replication_addr | data_path

-------------------------------------------------------------------------------------------------------------

host0003 | dbg1 | db | 1-3 | 192.168.7.12:1688 | 192.168.7.12:1689 | /data1/yashan/yasdb_data

----------+-------+-----------+-----------+-------------------+-------------------+--------------------------

Generate config success

3.执行安装:安装 YashanDB 的运行程序到新增节点的服务器,并且启动服务进程 yasagent

[yashan@ob1 install]$ yasboot host add -c yashandb -i yashandb-personal-23.1.1.100-linux-x86_64.tar.gz -t hosts_add.toml

type | uuid | name | hostid | index | status | return_code | progress | cost -------------------------------------------------------------------------------------------------

task | 63112e698b5689a0 | HostAdd | - | yashandb | SUCCESS | 0 | 100 | 8 ------+------------------+---------+--------+----------+---------+-------------+----------+------

task completed, status: SUCCESS

4.增加备机:任务显示成功并不代表着扩容任务成功,因为仍有后台任务在完成数据的同步等操作

[yashan@ob1 install]$ yasboot node add -c yashandb -t yashandb_add.toml

type | uuid | name | hostid | index | status | return_code | progress | cost -------------------------------------------------------------------------------------------------

task | 4618495ddc9c012c | NodeAdd | - | yashandb | SUCCESS | 0 | 100 | 10 ------+------------------+---------+--------+----------+---------+-------------+----------+------

task completed, status: SUCCESS

5.等待扩容任务完成

[yashan@ob1 install]$ yasboot task list -c yashandb --search type=NodeAdd

uuid | name | type | index | hostid | status | ret_code | progress | created_at | cost -------------------------------------------------------------------------------------------------------------------------------------------------

ecff3c2c4b452ce1 | AddDBAlterHA | NodeAdd | yashandb | - | SUCCESS | 0 | 100 | 2024-03-19 16:04:36 | 1 ------------------+-----------------------------+---------+--------------+----------+---------+----------+----------+---------------------+------

8d8146ab5fff3423 | BuildDatabaseToMultiAddress | NodeAdd | yashandb.1-1 | host0001 | SUCCESS | 0 | 100 | 2024-03-19 16:04:36 | 760 ------------------+-----------------------------+---------+--------------+----------+---------+----------+----------+---------------------+------

4618495ddc9c012c | NodeAdd | NodeAdd | yashandb | - | SUCCESS | 0 | 100 | 2024-03-19 16:04:36 | 10 ------------------+-----------------------------+---------+--------------+----------+---------+----------+----------+---------------------+------

‍6. 安装后检查: 检测集群的状态

[yashan@ob1 install]$ yasboot cluster status --cluster yashandb --detail

hostid | node_type | nodeid | pid | instance_status | database_status | database_role | listen_address | data_path -------------------------------------------------------------------------------------------------------------------------------------------------

host0001 | db | 1-1:1 | 14818 | open | normal | primary | 192.168.7.10:1688 | /data1/yashan/yasdb_data/db-1-1 ----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------

host0002 | db | 1-2:2 | 86135 | open | normal | standby | 192.168.7.11:1688 | /data1/yashan/yasdb_data/db-1-2 ----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------

host0003 | db | 1-3:3 | 14944 | open | normal | standby | 192.168.7.12:1688 | /data1/yashan/yasdb_data/db-1-3 ----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------

主备连接状态检查

SQL> select DEST_ID, CONNECTION, PEER_ADDR, STATUS, DATABASE_MODE from v$archive_dest_status;

DEST_ID CONNECTION PEER_ADDR STATUS DATABASE_MODE ------- ----------------- ---------------------------------------------------------------- ----------------- -----------------

1 CONNECTED 192.168.7.11:1689 NORMAL OPEN

2 CONNECTED 192.168.7.12:1689 NORMAL OPEN

2 rows fetched.

  1. 开启 Raft 自动切换

[yashan@ob1 install]$ yasboot cluster config set -c yashandb -k HA_ELECTION_ENABLED -v true

type | uuid | name | hostid | index | status | return_code | progress | cost --------------------------------------------------------------------------------------------------------------

task | cc2a1364200f86e8 | YasdbConfigSetParent | - | yashandb | SUCCESS | 0 | 100 | 1 ------+------------------+----------------------+--------+----------+---------+-------------+----------+------

task completed, status: SUCCESS

可关注 YashanDB 视频号观看教程测试

Raft 的自动切换,RTO<8S

1 测试步骤 测试步骤跟仲裁切换是一致的,这里不再介绍。 2 测试结果 执行测试前集群的状态:

[yashan@ob1 install]$ yasboot cluster status --cluster yashandb --detail

hostid | node_type | nodeid | pid | instance_status | database_status | database_role | listen_address | data_path -------------------------------------------------------------------------------------------------------------------------------------------------

host0001 | db | 1-1:1 | 14818 | open | normal | primary | 192.168.7.10:1688 | /data1/yashan/yasdb_data/db-1-1 ----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------

host0002 | db | 1-2:2 | 86135 | open | normal | standby | 192.168.7.11:1688 | /data1/yashan/yasdb_data/db-1-2 ----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------

host0003 | db | 1-3:3 | 14944 | open | normal | standby | 192.168.7.12:1688 | /data1/yashan/yasdb_data/db-1-3 ----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------

kill 主机之后,集群的状态(备机已经变成了主机)

[yashan@ob1 install]$ yasboot cluster status --cluster yashandb --detail

hostid | node_type | nodeid | pid | instance_status | database_status | database_role | listen_address | data_path -------------------------------------------------------------------------------------------------------------------------------------------------

host0001 | db | 1-1:1 | off | - | - | - | 192.168.7.10:1688 | /data1/yashan/yasdb_data/db-1-1 ----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------

host0002 | db | 1-2:2 | 86135 | open | normal | primary | 192.168.7.11:1688 | /data1/yashan/yasdb_data/db-1-2 ----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------

host0003 | db | 1-3:3 | 14944 | open | normal | standby | 192.168.7.12:1688 | /data1/yashan/yasdb_data/db-1-3 ----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------

旧主机业务中断的时间戳:

Current time is: 2024-03-19 16:31:45.309SQL> UPDATE ha_test SET time_col='2024-03-19 16:31:45.309';

1 row affected.

Current time is: 2024-03-19 16:31:45.322SQL> UPDATE ha_test SET time_col='2024-03-19 16:31:45.322';

YAS-00406 connection is closed

新主机执行业务成功的时间:

Current time is: 2024-03-19 16:31:53.250SQL> UPDATE ha_test SET time_col='2024-03-19 16:31:53.250';

YAS-06010 the database is not in readwrite mode

Current time is: 2024-03-19 16:31:53.257SQL> UPDATE ha_test SET time_col='2024-03-19 16:31:53.257';

1 row affected.

3 测试总结

  • 心跳间隔配置:1s
  • 检查超时时间配置:5s
  • 当前的 redo 刷盘速度:237MB/s
  • 业务中断时间:7.935s
  • 故障转移时间:小于 3s

官方教程来啦!上手体验YashanDB主备部署、同步延迟和自动切换能力

在上一篇深度干货 | 如何兼顾性能与可靠性?一文解析YashanDB主备高可用技术中,我们深入探讨了 YashanDB 高可用的架构设计原理和关键技术,本文将聚焦于实践操作,快速体验 YashanDB 的主备高可用能力。

概要

YashanDB 提供了不同部署形态下故障自动切换的能力:一主一备环境中,可以基于外部仲裁 OM 实现主备自动切换;一主多备配置中,可以基于 Raft 协议实现主备自动切换。当主机异常时,触发超时时间后,备机可以快速完成角色切换,继续执行业务,业务中断时间在秒级水平。

本文将进行一主一备安装部署、体验 YashanDB 的备机同步延迟和两种自动切换能力。整体操作简单易上手,大家可前往 YashanDB 官网下载中心下载最新的个人版进行体验。

安装前准备

1 前提条件

  • 获取 YashanDB 的安装包
  • 准备三台服务器(有条件的可以准备四台服务器,OM 部署到单独的服务器)
  • 开启 SSH 服务
  • 创建 yashan 用户及用户组
  • 创建 HOME 目录和 DATA 目录
  • 检查 YashanDB 所需端口是否被占用
  • 准备测试工具:benchmarksql-5.0
  • 时钟同步,确保测试结果的正确性

2 测试环境 服务器配置情况:

环境信息:

3 创建用户

# useradd -d /home/yashan -m yashan# passwd yashan

4 创建安装目录 HOME 目录和 DATA 目录均规划在 /data/yashan 下,yashan 用户需要对该目录拥有全部权限,可执行如下命令授权:# cd /

# mkdir yashan_data# mkdir yashan_home# chmod -R 770 /data/yashan/yashan_data# chmod -R 770 /data/yashan/yashan_hom

5 下载安装包并解压 从 YashanDB 的官网()下载最新的个人版安装包并解压。

安装一主一备

  1. 生成安装配置文件:hosts.toml 和 yashandb.toml

[yashan@ob1 install]$ yasboot package se gen --cluster yashandb -u yashan -p yashan --ip 192.168.7.10,192.168.7.11 --port 22 --install-path /data1/yashan/yasdb_home --data-path /data1/yashan/yasdb_data --begin-port 1688 --node 2

hostid | group | node_type | node_name | listen_addr | replication_addr | data_path

-------------------------------------------------------------------------------------------------------------

host0001 | dbg1 | db | 1-1 | 192.168.7.10:1688 | 192.168.7.10:1689 | /data1/yashan/yasdb_data

----------+-------+-----------+-----------+-------------------+-------------------+--------------------------

host0002 | dbg1 | db | 1-2 | 192.168.7.11:1688 | 192.168.7.11:1689 | /data1/yashan/yasdb_data

----------+-------+-----------+-----------+-------------------+-------------------+--------------------------

Generate config success

  1. 调整配置文件:根据实际需要调整 yashandb.toml 配置文件中的安装参数,可在 group 级别设置 YashanDB 的所有建库参数,可在 node 级别设置 YashanDB 的所有配置参数。为了保证本次测试的稳定,redo 文件、数据文件以及归档文件需要单独使用一块磁盘,需要调整文件的创建路径[group.config]

REDO_FILE_NUM = 10

REDO_FILE_SIZE = "10G"

REDO_FILE_PATH = '/data2/yashan/redo'

[group.node.config]

ARCHIVE_LOCAL_DEST = '/home/yashan/archive'

  1. 执行安装:安装 YashanDB 的运行程序到其他服务器,并且启动运维服务进程 yasom 和 yasagent

[yashan@ob1 install]$ yasboot package install -t hosts.toml -i yashandb-personal-23.1.1.100-linux-x86_64.tar.gz

checking install package...

install version: yashandb 23.1.1.100

host0001 100% [====================================================================] 3s

host0002 100% [====================================================================] 3s

update host to yasom...

  1. 部署集群

[yashan@ob1 install]$ yasboot cluster deploy -t yashandb.toml

type | uuid | name | hostid | index | status | return_code | progress | cost ------------------------------------------------------------------------------------------------------------

task | e3205df3e98645ed | DeployYasdbCluster | - | yashandb | SUCCESS | 0 | 100 | 174 ------+------------------+--------------------+--------+----------+---------+-------------+----------+------

task completed, status: SUCCESS

  1. 设置 sys 用户密码:设置为 yashandb_123

[yashan@ob1 install]$ yasboot cluster password set --new-password yashandb_123 --cluster yashandb

type | uuid | name | hostid | index | status | return_code | progress | cost ----------------------------------------------------------------------------------------------------------

task | 4e11fb328e1695ac | YasdbPasswordSet | - | yashandb | SUCCESS | 0 | 100 | 3 ------+------------------+------------------+--------+----------+---------+-------------+----------+------

task completed, status: SUCCESS

  1. 安装后检查 检查整个集群的状态:

[yashan@ob1 install]$ yasboot cluster status --cluster yashandb --detail

hostid | node_type | nodeid | pid | instance_status | database_status | database_role | listen_address | data_path -------------------------------------------------------------------------------------------------------------------------------------------------

host0001 | db | 1-1:1 | 69010 | open | normal | primary | 192.168.7.10:1688 | /data1/yashan/yasdb_data/db-1-1 ----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------

host0002 | db | 1-2:2 | 86135 | open | normal | standby | 192.168.7.11:1688 | /data1/yashan/yasdb_data/db-1-2 ----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------

检查主备的链接状态:

SQL> select DEST_ID, CONNECTION, PEER_ADDR, STATUS, DATABASE_MODE from v$archive_dest_status;

DEST_ID CONNECTION PEER_ADDR STATUS DATABASE_MODE ------- ----------------- ---------------------------------------------------------------- ----------------- -----------------

1 CONNECTED 192.168.7.11:1689 NORMAL OPEN 1 row fetched.

检测主备的同步情况:做一些简单的业务测试

  1. 配置参数调优 根据服务器的负载生成推荐参数

SQL> EXEC DBMS_PARAM.OPTIMIZE(NULL, NULL, 90, 90);

PL/SQL Succeed.

‍查看参数推荐报告

SQL> SELECT DBMS_PARAM.SHOW_RECOMMEND() FROM DUAL;

DBMS_PARAM.SHOW_RECO ---------------------------------------------------------------- ********** Recommended Settings For HEAP Table ***********+--------------------------------+-------------+-------------+---------+| name | current | recommend | restart |+--------------------------------+-------------+-------------+---------+| DATA_BUFFER_SIZE | 64M | 272785M | True || VM_BUFFER_SIZE | 32M | 34823M | True || WORK_AREA_STACK_SIZE | 1024K | 2M | True || WORK_AREA_POOL_SIZE | 16M | 128M | True || WORK_AREA_HEAP_SIZE | 512K | 512K | True || SHARE_POOL_SIZE | 256M | 34823M | True || LARGE_POOL_SIZE | 128M | 2048M | True || MAX_PARALLEL_WORKERS | 32 | 372 | True || SCOL_DATA_BUFFER_SIZE | 128M | 128M | True || SCOL_DATA_PRELOADERS | 2 | 2 | True || COLUMNAR_WORK_AREA_HEAP_SIZE | 64M | 32M | True || COLUMNAR_VM_BUFFER_SIZE | 2G | 128M | True || COLUMNAR_BULK_SIZE | 1024 | 1024 | True || COMPRESSION | LZ4 | LZ4 | True || PQ_POOL_SIZE | 128M | 128M | True || MAX_SESSIONS | 1024 | 1024 | True || MAX_WORKERS | 0 | 0 | True || TAB_QUEUE_WINDOW_SIZE | 4 | 4 | True || BLOOM_FILTER_FACTOR | .3 | .3 | True || DEGREE_OF_PARALLEL | 1 | 1 | True || MMS_DATA_LOADERS | 4 | 8 | True || CHECKPOINT_INTERVAL | 100000 | 256M | False || CHECKPOINT_TIMEOUT | 300 | 60 | False || REDOFILE_IO_MODE | DSYNC | DSYNC | True || DATAFILE_IO_MODE | DEFAULT | DEFAULT | True || COMMIT_LOGGING | IMMEDIATE | IMMEDIATE | False || RECOVERY_PARALLELISM | 16 | 64 | True || REDO_BUFFER_SIZE | 64M | 64M | True |+--------------------------------+-------------+-------------+---------+| total memory | 346760M |+--------------------------------+-------------+-------------+---------+

Note: You can execute 'DBMS_PARAM.APPLY_RECOMMEND()' to apply the recommend parameters.

After applying the parameters, you need to restart the database.

1 row fetched.

将参数写入配置文件

SQL> EXEC DBMS_PARAM.APPLY_RECOMMEND();

PL/SQL Succeed.

配置参数是实例级别,需要每个节点都执行该操作。

  1. 开启自动切换:设置 FailoverThreshold 为 5,并且开启自动切换

[yashan@ob1 install]$ yasboot election config set -k FailoverThreshold -v 5 --cluster yashandbgroup 1 execute Succeed

[yashan@ob1 install]$ yasboot election enable on -c yashandbgroup 1 execute Succeed

[yashan@ob1 install]$ yasboot election config show --cluster yashandbgroup 1

Protection Mode: MAXIMUM PROTECTION

Members:

[1-1:1] - Primary database

[1-2:2] - Physical standby database

Transport Lag: 0 seconds

Apply Lag: 0 seconds

Apply Rate: 2.73 MByte/s

Properties:

FailoverThreshold = 5

FailoverAutoReinstate = false

ZeroDataLossMode = true

Automatic Failover: Enabled in Zero Data Loss Mode

测试备机同步延迟 8ms

1 测试方案

  • 主机创建一张表:create table ha_test (time_col timestamp),往该表插入一条数据。
  • 获取本地时间戳,用本地时间戳 update 该表的数据,并提交。持续执行该操作。
  • 在备机上查询该表的数据,通过执行查询该表的时间戳与查询到表中的数据的时间戳做差值,这个时间差就是主备同步的延迟。(表中只有一条数据,所以执行 update 和 select 操作的时间可以忽略不计)

2 测试步骤

  1. 首先准备 TPC-C 压力测试(如何使用 TPC-C 压力测试可以参考 YashanDB 的官网,有详细的介绍)。
  2. TPC-C 配置为 300 仓 128 并发,在该配置下可以达到百万级别 tpmC 的压力测试,在这种压力业务场景下执行测试验证。
  3. 分别在主机和备机上执行测试脚本(总共做 100 次测试)。
  4. 根据脚本统计的数据,计算主备业务的时间差。

测试脚本:

#!/bin/bash#主机执行update业务操作# 修改100次for ((i=1; i<=100; i++))do

# 获取当前时间并格式化为数据库可接受的格式

current_time=$(date +'%Y-%m-%d %H:%M:%S.%3N')

echo "Current time is: $current_time"

# 修改表ha_test的数据

yasql ha_test/123@192.168.7.10:1688 -c "UPDATE ha_test SET time_col='$current_time';"

sleep 0.1done

#!/bin/bash

# 备机执行查询操作while truedo

# 获取当前时间

current_time=$(date +'%Y-%m-%d %H:%M:%S.%3N')

echo "Current time is: $current_time"

#查询表ha_test的时间列数据

yasql ha_test/123@192.168.7.11:1688 -c "select time_col from ha_test;"done

3 测试结果

  • 测试时的 redo 刷盘速度(查询 V$REDOSTAT 获知):235MB/s
  • 备机查询延迟的平均值:8ms

从 100 次测试中选取 5 次数据如下:

‍ ‍

测试仲裁自动切换,RTO<8S

RTO 的计算方式:旧主机业务中断时间同新主机执行业务成功的时间差。 1 测试步骤

1.继续构造压力测试场景(使用 TPC-C 的压力测试),执行 10 分钟左右的压力业务。

2.检测主机业务的中断时间和新主机成功执行业务的时间。

3.分别在主机和备机上执行检测脚本。

4.kill 主机进程,使主机的业务中断。 测试脚本:

#!/bin/bash# 无限循环while truedo

# 获取当前时间并格式化为数据库可接受的格式

current_time=$(date +'%Y-%m-%d %H:%M:%S.%3N')

# 打印当前时间

echo "Current time is: $current_time"

# 执行写操作

yasql ha_test/123@192.168.7.10:1688 -c "UPDATE ha_test SET time_col='$current_time';"done

2 测试结果 执行测试前集群的状态:

[yashan@ob1 install]$ yasboot cluster status --cluster yashandb --detail

hostid | node_type | nodeid | pid | instance_status | database_status | database_role | listen_address | data_path -------------------------------------------------------------------------------------------------------------------------------------------------

host0001 | db | 1-1:1 | 69010 | open | normal | primary | 192.168.7.10:1688 | /data1/yashan/yasdb_data/db-1-1 ----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------

host0002 | db | 1-2:2 | 86135 | open | normal | standby | 192.168.7.11:1688 | /data1/yashan/yasdb_data/db-1-2 ----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------

kill 主机之后,集群的状态(备机已经变成了主机)

[yashan@ob1 sync_test]$ yasboot cluster status --cluster yashandb --detail

hostid | node_type | nodeid | pid | instance_status | database_status | database_role | listen_address | data_path -------------------------------------------------------------------------------------------------------------------------------------------------

host0001 | db | 1-1:1 | off | - | - | - | 192.168.7.10:1688 | /data1/yashan/yasdb_data/db-1-1 ----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------

host0002 | db | 1-2:2 | 86135 | open | normal | primary | 192.168.7.11:1688 | /data1/yashan/yasdb_data/db-1-2 ----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------

旧主机业务中断的时间戳:

Current time is: 2024-03-19 15:45:38.464SQL> UPDATE ha_test SET time_col='2024-03-19 15:45:38.464';

1 row affected.

Current time is: 2024-03-19 15:45:38.476SQL> UPDATE ha_test SET time_col='2024-03-19 15:45:38.476';

YAS-00406 connection is closed

新主机执行业务成功的时间:

Current time is: 2024-03-19 15:45:46.204SQL> UPDATE ha_test SET time_col='2024-03-19 15:45:46.204';

YAS-06010 the database is not in readwrite mode

Current time is: 2024-03-19 15:45:46.211SQL> UPDATE ha_test SET time_col='2024-03-19 15:45:46.211';

1 row affected.

3 测试总结

  • 心跳间隔配置:1s
  • 检查超时时间配置:5s
  • 当前的 redo 刷盘速度:237MB/s
  • 业务中断时间:7.745s
  • 故障转移时间:小于 3s

部署一主两备,在线增加备机

1.恢复环境并关闭仲裁自动切换,仲裁自动切换仅使用于一主一备的环境配置

[yashan@ob1 yasdb_home]$ yasboot election enable off -c yashandbgroup 1 execute Succeed

[yashan@ob1 yasdb_home]$ yasboot election config show --cluster yashandbgroup 1

Protection Mode: MAXIMUM PROTECTION

Members:

[1-2:2] - Primary database

[1-1:1] - Physical standby database

Transport Lag: 0 seconds

Apply Lag: 0 seconds

Apply Rate: 391.00 MByte/s

Properties:

FailoverThreshold = 5

FailoverAutoReinstate = false

ZeroDataLossMode = true

Automatic Failover: DISABLED

[yashan@ob1 yasdb_home]$ yasboot cluster status --cluster yashandb --detail

hostid | node_type | nodeid | pid | instance_status | database_status | database_role | listen_address | data_path -------------------------------------------------------------------------------------------------------------------------------------------------

host0001 | db | 1-1:1 | 14818 | open | normal | primary | 192.168.7.10:1688 | /data1/yashan/yasdb_data/db-1-1 ----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------

host0002 | db | 1-2:2 | 86135 | open | normal | standby | 192.168.7.11:1688 | /data1/yashan/yasdb_data/db-1-2 ----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------

2.生成配置文件:hosts_add.toml 和 yashandb_add.toml

[yashan@ob1 install]$ yasboot config node gen -c yashandb -u yashan -p yashan --ip 192.168.7.12 --port 22 --data-path /data1/yashan/yasdb_data --install-path /data1/yashan/yasdb_home -g 1 --node 1

hostid | group | node_type | node_name | listen_addr | replication_addr | data_path

-------------------------------------------------------------------------------------------------------------

host0003 | dbg1 | db | 1-3 | 192.168.7.12:1688 | 192.168.7.12:1689 | /data1/yashan/yasdb_data

----------+-------+-----------+-----------+-------------------+-------------------+--------------------------

Generate config success

3.执行安装:安装 YashanDB 的运行程序到新增节点的服务器,并且启动服务进程 yasagent

[yashan@ob1 install]$ yasboot host add -c yashandb -i yashandb-personal-23.1.1.100-linux-x86_64.tar.gz -t hosts_add.toml

type | uuid | name | hostid | index | status | return_code | progress | cost -------------------------------------------------------------------------------------------------

task | 63112e698b5689a0 | HostAdd | - | yashandb | SUCCESS | 0 | 100 | 8 ------+------------------+---------+--------+----------+---------+-------------+----------+------

task completed, status: SUCCESS

4.增加备机:任务显示成功并不代表着扩容任务成功,因为仍有后台任务在完成数据的同步等操作

[yashan@ob1 install]$ yasboot node add -c yashandb -t yashandb_add.toml

type | uuid | name | hostid | index | status | return_code | progress | cost -------------------------------------------------------------------------------------------------

task | 4618495ddc9c012c | NodeAdd | - | yashandb | SUCCESS | 0 | 100 | 10 ------+------------------+---------+--------+----------+---------+-------------+----------+------

task completed, status: SUCCESS

5.等待扩容任务完成

[yashan@ob1 install]$ yasboot task list -c yashandb --search type=NodeAdd

uuid | name | type | index | hostid | status | ret_code | progress | created_at | cost -------------------------------------------------------------------------------------------------------------------------------------------------

ecff3c2c4b452ce1 | AddDBAlterHA | NodeAdd | yashandb | - | SUCCESS | 0 | 100 | 2024-03-19 16:04:36 | 1 ------------------+-----------------------------+---------+--------------+----------+---------+----------+----------+---------------------+------

8d8146ab5fff3423 | BuildDatabaseToMultiAddress | NodeAdd | yashandb.1-1 | host0001 | SUCCESS | 0 | 100 | 2024-03-19 16:04:36 | 760 ------------------+-----------------------------+---------+--------------+----------+---------+----------+----------+---------------------+------

4618495ddc9c012c | NodeAdd | NodeAdd | yashandb | - | SUCCESS | 0 | 100 | 2024-03-19 16:04:36 | 10 ------------------+-----------------------------+---------+--------------+----------+---------+----------+----------+---------------------+------

‍6. 安装后检查: 检测集群的状态

[yashan@ob1 install]$ yasboot cluster status --cluster yashandb --detail

hostid | node_type | nodeid | pid | instance_status | database_status | database_role | listen_address | data_path -------------------------------------------------------------------------------------------------------------------------------------------------

host0001 | db | 1-1:1 | 14818 | open | normal | primary | 192.168.7.10:1688 | /data1/yashan/yasdb_data/db-1-1 ----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------

host0002 | db | 1-2:2 | 86135 | open | normal | standby | 192.168.7.11:1688 | /data1/yashan/yasdb_data/db-1-2 ----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------

host0003 | db | 1-3:3 | 14944 | open | normal | standby | 192.168.7.12:1688 | /data1/yashan/yasdb_data/db-1-3 ----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------

主备连接状态检查

SQL> select DEST_ID, CONNECTION, PEER_ADDR, STATUS, DATABASE_MODE from v$archive_dest_status;

DEST_ID CONNECTION PEER_ADDR STATUS DATABASE_MODE ------- ----------------- ---------------------------------------------------------------- ----------------- -----------------

1 CONNECTED 192.168.7.11:1689 NORMAL OPEN

2 CONNECTED 192.168.7.12:1689 NORMAL OPEN

2 rows fetched.

  1. 开启 Raft 自动切换

[yashan@ob1 install]$ yasboot cluster config set -c yashandb -k HA_ELECTION_ENABLED -v true

type | uuid | name | hostid | index | status | return_code | progress | cost --------------------------------------------------------------------------------------------------------------

task | cc2a1364200f86e8 | YasdbConfigSetParent | - | yashandb | SUCCESS | 0 | 100 | 1 ------+------------------+----------------------+--------+----------+---------+-------------+----------+------

task completed, status: SUCCESS

可关注 YashanDB 视频号观看教程测试

Raft 的自动切换,RTO<8S

1 测试步骤 测试步骤跟仲裁切换是一致的,这里不再介绍。 2 测试结果 执行测试前集群的状态:

[yashan@ob1 install]$ yasboot cluster status --cluster yashandb --detail

hostid | node_type | nodeid | pid | instance_status | database_status | database_role | listen_address | data_path -------------------------------------------------------------------------------------------------------------------------------------------------

host0001 | db | 1-1:1 | 14818 | open | normal | primary | 192.168.7.10:1688 | /data1/yashan/yasdb_data/db-1-1 ----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------

host0002 | db | 1-2:2 | 86135 | open | normal | standby | 192.168.7.11:1688 | /data1/yashan/yasdb_data/db-1-2 ----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------

host0003 | db | 1-3:3 | 14944 | open | normal | standby | 192.168.7.12:1688 | /data1/yashan/yasdb_data/db-1-3 ----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------

kill 主机之后,集群的状态(备机已经变成了主机)

[yashan@ob1 install]$ yasboot cluster status --cluster yashandb --detail

hostid | node_type | nodeid | pid | instance_status | database_status | database_role | listen_address | data_path -------------------------------------------------------------------------------------------------------------------------------------------------

host0001 | db | 1-1:1 | off | - | - | - | 192.168.7.10:1688 | /data1/yashan/yasdb_data/db-1-1 ----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------

host0002 | db | 1-2:2 | 86135 | open | normal | primary | 192.168.7.11:1688 | /data1/yashan/yasdb_data/db-1-2 ----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------

host0003 | db | 1-3:3 | 14944 | open | normal | standby | 192.168.7.12:1688 | /data1/yashan/yasdb_data/db-1-3 ----------+-----------+--------+-------+-----------------+-----------------+---------------+-------------------+---------------------------------

旧主机业务中断的时间戳:

Current time is: 2024-03-19 16:31:45.309SQL> UPDATE ha_test SET time_col='2024-03-19 16:31:45.309';

1 row affected.

Current time is: 2024-03-19 16:31:45.322SQL> UPDATE ha_test SET time_col='2024-03-19 16:31:45.322';

YAS-00406 connection is closed

新主机执行业务成功的时间:

Current time is: 2024-03-19 16:31:53.250SQL> UPDATE ha_test SET time_col='2024-03-19 16:31:53.250';

YAS-06010 the database is not in readwrite mode

Current time is: 2024-03-19 16:31:53.257SQL> UPDATE ha_test SET time_col='2024-03-19 16:31:53.257';

1 row affected.

3 测试总结

  • 心跳间隔配置:1s
  • 检查超时时间配置:5s
  • 当前的 redo 刷盘速度:237MB/s
  • 业务中断时间:7.935s
  • 故障转移时间:小于 3s

本文标签: 官方教程来啦!上手体验YashanDB主备部署同步延迟和自动切换能力