@dungan 2020-03-28T03:30:25.000000Z 字数 19241 阅读 153

Redis

Redis 集群

redis 支持两种集群方案 Sentinel 和 Cluster。

Sentinel 适用于 redis 非重度用户，内存占用不大，单个小型项目的 cache 等场景，一般采用一主多从的方案进行部署，配置比较复杂。

Cluster 适用于海量数据的业务，一般采用多个 Master 的方案进行部署，并且也支持节点高可用，配置比较简单。

主从配置

主从架构一般用来实现读写分离，实现数据的备份，减少单点故障带来的损失，节点间通过复制来实现数据同步。

redis 的主从配置非常简单，只需要在配置文件中通过 slave of 绑定主从关系就行：

master(192.168.109.136:6379)

# =========== redis.conf ===============
port 6379
bind 0.0.0.0
requirepass tcl

slave1(192.168.109.137:6379) 和 slave2(192.168.109.138:6379)

# ============ redis.conf  ================
port 6379
bind 0.0.0.0
# 关联的 master
slaveof 192.168.109.136 6379
# master 的认证密码
masterauth tcl
# 由于出现故障时 slave 可能被切换为 master,所以给 slave 也设置了密码
requirepass tcl
# slave 是否只读
slave-read-only yes
# 关闭保护模式以便其他主机的客户端连接当前实例
protected-mode no

配置成功后重启 master 和 slave，就可以测试主从复制了：

master

[root@centos etc]# redis-cli 
127.0.0.1:6379> auth tcl
OK
127.0.0.1:6379> set name tcl
OK
127.0.0.1:6379>

slave1

[root@centos redis]# redis-cli
127.0.0.1:6379> auth tcl
OK
127.0.0.1:6379> get name
"tcl"
127.0.0.1:6379>

slave2

[root@centos redis]# redis-cli
127.0.0.1:6379> auth tcl
OK
127.0.0.1:6379> get name
"tcl"
127.0.0.1:6379>

可以看到 slave 中成功读取了 master 中设置的 key，说明主从复制成功了。

再去查看 master 和 slave 中的日志，发现其复制过程符合我们上一章中对 redis 复制的描述

master

86451:M 03 Jan 16:45:50.429 * Slave 192.168.109.137:6379 asks for synchronization
86451:M 03 Jan 16:45:50.429 * Full resync requested by slave 192.168.109.137:6379
86451:M 03 Jan 16:45:50.430 * Starting BGSAVE for SYNC with target: disk
86451:M 03 Jan 16:45:50.431 * Background saving started by pid 86973
86973:C 03 Jan 16:45:50.435 * DB saved on disk
86973:C 03 Jan 16:45:50.436 * RDB: 6 MB of memory used by copy-on-write
86451:M 03 Jan 16:45:50.477 * Background saving terminated with success
86451:M 03 Jan 16:45:50.477 * Synchronization with slave 192.168.109.137:6379 succeeded
86451:M 03 Jan 17:00:51.065 * 1 changes in 900 seconds. Saving...
86451:M 03 Jan 17:00:51.066 * Background saving started by pid 87465
87465:C 03 Jan 17:00:51.075 * DB saved on disk
87465:C 03 Jan 17:00:51.075 * RDB: 6 MB of memory used by copy-on-write
86451:M 03 Jan 17:00:51.167 * Background saving terminated with success

slave

21358:S 03 Jan 16:45:50.423 * Connecting to MASTER 192.168.109.136:6379
21358:S 03 Jan 16:45:50.424 * MASTER <-> SLAVE sync started
21358:S 03 Jan 16:45:50.424 * Non blocking connect for SYNC fired the event.
21358:S 03 Jan 16:45:50.425 * Master replied to PING, replication can continue...
21358:S 03 Jan 16:45:50.426 * Partial resynchronization not possible (no cached master)
21358:S 03 Jan 16:45:50.429 * Full resync from master: 389382f83540fce7c2fcaed9b70ee26183c06c74:0
21358:S 03 Jan 16:45:50.475 * MASTER <-> SLAVE sync: receiving 191 bytes from master
21358:S 03 Jan 16:45:50.475 * MASTER <-> SLAVE sync: Flushing old data
21358:S 03 Jan 16:45:50.476 * MASTER <-> SLAVE sync: Loading DB in memory
21358:S 03 Jan 16:45:50.476 * MASTER <-> SLAVE sync: Finished with success
21358:S 03 Jan 17:00:05.588 * 1 changes in 900 seconds. Saving...
21358:S 03 Jan 17:00:05.590 * Background saving started by pid 22304
22304:C 03 Jan 17:00:05.603 * DB saved on disk
22304:C 03 Jan 17:00:05.604 * RDB: 6 MB of memory used by copy-on-write
21358:S 03 Jan 17:00:05.691 * Background saving terminated with success

如果日志中出现 SYNC: Connection timed out，那么你需要关闭 master 所在主机的防火墙。

Sentinel

Sentinel 是什么？

Redis-Sentinel(哨兵) 是 Redis 官方提供的高可用(HA)解决方案，用在主从服务器集群中进行服务监控以及故障恢复，Sentinel 是一种 1 Master N Slave 的架构，支持纵向扩容(加内存)。

Sentinel 不仅会监控 master 和 slave，Sentinel 之间也会相互监控。

Sentinel 的工作原理

Sentinel 的故障恢复会经历三个过程： 判断 master 是否下线，推选出处理故障恢复的 Sentinel，切换 Master。

`判断 Master 是否下线`

每个 Sentinel 以每秒钟一次的频率向它所知的 redis 实例以及其他 Sentinel 发送一个 ping 命令，如果多次 ping 的响应时间超过了配置文件中的 down-after-milliseconds，那么这个 Sentinel 就会认为被监控的实例是 sdown(主观下线) 状态。

接着其他 Sentinel 会以每秒一次的频率确认 Master 的确进入了主观下线状态，当有足够数量的 Sentinel（大于等于配置文件指定的值）在指定的时间范围内确认 Master 的确进入了主观下线状态，则 Master 会被标记为 odown(客观下线)。

在一般情况下，每个 Sentinel 会以每 10 秒一次的频率向它已知的所有 redis 实例发送 INFO 命令，当 Master 被 Sentinel 标记为客观下线时，Sentinel 向 ODOWN Master 的所有 Slave 发送 INFO 命令的频率会从 10 秒一次改为每秒一次。

若没有足够数量的 Sentinel 同意 Master 已经下线， Master 的客观下线状态就会被移除，若 Master 重新向 Sentinel 的 PING 命令返回有效回复， Master 的主观下线状态就会被移除。

主观下线与客观下线

主观下线：Subjectively Down，简称 SDOWN，指的是当前 Sentinel 对某个 redis 实例做出的下线判断。
客观下线：Objectively Down，简称 ODOWN，指的是多个 Sentinel 对 Master 做出 SDOWN 判断。

只有 Master 被判断为下线时才需要协商，也就是说 slave 不会被标记为 odown。

`推选出处理故障恢复的 Sentinel`

要先选举一个 Sentinel 来主持切换，必须满足 >=N/2+1，N为哨兵总数，即超过半数的 Sentinel 都同意某一个 Sentinel 主持故障转移，Sentinel 间通过 is-master-down-by-addr 进行更新每个 Sentinel 选举的 leader。

一般只要一个 Sentinel 发现 Master 进入了 ODOWN，这个 Sentinel 就可能会被其他 Sentinel 推选出，并对下线的主服务器执行自动故障迁移操作。

`切换 Master`

主持切换的 Sentinel 向被选中的某个 Slave 发送 SLAVEOF NO ONE 命令，让它转变为 Master。然后通过发布与订阅功能，将更新后的配置传播给所有其他 Sentinel，其他 Sentinel 对它们自己的配置进行 config-rewrite。随后 Sentinel 向 ODOWN Master 的 slave 送 SLAVEOF 命令，让它们去复制新 Master。

配置 Sentinel

Sentinel 会组成一系列节点来监控 redis 服务器，如果有条件可以将 sentinel 多部署几个在客户端所在的应用服务器上，而不是与从节点部署在一起，这样避免整机宕机后 Sentinel 和 Slave 都减少而导致的切换选举 Sentinel 无法超过半数。

由于处理故障转移的 Sentinel 是推选出来的，因此建议设置奇数个哨兵 N/2+1(N为哨兵总数)。

故障转移成功后配置文件 master_redis.conf，slave_redis.conf 和 sentinel.conf 的内容都会发生改变，sentinel.conf 中的监控目标(master)会随之调换。

我们在每个 redis 实例所在的主机上部署 sentinel 节点

192.168.109.136：27000

192.168.109.137：27000

192.168.109.138：27000

每个 Sentinel 的配置文件 sentinel.conf 如下

#================== Sentinel 自身配置 ======================
# Sentinel 以守护进程的方式运行
daemonize yes
# Sentinel 的端口
port 27000
# Sentinel 日志
logfile "/usr/local/redis/var/sentinel.log"
# 关闭保护模式
protected-mode no
# ============== 监控 master 的配置 ===============
# 监视一个名为 mymaster 的 Master,Master 的 ip 为192.168.109.136,端口为 6379,
# 最后的数字 2 表示执行故障恢复操作至少需要几个哨兵节点同意。
sentinel monitor mymaster 192.168.109.136 6379 2
# 如果 master 在多少毫秒内无反应哨兵会开始进行 master-slave 间的切换
sentinel down-after-milliseconds mymaster 5000
# 如果在1分钟内没有实现故障恢复,那哨兵认为这是一次真正的宕机
sentinel failover-timeout mymaster 60000
# 在执行故障转移时，最多可以有多少个 Slave 同时对新的 Master 进行同步,数字越小耗费时间越多。
# 如果这个数字设置为 1,虽然故障恢复时间会变长,但是可以保证每次只有1个 Slave 处于不能处理命令请求的状态
sentinel parallel-syncs mymaster 2
# 如果 master 设置了密码，需要在所有 Sentinel 配置文件中设置认证密码 
sentinel auth-pass mymaster tcl

启动 Sentinel

配置完 Redis 和 Sentinel 之后，按顺序 Master->Slave->Sentinel 依次启动。

Sentinel 的启动和 redis 类似

redis-sentinel /usr/local/redis/var/sentinel.log

启动成功后，查看 Sentinel 的信息

[root@centos redis]# redis-cli -p 27000 -a tcl
Warning: Using a password with '-a' option on the command line interface may not be safe.
127.0.0.1:27000> INFO sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=192.168.109.136:6379,slaves=2,sentinels=3
127.0.0.1:27000> 
127.0.0.1:27000>

显示 master 192.168.109.136:6379 有两个 slave，并且被 3 个 sentinel 监控。

日志文件 sentinel.log 中有更详细的信息

13577:X 04 Jan 16:36:24.701 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
13577:X 04 Jan 16:36:24.701 # Redis version=4.0.11, bits=64, commit=00000000, modified=0, pid=13577, just started
13577:X 04 Jan 16:36:24.701 # Configuration loaded
13578:X 04 Jan 16:36:24.706 * Running mode=sentinel, port=27000.
13578:X 04 Jan 16:36:24.718 # Sentinel ID is 3f50dd6cf41ae4ddfa3f6c72a6877b05859c105f
13578:X 04 Jan 16:36:24.718 # +monitor master mymaster 192.168.109.136 6379 quorum 2
13578:X 04 Jan 16:36:24.720 * +slave slave 192.168.109.137:6379 192.168.109.137 6379 @ mymaster 192.168.109.136 6379
13578:X 04 Jan 16:36:24.724 * +slave slave 192.168.109.138:6379 192.168.109.138 6379 @ mymaster 192.168.109.136 6379
13578:X 04 Jan 16:36:36.957 * +sentinel sentinel 2d399c57ab4035377a08a8d0737b4f06b9d080a1 192.168.109.137 27000 @ mymaster 192.168.109.136 6379
13578:X 04 Jan 16:36:41.614 * +sentinel sentinel 4c00bb2bb0bd9613a7f0345eec3dd621425d6be8 192.168.109.138 27000 @ mymaster 192.168.109.136 6379

+slave 和 +sentinel 分别代表成功发现了从数据库和其他Sentinel，日志中的参数意义详见 redis-sentinel文档

同时你会发现配置文件 sentinel.conf 中自动生成了一些信息

daemonize yes
port 27000
logfile "/usr/local/redis/var/sentinel.log"
sentinel myid 3f50dd6cf41ae4ddfa3f6c72a6877b05859c105f
protected-mode no
sentinel deny-scripts-reconfig yes
sentinel monitor mymaster 192.168.109.136 6379 2
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 60000
# Generated by CONFIG REWRITE
dir "/usr/local/redis"
sentinel auth-pass mymaster tcl
sentinel config-epoch mymaster 0
sentinel leader-epoch mymaster 0
sentinel known-slave mymaster 192.168.109.137 6379
sentinel known-slave mymaster 192.168.109.138 6379
sentinel known-sentinel mymaster 192.168.109.138 27000 4c00bb2bb0bd9613a7f0345eec3dd621425d6be8
sentinel known-sentinel mymaster 192.168.109.137 27000 2d399c57ab4035377a08a8d0737b4f06b9d080a1
sentinel current-epoch 0

可以看到 sentinel 启动后它会把和它监控的 master 相关的 slave 和其他 sentinel 都会更新在自身的配置文件中。

测试 Sentinel

我们停掉 master

systemctl stop redis-server.service

接着查看 Sentinel 的信息

[root@centos redis]# redis-cli -p 27000 -a tcl
Warning: Using a password with '-a' option on the command line interface may not be safe.
127.0.0.1:27000> info sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=192.168.109.138:6379,slaves=2,sentinels=3
127.0.0.1:27000>

显示 192.168.109.138:6379 slave2 被提升为 master，查看这台 redis 实例的 role，发现它的角色确实变为 master 了

127.0.0.1:6379> role
1) "master"
2) (integer) 66137
3) 1) 1) "192.168.109.137"
      2) "6379"
      3) "66137"

根据 sentinel 工作原理我们知道每次故障转移，配置文件 sentinel.conf 会更新，看下是不是这样

# master 确实由  192.168.109.136(起初的 master) 变为 192.168.109.138(起初的 slave2)
sentinel monitor mymaster 192.168.109.138 6379 2
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 60000
# Generated by CONFIG REWRITE
dir "/usr/local/redis"
sentinel auth-pass mymaster tcl
sentinel config-epoch mymaster 1
sentinel leader-epoch mymaster 1
# 原先的 192.168.109.136(起初的master) 变为了 192.168.109.138(起初的 slave2) 的从数据库
sentinel known-slave mymaster 192.168.109.136 6379
sentinel known-slave mymaster 192.168.109.137 6379
sentinel known-sentinel mymaster 192.168.109.138 27000 4c00bb2bb0bd9613a7f0345eec3dd621425d6be8
sentinel known-sentinel mymaster 192.168.109.137 27000 2d399c57ab4035377a08a8d0737b4f06b9d080a1
sentinel current-epoch 1

我们再看下 192.168.109.137 和 192.168.109.138 这两台实例配置文件中的 slaveof 是否变化了

# 发现 192.168.109.137 变为 192.168.109.138 的 slave
192.168.109.137:6379> CONFIG GET slaveof
1) "slaveof"
2) "192.168.109.138 6379"
# 由于 192.168.109.138 现在被提升为 master 了,所以它的 slaveof 是空的
192.168.109.138:6379> CONFIG GET slaveof
1) "slaveof"
2) ""

最后我们再测试下经过 Sentinel 故障转移后的主从复制

master(192.168.109.138:6379)

127.0.0.1:6379> set b bbb
OK
127.0.0.1:6379>

slave1(192.168.109.137:6379)

127.0.0.1:6379> get b
"bbb"

可以看到 slave1 中成功读取了 master 中的设置的 key，说明 Sentinel 正常的运行起来了。

Cluster

Cluster 是什么？

redis Cluster 是一种 N Master N slave 的架构，支持动态横向扩容(加节点)，并且每个节点也支持高可用部署(无需 sentinel 监控)。

是一种服务器 Sharding 技术，通过数据分片来提供一定程度的可用性，在实际环境中当某个节点宕机或者不可达的情况下继续处理命令。

Cluster 的工作原理

Redis Cluster 没有使用一致性hash, 而是引入了哈希槽(slot)的概念，Redis 集群有 16384 个哈希槽，每个key 通过 CRC16 校验后对 16384 取模来决定放置哪个槽，集群的每个节点负责一部分 hash 槽。

这种结构很容易添加或者删除节点，比如新添加个节点 D，那么集群只需要将节点 A 、B 、 C 中的某些槽移动到节点 D 就可以了；如果要移除节点A，只需要将 A 中的槽移到节点 B 和 C 上，然后将没有任何槽的节点 A 从集群中移除即可。

因为从一个节点将哈希槽移动到另一个节点并不会停止服务，所以无论添加删除或者改变某个节点的哈希槽的数量都不会造成集群不可用的状态。

了使在部分节点失败或者大部分节点无法通信的情况下集群仍然可用，所以集群使用了主从复制模型：
每个 Master 节点都会有N-1 个复制品，其中一个复制品为主节点（master），而其余的 N-1 个复制品为从节点（slave）。

假设集群有三个节点： A(0-5460 哈希槽)，B(5461-10922 哈希槽)，C(10932-16383 哈希槽,图中指向有误)。

如果在创建集群的时候，为主节点 B 添加了从节点 B1，那么当主节点 B 下线的时候，集群就会将 B1 设置为新的主节点代替下线的主节点 B，继续处理 5461 至 11000 号哈希槽，这样集群就不会因为主节点 B 的下线而无法正常运作了。不过如果节点 B 和 B1 都下线的话， Redis 集群还是会停止运作。

配置 Cluster

在搭建集群版的时候节点必须是干净的，因此需要先删除数据文件 dump.rdb 和 appendonly.aof。

要让集群正常工作至少需要3个主节点，在这里创建 6 个 redis 节点：其中三个为 master，三个为 slave。

ip	端口	角色
127.0.0.1	7000	master
127.0.0.1	7001	slave
127.0.0.1	7002	master
127.0.0.1	7003	slave
127.0.0.1	7004	master
127.0.0.1	7005	slave

然后在 redis 配置文件目录下新建 6 个以端口为目录名的目录，用于存放每个节点的配置文件

[root@centos ~]# cd /usr/local/redis/etc/
[root@centos etc]# mkdir 7000 7001 7002 7003 7004 7005
[root@centos etc]# ll
total 64
drwxr-xr-x 2 root  root      6 Jan  8 15:55 7000
drwxr-xr-x 2 root  root      6 Jan  8 15:55 7001
drwxr-xr-x 2 root  root      6 Jan  8 15:55 7002
drwxr-xr-x 2 root  root      6 Jan  8 15:55 7003
drwxr-xr-x 2 root  root      6 Jan  8 15:55 7004
drwxr-xr-x 2 root  root      6 Jan  8 15:55 7005
-rw-r--r-- 1 redis redis 58864 Jan  3 16:37 redis.conf
-rw-r--r-- 1 root  root    784 Jan  7 14:19 sentinel.conf

接着复制一份 redis.conf 文件到各个目录，修改配置项如下

# 注意端口和目录名要对应
port 7000
# 主从模式的 slaveof 会与 cluster 冲突，如果之前修改了，需要先还原回来
slaveof 127.0.0.1 6379
# 守护进程
daemonize yes
# 开启实例的集群模式
cluster-enabled yes
# 记录节点状态和信息的文件,由节点自动创建,文件名也要和目录对应
cluster-config-file nodes_7000.conf
# 节点无法访问的超时时间
cluster-node-timeout 5000

配置文件修改完成后，启动这些节点

[root@centos redis]# redis-server etc/7000/redis.conf 
[root@centos redis]# redis-server etc/7001/redis.conf 
[root@centos redis]# redis-server etc/7002/redis.conf 
[root@centos redis]# redis-server etc/7003/redis.conf 
[root@centos redis]# redis-server etc/7004/redis.conf 
[root@centos redis]# redis-server etc/7005/redis.conf 
[root@centos redis]# 
[root@centos redis]# ps aux | grep redis
redis      1089  0.1  0.4 145308  4852 ?        Ssl  09:23   0:02 /usr/local/redis/bin/redis-server 0.0.0.0:6379
root      11874  0.1  0.2 153872  2868 ?        Ssl  09:55   0:00 redis-server 0.0.0.0:7000 [cluster]
root      11901  0.0  0.2 153872  2868 ?        Ssl  09:55   0:00 redis-server 0.0.0.0:7001 [cluster]
root      11921  0.0  0.2 153872  2868 ?        Ssl  09:55   0:00 redis-server 0.0.0.0:7002 [cluster]
root      11941  0.0  0.2 153872  2872 ?        Ssl  09:56   0:00 redis-server 0.0.0.0:7003 [cluster]
root      11960  0.0  0.2 153872  2880 ?        Ssl  09:56   0:00 redis-server 0.0.0.0:7004 [cluster]
root      11978  0.1  0.2 153872  2876 ?        Ssl  09:56   0:00 redis-server 0.0.0.0:7005 [cluster]

创建集群

目前只是启动了这些节点，而这些节点之间并不是集群关系，所以还需要将它们添加到集群中，redis5.0 中可以通过 redis-cli -- cluster 直接管理集群。

# --cluster-replicas 1 表示为每个主节点创建一个从节点,前三个是 master,后三个是 slave。
# 注意：后面三个节点可以是任何一个 master 的 slave 节点。
[root@centos etc]# redis-cli --cluster create 127.0.0.1:7000 127.0.0.1:7001 127.0.0.1:7002 127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005 --cluster-replicas 1 
>>> Performing hash slots allocation on 6 nodes...
Master[0] -> Slots 0 - 5460
Master[1] -> Slots 5461 - 10922
Master[2] -> Slots 10923 - 16383
Adding replica 127.0.0.1:7003 to 127.0.0.1:7000
Adding replica 127.0.0.1:7004 to 127.0.0.1:7001
Adding replica 127.0.0.1:7005 to 127.0.0.1:7002
>>> Trying to optimize slaves allocation for anti-affinity
[WARNING] Some slaves are in the same host as their master
M: 79edcb961e111b73f2a2cccde9da209aa536a9ed 127.0.0.1:7000
   slots:[0-5460] (5461 slots) master
M: 98ccf192cce67a817a81b3ed3f830c9c9368200d 127.0.0.1:7001
   slots:[5461-10922] (5462 slots) master
M: 9124e5d1dae6808c45761a832f3d9c6de9b741d3 127.0.0.1:7002
   slots:[10923-16383] (5461 slots) master
S: 538ad78ba7bb5d3e9bd84a692a6362eac737c520 127.0.0.1:7003
   replicates 9124e5d1dae6808c45761a832f3d9c6de9b741d3
S: de488ecbe7e5955d8688513029d5ee4ce47a50f1 127.0.0.1:7004
   replicates 79edcb961e111b73f2a2cccde9da209aa536a9ed
S: 2fdeb00f98851f2e9ffdc2fb0ae2891b2ec4e928 127.0.0.1:7005
   replicates 98ccf192cce67a817a81b3ed3f830c9c9368200d
Can I set the above configuration? (type 'yes' to accept):

可以看到 redis 会输出集群的配置详情给你；M 表示master，后面一长字符串是节点的 RunID，在一个redis集群中 RunID 是唯一的，slots 表示此节点分配的 hash 槽；输入 yes 后会将这份配置应用到集群当中。

集群创建成功后，我们测试一下主从复制

[root@centos etc]# redis-cli -p 7000 -c
127.0.0.1:7000> set client 7000
OK
127.0.0.1:7000> exit
[root@centos etc]# 
[root@centos etc]#
[root@centos etc]# redis-cli -p 7001 -c
127.0.0.1:7001> get client
-> Redirected to slot [3847] located at 127.0.0.1:7000
"7000"
127.0.0.1:7000> exit
[root@centos etc]#
[root@centos etc]#
[root@centos etc]# redis-cli -p 7003 -c
127.0.0.1:7003> get client
-> Redirected to slot [3847] located at 127.0.0.1:7000
"7000"
127.0.0.1:7000>

发现节点 7001 和 7003 确实读到了 7000 设置的 key ，说明集群正常运行起来了。

我们再测试一下故障转移，我们 kill 掉 7000 实例，看下整个集群的状态

[root@centos etc]# ps axu | grep 7000
root      13598  0.1  0.3 160528  3484 ?        Ssl  10:12   0:02 redis-server 0.0.0.0:7000 [cluster]
root      16470  0.0  0.0 112704   952 pts/0    R+   10:52   0:00 grep --color 7000
[root@centos etc]# kill 13598
[root@centos etc]# redis-cli --cluster info 127.0.0.1:7002
Could not connect to Redis at 127.0.0.1:7000: Connection refused
127.0.0.1:7002 (9124e5d1...) -> 0 keys | 5461 slots | 1 slaves.
127.0.0.1:7004 (de488ecb...) -> 1 keys | 5461 slots | 0 slaves.
127.0.0.1:7001 (98ccf192...) -> 0 keys | 5462 slots | 1 slaves.

从输出的状态信息中我们可以看到 7004 被提升为了 master，看下是不是这样

# 7004 确实被提升为了 master
[root@centos etc]# redis-cli -p 7004 -c
127.0.0.1:7004> role
1) "master"
2) (integer) 2228
3) (empty list or set)
# 再次启动 7000，发现已经变为了 slave
[root@centos etc]# redis-server 7000/redis.conf 
[root@centos etc]# redis-cli -p 7000 -c
127.0.0.1:7000> role
1) "slave"
2) "127.0.0.1"
3) (integer) 7004
4) "connected"
5) (integer) 2242
127.0.0.1:7000>

集群管理

redis-cli --cluster 是 redis 提供的管理集群的工具。

[root@centos etc]# redis-cli --cluster help
Cluster Manager Commands:
  create         host1:port1 ... hostN:portN  创建集群
                 --cluster-replicas <arg>
  check          host:port      # 查看 slot 分配信息
  info           host:port      # 查看集群的状态信息
  fix            host:port      # 确保所有的 slot 都被节点覆盖到
  reshard        host:port      # slot 管理，例如可以将某几个 slot 移动到另一个节点
                 --cluster-from <arg>
                 --cluster-to <arg>
                 --cluster-slots <arg>
                 --cluster-yes
                 --cluster-timeout <arg>
                 --cluster-pipeline <arg>
  rebalance      host:port   # 负载均衡相关的设置
                 --cluster-weight <node1=w1...nodeN=wN>
                 --cluster-use-empty-masters
                 --cluster-timeout <arg>
                 --cluster-simulate
                 --cluster-pipeline <arg>
                 --cluster-threshold <arg>
  add-node       new_host:new_port existing_host:existing_port # 添加 master/slave 节点
                 --cluster-slave
                 --cluster-master-id <arg>
  del-node       host:port node_id  # 删除节点
  call           host:port command arg arg .. arg
  set-timeout    host:port milliseconds
  import         host:port
                 --cluster-from <arg>
                 --cluster-copy
                 --cluster-replace
  help           
For check, fix, reshard, del-node, set-timeout you can specify the host and port of any working node in the cluster.

当然在交互模式下你也可以实现集群管理

redis-cli -c -h host -p port ： 连接集群服务器
# 节点
cluster info ：打印集群的信息
cluster nodes ：列出集群当前已知的所有节点（ node），以及这些节点的相关信息。
cluster meet <ip> <port> ：将 ip 和 port 所指定的节点添加到集群当中，让它成为集群的一份子。
cluster forget <node_id> ：从集群中移除 node_id 指定的节点。
cluster replicate <node_id> ：将当前节点设置为 node_id 指定的节点的从节点。
cluster saveconfig ：将节点的配置文件保存到硬盘里面。
# slot 
cluster addslots <slot> [slot ...] ：将一个或多个槽（ slot）指派（ assign）给当前节点。
cluster delslots <slot> [slot ...] ：移除一个或多个槽对当前节点的指派。
cluster flushslots ：移除指派给当前节点的所有槽，让当前节点变成一个没有指派任何槽的节点。
cluster setslot <slot> node <node_id> ：将槽 slot 指派给 node_id 指定的节点，如果槽已经指派给另一个节点，那么先让另一个节点删除该槽>，然后再进行指派。
cluster setslot <slot> migrating <node_id> ：将本节点的槽 slot 迁移到 node_id 指定的节点中。
cluster setslot <slot> importing <node_id> ：从 node_id 指定的节点中导入槽 slot 到本节点。
cluster setslot <slot> stable ：取消对槽 slot 的导入（ import）或者迁移（ migrate）。
# key
cluster keyslot <key> ：计算键 key 应该被放置在哪个槽上。
cluster countkeysinslot <slot> ：返回槽 slot 目前包含的键值对数量。
cluster getkeysinslot <slot> <count> ：返回 count 个 slot 槽中的键 。

示例

删除一个节点

# 例如我们要删除 7000 节点,我们先获取它的 RunID
[root@centos etc]# redis-cli -p 7000 cluster nodes
...
79edcb961e111b73f2a2cccde9da209aa536a9ed 127.0.0.1:7000@17000 myself,slave de488ecbe7e5955d8688513029d5ee4ce47a50f1 0 1547016781000 1 connected
...
# 获取到 7000 节点的 id 后,我们就可以删除了
[root@centos etc]# redis-cli --cluster del-node 127.0.0.1:7000  79edcb961e111b73f2a2cccde9da209aa536a9ed
>>> Removing node 79edcb961e111b73f2a2cccde9da209aa536a9ed from cluster 127.0.0.1:7000
>>> Sending CLUSTER FORGET messages to the cluster...
>>> SHUTDOWN the node.
# 查看集群的节点，发现节点 7000 确实不见了
[root@centos redis]# redis-cli -p 7004 -c
127.0.0.1:7004> CLUSTER NODES
2fdeb00f98851f2e9ffdc2fb0ae2891b2ec4e928 127.0.0.1:7005@17005 slave 98ccf192cce67a817a81b3ed3f830c9c9368200d 0 1547022314361 8 connected
de488ecbe7e5955d8688513029d5ee4ce47a50f1 127.0.0.1:7004@17004 myself,master - 0 1547022312000 7 connected 10-5460
538ad78ba7bb5d3e9bd84a692a6362eac737c520 127.0.0.1:7003@17003 slave 9124e5d1dae6808c45761a832f3d9c6de9b741d3 0 1547022313000 3 connected
9124e5d1dae6808c45761a832f3d9c6de9b741d3 127.0.0.1:7002@17002 master - 0 1547022313000 3 connected 10923-16383
98ccf192cce67a817a81b3ed3f830c9c9368200d 127.0.0.1:7001@17001 master - 0 1547022313353 8 connected 0-9 5461-10922
127.0.0.1:7004>

添加一个节点

# 把我们刚刚删除的节点 7000 添加回来
127.0.0.1:7004> CLUSTER MEET 127.0.0.1 7000
OK
127.0.0.1:7004>
127.0.0.1:7004> CLUSTER NODES
# 可以看到 7000 确实被添加回来了
79edcb961e111b73f2a2cccde9da209aa536a9ed 127.0.0.1:7000@17000 slave de488ecbe7e5955d8688513029d5ee4ce47a50f1 0 1547021682141 7 connected
2fdeb00f98851f2e9ffdc2fb0ae2891b2ec4e928 127.0.0.1:7005@17005 slave 98ccf192cce67a817a81b3ed3f830c9c9368200d 0 1547021682000 8 connected
de488ecbe7e5955d8688513029d5ee4ce47a50f1 127.0.0.1:7004@17004 myself,master - 0 1547021683000 7 connected 10-5460
538ad78ba7bb5d3e9bd84a692a6362eac737c520 127.0.0.1:7003@17003 slave 9124e5d1dae6808c45761a832f3d9c6de9b741d3 0 1547021680124 3 connected
9124e5d1dae6808c45761a832f3d9c6de9b741d3 127.0.0.1:7002@17002 master - 0 1547021681000 3 connected 10923-16383
98ccf192cce67a817a81b3ed3f830c9c9368200d 127.0.0.1:7001@17001 master - 0 1547021683151 8 connected 0-9 5461-10922

移动 slot

# 移动节点 7004 的 10个 slot 到 7001
[root@centos redis]# redis-cli --cluster reshard 127.0.0.1:7004 --cluster-from de488ecbe7e5955d8688513029d5ee4ce47a50f1 --cluster-to  98ccf192cce67a817a81b3ed3f830c9c9368200d --cluster-slots 10 --cluster-yes
# 接着我们查看一下，发现 7001 包含了 [0-9] 号的槽
[root@centos redis]# redis-cli --cluster check 127.0.0.1:7001
...
>>> Performing Cluster Check (using node 127.0.0.1:7001)
M: 98ccf192cce67a817a81b3ed3f830c9c9368200d 127.0.0.1:7001
   slots:[0-9],[5461-10922] (5472 slots) master
   1 additional replica(s)
S: 2fdeb00f98851f2e9ffdc2fb0ae2891b2ec4e928 127.0.0.1:7005
   slots: (0 slots) slave
   replicates 98ccf192cce67a817a81b3ed3f830c9c9368200d
...

参考连接

https://blog.csdn.net/qq_20597727/article/details/83385737

http://www.cnblogs.com/gomysql/p/4395504.html