Matsuo's Blog: Serf を使ってOpenStack上にクラスタ構成を組む

OpenStackデファクトのHypervisorであるKVMにはVMware HAのようなクラスタ機能は含まれていませんので、ユーザー側でクラスタソフトウェアを用いてクラスタ構成を構築する必要があります。
The cluster function like VMware HA is not included in KVM, which is a Hypervisor of the OpenStack de facto, so it is necessary for the user side to construct a cluster configuration using cluster software.

Hashicorp社が開発するオープンソースのSerfを活用して、OpenStack上の複数インスタンスでクラスタ構成を作成してみたいと思います。
I would like to take advantage of open source Serf developed by Hashicorp to create a cluster configuration with multiple instances on OpenStack.

Serfは以下のURLから入手できます。ドキュメントも整備されていますのでこちらを参考に作業を進めます。
Serf can be obtained from the following URL. Since the document is well maintained, we will proceed with reference here.

https://www.serfdom.io/

GETTING STARTEDに沿って進めます。

Proceed along GETTING STARTED.

https://www.serfdom.io/intro/getting-started/install.html

Serfをダウンロードする - Download Serf

https://www.serfdom.io/downloads.html

保存先のURLをコピーして次のようなコマンドでダウンロードします。
Copy the URL of the save destination and download with the command like the following.

curl -L -O https://releases.hashicorp.com/serf/0.7.0/serf_0.7.0_linux_amd64.zip

zipファイルを解凍します。

Extract the zip file.

unzip serf_0.7.0_linux_amd64.zip

Serfを導入する - Install Serf

SerfはダウンロードしてZIPを回答すると直接実行できるバイナリになります。
これを/usr/local/binにコピーします。
Serf is a binary that can be directly executed by downloading and answering ZIP. Copy this to / usr / local / bin.

sudo cp serf /usr/local/bin

動作するかどうか試します。

I will try to work.

$ serf
usage: serf [--version] [--help] <command> [<args>]

Available commands are:
    agent           Runs a Serf agent
    event           Send a custom event through the Serf cluster
    force-leave     Forces a member of the cluster to enter the "left" state
    info            Provides debugging information for operators
    join            Tell Serf agent to join cluster
    keygen          Generates a new encryption key
    keys            Manipulate the internal encryption keyring used by Serf
    leave           Gracefully leaves the Serf cluster and shuts down
    members         Lists the members of a Serf cluster
    monitor         Stream logs from a Serf agent
    query           Send a query to the Serf cluster
    reachability    Test network reachability
    rtt             Estimates network round trip time between nodes
    tags            Modify tags of a running Serf agent
    version         Prints the Serf version

このようにコマンドの利用方法が出力されれば導入は完了です。
Installation of this command will be completed if output method of command is output like this.

Serfイベントハンドラを作成する - Create event handler

Serfにイベントが発生した時に呼び出されるイベントハンドラを作成します。
Create an event handler called when an event occurs in Serf.

#!/bin/bash

echo
echo "New event: ${SERF_EVENT}. Data follows..."
while read line; do
printf "${line}\n"
done

これを適当なディレクトリに置きます。
筆者の場合は/home/ubuntu/serfの下に置きました。
Put this in the appropriate directory.
In my case I put it under / home / ubuntu / serf.

実行できるようchmodで実行権限を付与します。
Grant execute privilege on chmod so that it can be executed.

chmod +x handler.sh

Serf起動スクリプトを作成する - Create startup script

/home/ubuntu/serfディレクトリの下にstart-serf.shという名前でserfの起動スクリプトを作ります。
Under the / home / ubuntu / serf directory create a serf startup script named start-serf.sh.

nohup serf agent -node=matsuos-cluster-1 \
        -bind=10.220.0.248 \
        -event-handler=/home/ubuntu/serf/handler.sh \
        -log-level=debug &

nodeに指定しているのはSerf上で見える自分自身の名前です。
bindに指定しているのはSerfが使用するNICのIPアドレスです。
何も指定しないとloopbackアドレスが指定され、外部と通信できません。
event-handerには先ほど作成したイベントハンドラのパスを指定します。
log-levelは指定しなくても構いませんが、debugモードにすると詳細のログを閲覧できます。
nohupコマンドと＆（アンパサンド）を使用して、セッションが切れてもバックグラウンドで作動するようにします。
The node you specify is yourself name visible on Serf.
The one specified for bind is the IP address of the NIC used by Serf.
If nothing is specified, the loopback address is specified and communication with the outside can not be performed.
For event-hander, specify the path of the event handler created earlier.
You do not have to specify log-level, but you can view detailed logs in debug mode.
Use the nohup command and & (ampersand) so that it runs in the background even if the session expires.

実行できるようchmodで実行権限を付与します。
Grant execute privilege on chmod so that it can be executed.

chmod +x start-serf.sh

Serfを起動する - Start Serf

/home/ubuntu/serfに移り、start-serf.shを実行します。
Go to / home / ubuntu / serf and run start-serf.sh.

./start-serf.sh

2号機にも同様の設定を行う - Make the same setting for unit No.2

クラスタの対向となる2号機にも同様の設定を行います。
Make the same settings for Unit 2, which is the opposite of the cluster.

Serfのダウンロード (Download Serf)
Serfの導入 (Install Serf)
Serfイベントハンドラの作成 (Create event handler)
Serf起動スクリプトの作成 (Create startup script)
Serfを起動する (Start Serf)

Joinする

1号機と2号機でそれぞれSerfエージェントがバックグラウンドで稼働しています。
この状態ではお互いは何も通信していないので、お互いを認識していません。
認識させるためには、どちらかから相手側にJoinする必要があります。
The Serf agent runs in the background in Unit 1 and Unit 2, respectively.
In this state, they do not communicate with each other, so they do not recognize each other.
In order to recognize, you need to join from the other side to the other side.

ここでは、2号機から以下のコマンドを実行し1号機のSerfにJoinします。
（1号機から2号機へJoinしてもOKです）
In this case, execute the following command from Unit 2 and join Serf of Unit 1.
(It is OK even if Join from Unit 1 to Unit 2)

serf join 10.220.0.248
Successfully joined cluster by contacting 1 nodes.

次のコマンドを実行しメンバに登録されたことを確認します。
Execute the following command to confirm that it is registered in the member.

serf members
matsuos-cluster-2 10.220.0.249:7946 alive
matsuos-cluster-1 10.220.0.248:7946 alive

Logを確認する - Confirm log

イベントハンドラが正しく作動したかどうか、ログを確認します。
起動スクリプトでnohupコマンドを利用したので、標準出力はnohup.outファイルに書きだされています。
こちらは1号機のnohup.outの内容です。
Check the log whether the event handler worked properly.
I used the nohup command in the startup script, so the standard output is written in the nohup.out file.
This is the contents of No. 1 nohup.out.

==> Starting Serf agent...
==> Starting Serf agent RPC...
==> Serf agent running!
         Node name: 'matsuos-cluster-1'
         Bind addr: '10.220.0.248:7946'
          RPC addr: '127.0.0.1:7373'
         Encrypted: false
          Snapshot: false
           Profile: lan

==> Log data will now stream in as it occurs:

    2016/03/20 11:15:04 [INFO] agent: Serf agent starting
    2016/03/20 11:15:04 [INFO] serf: EventMemberJoin: matsuos-cluster-1 10.220.0.248
    2016/03/20 11:15:05 [INFO] agent: Received event: member-join
    2016/03/20 11:15:05 [DEBUG] agent: Event 'member-join' script output:
New event: member-join. Data follows...
matsuos-cluster-1      10.220.0.248

（ここで1号機自身が起動したことによるmember-joinイベントが発行されました）
    2016/03/20 11:15:48 [DEBUG] memberlist: TCP connection from=10.220.0.249:60034
    2016/03/20 11:15:48 [INFO] serf: EventMemberJoin: matsuos-cluster-2 10.220.0.249
    2016/03/20 11:15:48 [DEBUG] serf: messageJoinType: matsuos-cluster-2
    2016/03/20 11:15:49 [DEBUG] serf: messageJoinType: matsuos-cluster-2
    2016/03/20 11:15:49 [DEBUG] serf: messageJoinType: matsuos-cluster-2
    2016/03/20 11:15:49 [DEBUG] serf: messageJoinType: matsuos-cluster-2
    2016/03/20 11:15:49 [INFO] agent: Received event: member-join
    2016/03/20 11:15:49 [DEBUG] agent: Event 'member-join' script output:
New event: member-join. Data follows...
matsuos-cluster-2      10.220.0.249
（ここで2号機からjoinコマンドによってmember-joinイベントが発行されました）
    2016/03/20 11:15:50 [DEBUG] memberlist: Initiating push/pull sync with: 10.220.0.249:7946
    2016/03/20 11:16:20 [DEBUG] memberlist: Initiating push/pull sync with: 10.220.0.249:7946
    2016/03/20 11:16:35 [DEBUG] memberlist: TCP connection from=10.220.0.249:60036
    2016/03/20 11:16:50 [DEBUG] memberlist: Initiating push/pull sync with: 10.220.0.249:7946
    2016/03/20 11:17:05 [DEBUG] memberlist: TCP connection from=10.220.0.249:60037
    2016/03/20 11:17:20 [DEBUG] memberlist: Initiating push/pull sync with: 10.220.0.249:7946
    2016/03/20 11:17:35 [DEBUG] memberlist: TCP connection from=10.220.0.249:60038
    2016/03/20 11:17:50 [DEBUG] memberlist: Initiating push/pull sync with: 10.220.0.249:7946

他のイベントを送ってみる - Send another events

2号機のSerfエージェントをKillしてみます。
I will try to kill the Serf agent of Unit 2.

ubuntu@matsuos-cluster-2:~/serf$ ps
PID TTY          TIME CMD
1368 pts/0    00:00:00 bash
1604 pts/0    00:00:01 serf
1629 pts/0    00:00:00 ps
ubuntu@matsuos-cluster-2:~/serf$ kill 1604
ubuntu@matsuos-cluster-2:~/serf$ ps
PID TTY          TIME CMD
1368 pts/0    00:00:00 bash
1630 pts/0    00:00:00 ps

1号機のnohup.outファイルを見てみましょう。
Let's look at the nohup.out file of Unit 1.

    2016/03/20 11:23:20 [DEBUG] memberlist: Initiating push/pull sync with: 10.220.0.249:7946
    2016/03/20 11:23:35 [DEBUG] memberlist: TCP connection from=10.220.0.249:60050
    2016/03/20 11:23:50 [DEBUG] memberlist: Initiating push/pull sync with: 10.220.0.249:7946
    2016/03/20 11:24:05 [DEBUG] memberlist: TCP connection from=10.220.0.249:60051
    2016/03/20 11:24:20 [DEBUG] memberlist: Initiating push/pull sync with: 10.220.0.249:7946
    2016/03/20 11:24:34 [DEBUG] memberlist: Failed UDP ping: matsuos-cluster-2 (timeout reached)
（ここで2号機へのhart beatが失敗していることがわかります）    2016/03/20 11:24:35 [INFO] memberlist: Suspect matsuos-cluster-2 has failed, no acks received
    2016/03/20 11:24:36 [DEBUG] memberlist: Failed UDP ping: matsuos-cluster-2 (timeout reached)
    2016/03/20 11:24:37 [INFO] memberlist: Suspect matsuos-cluster-2 has failed, no acks received
    2016/03/20 11:24:38 [DEBUG] memberlist: Failed UDP ping: matsuos-cluster-2 (timeout reached)
    2016/03/20 11:24:39 [INFO] memberlist: Suspect matsuos-cluster-2 has failed, no acks received
    2016/03/20 11:24:39 [DEBUG] memberlist: Failed UDP ping: matsuos-cluster-2 (timeout reached)
    2016/03/20 11:24:40 [INFO] memberlist: Suspect matsuos-cluster-2 has failed, no acks received
    2016/03/20 11:24:40 [INFO] memberlist: Marking matsuos-cluster-2 as failed, suspect timeout reached
    2016/03/20 11:24:40 [INFO] serf: EventMemberFailed: matsuos-cluster-2 10.220.0.249
    2016/03/20 11:24:41 [INFO] agent: Received event: member-failed
    2016/03/20 11:24:41 [DEBUG] agent: Event 'member-failed' script output:
New event: member-failed. Data follows...
matsuos-cluster-2      10.220.0.249
（ここでmember-failedイベントが発行されました）    2016/03/20 11:25:04 [INFO] serf: attempting reconnect to matsuos-cluster-2 10.220.0.249:7946
    2016/03/20 11:25:34 [INFO] serf: attempting reconnect to matsuos-cluster-2 10.220.0.249:7946

イベントハンドラにリカバリ処理を記述することによって、クラスタ構成を取ることができることがご理解いただけましたでしょうか？
Did you understand that you can take cluster configuration by describing recovery processing in event handler?

例えば、Cinderボリュームを付け替えて、サービスを再起動する、といったリカバリ処理を記述することになると思います。
For example, I think that you will write a recovery process such as changing the Cinder volume and restarting the service.

Matsuo's Blog

2016年3月20日日曜日

Serf を使ってOpenStack上にクラスタ構成を組む