Add vendor folder to git

This commit is contained in:
Lucas Käldström 2017-06-26 19:23:05 +03:00
parent 66cf5eaafb
commit 183585f56f
No known key found for this signature in database
GPG key ID: 600FEFBBD0D40D21
6916 changed files with 2629581 additions and 1 deletions

247
vendor/github.com/coreos/etcd/raft/README.md generated vendored Normal file
View file

@ -0,0 +1,247 @@
# Raft library
Raft is a protocol with which a cluster of nodes can maintain a replicated state machine.
The state machine is kept in sync through the use of a replicated log.
For more details on Raft, see "In Search of an Understandable Consensus Algorithm"
(https://ramcloud.stanford.edu/raft.pdf) by Diego Ongaro and John Ousterhout.
This Raft library is stable and feature complete. As of 2016, it is **the most widely used** Raft library in production, serving tens of thousands clusters each day. It powers distributed systems such as etcd, Kubernetes, Docker Swarm, Cloud Foundry Diego, CockroachDB, TiDB, Project Calico, Flannel, and more.
Most Raft implementations have a monolithic design, including storage handling, messaging serialization, and network transport. This library instead follows a minimalistic design philosophy by only implementing the core raft algorithm. This minimalism buys flexibility, determinism, and performance.
To keep the codebase small as well as provide flexibility, the library only implements the Raft algorithm; both network and disk IO are left to the user. Library users must implement their own transportation layer for message passing between Raft peers over the wire. Similarly, users must implement their own storage layer to persist the Raft log and state.
In order to easily test the Raft library, its behavior should be deterministic. To achieve this determinism, the library models Raft as a state machine. The state machine takes a `Message` as input. A message can either be a local timer update or a network message sent from a remote peer. The state machine's output is a 3-tuple `{[]Messages, []LogEntries, NextState}` consisting of an array of `Messages`, `log entries`, and `Raft state changes`. For state machines with the same state, the same state machine input should always generate the same state machine output.
A simple example application, _raftexample_, is also available to help illustrate
how to use this package in practice:
https://github.com/coreos/etcd/tree/master/contrib/raftexample
# Features
This raft implementation is a full feature implementation of Raft protocol. Features includes:
- Leader election
- Log replication
- Log compaction
- Membership changes
- Leadership transfer extension
- Efficient linearizable read-only queries served by both the leader and followers
- leader checks with quorum and bypasses Raft log before processing read-only queries
- followers asks leader to get a safe read index before processing read-only queries
- More efficient lease-based linearizable read-only queries served by both the leader and followers
- leader bypasses Raft log and processing read-only queries locally
- followers asks leader to get a safe read index before processing read-only queries
- this approach relies on the clock of the all the machines in raft group
This raft implementation also includes a few optional enhancements:
- Optimistic pipelining to reduce log replication latency
- Flow control for log replication
- Batching Raft messages to reduce synchronized network I/O calls
- Batching log entries to reduce disk synchronized I/O
- Writing to leader's disk in parallel
- Internal proposal redirection from followers to leader
- Automatic stepping down when the leader loses quorum
## Notable Users
- [cockroachdb](https://github.com/cockroachdb/cockroach) A Scalable, Survivable, Strongly-Consistent SQL Database
- [dgraph](https://github.com/dgraph-io/dgraph) A Scalable, Distributed, Low Latency, High Throughput Graph Database
- [etcd](https://github.com/coreos/etcd) A distributed reliable key-value store
- [tikv](https://github.com/pingcap/tikv) A Distributed transactional key value database powered by Rust and Raft
- [swarmkit](https://github.com/docker/swarmkit) A toolkit for orchestrating distributed systems at any scale.
## Usage
The primary object in raft is a Node. You either start a Node from scratch
using raft.StartNode or start a Node from some initial state using raft.RestartNode.
To start a three-node cluster
```go
storage := raft.NewMemoryStorage()
c := &Config{
ID: 0x01,
ElectionTick: 10,
HeartbeatTick: 1,
Storage: storage,
MaxSizePerMsg: 4096,
MaxInflightMsgs: 256,
}
// Set peer list to the other nodes in the cluster.
// Note that they need to be started separately as well.
n := raft.StartNode(c, []raft.Peer{{ID: 0x02}, {ID: 0x03}})
```
You can start a single node cluster, like so:
```go
// Create storage and config as shown above.
// Set peer list to itself, so this node can become the leader of this single-node cluster.
peers := []raft.Peer{{ID: 0x01}}
n := raft.StartNode(c, peers)
```
To allow a new node to join this cluster, do not pass in any peers. First, you need add the node to the existing cluster by calling `ProposeConfChange` on any existing node inside the cluster. Then, you can start the node with empty peer list, like so:
```go
// Create storage and config as shown above.
n := raft.StartNode(c, nil)
```
To restart a node from previous state:
```go
storage := raft.NewMemoryStorage()
// Recover the in-memory storage from persistent snapshot, state and entries.
storage.ApplySnapshot(snapshot)
storage.SetHardState(state)
storage.Append(entries)
c := &Config{
ID: 0x01,
ElectionTick: 10,
HeartbeatTick: 1,
Storage: storage,
MaxSizePerMsg: 4096,
MaxInflightMsgs: 256,
}
// Restart raft without peer information.
// Peer information is already included in the storage.
n := raft.RestartNode(c)
```
Now that you are holding onto a Node you have a few responsibilities:
First, you must read from the Node.Ready() channel and process the updates
it contains. These steps may be performed in parallel, except as noted in step
2.
1. Write HardState, Entries, and Snapshot to persistent storage if they are
not empty. Note that when writing an Entry with Index i, any
previously-persisted entries with Index >= i must be discarded.
2. Send all Messages to the nodes named in the To field. It is important that
no messages be sent until the latest HardState has been persisted to disk,
and all Entries written by any previous Ready batch (Messages may be sent while
entries from the same batch are being persisted). To reduce the I/O latency, an
optimization can be applied to make leader write to disk in parallel with its
followers (as explained at section 10.2.1 in Raft thesis). If any Message has type
MsgSnap, call Node.ReportSnapshot() after it has been sent (these messages may be
large). Note: Marshalling messages is not thread-safe; it is important that you
make sure that no new entries are persisted while marshalling.
The easiest way to achieve this is to serialise the messages directly inside
your main raft loop.
3. Apply Snapshot (if any) and CommittedEntries to the state machine.
If any committed Entry has Type EntryConfChange, call Node.ApplyConfChange()
to apply it to the node. The configuration change may be cancelled at this point
by setting the NodeID field to zero before calling ApplyConfChange
(but ApplyConfChange must be called one way or the other, and the decision to cancel
must be based solely on the state machine and not external information such as
the observed health of the node).
4. Call Node.Advance() to signal readiness for the next batch of updates.
This may be done at any time after step 1, although all updates must be processed
in the order they were returned by Ready.
Second, all persisted log entries must be made available via an
implementation of the Storage interface. The provided MemoryStorage
type can be used for this (if you repopulate its state upon a
restart), or you can supply your own disk-backed implementation.
Third, when you receive a message from another node, pass it to Node.Step:
```go
func recvRaftRPC(ctx context.Context, m raftpb.Message) {
n.Step(ctx, m)
}
```
Finally, you need to call `Node.Tick()` at regular intervals (probably
via a `time.Ticker`). Raft has two important timeouts: heartbeat and the
election timeout. However, internally to the raft package time is
represented by an abstract "tick".
The total state machine handling loop will look something like this:
```go
for {
select {
case <-s.Ticker:
n.Tick()
case rd := <-s.Node.Ready():
saveToStorage(rd.State, rd.Entries, rd.Snapshot)
send(rd.Messages)
if !raft.IsEmptySnap(rd.Snapshot) {
processSnapshot(rd.Snapshot)
}
for _, entry := range rd.CommittedEntries {
process(entry)
if entry.Type == raftpb.EntryConfChange {
var cc raftpb.ConfChange
cc.Unmarshal(entry.Data)
s.Node.ApplyConfChange(cc)
}
}
s.Node.Advance()
case <-s.done:
return
}
}
```
To propose changes to the state machine from your node take your application
data, serialize it into a byte slice and call:
```go
n.Propose(ctx, data)
```
If the proposal is committed, data will appear in committed entries with type
raftpb.EntryNormal. There is no guarantee that a proposed command will be
committed; you may have to re-propose after a timeout.
To add or remove node in a cluster, build ConfChange struct 'cc' and call:
```go
n.ProposeConfChange(ctx, cc)
```
After config change is committed, some committed entry with type
raftpb.EntryConfChange will be returned. You must apply it to node through:
```go
var cc raftpb.ConfChange
cc.Unmarshal(data)
n.ApplyConfChange(cc)
```
Note: An ID represents a unique node in a cluster for all time. A
given ID MUST be used only once even if the old node has been removed.
This means that for example IP addresses make poor node IDs since they
may be reused. Node IDs must be non-zero.
## Implementation notes
This implementation is up to date with the final Raft thesis
(https://ramcloud.stanford.edu/~ongaro/thesis.pdf), although our
implementation of the membership change protocol differs somewhat from
that described in chapter 4. The key invariant that membership changes
happen one node at a time is preserved, but in our implementation the
membership change takes effect when its entry is applied, not when it
is added to the log (so the entry is committed under the old
membership instead of the new). This is equivalent in terms of safety,
since the old and new configurations are guaranteed to overlap.
To ensure that we do not attempt to commit two membership changes at
once by matching log positions (which would be unsafe since they
should have different quorum requirements), we simply disallow any
proposed membership change while any uncommitted change appears in
the leader's log.
This approach introduces a problem when you try to remove a member
from a two-member cluster: If one of the members dies before the
other one receives the commit of the confchange entry, then the member
cannot be removed any more since the cluster cannot make progress.
For this reason it is highly recommended to use three or more nodes in
every cluster.

57
vendor/github.com/coreos/etcd/raft/design.md generated vendored Normal file
View file

@ -0,0 +1,57 @@
## Progress
Progress represents a followers progress in the view of the leader. Leader maintains progresses of all followers, and sends `replication message` to the follower based on its progress.
`replication message` is a `msgApp` with log entries.
A progress has two attribute: `match` and `next`. `match` is the index of the highest known matched entry. If leader knows nothing about followers replication status, `match` is set to zero. `next` is the index of the first entry that will be replicated to the follower. Leader puts entries from `next` to its latest one in next `replication message`.
A progress is in one of the three state: `probe`, `replicate`, `snapshot`.
```
+--------------------------------------------------------+
| send snapshot |
| |
+---------+----------+ +----------v---------+
+---> probe | | snapshot |
| | max inflight = 1 <----------------------------------+ max inflight = 0 |
| +---------+----------+ +--------------------+
| | 1. snapshot success
| | (next=snapshot.index + 1)
| | 2. snapshot failure
| | (no change)
| | 3. receives msgAppResp(rej=false&&index>lastsnap.index)
| | (match=m.index,next=match+1)
receives msgAppResp(rej=true)
(next=match+1)| |
| |
| |
| | receives msgAppResp(rej=false&&index>match)
| | (match=m.index,next=match+1)
| |
| |
| |
| +---------v----------+
| | replicate |
+---+ max inflight = n |
+--------------------+
```
When the progress of a follower is in `probe` state, leader sends at most one `replication message` per heartbeat interval. The leader sends `replication message` slowly and probing the actual progress of the follower. A `msgHeartbeatResp` or a `msgAppResp` with reject might trigger the sending of the next `replication message`.
When the progress of a follower is in `replicate` state, leader sends `replication message`, then optimistically increases `next` to the latest entry sent. This is an optimized state for fast replicating log entries to the follower.
When the progress of a follower is in `snapshot` state, leader stops sending any `replication message`.
A newly elected leader sets the progresses of all the followers to `probe` state with `match` = 0 and `next` = last index. The leader slowly (at most once per heartbeat) sends `replication message` to the follower and probes its progress.
A progress changes to `replicate` when the follower replies with a non-rejection `msgAppResp`, which implies that it has matched the index sent. At this point, leader starts to stream log entries to the follower fast. The progress will fall back to `probe` when the follower replies a rejection `msgAppResp` or the link layer reports the follower is unreachable. We aggressively reset `next` to `match`+1 since if we receive any `msgAppResp` soon, both `match` and `next` will increase directly to the `index` in `msgAppResp`. (We might end up with sending some duplicate entries when aggressively reset `next` too low. see open question)
A progress changes from `probe` to `snapshot` when the follower falls very far behind and requires a snapshot. After sending `msgSnap`, the leader waits until the success, failure or abortion of the previous snapshot sent. The progress will go back to `probe` after the sending result is applied.
### Flow Control
1. limit the max size of message sent per message. Max should be configurable.
Lower the cost at probing state as we limit the size per message; lower the penalty when aggressively decreased to a too low `next`
2. limit the # of in flight messages < N when in `replicate` state. N should be configurable. Most implementation will have a sending buffer on top of its actual network transport layer (not blocking raft node). We want to make sure raft does not overflow that buffer, which can cause message dropping and triggering a bunch of unnecessary resending repeatedly.

65
vendor/github.com/coreos/etcd/raft/diff_test.go generated vendored Normal file
View file

@ -0,0 +1,65 @@
// Copyright 2015 The etcd Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package raft
import (
"fmt"
"io"
"io/ioutil"
"os"
"os/exec"
"strings"
)
func diffu(a, b string) string {
if a == b {
return ""
}
aname, bname := mustTemp("base", a), mustTemp("other", b)
defer os.Remove(aname)
defer os.Remove(bname)
cmd := exec.Command("diff", "-u", aname, bname)
buf, err := cmd.CombinedOutput()
if err != nil {
if _, ok := err.(*exec.ExitError); ok {
// do nothing
return string(buf)
}
panic(err)
}
return string(buf)
}
func mustTemp(pre, body string) string {
f, err := ioutil.TempFile("", pre)
if err != nil {
panic(err)
}
_, err = io.Copy(f, strings.NewReader(body))
if err != nil {
panic(err)
}
f.Close()
return f.Name()
}
func ltoa(l *raftLog) string {
s := fmt.Sprintf("committed: %d\n", l.committed)
s += fmt.Sprintf("applied: %d\n", l.applied)
for i, e := range l.allEntries() {
s += fmt.Sprintf("#%d: %+v\n", i, e)
}
return s
}

300
vendor/github.com/coreos/etcd/raft/doc.go generated vendored Normal file
View file

@ -0,0 +1,300 @@
// Copyright 2015 The etcd Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
/*
Package raft sends and receives messages in the Protocol Buffer format
defined in the raftpb package.
Raft is a protocol with which a cluster of nodes can maintain a replicated state machine.
The state machine is kept in sync through the use of a replicated log.
For more details on Raft, see "In Search of an Understandable Consensus Algorithm"
(https://ramcloud.stanford.edu/raft.pdf) by Diego Ongaro and John Ousterhout.
A simple example application, _raftexample_, is also available to help illustrate
how to use this package in practice:
https://github.com/coreos/etcd/tree/master/contrib/raftexample
Usage
The primary object in raft is a Node. You either start a Node from scratch
using raft.StartNode or start a Node from some initial state using raft.RestartNode.
To start a node from scratch:
storage := raft.NewMemoryStorage()
c := &Config{
ID: 0x01,
ElectionTick: 10,
HeartbeatTick: 1,
Storage: storage,
MaxSizePerMsg: 4096,
MaxInflightMsgs: 256,
}
n := raft.StartNode(c, []raft.Peer{{ID: 0x02}, {ID: 0x03}})
To restart a node from previous state:
storage := raft.NewMemoryStorage()
// recover the in-memory storage from persistent
// snapshot, state and entries.
storage.ApplySnapshot(snapshot)
storage.SetHardState(state)
storage.Append(entries)
c := &Config{
ID: 0x01,
ElectionTick: 10,
HeartbeatTick: 1,
Storage: storage,
MaxSizePerMsg: 4096,
MaxInflightMsgs: 256,
}
// restart raft without peer information.
// peer information is already included in the storage.
n := raft.RestartNode(c)
Now that you are holding onto a Node you have a few responsibilities:
First, you must read from the Node.Ready() channel and process the updates
it contains. These steps may be performed in parallel, except as noted in step
2.
1. Write HardState, Entries, and Snapshot to persistent storage if they are
not empty. Note that when writing an Entry with Index i, any
previously-persisted entries with Index >= i must be discarded.
2. Send all Messages to the nodes named in the To field. It is important that
no messages be sent until the latest HardState has been persisted to disk,
and all Entries written by any previous Ready batch (Messages may be sent while
entries from the same batch are being persisted). To reduce the I/O latency, an
optimization can be applied to make leader write to disk in parallel with its
followers (as explained at section 10.2.1 in Raft thesis). If any Message has type
MsgSnap, call Node.ReportSnapshot() after it has been sent (these messages may be
large).
Note: Marshalling messages is not thread-safe; it is important that you
make sure that no new entries are persisted while marshalling.
The easiest way to achieve this is to serialise the messages directly inside
your main raft loop.
3. Apply Snapshot (if any) and CommittedEntries to the state machine.
If any committed Entry has Type EntryConfChange, call Node.ApplyConfChange()
to apply it to the node. The configuration change may be cancelled at this point
by setting the NodeID field to zero before calling ApplyConfChange
(but ApplyConfChange must be called one way or the other, and the decision to cancel
must be based solely on the state machine and not external information such as
the observed health of the node).
4. Call Node.Advance() to signal readiness for the next batch of updates.
This may be done at any time after step 1, although all updates must be processed
in the order they were returned by Ready.
Second, all persisted log entries must be made available via an
implementation of the Storage interface. The provided MemoryStorage
type can be used for this (if you repopulate its state upon a
restart), or you can supply your own disk-backed implementation.
Third, when you receive a message from another node, pass it to Node.Step:
func recvRaftRPC(ctx context.Context, m raftpb.Message) {
n.Step(ctx, m)
}
Finally, you need to call Node.Tick() at regular intervals (probably
via a time.Ticker). Raft has two important timeouts: heartbeat and the
election timeout. However, internally to the raft package time is
represented by an abstract "tick".
The total state machine handling loop will look something like this:
for {
select {
case <-s.Ticker:
n.Tick()
case rd := <-s.Node.Ready():
saveToStorage(rd.State, rd.Entries, rd.Snapshot)
send(rd.Messages)
if !raft.IsEmptySnap(rd.Snapshot) {
processSnapshot(rd.Snapshot)
}
for _, entry := range rd.CommittedEntries {
process(entry)
if entry.Type == raftpb.EntryConfChange {
var cc raftpb.ConfChange
cc.Unmarshal(entry.Data)
s.Node.ApplyConfChange(cc)
}
}
s.Node.Advance()
case <-s.done:
return
}
}
To propose changes to the state machine from your node take your application
data, serialize it into a byte slice and call:
n.Propose(ctx, data)
If the proposal is committed, data will appear in committed entries with type
raftpb.EntryNormal. There is no guarantee that a proposed command will be
committed; you may have to re-propose after a timeout.
To add or remove node in a cluster, build ConfChange struct 'cc' and call:
n.ProposeConfChange(ctx, cc)
After config change is committed, some committed entry with type
raftpb.EntryConfChange will be returned. You must apply it to node through:
var cc raftpb.ConfChange
cc.Unmarshal(data)
n.ApplyConfChange(cc)
Note: An ID represents a unique node in a cluster for all time. A
given ID MUST be used only once even if the old node has been removed.
This means that for example IP addresses make poor node IDs since they
may be reused. Node IDs must be non-zero.
Implementation notes
This implementation is up to date with the final Raft thesis
(https://ramcloud.stanford.edu/~ongaro/thesis.pdf), although our
implementation of the membership change protocol differs somewhat from
that described in chapter 4. The key invariant that membership changes
happen one node at a time is preserved, but in our implementation the
membership change takes effect when its entry is applied, not when it
is added to the log (so the entry is committed under the old
membership instead of the new). This is equivalent in terms of safety,
since the old and new configurations are guaranteed to overlap.
To ensure that we do not attempt to commit two membership changes at
once by matching log positions (which would be unsafe since they
should have different quorum requirements), we simply disallow any
proposed membership change while any uncommitted change appears in
the leader's log.
This approach introduces a problem when you try to remove a member
from a two-member cluster: If one of the members dies before the
other one receives the commit of the confchange entry, then the member
cannot be removed any more since the cluster cannot make progress.
For this reason it is highly recommended to use three or more nodes in
every cluster.
MessageType
Package raft sends and receives message in Protocol Buffer format (defined
in raftpb package). Each state (follower, candidate, leader) implements its
own 'step' method ('stepFollower', 'stepCandidate', 'stepLeader') when
advancing with the given raftpb.Message. Each step is determined by its
raftpb.MessageType. Note that every step is checked by one common method
'Step' that safety-checks the terms of node and incoming message to prevent
stale log entries:
'MsgHup' is used for election. If a node is a follower or candidate, the
'tick' function in 'raft' struct is set as 'tickElection'. If a follower or
candidate has not received any heartbeat before the election timeout, it
passes 'MsgHup' to its Step method and becomes (or remains) a candidate to
start a new election.
'MsgBeat' is an internal type that signals the leader to send a heartbeat of
the 'MsgHeartbeat' type. If a node is a leader, the 'tick' function in
the 'raft' struct is set as 'tickHeartbeat', and triggers the leader to
send periodic 'MsgHeartbeat' messages to its followers.
'MsgProp' proposes to append data to its log entries. This is a special
type to redirect proposals to leader. Therefore, send method overwrites
raftpb.Message's term with its HardState's term to avoid attaching its
local term to 'MsgProp'. When 'MsgProp' is passed to the leader's 'Step'
method, the leader first calls the 'appendEntry' method to append entries
to its log, and then calls 'bcastAppend' method to send those entries to
its peers. When passed to candidate, 'MsgProp' is dropped. When passed to
follower, 'MsgProp' is stored in follower's mailbox(msgs) by the send
method. It is stored with sender's ID and later forwarded to leader by
rafthttp package.
'MsgApp' contains log entries to replicate. A leader calls bcastAppend,
which calls sendAppend, which sends soon-to-be-replicated logs in 'MsgApp'
type. When 'MsgApp' is passed to candidate's Step method, candidate reverts
back to follower, because it indicates that there is a valid leader sending
'MsgApp' messages. Candidate and follower respond to this message in
'MsgAppResp' type.
'MsgAppResp' is response to log replication request('MsgApp'). When
'MsgApp' is passed to candidate or follower's Step method, it responds by
calling 'handleAppendEntries' method, which sends 'MsgAppResp' to raft
mailbox.
'MsgVote' requests votes for election. When a node is a follower or
candidate and 'MsgHup' is passed to its Step method, then the node calls
'campaign' method to campaign itself to become a leader. Once 'campaign'
method is called, the node becomes candidate and sends 'MsgVote' to peers
in cluster to request votes. When passed to leader or candidate's Step
method and the message's Term is lower than leader's or candidate's,
'MsgVote' will be rejected ('MsgVoteResp' is returned with Reject true).
If leader or candidate receives 'MsgVote' with higher term, it will revert
back to follower. When 'MsgVote' is passed to follower, it votes for the
sender only when sender's last term is greater than MsgVote's term or
sender's last term is equal to MsgVote's term but sender's last committed
index is greater than or equal to follower's.
'MsgVoteResp' contains responses from voting request. When 'MsgVoteResp' is
passed to candidate, the candidate calculates how many votes it has won. If
it's more than majority (quorum), it becomes leader and calls 'bcastAppend'.
If candidate receives majority of votes of denials, it reverts back to
follower.
'MsgPreVote' and 'MsgPreVoteResp' are used in an optional two-phase election
protocol. When Config.PreVote is true, a pre-election is carried out first
(using the same rules as a regular election), and no node increases its term
number unless the pre-election indicates that the campaigining node would win.
This minimizes disruption when a partitioned node rejoins the cluster.
'MsgSnap' requests to install a snapshot message. When a node has just
become a leader or the leader receives 'MsgProp' message, it calls
'bcastAppend' method, which then calls 'sendAppend' method to each
follower. In 'sendAppend', if a leader fails to get term or entries,
the leader requests snapshot by sending 'MsgSnap' type message.
'MsgSnapStatus' tells the result of snapshot install message. When a
follower rejected 'MsgSnap', it indicates the snapshot request with
'MsgSnap' had failed from network issues which causes the network layer
to fail to send out snapshots to its followers. Then leader considers
follower's progress as probe. When 'MsgSnap' were not rejected, it
indicates that the snapshot succeeded and the leader sets follower's
progress to probe and resumes its log replication.
'MsgHeartbeat' sends heartbeat from leader. When 'MsgHeartbeat' is passed
to candidate and message's term is higher than candidate's, the candidate
reverts back to follower and updates its committed index from the one in
this heartbeat. And it sends the message to its mailbox. When
'MsgHeartbeat' is passed to follower's Step method and message's term is
higher than follower's, the follower updates its leaderID with the ID
from the message.
'MsgHeartbeatResp' is a response to 'MsgHeartbeat'. When 'MsgHeartbeatResp'
is passed to leader's Step method, the leader knows which follower
responded. And only when the leader's last committed index is greater than
follower's Match index, the leader runs 'sendAppend` method.
'MsgUnreachable' tells that request(message) wasn't delivered. When
'MsgUnreachable' is passed to leader's Step method, the leader discovers
that the follower that sent this 'MsgUnreachable' is not reachable, often
indicating 'MsgApp' is lost. When follower's progress state is replicate,
the leader sets it back to probe.
*/
package raft

47
vendor/github.com/coreos/etcd/raft/example_test.go generated vendored Normal file
View file

@ -0,0 +1,47 @@
// Copyright 2015 The etcd Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package raft
import (
pb "github.com/coreos/etcd/raft/raftpb"
)
func applyToStore(ents []pb.Entry) {}
func sendMessages(msgs []pb.Message) {}
func saveStateToDisk(st pb.HardState) {}
func saveToDisk(ents []pb.Entry) {}
func ExampleNode() {
c := &Config{}
n := StartNode(c, nil)
defer n.Stop()
// stuff to n happens in other goroutines
// the last known state
var prev pb.HardState
for {
// Ready blocks until there is new state ready.
rd := <-n.Ready()
if !isHardStateEqual(prev, rd.HardState) {
saveStateToDisk(rd.HardState)
prev = rd.HardState
}
saveToDisk(rd.Entries)
go applyToStore(rd.CommittedEntries)
sendMessages(rd.Messages)
}
}

358
vendor/github.com/coreos/etcd/raft/log.go generated vendored Normal file
View file

@ -0,0 +1,358 @@
// Copyright 2015 The etcd Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package raft
import (
"fmt"
"log"
pb "github.com/coreos/etcd/raft/raftpb"
)
type raftLog struct {
// storage contains all stable entries since the last snapshot.
storage Storage
// unstable contains all unstable entries and snapshot.
// they will be saved into storage.
unstable unstable
// committed is the highest log position that is known to be in
// stable storage on a quorum of nodes.
committed uint64
// applied is the highest log position that the application has
// been instructed to apply to its state machine.
// Invariant: applied <= committed
applied uint64
logger Logger
}
// newLog returns log using the given storage. It recovers the log to the state
// that it just commits and applies the latest snapshot.
func newLog(storage Storage, logger Logger) *raftLog {
if storage == nil {
log.Panic("storage must not be nil")
}
log := &raftLog{
storage: storage,
logger: logger,
}
firstIndex, err := storage.FirstIndex()
if err != nil {
panic(err) // TODO(bdarnell)
}
lastIndex, err := storage.LastIndex()
if err != nil {
panic(err) // TODO(bdarnell)
}
log.unstable.offset = lastIndex + 1
log.unstable.logger = logger
// Initialize our committed and applied pointers to the time of the last compaction.
log.committed = firstIndex - 1
log.applied = firstIndex - 1
return log
}
func (l *raftLog) String() string {
return fmt.Sprintf("committed=%d, applied=%d, unstable.offset=%d, len(unstable.Entries)=%d", l.committed, l.applied, l.unstable.offset, len(l.unstable.entries))
}
// maybeAppend returns (0, false) if the entries cannot be appended. Otherwise,
// it returns (last index of new entries, true).
func (l *raftLog) maybeAppend(index, logTerm, committed uint64, ents ...pb.Entry) (lastnewi uint64, ok bool) {
if l.matchTerm(index, logTerm) {
lastnewi = index + uint64(len(ents))
ci := l.findConflict(ents)
switch {
case ci == 0:
case ci <= l.committed:
l.logger.Panicf("entry %d conflict with committed entry [committed(%d)]", ci, l.committed)
default:
offset := index + 1
l.append(ents[ci-offset:]...)
}
l.commitTo(min(committed, lastnewi))
return lastnewi, true
}
return 0, false
}
func (l *raftLog) append(ents ...pb.Entry) uint64 {
if len(ents) == 0 {
return l.lastIndex()
}
if after := ents[0].Index - 1; after < l.committed {
l.logger.Panicf("after(%d) is out of range [committed(%d)]", after, l.committed)
}
l.unstable.truncateAndAppend(ents)
return l.lastIndex()
}
// findConflict finds the index of the conflict.
// It returns the first pair of conflicting entries between the existing
// entries and the given entries, if there are any.
// If there is no conflicting entries, and the existing entries contains
// all the given entries, zero will be returned.
// If there is no conflicting entries, but the given entries contains new
// entries, the index of the first new entry will be returned.
// An entry is considered to be conflicting if it has the same index but
// a different term.
// The first entry MUST have an index equal to the argument 'from'.
// The index of the given entries MUST be continuously increasing.
func (l *raftLog) findConflict(ents []pb.Entry) uint64 {
for _, ne := range ents {
if !l.matchTerm(ne.Index, ne.Term) {
if ne.Index <= l.lastIndex() {
l.logger.Infof("found conflict at index %d [existing term: %d, conflicting term: %d]",
ne.Index, l.zeroTermOnErrCompacted(l.term(ne.Index)), ne.Term)
}
return ne.Index
}
}
return 0
}
func (l *raftLog) unstableEntries() []pb.Entry {
if len(l.unstable.entries) == 0 {
return nil
}
return l.unstable.entries
}
// nextEnts returns all the available entries for execution.
// If applied is smaller than the index of snapshot, it returns all committed
// entries after the index of snapshot.
func (l *raftLog) nextEnts() (ents []pb.Entry) {
off := max(l.applied+1, l.firstIndex())
if l.committed+1 > off {
ents, err := l.slice(off, l.committed+1, noLimit)
if err != nil {
l.logger.Panicf("unexpected error when getting unapplied entries (%v)", err)
}
return ents
}
return nil
}
// hasNextEnts returns if there is any available entries for execution. This
// is a fast check without heavy raftLog.slice() in raftLog.nextEnts().
func (l *raftLog) hasNextEnts() bool {
off := max(l.applied+1, l.firstIndex())
return l.committed+1 > off
}
func (l *raftLog) snapshot() (pb.Snapshot, error) {
if l.unstable.snapshot != nil {
return *l.unstable.snapshot, nil
}
return l.storage.Snapshot()
}
func (l *raftLog) firstIndex() uint64 {
if i, ok := l.unstable.maybeFirstIndex(); ok {
return i
}
index, err := l.storage.FirstIndex()
if err != nil {
panic(err) // TODO(bdarnell)
}
return index
}
func (l *raftLog) lastIndex() uint64 {
if i, ok := l.unstable.maybeLastIndex(); ok {
return i
}
i, err := l.storage.LastIndex()
if err != nil {
panic(err) // TODO(bdarnell)
}
return i
}
func (l *raftLog) commitTo(tocommit uint64) {
// never decrease commit
if l.committed < tocommit {
if l.lastIndex() < tocommit {
l.logger.Panicf("tocommit(%d) is out of range [lastIndex(%d)]. Was the raft log corrupted, truncated, or lost?", tocommit, l.lastIndex())
}
l.committed = tocommit
}
}
func (l *raftLog) appliedTo(i uint64) {
if i == 0 {
return
}
if l.committed < i || i < l.applied {
l.logger.Panicf("applied(%d) is out of range [prevApplied(%d), committed(%d)]", i, l.applied, l.committed)
}
l.applied = i
}
func (l *raftLog) stableTo(i, t uint64) { l.unstable.stableTo(i, t) }
func (l *raftLog) stableSnapTo(i uint64) { l.unstable.stableSnapTo(i) }
func (l *raftLog) lastTerm() uint64 {
t, err := l.term(l.lastIndex())
if err != nil {
l.logger.Panicf("unexpected error when getting the last term (%v)", err)
}
return t
}
func (l *raftLog) term(i uint64) (uint64, error) {
// the valid term range is [index of dummy entry, last index]
dummyIndex := l.firstIndex() - 1
if i < dummyIndex || i > l.lastIndex() {
// TODO: return an error instead?
return 0, nil
}
if t, ok := l.unstable.maybeTerm(i); ok {
return t, nil
}
t, err := l.storage.Term(i)
if err == nil {
return t, nil
}
if err == ErrCompacted || err == ErrUnavailable {
return 0, err
}
panic(err) // TODO(bdarnell)
}
func (l *raftLog) entries(i, maxsize uint64) ([]pb.Entry, error) {
if i > l.lastIndex() {
return nil, nil
}
return l.slice(i, l.lastIndex()+1, maxsize)
}
// allEntries returns all entries in the log.
func (l *raftLog) allEntries() []pb.Entry {
ents, err := l.entries(l.firstIndex(), noLimit)
if err == nil {
return ents
}
if err == ErrCompacted { // try again if there was a racing compaction
return l.allEntries()
}
// TODO (xiangli): handle error?
panic(err)
}
// isUpToDate determines if the given (lastIndex,term) log is more up-to-date
// by comparing the index and term of the last entries in the existing logs.
// If the logs have last entries with different terms, then the log with the
// later term is more up-to-date. If the logs end with the same term, then
// whichever log has the larger lastIndex is more up-to-date. If the logs are
// the same, the given log is up-to-date.
func (l *raftLog) isUpToDate(lasti, term uint64) bool {
return term > l.lastTerm() || (term == l.lastTerm() && lasti >= l.lastIndex())
}
func (l *raftLog) matchTerm(i, term uint64) bool {
t, err := l.term(i)
if err != nil {
return false
}
return t == term
}
func (l *raftLog) maybeCommit(maxIndex, term uint64) bool {
if maxIndex > l.committed && l.zeroTermOnErrCompacted(l.term(maxIndex)) == term {
l.commitTo(maxIndex)
return true
}
return false
}
func (l *raftLog) restore(s pb.Snapshot) {
l.logger.Infof("log [%s] starts to restore snapshot [index: %d, term: %d]", l, s.Metadata.Index, s.Metadata.Term)
l.committed = s.Metadata.Index
l.unstable.restore(s)
}
// slice returns a slice of log entries from lo through hi-1, inclusive.
func (l *raftLog) slice(lo, hi, maxSize uint64) ([]pb.Entry, error) {
err := l.mustCheckOutOfBounds(lo, hi)
if err != nil {
return nil, err
}
if lo == hi {
return nil, nil
}
var ents []pb.Entry
if lo < l.unstable.offset {
storedEnts, err := l.storage.Entries(lo, min(hi, l.unstable.offset), maxSize)
if err == ErrCompacted {
return nil, err
} else if err == ErrUnavailable {
l.logger.Panicf("entries[%d:%d) is unavailable from storage", lo, min(hi, l.unstable.offset))
} else if err != nil {
panic(err) // TODO(bdarnell)
}
// check if ents has reached the size limitation
if uint64(len(storedEnts)) < min(hi, l.unstable.offset)-lo {
return storedEnts, nil
}
ents = storedEnts
}
if hi > l.unstable.offset {
unstable := l.unstable.slice(max(lo, l.unstable.offset), hi)
if len(ents) > 0 {
ents = append([]pb.Entry{}, ents...)
ents = append(ents, unstable...)
} else {
ents = unstable
}
}
return limitSize(ents, maxSize), nil
}
// l.firstIndex <= lo <= hi <= l.firstIndex + len(l.entries)
func (l *raftLog) mustCheckOutOfBounds(lo, hi uint64) error {
if lo > hi {
l.logger.Panicf("invalid slice %d > %d", lo, hi)
}
fi := l.firstIndex()
if lo < fi {
return ErrCompacted
}
length := l.lastIndex() + 1 - fi
if lo < fi || hi > fi+length {
l.logger.Panicf("slice[%d,%d) out of bound [%d,%d]", lo, hi, fi, l.lastIndex())
}
return nil
}
func (l *raftLog) zeroTermOnErrCompacted(t uint64, err error) uint64 {
if err == nil {
return t
}
if err == ErrCompacted {
return 0
}
l.logger.Panicf("unexpected error (%v)", err)
return 0
}

819
vendor/github.com/coreos/etcd/raft/log_test.go generated vendored Normal file
View file

@ -0,0 +1,819 @@
// Copyright 2015 The etcd Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package raft
import (
"reflect"
"testing"
pb "github.com/coreos/etcd/raft/raftpb"
)
func TestFindConflict(t *testing.T) {
previousEnts := []pb.Entry{{Index: 1, Term: 1}, {Index: 2, Term: 2}, {Index: 3, Term: 3}}
tests := []struct {
ents []pb.Entry
wconflict uint64
}{
// no conflict, empty ent
{[]pb.Entry{}, 0},
// no conflict
{[]pb.Entry{{Index: 1, Term: 1}, {Index: 2, Term: 2}, {Index: 3, Term: 3}}, 0},
{[]pb.Entry{{Index: 2, Term: 2}, {Index: 3, Term: 3}}, 0},
{[]pb.Entry{{Index: 3, Term: 3}}, 0},
// no conflict, but has new entries
{[]pb.Entry{{Index: 1, Term: 1}, {Index: 2, Term: 2}, {Index: 3, Term: 3}, {Index: 4, Term: 4}, {Index: 5, Term: 4}}, 4},
{[]pb.Entry{{Index: 2, Term: 2}, {Index: 3, Term: 3}, {Index: 4, Term: 4}, {Index: 5, Term: 4}}, 4},
{[]pb.Entry{{Index: 3, Term: 3}, {Index: 4, Term: 4}, {Index: 5, Term: 4}}, 4},
{[]pb.Entry{{Index: 4, Term: 4}, {Index: 5, Term: 4}}, 4},
// conflicts with existing entries
{[]pb.Entry{{Index: 1, Term: 4}, {Index: 2, Term: 4}}, 1},
{[]pb.Entry{{Index: 2, Term: 1}, {Index: 3, Term: 4}, {Index: 4, Term: 4}}, 2},
{[]pb.Entry{{Index: 3, Term: 1}, {Index: 4, Term: 2}, {Index: 5, Term: 4}, {Index: 6, Term: 4}}, 3},
}
for i, tt := range tests {
raftLog := newLog(NewMemoryStorage(), raftLogger)
raftLog.append(previousEnts...)
gconflict := raftLog.findConflict(tt.ents)
if gconflict != tt.wconflict {
t.Errorf("#%d: conflict = %d, want %d", i, gconflict, tt.wconflict)
}
}
}
func TestIsUpToDate(t *testing.T) {
previousEnts := []pb.Entry{{Index: 1, Term: 1}, {Index: 2, Term: 2}, {Index: 3, Term: 3}}
raftLog := newLog(NewMemoryStorage(), raftLogger)
raftLog.append(previousEnts...)
tests := []struct {
lastIndex uint64
term uint64
wUpToDate bool
}{
// greater term, ignore lastIndex
{raftLog.lastIndex() - 1, 4, true},
{raftLog.lastIndex(), 4, true},
{raftLog.lastIndex() + 1, 4, true},
// smaller term, ignore lastIndex
{raftLog.lastIndex() - 1, 2, false},
{raftLog.lastIndex(), 2, false},
{raftLog.lastIndex() + 1, 2, false},
// equal term, equal or lager lastIndex wins
{raftLog.lastIndex() - 1, 3, false},
{raftLog.lastIndex(), 3, true},
{raftLog.lastIndex() + 1, 3, true},
}
for i, tt := range tests {
gUpToDate := raftLog.isUpToDate(tt.lastIndex, tt.term)
if gUpToDate != tt.wUpToDate {
t.Errorf("#%d: uptodate = %v, want %v", i, gUpToDate, tt.wUpToDate)
}
}
}
func TestAppend(t *testing.T) {
previousEnts := []pb.Entry{{Index: 1, Term: 1}, {Index: 2, Term: 2}}
tests := []struct {
ents []pb.Entry
windex uint64
wents []pb.Entry
wunstable uint64
}{
{
[]pb.Entry{},
2,
[]pb.Entry{{Index: 1, Term: 1}, {Index: 2, Term: 2}},
3,
},
{
[]pb.Entry{{Index: 3, Term: 2}},
3,
[]pb.Entry{{Index: 1, Term: 1}, {Index: 2, Term: 2}, {Index: 3, Term: 2}},
3,
},
// conflicts with index 1
{
[]pb.Entry{{Index: 1, Term: 2}},
1,
[]pb.Entry{{Index: 1, Term: 2}},
1,
},
// conflicts with index 2
{
[]pb.Entry{{Index: 2, Term: 3}, {Index: 3, Term: 3}},
3,
[]pb.Entry{{Index: 1, Term: 1}, {Index: 2, Term: 3}, {Index: 3, Term: 3}},
2,
},
}
for i, tt := range tests {
storage := NewMemoryStorage()
storage.Append(previousEnts)
raftLog := newLog(storage, raftLogger)
index := raftLog.append(tt.ents...)
if index != tt.windex {
t.Errorf("#%d: lastIndex = %d, want %d", i, index, tt.windex)
}
g, err := raftLog.entries(1, noLimit)
if err != nil {
t.Fatalf("#%d: unexpected error %v", i, err)
}
if !reflect.DeepEqual(g, tt.wents) {
t.Errorf("#%d: logEnts = %+v, want %+v", i, g, tt.wents)
}
if goff := raftLog.unstable.offset; goff != tt.wunstable {
t.Errorf("#%d: unstable = %d, want %d", i, goff, tt.wunstable)
}
}
}
// TestLogMaybeAppend ensures:
// If the given (index, term) matches with the existing log:
// 1. If an existing entry conflicts with a new one (same index
// but different terms), delete the existing entry and all that
// follow it
// 2.Append any new entries not already in the log
// If the given (index, term) does not match with the existing log:
// return false
func TestLogMaybeAppend(t *testing.T) {
previousEnts := []pb.Entry{{Index: 1, Term: 1}, {Index: 2, Term: 2}, {Index: 3, Term: 3}}
lastindex := uint64(3)
lastterm := uint64(3)
commit := uint64(1)
tests := []struct {
logTerm uint64
index uint64
committed uint64
ents []pb.Entry
wlasti uint64
wappend bool
wcommit uint64
wpanic bool
}{
// not match: term is different
{
lastterm - 1, lastindex, lastindex, []pb.Entry{{Index: lastindex + 1, Term: 4}},
0, false, commit, false,
},
// not match: index out of bound
{
lastterm, lastindex + 1, lastindex, []pb.Entry{{Index: lastindex + 2, Term: 4}},
0, false, commit, false,
},
// match with the last existing entry
{
lastterm, lastindex, lastindex, nil,
lastindex, true, lastindex, false,
},
{
lastterm, lastindex, lastindex + 1, nil,
lastindex, true, lastindex, false, // do not increase commit higher than lastnewi
},
{
lastterm, lastindex, lastindex - 1, nil,
lastindex, true, lastindex - 1, false, // commit up to the commit in the message
},
{
lastterm, lastindex, 0, nil,
lastindex, true, commit, false, // commit do not decrease
},
{
0, 0, lastindex, nil,
0, true, commit, false, // commit do not decrease
},
{
lastterm, lastindex, lastindex, []pb.Entry{{Index: lastindex + 1, Term: 4}},
lastindex + 1, true, lastindex, false,
},
{
lastterm, lastindex, lastindex + 1, []pb.Entry{{Index: lastindex + 1, Term: 4}},
lastindex + 1, true, lastindex + 1, false,
},
{
lastterm, lastindex, lastindex + 2, []pb.Entry{{Index: lastindex + 1, Term: 4}},
lastindex + 1, true, lastindex + 1, false, // do not increase commit higher than lastnewi
},
{
lastterm, lastindex, lastindex + 2, []pb.Entry{{Index: lastindex + 1, Term: 4}, {Index: lastindex + 2, Term: 4}},
lastindex + 2, true, lastindex + 2, false,
},
// match with the the entry in the middle
{
lastterm - 1, lastindex - 1, lastindex, []pb.Entry{{Index: lastindex, Term: 4}},
lastindex, true, lastindex, false,
},
{
lastterm - 2, lastindex - 2, lastindex, []pb.Entry{{Index: lastindex - 1, Term: 4}},
lastindex - 1, true, lastindex - 1, false,
},
{
lastterm - 3, lastindex - 3, lastindex, []pb.Entry{{Index: lastindex - 2, Term: 4}},
lastindex - 2, true, lastindex - 2, true, // conflict with existing committed entry
},
{
lastterm - 2, lastindex - 2, lastindex, []pb.Entry{{Index: lastindex - 1, Term: 4}, {Index: lastindex, Term: 4}},
lastindex, true, lastindex, false,
},
}
for i, tt := range tests {
raftLog := newLog(NewMemoryStorage(), raftLogger)
raftLog.append(previousEnts...)
raftLog.committed = commit
func() {
defer func() {
if r := recover(); r != nil {
if !tt.wpanic {
t.Errorf("%d: panic = %v, want %v", i, true, tt.wpanic)
}
}
}()
glasti, gappend := raftLog.maybeAppend(tt.index, tt.logTerm, tt.committed, tt.ents...)
gcommit := raftLog.committed
if glasti != tt.wlasti {
t.Errorf("#%d: lastindex = %d, want %d", i, glasti, tt.wlasti)
}
if gappend != tt.wappend {
t.Errorf("#%d: append = %v, want %v", i, gappend, tt.wappend)
}
if gcommit != tt.wcommit {
t.Errorf("#%d: committed = %d, want %d", i, gcommit, tt.wcommit)
}
if gappend && len(tt.ents) != 0 {
gents, err := raftLog.slice(raftLog.lastIndex()-uint64(len(tt.ents))+1, raftLog.lastIndex()+1, noLimit)
if err != nil {
t.Fatalf("unexpected error %v", err)
}
if !reflect.DeepEqual(tt.ents, gents) {
t.Errorf("%d: appended entries = %v, want %v", i, gents, tt.ents)
}
}
}()
}
}
// TestCompactionSideEffects ensures that all the log related functionality works correctly after
// a compaction.
func TestCompactionSideEffects(t *testing.T) {
var i uint64
// Populate the log with 1000 entries; 750 in stable storage and 250 in unstable.
lastIndex := uint64(1000)
unstableIndex := uint64(750)
lastTerm := lastIndex
storage := NewMemoryStorage()
for i = 1; i <= unstableIndex; i++ {
storage.Append([]pb.Entry{{Term: uint64(i), Index: uint64(i)}})
}
raftLog := newLog(storage, raftLogger)
for i = unstableIndex; i < lastIndex; i++ {
raftLog.append(pb.Entry{Term: uint64(i + 1), Index: uint64(i + 1)})
}
ok := raftLog.maybeCommit(lastIndex, lastTerm)
if !ok {
t.Fatalf("maybeCommit returned false")
}
raftLog.appliedTo(raftLog.committed)
offset := uint64(500)
storage.Compact(offset)
if raftLog.lastIndex() != lastIndex {
t.Errorf("lastIndex = %d, want %d", raftLog.lastIndex(), lastIndex)
}
for j := offset; j <= raftLog.lastIndex(); j++ {
if mustTerm(raftLog.term(j)) != j {
t.Errorf("term(%d) = %d, want %d", j, mustTerm(raftLog.term(j)), j)
}
}
for j := offset; j <= raftLog.lastIndex(); j++ {
if !raftLog.matchTerm(j, j) {
t.Errorf("matchTerm(%d) = false, want true", j)
}
}
unstableEnts := raftLog.unstableEntries()
if g := len(unstableEnts); g != 250 {
t.Errorf("len(unstableEntries) = %d, want = %d", g, 250)
}
if unstableEnts[0].Index != 751 {
t.Errorf("Index = %d, want = %d", unstableEnts[0].Index, 751)
}
prev := raftLog.lastIndex()
raftLog.append(pb.Entry{Index: raftLog.lastIndex() + 1, Term: raftLog.lastIndex() + 1})
if raftLog.lastIndex() != prev+1 {
t.Errorf("lastIndex = %d, want = %d", raftLog.lastIndex(), prev+1)
}
ents, err := raftLog.entries(raftLog.lastIndex(), noLimit)
if err != nil {
t.Fatalf("unexpected error %v", err)
}
if len(ents) != 1 {
t.Errorf("len(entries) = %d, want = %d", len(ents), 1)
}
}
func TestHasNextEnts(t *testing.T) {
snap := pb.Snapshot{
Metadata: pb.SnapshotMetadata{Term: 1, Index: 3},
}
ents := []pb.Entry{
{Term: 1, Index: 4},
{Term: 1, Index: 5},
{Term: 1, Index: 6},
}
tests := []struct {
applied uint64
hasNext bool
}{
{0, true},
{3, true},
{4, true},
{5, false},
}
for i, tt := range tests {
storage := NewMemoryStorage()
storage.ApplySnapshot(snap)
raftLog := newLog(storage, raftLogger)
raftLog.append(ents...)
raftLog.maybeCommit(5, 1)
raftLog.appliedTo(tt.applied)
hasNext := raftLog.hasNextEnts()
if hasNext != tt.hasNext {
t.Errorf("#%d: hasNext = %v, want %v", i, hasNext, tt.hasNext)
}
}
}
func TestNextEnts(t *testing.T) {
snap := pb.Snapshot{
Metadata: pb.SnapshotMetadata{Term: 1, Index: 3},
}
ents := []pb.Entry{
{Term: 1, Index: 4},
{Term: 1, Index: 5},
{Term: 1, Index: 6},
}
tests := []struct {
applied uint64
wents []pb.Entry
}{
{0, ents[:2]},
{3, ents[:2]},
{4, ents[1:2]},
{5, nil},
}
for i, tt := range tests {
storage := NewMemoryStorage()
storage.ApplySnapshot(snap)
raftLog := newLog(storage, raftLogger)
raftLog.append(ents...)
raftLog.maybeCommit(5, 1)
raftLog.appliedTo(tt.applied)
nents := raftLog.nextEnts()
if !reflect.DeepEqual(nents, tt.wents) {
t.Errorf("#%d: nents = %+v, want %+v", i, nents, tt.wents)
}
}
}
// TestUnstableEnts ensures unstableEntries returns the unstable part of the
// entries correctly.
func TestUnstableEnts(t *testing.T) {
previousEnts := []pb.Entry{{Term: 1, Index: 1}, {Term: 2, Index: 2}}
tests := []struct {
unstable uint64
wents []pb.Entry
}{
{3, nil},
{1, previousEnts},
}
for i, tt := range tests {
// append stable entries to storage
storage := NewMemoryStorage()
storage.Append(previousEnts[:tt.unstable-1])
// append unstable entries to raftlog
raftLog := newLog(storage, raftLogger)
raftLog.append(previousEnts[tt.unstable-1:]...)
ents := raftLog.unstableEntries()
if l := len(ents); l > 0 {
raftLog.stableTo(ents[l-1].Index, ents[l-i].Term)
}
if !reflect.DeepEqual(ents, tt.wents) {
t.Errorf("#%d: unstableEnts = %+v, want %+v", i, ents, tt.wents)
}
w := previousEnts[len(previousEnts)-1].Index + 1
if g := raftLog.unstable.offset; g != w {
t.Errorf("#%d: unstable = %d, want %d", i, g, w)
}
}
}
func TestCommitTo(t *testing.T) {
previousEnts := []pb.Entry{{Term: 1, Index: 1}, {Term: 2, Index: 2}, {Term: 3, Index: 3}}
commit := uint64(2)
tests := []struct {
commit uint64
wcommit uint64
wpanic bool
}{
{3, 3, false},
{1, 2, false}, // never decrease
{4, 0, true}, // commit out of range -> panic
}
for i, tt := range tests {
func() {
defer func() {
if r := recover(); r != nil {
if !tt.wpanic {
t.Errorf("%d: panic = %v, want %v", i, true, tt.wpanic)
}
}
}()
raftLog := newLog(NewMemoryStorage(), raftLogger)
raftLog.append(previousEnts...)
raftLog.committed = commit
raftLog.commitTo(tt.commit)
if raftLog.committed != tt.wcommit {
t.Errorf("#%d: committed = %d, want %d", i, raftLog.committed, tt.wcommit)
}
}()
}
}
func TestStableTo(t *testing.T) {
tests := []struct {
stablei uint64
stablet uint64
wunstable uint64
}{
{1, 1, 2},
{2, 2, 3},
{2, 1, 1}, // bad term
{3, 1, 1}, // bad index
}
for i, tt := range tests {
raftLog := newLog(NewMemoryStorage(), raftLogger)
raftLog.append([]pb.Entry{{Index: 1, Term: 1}, {Index: 2, Term: 2}}...)
raftLog.stableTo(tt.stablei, tt.stablet)
if raftLog.unstable.offset != tt.wunstable {
t.Errorf("#%d: unstable = %d, want %d", i, raftLog.unstable.offset, tt.wunstable)
}
}
}
func TestStableToWithSnap(t *testing.T) {
snapi, snapt := uint64(5), uint64(2)
tests := []struct {
stablei uint64
stablet uint64
newEnts []pb.Entry
wunstable uint64
}{
{snapi + 1, snapt, nil, snapi + 1},
{snapi, snapt, nil, snapi + 1},
{snapi - 1, snapt, nil, snapi + 1},
{snapi + 1, snapt + 1, nil, snapi + 1},
{snapi, snapt + 1, nil, snapi + 1},
{snapi - 1, snapt + 1, nil, snapi + 1},
{snapi + 1, snapt, []pb.Entry{{Index: snapi + 1, Term: snapt}}, snapi + 2},
{snapi, snapt, []pb.Entry{{Index: snapi + 1, Term: snapt}}, snapi + 1},
{snapi - 1, snapt, []pb.Entry{{Index: snapi + 1, Term: snapt}}, snapi + 1},
{snapi + 1, snapt + 1, []pb.Entry{{Index: snapi + 1, Term: snapt}}, snapi + 1},
{snapi, snapt + 1, []pb.Entry{{Index: snapi + 1, Term: snapt}}, snapi + 1},
{snapi - 1, snapt + 1, []pb.Entry{{Index: snapi + 1, Term: snapt}}, snapi + 1},
}
for i, tt := range tests {
s := NewMemoryStorage()
s.ApplySnapshot(pb.Snapshot{Metadata: pb.SnapshotMetadata{Index: snapi, Term: snapt}})
raftLog := newLog(s, raftLogger)
raftLog.append(tt.newEnts...)
raftLog.stableTo(tt.stablei, tt.stablet)
if raftLog.unstable.offset != tt.wunstable {
t.Errorf("#%d: unstable = %d, want %d", i, raftLog.unstable.offset, tt.wunstable)
}
}
}
//TestCompaction ensures that the number of log entries is correct after compactions.
func TestCompaction(t *testing.T) {
tests := []struct {
lastIndex uint64
compact []uint64
wleft []int
wallow bool
}{
// out of upper bound
{1000, []uint64{1001}, []int{-1}, false},
{1000, []uint64{300, 500, 800, 900}, []int{700, 500, 200, 100}, true},
// out of lower bound
{1000, []uint64{300, 299}, []int{700, -1}, false},
}
for i, tt := range tests {
func() {
defer func() {
if r := recover(); r != nil {
if tt.wallow {
t.Errorf("%d: allow = %v, want %v: %v", i, false, true, r)
}
}
}()
storage := NewMemoryStorage()
for i := uint64(1); i <= tt.lastIndex; i++ {
storage.Append([]pb.Entry{{Index: i}})
}
raftLog := newLog(storage, raftLogger)
raftLog.maybeCommit(tt.lastIndex, 0)
raftLog.appliedTo(raftLog.committed)
for j := 0; j < len(tt.compact); j++ {
err := storage.Compact(tt.compact[j])
if err != nil {
if tt.wallow {
t.Errorf("#%d.%d allow = %t, want %t", i, j, false, tt.wallow)
}
continue
}
if len(raftLog.allEntries()) != tt.wleft[j] {
t.Errorf("#%d.%d len = %d, want %d", i, j, len(raftLog.allEntries()), tt.wleft[j])
}
}
}()
}
}
func TestLogRestore(t *testing.T) {
index := uint64(1000)
term := uint64(1000)
snap := pb.SnapshotMetadata{Index: index, Term: term}
storage := NewMemoryStorage()
storage.ApplySnapshot(pb.Snapshot{Metadata: snap})
raftLog := newLog(storage, raftLogger)
if len(raftLog.allEntries()) != 0 {
t.Errorf("len = %d, want 0", len(raftLog.allEntries()))
}
if raftLog.firstIndex() != index+1 {
t.Errorf("firstIndex = %d, want %d", raftLog.firstIndex(), index+1)
}
if raftLog.committed != index {
t.Errorf("committed = %d, want %d", raftLog.committed, index)
}
if raftLog.unstable.offset != index+1 {
t.Errorf("unstable = %d, want %d", raftLog.unstable.offset, index+1)
}
if mustTerm(raftLog.term(index)) != term {
t.Errorf("term = %d, want %d", mustTerm(raftLog.term(index)), term)
}
}
func TestIsOutOfBounds(t *testing.T) {
offset := uint64(100)
num := uint64(100)
storage := NewMemoryStorage()
storage.ApplySnapshot(pb.Snapshot{Metadata: pb.SnapshotMetadata{Index: offset}})
l := newLog(storage, raftLogger)
for i := uint64(1); i <= num; i++ {
l.append(pb.Entry{Index: i + offset})
}
first := offset + 1
tests := []struct {
lo, hi uint64
wpanic bool
wErrCompacted bool
}{
{
first - 2, first + 1,
false,
true,
},
{
first - 1, first + 1,
false,
true,
},
{
first, first,
false,
false,
},
{
first + num/2, first + num/2,
false,
false,
},
{
first + num - 1, first + num - 1,
false,
false,
},
{
first + num, first + num,
false,
false,
},
{
first + num, first + num + 1,
true,
false,
},
{
first + num + 1, first + num + 1,
true,
false,
},
}
for i, tt := range tests {
func() {
defer func() {
if r := recover(); r != nil {
if !tt.wpanic {
t.Errorf("%d: panic = %v, want %v: %v", i, true, false, r)
}
}
}()
err := l.mustCheckOutOfBounds(tt.lo, tt.hi)
if tt.wpanic {
t.Errorf("%d: panic = %v, want %v", i, false, true)
}
if tt.wErrCompacted && err != ErrCompacted {
t.Errorf("%d: err = %v, want %v", i, err, ErrCompacted)
}
if !tt.wErrCompacted && err != nil {
t.Errorf("%d: unexpected err %v", i, err)
}
}()
}
}
func TestTerm(t *testing.T) {
var i uint64
offset := uint64(100)
num := uint64(100)
storage := NewMemoryStorage()
storage.ApplySnapshot(pb.Snapshot{Metadata: pb.SnapshotMetadata{Index: offset, Term: 1}})
l := newLog(storage, raftLogger)
for i = 1; i < num; i++ {
l.append(pb.Entry{Index: offset + i, Term: i})
}
tests := []struct {
index uint64
w uint64
}{
{offset - 1, 0},
{offset, 1},
{offset + num/2, num / 2},
{offset + num - 1, num - 1},
{offset + num, 0},
}
for j, tt := range tests {
term := mustTerm(l.term(tt.index))
if term != tt.w {
t.Errorf("#%d: at = %d, want %d", j, term, tt.w)
}
}
}
func TestTermWithUnstableSnapshot(t *testing.T) {
storagesnapi := uint64(100)
unstablesnapi := storagesnapi + 5
storage := NewMemoryStorage()
storage.ApplySnapshot(pb.Snapshot{Metadata: pb.SnapshotMetadata{Index: storagesnapi, Term: 1}})
l := newLog(storage, raftLogger)
l.restore(pb.Snapshot{Metadata: pb.SnapshotMetadata{Index: unstablesnapi, Term: 1}})
tests := []struct {
index uint64
w uint64
}{
// cannot get term from storage
{storagesnapi, 0},
// cannot get term from the gap between storage ents and unstable snapshot
{storagesnapi + 1, 0},
{unstablesnapi - 1, 0},
// get term from unstable snapshot index
{unstablesnapi, 1},
}
for i, tt := range tests {
term := mustTerm(l.term(tt.index))
if term != tt.w {
t.Errorf("#%d: at = %d, want %d", i, term, tt.w)
}
}
}
func TestSlice(t *testing.T) {
var i uint64
offset := uint64(100)
num := uint64(100)
last := offset + num
half := offset + num/2
halfe := pb.Entry{Index: half, Term: half}
storage := NewMemoryStorage()
storage.ApplySnapshot(pb.Snapshot{Metadata: pb.SnapshotMetadata{Index: offset}})
for i = 1; i < num/2; i++ {
storage.Append([]pb.Entry{{Index: offset + i, Term: offset + i}})
}
l := newLog(storage, raftLogger)
for i = num / 2; i < num; i++ {
l.append(pb.Entry{Index: offset + i, Term: offset + i})
}
tests := []struct {
from uint64
to uint64
limit uint64
w []pb.Entry
wpanic bool
}{
// test no limit
{offset - 1, offset + 1, noLimit, nil, false},
{offset, offset + 1, noLimit, nil, false},
{half - 1, half + 1, noLimit, []pb.Entry{{Index: half - 1, Term: half - 1}, {Index: half, Term: half}}, false},
{half, half + 1, noLimit, []pb.Entry{{Index: half, Term: half}}, false},
{last - 1, last, noLimit, []pb.Entry{{Index: last - 1, Term: last - 1}}, false},
{last, last + 1, noLimit, nil, true},
// test limit
{half - 1, half + 1, 0, []pb.Entry{{Index: half - 1, Term: half - 1}}, false},
{half - 1, half + 1, uint64(halfe.Size() + 1), []pb.Entry{{Index: half - 1, Term: half - 1}}, false},
{half - 2, half + 1, uint64(halfe.Size() + 1), []pb.Entry{{Index: half - 2, Term: half - 2}}, false},
{half - 1, half + 1, uint64(halfe.Size() * 2), []pb.Entry{{Index: half - 1, Term: half - 1}, {Index: half, Term: half}}, false},
{half - 1, half + 2, uint64(halfe.Size() * 3), []pb.Entry{{Index: half - 1, Term: half - 1}, {Index: half, Term: half}, {Index: half + 1, Term: half + 1}}, false},
{half, half + 2, uint64(halfe.Size()), []pb.Entry{{Index: half, Term: half}}, false},
{half, half + 2, uint64(halfe.Size() * 2), []pb.Entry{{Index: half, Term: half}, {Index: half + 1, Term: half + 1}}, false},
}
for j, tt := range tests {
func() {
defer func() {
if r := recover(); r != nil {
if !tt.wpanic {
t.Errorf("%d: panic = %v, want %v: %v", j, true, false, r)
}
}
}()
g, err := l.slice(tt.from, tt.to, tt.limit)
if tt.from <= offset && err != ErrCompacted {
t.Fatalf("#%d: err = %v, want %v", j, err, ErrCompacted)
}
if tt.from > offset && err != nil {
t.Fatalf("#%d: unexpected error %v", j, err)
}
if !reflect.DeepEqual(g, tt.w) {
t.Errorf("#%d: from %d to %d = %v, want %v", j, tt.from, tt.to, g, tt.w)
}
}()
}
}
func mustTerm(term uint64, err error) uint64 {
if err != nil {
panic(err)
}
return term
}

139
vendor/github.com/coreos/etcd/raft/log_unstable.go generated vendored Normal file
View file

@ -0,0 +1,139 @@
// Copyright 2015 The etcd Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package raft
import pb "github.com/coreos/etcd/raft/raftpb"
// unstable.entries[i] has raft log position i+unstable.offset.
// Note that unstable.offset may be less than the highest log
// position in storage; this means that the next write to storage
// might need to truncate the log before persisting unstable.entries.
type unstable struct {
// the incoming unstable snapshot, if any.
snapshot *pb.Snapshot
// all entries that have not yet been written to storage.
entries []pb.Entry
offset uint64
logger Logger
}
// maybeFirstIndex returns the index of the first possible entry in entries
// if it has a snapshot.
func (u *unstable) maybeFirstIndex() (uint64, bool) {
if u.snapshot != nil {
return u.snapshot.Metadata.Index + 1, true
}
return 0, false
}
// maybeLastIndex returns the last index if it has at least one
// unstable entry or snapshot.
func (u *unstable) maybeLastIndex() (uint64, bool) {
if l := len(u.entries); l != 0 {
return u.offset + uint64(l) - 1, true
}
if u.snapshot != nil {
return u.snapshot.Metadata.Index, true
}
return 0, false
}
// maybeTerm returns the term of the entry at index i, if there
// is any.
func (u *unstable) maybeTerm(i uint64) (uint64, bool) {
if i < u.offset {
if u.snapshot == nil {
return 0, false
}
if u.snapshot.Metadata.Index == i {
return u.snapshot.Metadata.Term, true
}
return 0, false
}
last, ok := u.maybeLastIndex()
if !ok {
return 0, false
}
if i > last {
return 0, false
}
return u.entries[i-u.offset].Term, true
}
func (u *unstable) stableTo(i, t uint64) {
gt, ok := u.maybeTerm(i)
if !ok {
return
}
// if i < offset, term is matched with the snapshot
// only update the unstable entries if term is matched with
// an unstable entry.
if gt == t && i >= u.offset {
u.entries = u.entries[i+1-u.offset:]
u.offset = i + 1
}
}
func (u *unstable) stableSnapTo(i uint64) {
if u.snapshot != nil && u.snapshot.Metadata.Index == i {
u.snapshot = nil
}
}
func (u *unstable) restore(s pb.Snapshot) {
u.offset = s.Metadata.Index + 1
u.entries = nil
u.snapshot = &s
}
func (u *unstable) truncateAndAppend(ents []pb.Entry) {
after := ents[0].Index
switch {
case after == u.offset+uint64(len(u.entries)):
// after is the next index in the u.entries
// directly append
u.entries = append(u.entries, ents...)
case after <= u.offset:
u.logger.Infof("replace the unstable entries from index %d", after)
// The log is being truncated to before our current offset
// portion, so set the offset and replace the entries
u.offset = after
u.entries = ents
default:
// truncate to after and copy to u.entries
// then append
u.logger.Infof("truncate the unstable entries before index %d", after)
u.entries = append([]pb.Entry{}, u.slice(u.offset, after)...)
u.entries = append(u.entries, ents...)
}
}
func (u *unstable) slice(lo uint64, hi uint64) []pb.Entry {
u.mustCheckOutOfBounds(lo, hi)
return u.entries[lo-u.offset : hi-u.offset]
}
// u.offset <= lo <= hi <= u.offset+len(u.offset)
func (u *unstable) mustCheckOutOfBounds(lo, hi uint64) {
if lo > hi {
u.logger.Panicf("invalid unstable.slice %d > %d", lo, hi)
}
upper := u.offset + uint64(len(u.entries))
if lo < u.offset || hi > upper {
u.logger.Panicf("unstable.slice[%d,%d) out of bound [%d,%d]", lo, hi, u.offset, upper)
}
}

359
vendor/github.com/coreos/etcd/raft/log_unstable_test.go generated vendored Normal file
View file

@ -0,0 +1,359 @@
// Copyright 2015 The etcd Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package raft
import (
"reflect"
"testing"
pb "github.com/coreos/etcd/raft/raftpb"
)
func TestUnstableMaybeFirstIndex(t *testing.T) {
tests := []struct {
entries []pb.Entry
offset uint64
snap *pb.Snapshot
wok bool
windex uint64
}{
// no snapshot
{
[]pb.Entry{{Index: 5, Term: 1}}, 5, nil,
false, 0,
},
{
[]pb.Entry{}, 0, nil,
false, 0,
},
// has snapshot
{
[]pb.Entry{{Index: 5, Term: 1}}, 5, &pb.Snapshot{Metadata: pb.SnapshotMetadata{Index: 4, Term: 1}},
true, 5,
},
{
[]pb.Entry{}, 5, &pb.Snapshot{Metadata: pb.SnapshotMetadata{Index: 4, Term: 1}},
true, 5,
},
}
for i, tt := range tests {
u := unstable{
entries: tt.entries,
offset: tt.offset,
snapshot: tt.snap,
logger: raftLogger,
}
index, ok := u.maybeFirstIndex()
if ok != tt.wok {
t.Errorf("#%d: ok = %t, want %t", i, ok, tt.wok)
}
if index != tt.windex {
t.Errorf("#%d: index = %d, want %d", i, index, tt.windex)
}
}
}
func TestMaybeLastIndex(t *testing.T) {
tests := []struct {
entries []pb.Entry
offset uint64
snap *pb.Snapshot
wok bool
windex uint64
}{
// last in entries
{
[]pb.Entry{{Index: 5, Term: 1}}, 5, nil,
true, 5,
},
{
[]pb.Entry{{Index: 5, Term: 1}}, 5, &pb.Snapshot{Metadata: pb.SnapshotMetadata{Index: 4, Term: 1}},
true, 5,
},
// last in snapshot
{
[]pb.Entry{}, 5, &pb.Snapshot{Metadata: pb.SnapshotMetadata{Index: 4, Term: 1}},
true, 4,
},
// empty unstable
{
[]pb.Entry{}, 0, nil,
false, 0,
},
}
for i, tt := range tests {
u := unstable{
entries: tt.entries,
offset: tt.offset,
snapshot: tt.snap,
logger: raftLogger,
}
index, ok := u.maybeLastIndex()
if ok != tt.wok {
t.Errorf("#%d: ok = %t, want %t", i, ok, tt.wok)
}
if index != tt.windex {
t.Errorf("#%d: index = %d, want %d", i, index, tt.windex)
}
}
}
func TestUnstableMaybeTerm(t *testing.T) {
tests := []struct {
entries []pb.Entry
offset uint64
snap *pb.Snapshot
index uint64
wok bool
wterm uint64
}{
// term from entries
{
[]pb.Entry{{Index: 5, Term: 1}}, 5, nil,
5,
true, 1,
},
{
[]pb.Entry{{Index: 5, Term: 1}}, 5, nil,
6,
false, 0,
},
{
[]pb.Entry{{Index: 5, Term: 1}}, 5, nil,
4,
false, 0,
},
{
[]pb.Entry{{Index: 5, Term: 1}}, 5, &pb.Snapshot{Metadata: pb.SnapshotMetadata{Index: 4, Term: 1}},
5,
true, 1,
},
{
[]pb.Entry{{Index: 5, Term: 1}}, 5, &pb.Snapshot{Metadata: pb.SnapshotMetadata{Index: 4, Term: 1}},
6,
false, 0,
},
// term from snapshot
{
[]pb.Entry{{Index: 5, Term: 1}}, 5, &pb.Snapshot{Metadata: pb.SnapshotMetadata{Index: 4, Term: 1}},
4,
true, 1,
},
{
[]pb.Entry{{Index: 5, Term: 1}}, 5, &pb.Snapshot{Metadata: pb.SnapshotMetadata{Index: 4, Term: 1}},
3,
false, 0,
},
{
[]pb.Entry{}, 5, &pb.Snapshot{Metadata: pb.SnapshotMetadata{Index: 4, Term: 1}},
5,
false, 0,
},
{
[]pb.Entry{}, 5, &pb.Snapshot{Metadata: pb.SnapshotMetadata{Index: 4, Term: 1}},
4,
true, 1,
},
{
[]pb.Entry{}, 0, nil,
5,
false, 0,
},
}
for i, tt := range tests {
u := unstable{
entries: tt.entries,
offset: tt.offset,
snapshot: tt.snap,
logger: raftLogger,
}
term, ok := u.maybeTerm(tt.index)
if ok != tt.wok {
t.Errorf("#%d: ok = %t, want %t", i, ok, tt.wok)
}
if term != tt.wterm {
t.Errorf("#%d: term = %d, want %d", i, term, tt.wterm)
}
}
}
func TestUnstableRestore(t *testing.T) {
u := unstable{
entries: []pb.Entry{{Index: 5, Term: 1}},
offset: 5,
snapshot: &pb.Snapshot{Metadata: pb.SnapshotMetadata{Index: 4, Term: 1}},
logger: raftLogger,
}
s := pb.Snapshot{Metadata: pb.SnapshotMetadata{Index: 6, Term: 2}}
u.restore(s)
if u.offset != s.Metadata.Index+1 {
t.Errorf("offset = %d, want %d", u.offset, s.Metadata.Index+1)
}
if len(u.entries) != 0 {
t.Errorf("len = %d, want 0", len(u.entries))
}
if !reflect.DeepEqual(u.snapshot, &s) {
t.Errorf("snap = %v, want %v", u.snapshot, &s)
}
}
func TestUnstableStableTo(t *testing.T) {
tests := []struct {
entries []pb.Entry
offset uint64
snap *pb.Snapshot
index, term uint64
woffset uint64
wlen int
}{
{
[]pb.Entry{}, 0, nil,
5, 1,
0, 0,
},
{
[]pb.Entry{{Index: 5, Term: 1}}, 5, nil,
5, 1, // stable to the first entry
6, 0,
},
{
[]pb.Entry{{Index: 5, Term: 1}, {Index: 6, Term: 1}}, 5, nil,
5, 1, // stable to the first entry
6, 1,
},
{
[]pb.Entry{{Index: 6, Term: 2}}, 6, nil,
6, 1, // stable to the first entry and term mismatch
6, 1,
},
{
[]pb.Entry{{Index: 5, Term: 1}}, 5, nil,
4, 1, // stable to old entry
5, 1,
},
{
[]pb.Entry{{Index: 5, Term: 1}}, 5, nil,
4, 2, // stable to old entry
5, 1,
},
// with snapshot
{
[]pb.Entry{{Index: 5, Term: 1}}, 5, &pb.Snapshot{Metadata: pb.SnapshotMetadata{Index: 4, Term: 1}},
5, 1, // stable to the first entry
6, 0,
},
{
[]pb.Entry{{Index: 5, Term: 1}, {Index: 6, Term: 1}}, 5, &pb.Snapshot{Metadata: pb.SnapshotMetadata{Index: 4, Term: 1}},
5, 1, // stable to the first entry
6, 1,
},
{
[]pb.Entry{{Index: 6, Term: 2}}, 6, &pb.Snapshot{Metadata: pb.SnapshotMetadata{Index: 5, Term: 1}},
6, 1, // stable to the first entry and term mismatch
6, 1,
},
{
[]pb.Entry{{Index: 5, Term: 1}}, 5, &pb.Snapshot{Metadata: pb.SnapshotMetadata{Index: 4, Term: 1}},
4, 1, // stable to snapshot
5, 1,
},
{
[]pb.Entry{{Index: 5, Term: 2}}, 5, &pb.Snapshot{Metadata: pb.SnapshotMetadata{Index: 4, Term: 2}},
4, 1, // stable to old entry
5, 1,
},
}
for i, tt := range tests {
u := unstable{
entries: tt.entries,
offset: tt.offset,
snapshot: tt.snap,
logger: raftLogger,
}
u.stableTo(tt.index, tt.term)
if u.offset != tt.woffset {
t.Errorf("#%d: offset = %d, want %d", i, u.offset, tt.woffset)
}
if len(u.entries) != tt.wlen {
t.Errorf("#%d: len = %d, want %d", i, len(u.entries), tt.wlen)
}
}
}
func TestUnstableTruncateAndAppend(t *testing.T) {
tests := []struct {
entries []pb.Entry
offset uint64
snap *pb.Snapshot
toappend []pb.Entry
woffset uint64
wentries []pb.Entry
}{
// append to the end
{
[]pb.Entry{{Index: 5, Term: 1}}, 5, nil,
[]pb.Entry{{Index: 6, Term: 1}, {Index: 7, Term: 1}},
5, []pb.Entry{{Index: 5, Term: 1}, {Index: 6, Term: 1}, {Index: 7, Term: 1}},
},
// replace the unstable entries
{
[]pb.Entry{{Index: 5, Term: 1}}, 5, nil,
[]pb.Entry{{Index: 5, Term: 2}, {Index: 6, Term: 2}},
5, []pb.Entry{{Index: 5, Term: 2}, {Index: 6, Term: 2}},
},
{
[]pb.Entry{{Index: 5, Term: 1}}, 5, nil,
[]pb.Entry{{Index: 4, Term: 2}, {Index: 5, Term: 2}, {Index: 6, Term: 2}},
4, []pb.Entry{{Index: 4, Term: 2}, {Index: 5, Term: 2}, {Index: 6, Term: 2}},
},
// truncate the existing entries and append
{
[]pb.Entry{{Index: 5, Term: 1}, {Index: 6, Term: 1}, {Index: 7, Term: 1}}, 5, nil,
[]pb.Entry{{Index: 6, Term: 2}},
5, []pb.Entry{{Index: 5, Term: 1}, {Index: 6, Term: 2}},
},
{
[]pb.Entry{{Index: 5, Term: 1}, {Index: 6, Term: 1}, {Index: 7, Term: 1}}, 5, nil,
[]pb.Entry{{Index: 7, Term: 2}, {Index: 8, Term: 2}},
5, []pb.Entry{{Index: 5, Term: 1}, {Index: 6, Term: 1}, {Index: 7, Term: 2}, {Index: 8, Term: 2}},
},
}
for i, tt := range tests {
u := unstable{
entries: tt.entries,
offset: tt.offset,
snapshot: tt.snap,
logger: raftLogger,
}
u.truncateAndAppend(tt.toappend)
if u.offset != tt.woffset {
t.Errorf("#%d: offset = %d, want %d", i, u.offset, tt.woffset)
}
if !reflect.DeepEqual(u.entries, tt.wentries) {
t.Errorf("#%d: entries = %v, want %v", i, u.entries, tt.wentries)
}
}
}

126
vendor/github.com/coreos/etcd/raft/logger.go generated vendored Normal file
View file

@ -0,0 +1,126 @@
// Copyright 2015 The etcd Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package raft
import (
"fmt"
"io/ioutil"
"log"
"os"
)
type Logger interface {
Debug(v ...interface{})
Debugf(format string, v ...interface{})
Error(v ...interface{})
Errorf(format string, v ...interface{})
Info(v ...interface{})
Infof(format string, v ...interface{})
Warning(v ...interface{})
Warningf(format string, v ...interface{})
Fatal(v ...interface{})
Fatalf(format string, v ...interface{})
Panic(v ...interface{})
Panicf(format string, v ...interface{})
}
func SetLogger(l Logger) { raftLogger = l }
var (
defaultLogger = &DefaultLogger{Logger: log.New(os.Stderr, "raft", log.LstdFlags)}
discardLogger = &DefaultLogger{Logger: log.New(ioutil.Discard, "", 0)}
raftLogger = Logger(defaultLogger)
)
const (
calldepth = 2
)
// DefaultLogger is a default implementation of the Logger interface.
type DefaultLogger struct {
*log.Logger
debug bool
}
func (l *DefaultLogger) EnableTimestamps() {
l.SetFlags(l.Flags() | log.Ldate | log.Ltime)
}
func (l *DefaultLogger) EnableDebug() {
l.debug = true
}
func (l *DefaultLogger) Debug(v ...interface{}) {
if l.debug {
l.Output(calldepth, header("DEBUG", fmt.Sprint(v...)))
}
}
func (l *DefaultLogger) Debugf(format string, v ...interface{}) {
if l.debug {
l.Output(calldepth, header("DEBUG", fmt.Sprintf(format, v...)))
}
}
func (l *DefaultLogger) Info(v ...interface{}) {
l.Output(calldepth, header("INFO", fmt.Sprint(v...)))
}
func (l *DefaultLogger) Infof(format string, v ...interface{}) {
l.Output(calldepth, header("INFO", fmt.Sprintf(format, v...)))
}
func (l *DefaultLogger) Error(v ...interface{}) {
l.Output(calldepth, header("ERROR", fmt.Sprint(v...)))
}
func (l *DefaultLogger) Errorf(format string, v ...interface{}) {
l.Output(calldepth, header("ERROR", fmt.Sprintf(format, v...)))
}
func (l *DefaultLogger) Warning(v ...interface{}) {
l.Output(calldepth, header("WARN", fmt.Sprint(v...)))
}
func (l *DefaultLogger) Warningf(format string, v ...interface{}) {
l.Output(calldepth, header("WARN", fmt.Sprintf(format, v...)))
}
func (l *DefaultLogger) Fatal(v ...interface{}) {
l.Output(calldepth, header("FATAL", fmt.Sprint(v...)))
os.Exit(1)
}
func (l *DefaultLogger) Fatalf(format string, v ...interface{}) {
l.Output(calldepth, header("FATAL", fmt.Sprintf(format, v...)))
os.Exit(1)
}
func (l *DefaultLogger) Panic(v ...interface{}) {
l.Logger.Panic(v)
}
func (l *DefaultLogger) Panicf(format string, v ...interface{}) {
l.Logger.Panicf(format, v...)
}
func header(lvl, msg string) string {
return fmt.Sprintf("%s: %s", lvl, msg)
}

521
vendor/github.com/coreos/etcd/raft/node.go generated vendored Normal file
View file

@ -0,0 +1,521 @@
// Copyright 2015 The etcd Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package raft
import (
"errors"
pb "github.com/coreos/etcd/raft/raftpb"
"golang.org/x/net/context"
)
type SnapshotStatus int
const (
SnapshotFinish SnapshotStatus = 1
SnapshotFailure SnapshotStatus = 2
)
var (
emptyState = pb.HardState{}
// ErrStopped is returned by methods on Nodes that have been stopped.
ErrStopped = errors.New("raft: stopped")
)
// SoftState provides state that is useful for logging and debugging.
// The state is volatile and does not need to be persisted to the WAL.
type SoftState struct {
Lead uint64 // must use atomic operations to access; keep 64-bit aligned.
RaftState StateType
}
func (a *SoftState) equal(b *SoftState) bool {
return a.Lead == b.Lead && a.RaftState == b.RaftState
}
// Ready encapsulates the entries and messages that are ready to read,
// be saved to stable storage, committed or sent to other peers.
// All fields in Ready are read-only.
type Ready struct {
// The current volatile state of a Node.
// SoftState will be nil if there is no update.
// It is not required to consume or store SoftState.
*SoftState
// The current state of a Node to be saved to stable storage BEFORE
// Messages are sent.
// HardState will be equal to empty state if there is no update.
pb.HardState
// ReadStates can be used for node to serve linearizable read requests locally
// when its applied index is greater than the index in ReadState.
// Note that the readState will be returned when raft receives msgReadIndex.
// The returned is only valid for the request that requested to read.
ReadStates []ReadState
// Entries specifies entries to be saved to stable storage BEFORE
// Messages are sent.
Entries []pb.Entry
// Snapshot specifies the snapshot to be saved to stable storage.
Snapshot pb.Snapshot
// CommittedEntries specifies entries to be committed to a
// store/state-machine. These have previously been committed to stable
// store.
CommittedEntries []pb.Entry
// Messages specifies outbound messages to be sent AFTER Entries are
// committed to stable storage.
// If it contains a MsgSnap message, the application MUST report back to raft
// when the snapshot has been received or has failed by calling ReportSnapshot.
Messages []pb.Message
}
func isHardStateEqual(a, b pb.HardState) bool {
return a.Term == b.Term && a.Vote == b.Vote && a.Commit == b.Commit
}
// IsEmptyHardState returns true if the given HardState is empty.
func IsEmptyHardState(st pb.HardState) bool {
return isHardStateEqual(st, emptyState)
}
// IsEmptySnap returns true if the given Snapshot is empty.
func IsEmptySnap(sp pb.Snapshot) bool {
return sp.Metadata.Index == 0
}
func (rd Ready) containsUpdates() bool {
return rd.SoftState != nil || !IsEmptyHardState(rd.HardState) ||
!IsEmptySnap(rd.Snapshot) || len(rd.Entries) > 0 ||
len(rd.CommittedEntries) > 0 || len(rd.Messages) > 0 || len(rd.ReadStates) != 0
}
// Node represents a node in a raft cluster.
type Node interface {
// Tick increments the internal logical clock for the Node by a single tick. Election
// timeouts and heartbeat timeouts are in units of ticks.
Tick()
// Campaign causes the Node to transition to candidate state and start campaigning to become leader.
Campaign(ctx context.Context) error
// Propose proposes that data be appended to the log.
Propose(ctx context.Context, data []byte) error
// ProposeConfChange proposes config change.
// At most one ConfChange can be in the process of going through consensus.
// Application needs to call ApplyConfChange when applying EntryConfChange type entry.
ProposeConfChange(ctx context.Context, cc pb.ConfChange) error
// Step advances the state machine using the given message. ctx.Err() will be returned, if any.
Step(ctx context.Context, msg pb.Message) error
// Ready returns a channel that returns the current point-in-time state.
// Users of the Node must call Advance after retrieving the state returned by Ready.
//
// NOTE: No committed entries from the next Ready may be applied until all committed entries
// and snapshots from the previous one have finished.
Ready() <-chan Ready
// Advance notifies the Node that the application has saved progress up to the last Ready.
// It prepares the node to return the next available Ready.
//
// The application should generally call Advance after it applies the entries in last Ready.
//
// However, as an optimization, the application may call Advance while it is applying the
// commands. For example. when the last Ready contains a snapshot, the application might take
// a long time to apply the snapshot data. To continue receiving Ready without blocking raft
// progress, it can call Advance before finishing applying the last ready.
Advance()
// ApplyConfChange applies config change to the local node.
// Returns an opaque ConfState protobuf which must be recorded
// in snapshots. Will never return nil; it returns a pointer only
// to match MemoryStorage.Compact.
ApplyConfChange(cc pb.ConfChange) *pb.ConfState
// TransferLeadership attempts to transfer leadership to the given transferee.
TransferLeadership(ctx context.Context, lead, transferee uint64)
// ReadIndex request a read state. The read state will be set in the ready.
// Read state has a read index. Once the application advances further than the read
// index, any linearizable read requests issued before the read request can be
// processed safely. The read state will have the same rctx attached.
ReadIndex(ctx context.Context, rctx []byte) error
// Status returns the current status of the raft state machine.
Status() Status
// ReportUnreachable reports the given node is not reachable for the last send.
ReportUnreachable(id uint64)
// ReportSnapshot reports the status of the sent snapshot.
ReportSnapshot(id uint64, status SnapshotStatus)
// Stop performs any necessary termination of the Node.
Stop()
}
type Peer struct {
ID uint64
Context []byte
}
// StartNode returns a new Node given configuration and a list of raft peers.
// It appends a ConfChangeAddNode entry for each given peer to the initial log.
func StartNode(c *Config, peers []Peer) Node {
r := newRaft(c)
// become the follower at term 1 and apply initial configuration
// entries of term 1
r.becomeFollower(1, None)
for _, peer := range peers {
cc := pb.ConfChange{Type: pb.ConfChangeAddNode, NodeID: peer.ID, Context: peer.Context}
d, err := cc.Marshal()
if err != nil {
panic("unexpected marshal error")
}
e := pb.Entry{Type: pb.EntryConfChange, Term: 1, Index: r.raftLog.lastIndex() + 1, Data: d}
r.raftLog.append(e)
}
// Mark these initial entries as committed.
// TODO(bdarnell): These entries are still unstable; do we need to preserve
// the invariant that committed < unstable?
r.raftLog.committed = r.raftLog.lastIndex()
// Now apply them, mainly so that the application can call Campaign
// immediately after StartNode in tests. Note that these nodes will
// be added to raft twice: here and when the application's Ready
// loop calls ApplyConfChange. The calls to addNode must come after
// all calls to raftLog.append so progress.next is set after these
// bootstrapping entries (it is an error if we try to append these
// entries since they have already been committed).
// We do not set raftLog.applied so the application will be able
// to observe all conf changes via Ready.CommittedEntries.
for _, peer := range peers {
r.addNode(peer.ID)
}
n := newNode()
n.logger = c.Logger
go n.run(r)
return &n
}
// RestartNode is similar to StartNode but does not take a list of peers.
// The current membership of the cluster will be restored from the Storage.
// If the caller has an existing state machine, pass in the last log index that
// has been applied to it; otherwise use zero.
func RestartNode(c *Config) Node {
r := newRaft(c)
n := newNode()
n.logger = c.Logger
go n.run(r)
return &n
}
// node is the canonical implementation of the Node interface
type node struct {
propc chan pb.Message
recvc chan pb.Message
confc chan pb.ConfChange
confstatec chan pb.ConfState
readyc chan Ready
advancec chan struct{}
tickc chan struct{}
done chan struct{}
stop chan struct{}
status chan chan Status
logger Logger
}
func newNode() node {
return node{
propc: make(chan pb.Message),
recvc: make(chan pb.Message),
confc: make(chan pb.ConfChange),
confstatec: make(chan pb.ConfState),
readyc: make(chan Ready),
advancec: make(chan struct{}),
// make tickc a buffered chan, so raft node can buffer some ticks when the node
// is busy processing raft messages. Raft node will resume process buffered
// ticks when it becomes idle.
tickc: make(chan struct{}, 128),
done: make(chan struct{}),
stop: make(chan struct{}),
status: make(chan chan Status),
}
}
func (n *node) Stop() {
select {
case n.stop <- struct{}{}:
// Not already stopped, so trigger it
case <-n.done:
// Node has already been stopped - no need to do anything
return
}
// Block until the stop has been acknowledged by run()
<-n.done
}
func (n *node) run(r *raft) {
var propc chan pb.Message
var readyc chan Ready
var advancec chan struct{}
var prevLastUnstablei, prevLastUnstablet uint64
var havePrevLastUnstablei bool
var prevSnapi uint64
var rd Ready
lead := None
prevSoftSt := r.softState()
prevHardSt := emptyState
for {
if advancec != nil {
readyc = nil
} else {
rd = newReady(r, prevSoftSt, prevHardSt)
if rd.containsUpdates() {
readyc = n.readyc
} else {
readyc = nil
}
}
if lead != r.lead {
if r.hasLeader() {
if lead == None {
r.logger.Infof("raft.node: %x elected leader %x at term %d", r.id, r.lead, r.Term)
} else {
r.logger.Infof("raft.node: %x changed leader from %x to %x at term %d", r.id, lead, r.lead, r.Term)
}
propc = n.propc
} else {
r.logger.Infof("raft.node: %x lost leader %x at term %d", r.id, lead, r.Term)
propc = nil
}
lead = r.lead
}
select {
// TODO: maybe buffer the config propose if there exists one (the way
// described in raft dissertation)
// Currently it is dropped in Step silently.
case m := <-propc:
m.From = r.id
r.Step(m)
case m := <-n.recvc:
// filter out response message from unknown From.
if _, ok := r.prs[m.From]; ok || !IsResponseMsg(m.Type) {
r.Step(m) // raft never returns an error
}
case cc := <-n.confc:
if cc.NodeID == None {
r.resetPendingConf()
select {
case n.confstatec <- pb.ConfState{Nodes: r.nodes()}:
case <-n.done:
}
break
}
switch cc.Type {
case pb.ConfChangeAddNode:
r.addNode(cc.NodeID)
case pb.ConfChangeRemoveNode:
// block incoming proposal when local node is
// removed
if cc.NodeID == r.id {
propc = nil
}
r.removeNode(cc.NodeID)
case pb.ConfChangeUpdateNode:
r.resetPendingConf()
default:
panic("unexpected conf type")
}
select {
case n.confstatec <- pb.ConfState{Nodes: r.nodes()}:
case <-n.done:
}
case <-n.tickc:
r.tick()
case readyc <- rd:
if rd.SoftState != nil {
prevSoftSt = rd.SoftState
}
if len(rd.Entries) > 0 {
prevLastUnstablei = rd.Entries[len(rd.Entries)-1].Index
prevLastUnstablet = rd.Entries[len(rd.Entries)-1].Term
havePrevLastUnstablei = true
}
if !IsEmptyHardState(rd.HardState) {
prevHardSt = rd.HardState
}
if !IsEmptySnap(rd.Snapshot) {
prevSnapi = rd.Snapshot.Metadata.Index
}
r.msgs = nil
r.readStates = nil
advancec = n.advancec
case <-advancec:
if prevHardSt.Commit != 0 {
r.raftLog.appliedTo(prevHardSt.Commit)
}
if havePrevLastUnstablei {
r.raftLog.stableTo(prevLastUnstablei, prevLastUnstablet)
havePrevLastUnstablei = false
}
r.raftLog.stableSnapTo(prevSnapi)
advancec = nil
case c := <-n.status:
c <- getStatus(r)
case <-n.stop:
close(n.done)
return
}
}
}
// Tick increments the internal logical clock for this Node. Election timeouts
// and heartbeat timeouts are in units of ticks.
func (n *node) Tick() {
select {
case n.tickc <- struct{}{}:
case <-n.done:
default:
n.logger.Warningf("A tick missed to fire. Node blocks too long!")
}
}
func (n *node) Campaign(ctx context.Context) error { return n.step(ctx, pb.Message{Type: pb.MsgHup}) }
func (n *node) Propose(ctx context.Context, data []byte) error {
return n.step(ctx, pb.Message{Type: pb.MsgProp, Entries: []pb.Entry{{Data: data}}})
}
func (n *node) Step(ctx context.Context, m pb.Message) error {
// ignore unexpected local messages receiving over network
if IsLocalMsg(m.Type) {
// TODO: return an error?
return nil
}
return n.step(ctx, m)
}
func (n *node) ProposeConfChange(ctx context.Context, cc pb.ConfChange) error {
data, err := cc.Marshal()
if err != nil {
return err
}
return n.Step(ctx, pb.Message{Type: pb.MsgProp, Entries: []pb.Entry{{Type: pb.EntryConfChange, Data: data}}})
}
// Step advances the state machine using msgs. The ctx.Err() will be returned,
// if any.
func (n *node) step(ctx context.Context, m pb.Message) error {
ch := n.recvc
if m.Type == pb.MsgProp {
ch = n.propc
}
select {
case ch <- m:
return nil
case <-ctx.Done():
return ctx.Err()
case <-n.done:
return ErrStopped
}
}
func (n *node) Ready() <-chan Ready { return n.readyc }
func (n *node) Advance() {
select {
case n.advancec <- struct{}{}:
case <-n.done:
}
}
func (n *node) ApplyConfChange(cc pb.ConfChange) *pb.ConfState {
var cs pb.ConfState
select {
case n.confc <- cc:
case <-n.done:
}
select {
case cs = <-n.confstatec:
case <-n.done:
}
return &cs
}
func (n *node) Status() Status {
c := make(chan Status)
select {
case n.status <- c:
return <-c
case <-n.done:
return Status{}
}
}
func (n *node) ReportUnreachable(id uint64) {
select {
case n.recvc <- pb.Message{Type: pb.MsgUnreachable, From: id}:
case <-n.done:
}
}
func (n *node) ReportSnapshot(id uint64, status SnapshotStatus) {
rej := status == SnapshotFailure
select {
case n.recvc <- pb.Message{Type: pb.MsgSnapStatus, From: id, Reject: rej}:
case <-n.done:
}
}
func (n *node) TransferLeadership(ctx context.Context, lead, transferee uint64) {
select {
// manually set 'from' and 'to', so that leader can voluntarily transfers its leadership
case n.recvc <- pb.Message{Type: pb.MsgTransferLeader, From: transferee, To: lead}:
case <-n.done:
case <-ctx.Done():
}
}
func (n *node) ReadIndex(ctx context.Context, rctx []byte) error {
return n.step(ctx, pb.Message{Type: pb.MsgReadIndex, Entries: []pb.Entry{{Data: rctx}}})
}
func newReady(r *raft, prevSoftSt *SoftState, prevHardSt pb.HardState) Ready {
rd := Ready{
Entries: r.raftLog.unstableEntries(),
CommittedEntries: r.raftLog.nextEnts(),
Messages: r.msgs,
}
if softSt := r.softState(); !softSt.equal(prevSoftSt) {
rd.SoftState = softSt
}
if hardSt := r.hardState(); !isHardStateEqual(hardSt, prevHardSt) {
rd.HardState = hardSt
}
if r.raftLog.unstable.snapshot != nil {
rd.Snapshot = *r.raftLog.unstable.snapshot
}
if len(r.readStates) != 0 {
rd.ReadStates = r.readStates
}
return rd
}

52
vendor/github.com/coreos/etcd/raft/node_bench_test.go generated vendored Normal file
View file

@ -0,0 +1,52 @@
// Copyright 2015 The etcd Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package raft
import (
"testing"
"time"
"golang.org/x/net/context"
)
func BenchmarkOneNode(b *testing.B) {
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
n := newNode()
s := NewMemoryStorage()
r := newTestRaft(1, []uint64{1}, 10, 1, s)
go n.run(r)
defer n.Stop()
n.Campaign(ctx)
go func() {
for i := 0; i < b.N; i++ {
n.Propose(ctx, []byte("foo"))
}
}()
for {
rd := <-n.Ready()
s.Append(rd.Entries)
// a reasonable disk sync latency
time.Sleep(1 * time.Millisecond)
n.Advance()
if rd.HardState.Commit == uint64(b.N+1) {
return
}
}
}

689
vendor/github.com/coreos/etcd/raft/node_test.go generated vendored Normal file
View file

@ -0,0 +1,689 @@
// Copyright 2015 The etcd Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package raft
import (
"bytes"
"reflect"
"testing"
"time"
"github.com/coreos/etcd/pkg/testutil"
"github.com/coreos/etcd/raft/raftpb"
"golang.org/x/net/context"
)
// TestNodeStep ensures that node.Step sends msgProp to propc chan
// and other kinds of messages to recvc chan.
func TestNodeStep(t *testing.T) {
for i, msgn := range raftpb.MessageType_name {
n := &node{
propc: make(chan raftpb.Message, 1),
recvc: make(chan raftpb.Message, 1),
}
msgt := raftpb.MessageType(i)
n.Step(context.TODO(), raftpb.Message{Type: msgt})
// Proposal goes to proc chan. Others go to recvc chan.
if msgt == raftpb.MsgProp {
select {
case <-n.propc:
default:
t.Errorf("%d: cannot receive %s on propc chan", msgt, msgn)
}
} else {
if IsLocalMsg(msgt) {
select {
case <-n.recvc:
t.Errorf("%d: step should ignore %s", msgt, msgn)
default:
}
} else {
select {
case <-n.recvc:
default:
t.Errorf("%d: cannot receive %s on recvc chan", msgt, msgn)
}
}
}
}
}
// Cancel and Stop should unblock Step()
func TestNodeStepUnblock(t *testing.T) {
// a node without buffer to block step
n := &node{
propc: make(chan raftpb.Message),
done: make(chan struct{}),
}
ctx, cancel := context.WithCancel(context.Background())
stopFunc := func() { close(n.done) }
tests := []struct {
unblock func()
werr error
}{
{stopFunc, ErrStopped},
{cancel, context.Canceled},
}
for i, tt := range tests {
errc := make(chan error, 1)
go func() {
err := n.Step(ctx, raftpb.Message{Type: raftpb.MsgProp})
errc <- err
}()
tt.unblock()
select {
case err := <-errc:
if err != tt.werr {
t.Errorf("#%d: err = %v, want %v", i, err, tt.werr)
}
//clean up side-effect
if ctx.Err() != nil {
ctx = context.TODO()
}
select {
case <-n.done:
n.done = make(chan struct{})
default:
}
case <-time.After(1 * time.Second):
t.Fatalf("#%d: failed to unblock step", i)
}
}
}
// TestNodePropose ensures that node.Propose sends the given proposal to the underlying raft.
func TestNodePropose(t *testing.T) {
msgs := []raftpb.Message{}
appendStep := func(r *raft, m raftpb.Message) {
msgs = append(msgs, m)
}
n := newNode()
s := NewMemoryStorage()
r := newTestRaft(1, []uint64{1}, 10, 1, s)
go n.run(r)
n.Campaign(context.TODO())
for {
rd := <-n.Ready()
s.Append(rd.Entries)
// change the step function to appendStep until this raft becomes leader
if rd.SoftState.Lead == r.id {
r.step = appendStep
n.Advance()
break
}
n.Advance()
}
n.Propose(context.TODO(), []byte("somedata"))
n.Stop()
if len(msgs) != 1 {
t.Fatalf("len(msgs) = %d, want %d", len(msgs), 1)
}
if msgs[0].Type != raftpb.MsgProp {
t.Errorf("msg type = %d, want %d", msgs[0].Type, raftpb.MsgProp)
}
if !bytes.Equal(msgs[0].Entries[0].Data, []byte("somedata")) {
t.Errorf("data = %v, want %v", msgs[0].Entries[0].Data, []byte("somedata"))
}
}
// TestNodeReadIndex ensures that node.ReadIndex sends the MsgReadIndex message to the underlying raft.
// It also ensures that ReadState can be read out through ready chan.
func TestNodeReadIndex(t *testing.T) {
msgs := []raftpb.Message{}
appendStep := func(r *raft, m raftpb.Message) {
msgs = append(msgs, m)
}
wrs := []ReadState{{Index: uint64(1), RequestCtx: []byte("somedata")}}
n := newNode()
s := NewMemoryStorage()
r := newTestRaft(1, []uint64{1}, 10, 1, s)
r.readStates = wrs
go n.run(r)
n.Campaign(context.TODO())
for {
rd := <-n.Ready()
if !reflect.DeepEqual(rd.ReadStates, wrs) {
t.Errorf("ReadStates = %v, want %v", rd.ReadStates, wrs)
}
s.Append(rd.Entries)
if rd.SoftState.Lead == r.id {
n.Advance()
break
}
n.Advance()
}
r.step = appendStep
wrequestCtx := []byte("somedata2")
n.ReadIndex(context.TODO(), wrequestCtx)
n.Stop()
if len(msgs) != 1 {
t.Fatalf("len(msgs) = %d, want %d", len(msgs), 1)
}
if msgs[0].Type != raftpb.MsgReadIndex {
t.Errorf("msg type = %d, want %d", msgs[0].Type, raftpb.MsgReadIndex)
}
if !bytes.Equal(msgs[0].Entries[0].Data, wrequestCtx) {
t.Errorf("data = %v, want %v", msgs[0].Entries[0].Data, wrequestCtx)
}
}
// TestNodeReadIndexToOldLeader ensures that raftpb.MsgReadIndex to old leader
// gets forwarded to the new leader and 'send' method does not attach its term.
func TestNodeReadIndexToOldLeader(t *testing.T) {
r1 := newTestRaft(1, []uint64{1, 2, 3}, 10, 1, NewMemoryStorage())
r2 := newTestRaft(2, []uint64{1, 2, 3}, 10, 1, NewMemoryStorage())
r3 := newTestRaft(3, []uint64{1, 2, 3}, 10, 1, NewMemoryStorage())
nt := newNetwork(r1, r2, r3)
// elect r1 as leader
nt.send(raftpb.Message{From: 1, To: 1, Type: raftpb.MsgHup})
var testEntries = []raftpb.Entry{{Data: []byte("testdata")}}
// send readindex request to r2(follower)
r2.Step(raftpb.Message{From: 2, To: 2, Type: raftpb.MsgReadIndex, Entries: testEntries})
// verify r2(follower) forwards this message to r1(leader) with term not set
if len(r2.msgs) != 1 {
t.Fatalf("len(r2.msgs) expected 1, got %d", len(r2.msgs))
}
readIndxMsg1 := raftpb.Message{From: 2, To: 1, Type: raftpb.MsgReadIndex, Entries: testEntries}
if !reflect.DeepEqual(r2.msgs[0], readIndxMsg1) {
t.Fatalf("r2.msgs[0] expected %+v, got %+v", readIndxMsg1, r2.msgs[0])
}
// send readindex request to r3(follower)
r3.Step(raftpb.Message{From: 3, To: 3, Type: raftpb.MsgReadIndex, Entries: testEntries})
// verify r3(follower) forwards this message to r1(leader) with term not set as well.
if len(r3.msgs) != 1 {
t.Fatalf("len(r3.msgs) expected 1, got %d", len(r3.msgs))
}
readIndxMsg2 := raftpb.Message{From: 3, To: 1, Type: raftpb.MsgReadIndex, Entries: testEntries}
if !reflect.DeepEqual(r3.msgs[0], readIndxMsg2) {
t.Fatalf("r3.msgs[0] expected %+v, got %+v", readIndxMsg2, r3.msgs[0])
}
// now elect r3 as leader
nt.send(raftpb.Message{From: 3, To: 3, Type: raftpb.MsgHup})
// let r1 steps the two messages previously we got from r2, r3
r1.Step(readIndxMsg1)
r1.Step(readIndxMsg2)
// verify r1(follower) forwards these messages again to r3(new leader)
if len(r1.msgs) != 2 {
t.Fatalf("len(r1.msgs) expected 1, got %d", len(r1.msgs))
}
readIndxMsg3 := raftpb.Message{From: 1, To: 3, Type: raftpb.MsgReadIndex, Entries: testEntries}
if !reflect.DeepEqual(r1.msgs[0], readIndxMsg3) {
t.Fatalf("r1.msgs[0] expected %+v, got %+v", readIndxMsg3, r1.msgs[0])
}
if !reflect.DeepEqual(r1.msgs[1], readIndxMsg3) {
t.Fatalf("r1.msgs[1] expected %+v, got %+v", readIndxMsg3, r1.msgs[1])
}
}
// TestNodeProposeConfig ensures that node.ProposeConfChange sends the given configuration proposal
// to the underlying raft.
func TestNodeProposeConfig(t *testing.T) {
msgs := []raftpb.Message{}
appendStep := func(r *raft, m raftpb.Message) {
msgs = append(msgs, m)
}
n := newNode()
s := NewMemoryStorage()
r := newTestRaft(1, []uint64{1}, 10, 1, s)
go n.run(r)
n.Campaign(context.TODO())
for {
rd := <-n.Ready()
s.Append(rd.Entries)
// change the step function to appendStep until this raft becomes leader
if rd.SoftState.Lead == r.id {
r.step = appendStep
n.Advance()
break
}
n.Advance()
}
cc := raftpb.ConfChange{Type: raftpb.ConfChangeAddNode, NodeID: 1}
ccdata, err := cc.Marshal()
if err != nil {
t.Fatal(err)
}
n.ProposeConfChange(context.TODO(), cc)
n.Stop()
if len(msgs) != 1 {
t.Fatalf("len(msgs) = %d, want %d", len(msgs), 1)
}
if msgs[0].Type != raftpb.MsgProp {
t.Errorf("msg type = %d, want %d", msgs[0].Type, raftpb.MsgProp)
}
if !bytes.Equal(msgs[0].Entries[0].Data, ccdata) {
t.Errorf("data = %v, want %v", msgs[0].Entries[0].Data, ccdata)
}
}
// TestNodeProposeAddDuplicateNode ensures that two proposes to add the same node should
// not affect the later propose to add new node.
func TestNodeProposeAddDuplicateNode(t *testing.T) {
n := newNode()
s := NewMemoryStorage()
r := newTestRaft(1, []uint64{1}, 10, 1, s)
go n.run(r)
n.Campaign(context.TODO())
rdyEntries := make([]raftpb.Entry, 0)
ticker := time.NewTicker(time.Millisecond * 100)
done := make(chan struct{})
stop := make(chan struct{})
applyConfChan := make(chan struct{})
go func() {
defer close(done)
for {
select {
case <-stop:
return
case <-ticker.C:
n.Tick()
case rd := <-n.Ready():
s.Append(rd.Entries)
for _, e := range rd.Entries {
rdyEntries = append(rdyEntries, e)
switch e.Type {
case raftpb.EntryNormal:
case raftpb.EntryConfChange:
var cc raftpb.ConfChange
cc.Unmarshal(e.Data)
n.ApplyConfChange(cc)
applyConfChan <- struct{}{}
}
}
n.Advance()
}
}
}()
cc1 := raftpb.ConfChange{Type: raftpb.ConfChangeAddNode, NodeID: 1}
ccdata1, _ := cc1.Marshal()
n.ProposeConfChange(context.TODO(), cc1)
<-applyConfChan
// try add the same node again
n.ProposeConfChange(context.TODO(), cc1)
<-applyConfChan
// the new node join should be ok
cc2 := raftpb.ConfChange{Type: raftpb.ConfChangeAddNode, NodeID: 2}
ccdata2, _ := cc2.Marshal()
n.ProposeConfChange(context.TODO(), cc2)
<-applyConfChan
close(stop)
<-done
if len(rdyEntries) != 4 {
t.Errorf("len(entry) = %d, want %d, %v\n", len(rdyEntries), 4, rdyEntries)
}
if !bytes.Equal(rdyEntries[1].Data, ccdata1) {
t.Errorf("data = %v, want %v", rdyEntries[1].Data, ccdata1)
}
if !bytes.Equal(rdyEntries[3].Data, ccdata2) {
t.Errorf("data = %v, want %v", rdyEntries[3].Data, ccdata2)
}
n.Stop()
}
// TestBlockProposal ensures that node will block proposal when it does not
// know who is the current leader; node will accept proposal when it knows
// who is the current leader.
func TestBlockProposal(t *testing.T) {
n := newNode()
r := newTestRaft(1, []uint64{1}, 10, 1, NewMemoryStorage())
go n.run(r)
defer n.Stop()
errc := make(chan error, 1)
go func() {
errc <- n.Propose(context.TODO(), []byte("somedata"))
}()
testutil.WaitSchedule()
select {
case err := <-errc:
t.Errorf("err = %v, want blocking", err)
default:
}
n.Campaign(context.TODO())
select {
case err := <-errc:
if err != nil {
t.Errorf("err = %v, want %v", err, nil)
}
case <-time.After(10 * time.Second):
t.Errorf("blocking proposal, want unblocking")
}
}
// TestNodeTick ensures that node.Tick() will increase the
// elapsed of the underlying raft state machine.
func TestNodeTick(t *testing.T) {
n := newNode()
s := NewMemoryStorage()
r := newTestRaft(1, []uint64{1}, 10, 1, s)
go n.run(r)
elapsed := r.electionElapsed
n.Tick()
testutil.WaitSchedule()
n.Stop()
if r.electionElapsed != elapsed+1 {
t.Errorf("elapsed = %d, want %d", r.electionElapsed, elapsed+1)
}
}
// TestNodeStop ensures that node.Stop() blocks until the node has stopped
// processing, and that it is idempotent
func TestNodeStop(t *testing.T) {
n := newNode()
s := NewMemoryStorage()
r := newTestRaft(1, []uint64{1}, 10, 1, s)
donec := make(chan struct{})
go func() {
n.run(r)
close(donec)
}()
status := n.Status()
n.Stop()
select {
case <-donec:
case <-time.After(time.Second):
t.Fatalf("timed out waiting for node to stop!")
}
emptyStatus := Status{}
if reflect.DeepEqual(status, emptyStatus) {
t.Errorf("status = %v, want not empty", status)
}
// Further status should return be empty, the node is stopped.
status = n.Status()
if !reflect.DeepEqual(status, emptyStatus) {
t.Errorf("status = %v, want empty", status)
}
// Subsequent Stops should have no effect.
n.Stop()
}
func TestReadyContainUpdates(t *testing.T) {
tests := []struct {
rd Ready
wcontain bool
}{
{Ready{}, false},
{Ready{SoftState: &SoftState{Lead: 1}}, true},
{Ready{HardState: raftpb.HardState{Vote: 1}}, true},
{Ready{Entries: make([]raftpb.Entry, 1, 1)}, true},
{Ready{CommittedEntries: make([]raftpb.Entry, 1, 1)}, true},
{Ready{Messages: make([]raftpb.Message, 1, 1)}, true},
{Ready{Snapshot: raftpb.Snapshot{Metadata: raftpb.SnapshotMetadata{Index: 1}}}, true},
}
for i, tt := range tests {
if g := tt.rd.containsUpdates(); g != tt.wcontain {
t.Errorf("#%d: containUpdates = %v, want %v", i, g, tt.wcontain)
}
}
}
// TestNodeStart ensures that a node can be started correctly. The node should
// start with correct configuration change entries, and can accept and commit
// proposals.
func TestNodeStart(t *testing.T) {
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
cc := raftpb.ConfChange{Type: raftpb.ConfChangeAddNode, NodeID: 1}
ccdata, err := cc.Marshal()
if err != nil {
t.Fatalf("unexpected marshal error: %v", err)
}
wants := []Ready{
{
HardState: raftpb.HardState{Term: 1, Commit: 1, Vote: 0},
Entries: []raftpb.Entry{
{Type: raftpb.EntryConfChange, Term: 1, Index: 1, Data: ccdata},
},
CommittedEntries: []raftpb.Entry{
{Type: raftpb.EntryConfChange, Term: 1, Index: 1, Data: ccdata},
},
},
{
HardState: raftpb.HardState{Term: 2, Commit: 3, Vote: 1},
Entries: []raftpb.Entry{{Term: 2, Index: 3, Data: []byte("foo")}},
CommittedEntries: []raftpb.Entry{{Term: 2, Index: 3, Data: []byte("foo")}},
},
}
storage := NewMemoryStorage()
c := &Config{
ID: 1,
ElectionTick: 10,
HeartbeatTick: 1,
Storage: storage,
MaxSizePerMsg: noLimit,
MaxInflightMsgs: 256,
}
n := StartNode(c, []Peer{{ID: 1}})
defer n.Stop()
g := <-n.Ready()
if !reflect.DeepEqual(g, wants[0]) {
t.Fatalf("#%d: g = %+v,\n w %+v", 1, g, wants[0])
} else {
storage.Append(g.Entries)
n.Advance()
}
n.Campaign(ctx)
rd := <-n.Ready()
storage.Append(rd.Entries)
n.Advance()
n.Propose(ctx, []byte("foo"))
if g2 := <-n.Ready(); !reflect.DeepEqual(g2, wants[1]) {
t.Errorf("#%d: g = %+v,\n w %+v", 2, g2, wants[1])
} else {
storage.Append(g2.Entries)
n.Advance()
}
select {
case rd := <-n.Ready():
t.Errorf("unexpected Ready: %+v", rd)
case <-time.After(time.Millisecond):
}
}
func TestNodeRestart(t *testing.T) {
entries := []raftpb.Entry{
{Term: 1, Index: 1},
{Term: 1, Index: 2, Data: []byte("foo")},
}
st := raftpb.HardState{Term: 1, Commit: 1}
want := Ready{
HardState: st,
// commit up to index commit index in st
CommittedEntries: entries[:st.Commit],
}
storage := NewMemoryStorage()
storage.SetHardState(st)
storage.Append(entries)
c := &Config{
ID: 1,
ElectionTick: 10,
HeartbeatTick: 1,
Storage: storage,
MaxSizePerMsg: noLimit,
MaxInflightMsgs: 256,
}
n := RestartNode(c)
defer n.Stop()
if g := <-n.Ready(); !reflect.DeepEqual(g, want) {
t.Errorf("g = %+v,\n w %+v", g, want)
}
n.Advance()
select {
case rd := <-n.Ready():
t.Errorf("unexpected Ready: %+v", rd)
case <-time.After(time.Millisecond):
}
}
func TestNodeRestartFromSnapshot(t *testing.T) {
snap := raftpb.Snapshot{
Metadata: raftpb.SnapshotMetadata{
ConfState: raftpb.ConfState{Nodes: []uint64{1, 2}},
Index: 2,
Term: 1,
},
}
entries := []raftpb.Entry{
{Term: 1, Index: 3, Data: []byte("foo")},
}
st := raftpb.HardState{Term: 1, Commit: 3}
want := Ready{
HardState: st,
// commit up to index commit index in st
CommittedEntries: entries,
}
s := NewMemoryStorage()
s.SetHardState(st)
s.ApplySnapshot(snap)
s.Append(entries)
c := &Config{
ID: 1,
ElectionTick: 10,
HeartbeatTick: 1,
Storage: s,
MaxSizePerMsg: noLimit,
MaxInflightMsgs: 256,
}
n := RestartNode(c)
defer n.Stop()
if g := <-n.Ready(); !reflect.DeepEqual(g, want) {
t.Errorf("g = %+v,\n w %+v", g, want)
} else {
n.Advance()
}
select {
case rd := <-n.Ready():
t.Errorf("unexpected Ready: %+v", rd)
case <-time.After(time.Millisecond):
}
}
func TestNodeAdvance(t *testing.T) {
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
storage := NewMemoryStorage()
c := &Config{
ID: 1,
ElectionTick: 10,
HeartbeatTick: 1,
Storage: storage,
MaxSizePerMsg: noLimit,
MaxInflightMsgs: 256,
}
n := StartNode(c, []Peer{{ID: 1}})
defer n.Stop()
rd := <-n.Ready()
storage.Append(rd.Entries)
n.Advance()
n.Campaign(ctx)
<-n.Ready()
n.Propose(ctx, []byte("foo"))
select {
case rd = <-n.Ready():
t.Fatalf("unexpected Ready before Advance: %+v", rd)
case <-time.After(time.Millisecond):
}
storage.Append(rd.Entries)
n.Advance()
select {
case <-n.Ready():
case <-time.After(100 * time.Millisecond):
t.Errorf("expect Ready after Advance, but there is no Ready available")
}
}
func TestSoftStateEqual(t *testing.T) {
tests := []struct {
st *SoftState
we bool
}{
{&SoftState{}, true},
{&SoftState{Lead: 1}, false},
{&SoftState{RaftState: StateLeader}, false},
}
for i, tt := range tests {
if g := tt.st.equal(&SoftState{}); g != tt.we {
t.Errorf("#%d, equal = %v, want %v", i, g, tt.we)
}
}
}
func TestIsHardStateEqual(t *testing.T) {
tests := []struct {
st raftpb.HardState
we bool
}{
{emptyState, true},
{raftpb.HardState{Vote: 1}, false},
{raftpb.HardState{Commit: 1}, false},
{raftpb.HardState{Term: 1}, false},
}
for i, tt := range tests {
if isHardStateEqual(tt.st, emptyState) != tt.we {
t.Errorf("#%d, equal = %v, want %v", i, isHardStateEqual(tt.st, emptyState), tt.we)
}
}
}

279
vendor/github.com/coreos/etcd/raft/progress.go generated vendored Normal file
View file

@ -0,0 +1,279 @@
// Copyright 2015 The etcd Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package raft
import "fmt"
const (
ProgressStateProbe ProgressStateType = iota
ProgressStateReplicate
ProgressStateSnapshot
)
type ProgressStateType uint64
var prstmap = [...]string{
"ProgressStateProbe",
"ProgressStateReplicate",
"ProgressStateSnapshot",
}
func (st ProgressStateType) String() string { return prstmap[uint64(st)] }
// Progress represents a followers progress in the view of the leader. Leader maintains
// progresses of all followers, and sends entries to the follower based on its progress.
type Progress struct {
Match, Next uint64
// State defines how the leader should interact with the follower.
//
// When in ProgressStateProbe, leader sends at most one replication message
// per heartbeat interval. It also probes actual progress of the follower.
//
// When in ProgressStateReplicate, leader optimistically increases next
// to the latest entry sent after sending replication message. This is
// an optimized state for fast replicating log entries to the follower.
//
// When in ProgressStateSnapshot, leader should have sent out snapshot
// before and stops sending any replication message.
State ProgressStateType
// Paused is used in ProgressStateProbe.
// When Paused is true, raft should pause sending replication message to this peer.
Paused bool
// PendingSnapshot is used in ProgressStateSnapshot.
// If there is a pending snapshot, the pendingSnapshot will be set to the
// index of the snapshot. If pendingSnapshot is set, the replication process of
// this Progress will be paused. raft will not resend snapshot until the pending one
// is reported to be failed.
PendingSnapshot uint64
// RecentActive is true if the progress is recently active. Receiving any messages
// from the corresponding follower indicates the progress is active.
// RecentActive can be reset to false after an election timeout.
RecentActive bool
// inflights is a sliding window for the inflight messages.
// Each inflight message contains one or more log entries.
// The max number of entries per message is defined in raft config as MaxSizePerMsg.
// Thus inflight effectively limits both the number of inflight messages
// and the bandwidth each Progress can use.
// When inflights is full, no more message should be sent.
// When a leader sends out a message, the index of the last
// entry should be added to inflights. The index MUST be added
// into inflights in order.
// When a leader receives a reply, the previous inflights should
// be freed by calling inflights.freeTo with the index of the last
// received entry.
ins *inflights
}
func (pr *Progress) resetState(state ProgressStateType) {
pr.Paused = false
pr.PendingSnapshot = 0
pr.State = state
pr.ins.reset()
}
func (pr *Progress) becomeProbe() {
// If the original state is ProgressStateSnapshot, progress knows that
// the pending snapshot has been sent to this peer successfully, then
// probes from pendingSnapshot + 1.
if pr.State == ProgressStateSnapshot {
pendingSnapshot := pr.PendingSnapshot
pr.resetState(ProgressStateProbe)
pr.Next = max(pr.Match+1, pendingSnapshot+1)
} else {
pr.resetState(ProgressStateProbe)
pr.Next = pr.Match + 1
}
}
func (pr *Progress) becomeReplicate() {
pr.resetState(ProgressStateReplicate)
pr.Next = pr.Match + 1
}
func (pr *Progress) becomeSnapshot(snapshoti uint64) {
pr.resetState(ProgressStateSnapshot)
pr.PendingSnapshot = snapshoti
}
// maybeUpdate returns false if the given n index comes from an outdated message.
// Otherwise it updates the progress and returns true.
func (pr *Progress) maybeUpdate(n uint64) bool {
var updated bool
if pr.Match < n {
pr.Match = n
updated = true
pr.resume()
}
if pr.Next < n+1 {
pr.Next = n + 1
}
return updated
}
func (pr *Progress) optimisticUpdate(n uint64) { pr.Next = n + 1 }
// maybeDecrTo returns false if the given to index comes from an out of order message.
// Otherwise it decreases the progress next index to min(rejected, last) and returns true.
func (pr *Progress) maybeDecrTo(rejected, last uint64) bool {
if pr.State == ProgressStateReplicate {
// the rejection must be stale if the progress has matched and "rejected"
// is smaller than "match".
if rejected <= pr.Match {
return false
}
// directly decrease next to match + 1
pr.Next = pr.Match + 1
return true
}
// the rejection must be stale if "rejected" does not match next - 1
if pr.Next-1 != rejected {
return false
}
if pr.Next = min(rejected, last+1); pr.Next < 1 {
pr.Next = 1
}
pr.resume()
return true
}
func (pr *Progress) pause() { pr.Paused = true }
func (pr *Progress) resume() { pr.Paused = false }
// IsPaused returns whether sending log entries to this node has been
// paused. A node may be paused because it has rejected recent
// MsgApps, is currently waiting for a snapshot, or has reached the
// MaxInflightMsgs limit.
func (pr *Progress) IsPaused() bool {
switch pr.State {
case ProgressStateProbe:
return pr.Paused
case ProgressStateReplicate:
return pr.ins.full()
case ProgressStateSnapshot:
return true
default:
panic("unexpected state")
}
}
func (pr *Progress) snapshotFailure() { pr.PendingSnapshot = 0 }
// needSnapshotAbort returns true if snapshot progress's Match
// is equal or higher than the pendingSnapshot.
func (pr *Progress) needSnapshotAbort() bool {
return pr.State == ProgressStateSnapshot && pr.Match >= pr.PendingSnapshot
}
func (pr *Progress) String() string {
return fmt.Sprintf("next = %d, match = %d, state = %s, waiting = %v, pendingSnapshot = %d", pr.Next, pr.Match, pr.State, pr.IsPaused(), pr.PendingSnapshot)
}
type inflights struct {
// the starting index in the buffer
start int
// number of inflights in the buffer
count int
// the size of the buffer
size int
// buffer contains the index of the last entry
// inside one message.
buffer []uint64
}
func newInflights(size int) *inflights {
return &inflights{
size: size,
}
}
// add adds an inflight into inflights
func (in *inflights) add(inflight uint64) {
if in.full() {
panic("cannot add into a full inflights")
}
next := in.start + in.count
size := in.size
if next >= size {
next -= size
}
if next >= len(in.buffer) {
in.growBuf()
}
in.buffer[next] = inflight
in.count++
}
// grow the inflight buffer by doubling up to inflights.size. We grow on demand
// instead of preallocating to inflights.size to handle systems which have
// thousands of Raft groups per process.
func (in *inflights) growBuf() {
newSize := len(in.buffer) * 2
if newSize == 0 {
newSize = 1
} else if newSize > in.size {
newSize = in.size
}
newBuffer := make([]uint64, newSize)
copy(newBuffer, in.buffer)
in.buffer = newBuffer
}
// freeTo frees the inflights smaller or equal to the given `to` flight.
func (in *inflights) freeTo(to uint64) {
if in.count == 0 || to < in.buffer[in.start] {
// out of the left side of the window
return
}
i, idx := 0, in.start
for i = 0; i < in.count; i++ {
if to < in.buffer[idx] { // found the first large inflight
break
}
// increase index and maybe rotate
size := in.size
if idx++; idx >= size {
idx -= size
}
}
// free i inflights and set new start index
in.count -= i
in.start = idx
if in.count == 0 {
// inflights is empty, reset the start index so that we don't grow the
// buffer unnecessarily.
in.start = 0
}
}
func (in *inflights) freeFirstOne() { in.freeTo(in.buffer[in.start]) }
// full returns true if the inflights is full.
func (in *inflights) full() bool {
return in.count == in.size
}
// resets frees all inflights.
func (in *inflights) reset() {
in.count = 0
in.start = 0
}

189
vendor/github.com/coreos/etcd/raft/progress_test.go generated vendored Normal file
View file

@ -0,0 +1,189 @@
// Copyright 2015 The etcd Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package raft
import (
"reflect"
"testing"
)
func TestInflightsAdd(t *testing.T) {
// no rotating case
in := &inflights{
size: 10,
buffer: make([]uint64, 10),
}
for i := 0; i < 5; i++ {
in.add(uint64(i))
}
wantIn := &inflights{
start: 0,
count: 5,
size: 10,
// ↓------------
buffer: []uint64{0, 1, 2, 3, 4, 0, 0, 0, 0, 0},
}
if !reflect.DeepEqual(in, wantIn) {
t.Fatalf("in = %+v, want %+v", in, wantIn)
}
for i := 5; i < 10; i++ {
in.add(uint64(i))
}
wantIn2 := &inflights{
start: 0,
count: 10,
size: 10,
// ↓---------------------------
buffer: []uint64{0, 1, 2, 3, 4, 5, 6, 7, 8, 9},
}
if !reflect.DeepEqual(in, wantIn2) {
t.Fatalf("in = %+v, want %+v", in, wantIn2)
}
// rotating case
in2 := &inflights{
start: 5,
size: 10,
buffer: make([]uint64, 10),
}
for i := 0; i < 5; i++ {
in2.add(uint64(i))
}
wantIn21 := &inflights{
start: 5,
count: 5,
size: 10,
// ↓------------
buffer: []uint64{0, 0, 0, 0, 0, 0, 1, 2, 3, 4},
}
if !reflect.DeepEqual(in2, wantIn21) {
t.Fatalf("in = %+v, want %+v", in2, wantIn21)
}
for i := 5; i < 10; i++ {
in2.add(uint64(i))
}
wantIn22 := &inflights{
start: 5,
count: 10,
size: 10,
// -------------- ↓------------
buffer: []uint64{5, 6, 7, 8, 9, 0, 1, 2, 3, 4},
}
if !reflect.DeepEqual(in2, wantIn22) {
t.Fatalf("in = %+v, want %+v", in2, wantIn22)
}
}
func TestInflightFreeTo(t *testing.T) {
// no rotating case
in := newInflights(10)
for i := 0; i < 10; i++ {
in.add(uint64(i))
}
in.freeTo(4)
wantIn := &inflights{
start: 5,
count: 5,
size: 10,
// ↓------------
buffer: []uint64{0, 1, 2, 3, 4, 5, 6, 7, 8, 9},
}
if !reflect.DeepEqual(in, wantIn) {
t.Fatalf("in = %+v, want %+v", in, wantIn)
}
in.freeTo(8)
wantIn2 := &inflights{
start: 9,
count: 1,
size: 10,
// ↓
buffer: []uint64{0, 1, 2, 3, 4, 5, 6, 7, 8, 9},
}
if !reflect.DeepEqual(in, wantIn2) {
t.Fatalf("in = %+v, want %+v", in, wantIn2)
}
// rotating case
for i := 10; i < 15; i++ {
in.add(uint64(i))
}
in.freeTo(12)
wantIn3 := &inflights{
start: 3,
count: 2,
size: 10,
// ↓-----
buffer: []uint64{10, 11, 12, 13, 14, 5, 6, 7, 8, 9},
}
if !reflect.DeepEqual(in, wantIn3) {
t.Fatalf("in = %+v, want %+v", in, wantIn3)
}
in.freeTo(14)
wantIn4 := &inflights{
start: 0,
count: 0,
size: 10,
// ↓
buffer: []uint64{10, 11, 12, 13, 14, 5, 6, 7, 8, 9},
}
if !reflect.DeepEqual(in, wantIn4) {
t.Fatalf("in = %+v, want %+v", in, wantIn4)
}
}
func TestInflightFreeFirstOne(t *testing.T) {
in := newInflights(10)
for i := 0; i < 10; i++ {
in.add(uint64(i))
}
in.freeFirstOne()
wantIn := &inflights{
start: 1,
count: 9,
size: 10,
// ↓------------------------
buffer: []uint64{0, 1, 2, 3, 4, 5, 6, 7, 8, 9},
}
if !reflect.DeepEqual(in, wantIn) {
t.Fatalf("in = %+v, want %+v", in, wantIn)
}
}

1253
vendor/github.com/coreos/etcd/raft/raft.go generated vendored Normal file

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,155 @@
// Copyright 2015 The etcd Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package raft
import (
"testing"
pb "github.com/coreos/etcd/raft/raftpb"
)
// TestMsgAppFlowControlFull ensures:
// 1. msgApp can fill the sending window until full
// 2. when the window is full, no more msgApp can be sent.
func TestMsgAppFlowControlFull(t *testing.T) {
r := newTestRaft(1, []uint64{1, 2}, 5, 1, NewMemoryStorage())
r.becomeCandidate()
r.becomeLeader()
pr2 := r.prs[2]
// force the progress to be in replicate state
pr2.becomeReplicate()
// fill in the inflights window
for i := 0; i < r.maxInflight; i++ {
r.Step(pb.Message{From: 1, To: 1, Type: pb.MsgProp, Entries: []pb.Entry{{Data: []byte("somedata")}}})
ms := r.readMessages()
if len(ms) != 1 {
t.Fatalf("#%d: len(ms) = %d, want 1", i, len(ms))
}
}
// ensure 1
if !pr2.ins.full() {
t.Fatalf("inflights.full = %t, want %t", pr2.ins.full(), true)
}
// ensure 2
for i := 0; i < 10; i++ {
r.Step(pb.Message{From: 1, To: 1, Type: pb.MsgProp, Entries: []pb.Entry{{Data: []byte("somedata")}}})
ms := r.readMessages()
if len(ms) != 0 {
t.Fatalf("#%d: len(ms) = %d, want 0", i, len(ms))
}
}
}
// TestMsgAppFlowControlMoveForward ensures msgAppResp can move
// forward the sending window correctly:
// 1. valid msgAppResp.index moves the windows to pass all smaller or equal index.
// 2. out-of-dated msgAppResp has no effect on the sliding window.
func TestMsgAppFlowControlMoveForward(t *testing.T) {
r := newTestRaft(1, []uint64{1, 2}, 5, 1, NewMemoryStorage())
r.becomeCandidate()
r.becomeLeader()
pr2 := r.prs[2]
// force the progress to be in replicate state
pr2.becomeReplicate()
// fill in the inflights window
for i := 0; i < r.maxInflight; i++ {
r.Step(pb.Message{From: 1, To: 1, Type: pb.MsgProp, Entries: []pb.Entry{{Data: []byte("somedata")}}})
r.readMessages()
}
// 1 is noop, 2 is the first proposal we just sent.
// so we start with 2.
for tt := 2; tt < r.maxInflight; tt++ {
// move forward the window
r.Step(pb.Message{From: 2, To: 1, Type: pb.MsgAppResp, Index: uint64(tt)})
r.readMessages()
// fill in the inflights window again
r.Step(pb.Message{From: 1, To: 1, Type: pb.MsgProp, Entries: []pb.Entry{{Data: []byte("somedata")}}})
ms := r.readMessages()
if len(ms) != 1 {
t.Fatalf("#%d: len(ms) = %d, want 1", tt, len(ms))
}
// ensure 1
if !pr2.ins.full() {
t.Fatalf("inflights.full = %t, want %t", pr2.ins.full(), true)
}
// ensure 2
for i := 0; i < tt; i++ {
r.Step(pb.Message{From: 2, To: 1, Type: pb.MsgAppResp, Index: uint64(i)})
if !pr2.ins.full() {
t.Fatalf("#%d: inflights.full = %t, want %t", tt, pr2.ins.full(), true)
}
}
}
}
// TestMsgAppFlowControlRecvHeartbeat ensures a heartbeat response
// frees one slot if the window is full.
func TestMsgAppFlowControlRecvHeartbeat(t *testing.T) {
r := newTestRaft(1, []uint64{1, 2}, 5, 1, NewMemoryStorage())
r.becomeCandidate()
r.becomeLeader()
pr2 := r.prs[2]
// force the progress to be in replicate state
pr2.becomeReplicate()
// fill in the inflights window
for i := 0; i < r.maxInflight; i++ {
r.Step(pb.Message{From: 1, To: 1, Type: pb.MsgProp, Entries: []pb.Entry{{Data: []byte("somedata")}}})
r.readMessages()
}
for tt := 1; tt < 5; tt++ {
if !pr2.ins.full() {
t.Fatalf("#%d: inflights.full = %t, want %t", tt, pr2.ins.full(), true)
}
// recv tt msgHeartbeatResp and expect one free slot
for i := 0; i < tt; i++ {
r.Step(pb.Message{From: 2, To: 1, Type: pb.MsgHeartbeatResp})
r.readMessages()
if pr2.ins.full() {
t.Fatalf("#%d.%d: inflights.full = %t, want %t", tt, i, pr2.ins.full(), false)
}
}
// one slot
r.Step(pb.Message{From: 1, To: 1, Type: pb.MsgProp, Entries: []pb.Entry{{Data: []byte("somedata")}}})
ms := r.readMessages()
if len(ms) != 1 {
t.Fatalf("#%d: free slot = 0, want 1", tt)
}
// and just one slot
for i := 0; i < 10; i++ {
r.Step(pb.Message{From: 1, To: 1, Type: pb.MsgProp, Entries: []pb.Entry{{Data: []byte("somedata")}}})
ms1 := r.readMessages()
if len(ms1) != 0 {
t.Fatalf("#%d.%d: len(ms) = %d, want 0", tt, i, len(ms1))
}
}
// clear all pending messages.
r.Step(pb.Message{From: 2, To: 1, Type: pb.MsgHeartbeatResp})
r.readMessages()
}
}

936
vendor/github.com/coreos/etcd/raft/raft_paper_test.go generated vendored Normal file
View file

@ -0,0 +1,936 @@
// Copyright 2015 The etcd Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
/*
This file contains tests which verify that the scenarios described
in the raft paper (https://ramcloud.stanford.edu/raft.pdf) are
handled by the raft implementation correctly. Each test focuses on
several sentences written in the paper. This could help us to prevent
most implementation bugs.
Each test is composed of three parts: init, test and check.
Init part uses simple and understandable way to simulate the init state.
Test part uses Step function to generate the scenario. Check part checks
outgoing messages and state.
*/
package raft
import (
"fmt"
"testing"
"reflect"
"sort"
pb "github.com/coreos/etcd/raft/raftpb"
)
func TestFollowerUpdateTermFromMessage(t *testing.T) {
testUpdateTermFromMessage(t, StateFollower)
}
func TestCandidateUpdateTermFromMessage(t *testing.T) {
testUpdateTermFromMessage(t, StateCandidate)
}
func TestLeaderUpdateTermFromMessage(t *testing.T) {
testUpdateTermFromMessage(t, StateLeader)
}
// testUpdateTermFromMessage tests that if one servers current term is
// smaller than the others, then it updates its current term to the larger
// value. If a candidate or leader discovers that its term is out of date,
// it immediately reverts to follower state.
// Reference: section 5.1
func testUpdateTermFromMessage(t *testing.T, state StateType) {
r := newTestRaft(1, []uint64{1, 2, 3}, 10, 1, NewMemoryStorage())
switch state {
case StateFollower:
r.becomeFollower(1, 2)
case StateCandidate:
r.becomeCandidate()
case StateLeader:
r.becomeCandidate()
r.becomeLeader()
}
r.Step(pb.Message{Type: pb.MsgApp, Term: 2})
if r.Term != 2 {
t.Errorf("term = %d, want %d", r.Term, 2)
}
if r.state != StateFollower {
t.Errorf("state = %v, want %v", r.state, StateFollower)
}
}
// TestRejectStaleTermMessage tests that if a server receives a request with
// a stale term number, it rejects the request.
// Our implementation ignores the request instead.
// Reference: section 5.1
func TestRejectStaleTermMessage(t *testing.T) {
called := false
fakeStep := func(r *raft, m pb.Message) {
called = true
}
r := newTestRaft(1, []uint64{1, 2, 3}, 10, 1, NewMemoryStorage())
r.step = fakeStep
r.loadState(pb.HardState{Term: 2})
r.Step(pb.Message{Type: pb.MsgApp, Term: r.Term - 1})
if called {
t.Errorf("stepFunc called = %v, want %v", called, false)
}
}
// TestStartAsFollower tests that when servers start up, they begin as followers.
// Reference: section 5.2
func TestStartAsFollower(t *testing.T) {
r := newTestRaft(1, []uint64{1, 2, 3}, 10, 1, NewMemoryStorage())
if r.state != StateFollower {
t.Errorf("state = %s, want %s", r.state, StateFollower)
}
}
// TestLeaderBcastBeat tests that if the leader receives a heartbeat tick,
// it will send a msgApp with m.Index = 0, m.LogTerm=0 and empty entries as
// heartbeat to all followers.
// Reference: section 5.2
func TestLeaderBcastBeat(t *testing.T) {
// heartbeat interval
hi := 1
r := newTestRaft(1, []uint64{1, 2, 3}, 10, hi, NewMemoryStorage())
r.becomeCandidate()
r.becomeLeader()
for i := 0; i < 10; i++ {
r.appendEntry(pb.Entry{Index: uint64(i) + 1})
}
for i := 0; i < hi; i++ {
r.tick()
}
msgs := r.readMessages()
sort.Sort(messageSlice(msgs))
wmsgs := []pb.Message{
{From: 1, To: 2, Term: 1, Type: pb.MsgHeartbeat},
{From: 1, To: 3, Term: 1, Type: pb.MsgHeartbeat},
}
if !reflect.DeepEqual(msgs, wmsgs) {
t.Errorf("msgs = %v, want %v", msgs, wmsgs)
}
}
func TestFollowerStartElection(t *testing.T) {
testNonleaderStartElection(t, StateFollower)
}
func TestCandidateStartNewElection(t *testing.T) {
testNonleaderStartElection(t, StateCandidate)
}
// testNonleaderStartElection tests that if a follower receives no communication
// over election timeout, it begins an election to choose a new leader. It
// increments its current term and transitions to candidate state. It then
// votes for itself and issues RequestVote RPCs in parallel to each of the
// other servers in the cluster.
// Reference: section 5.2
// Also if a candidate fails to obtain a majority, it will time out and
// start a new election by incrementing its term and initiating another
// round of RequestVote RPCs.
// Reference: section 5.2
func testNonleaderStartElection(t *testing.T, state StateType) {
// election timeout
et := 10
r := newTestRaft(1, []uint64{1, 2, 3}, et, 1, NewMemoryStorage())
switch state {
case StateFollower:
r.becomeFollower(1, 2)
case StateCandidate:
r.becomeCandidate()
}
for i := 1; i < 2*et; i++ {
r.tick()
}
if r.Term != 2 {
t.Errorf("term = %d, want 2", r.Term)
}
if r.state != StateCandidate {
t.Errorf("state = %s, want %s", r.state, StateCandidate)
}
if !r.votes[r.id] {
t.Errorf("vote for self = false, want true")
}
msgs := r.readMessages()
sort.Sort(messageSlice(msgs))
wmsgs := []pb.Message{
{From: 1, To: 2, Term: 2, Type: pb.MsgVote},
{From: 1, To: 3, Term: 2, Type: pb.MsgVote},
}
if !reflect.DeepEqual(msgs, wmsgs) {
t.Errorf("msgs = %v, want %v", msgs, wmsgs)
}
}
// TestLeaderElectionInOneRoundRPC tests all cases that may happen in
// leader election during one round of RequestVote RPC:
// a) it wins the election
// b) it loses the election
// c) it is unclear about the result
// Reference: section 5.2
func TestLeaderElectionInOneRoundRPC(t *testing.T) {
tests := []struct {
size int
votes map[uint64]bool
state StateType
}{
// win the election when receiving votes from a majority of the servers
{1, map[uint64]bool{}, StateLeader},
{3, map[uint64]bool{2: true, 3: true}, StateLeader},
{3, map[uint64]bool{2: true}, StateLeader},
{5, map[uint64]bool{2: true, 3: true, 4: true, 5: true}, StateLeader},
{5, map[uint64]bool{2: true, 3: true, 4: true}, StateLeader},
{5, map[uint64]bool{2: true, 3: true}, StateLeader},
// return to follower state if it receives vote denial from a majority
{3, map[uint64]bool{2: false, 3: false}, StateFollower},
{5, map[uint64]bool{2: false, 3: false, 4: false, 5: false}, StateFollower},
{5, map[uint64]bool{2: true, 3: false, 4: false, 5: false}, StateFollower},
// stay in candidate if it does not obtain the majority
{3, map[uint64]bool{}, StateCandidate},
{5, map[uint64]bool{2: true}, StateCandidate},
{5, map[uint64]bool{2: false, 3: false}, StateCandidate},
{5, map[uint64]bool{}, StateCandidate},
}
for i, tt := range tests {
r := newTestRaft(1, idsBySize(tt.size), 10, 1, NewMemoryStorage())
r.Step(pb.Message{From: 1, To: 1, Type: pb.MsgHup})
for id, vote := range tt.votes {
r.Step(pb.Message{From: id, To: 1, Type: pb.MsgVoteResp, Reject: !vote})
}
if r.state != tt.state {
t.Errorf("#%d: state = %s, want %s", i, r.state, tt.state)
}
if g := r.Term; g != 1 {
t.Errorf("#%d: term = %d, want %d", i, g, 1)
}
}
}
// TestFollowerVote tests that each follower will vote for at most one
// candidate in a given term, on a first-come-first-served basis.
// Reference: section 5.2
func TestFollowerVote(t *testing.T) {
tests := []struct {
vote uint64
nvote uint64
wreject bool
}{
{None, 1, false},
{None, 2, false},
{1, 1, false},
{2, 2, false},
{1, 2, true},
{2, 1, true},
}
for i, tt := range tests {
r := newTestRaft(1, []uint64{1, 2, 3}, 10, 1, NewMemoryStorage())
r.loadState(pb.HardState{Term: 1, Vote: tt.vote})
r.Step(pb.Message{From: tt.nvote, To: 1, Term: 1, Type: pb.MsgVote})
msgs := r.readMessages()
wmsgs := []pb.Message{
{From: 1, To: tt.nvote, Term: 1, Type: pb.MsgVoteResp, Reject: tt.wreject},
}
if !reflect.DeepEqual(msgs, wmsgs) {
t.Errorf("#%d: msgs = %v, want %v", i, msgs, wmsgs)
}
}
}
// TestCandidateFallback tests that while waiting for votes,
// if a candidate receives an AppendEntries RPC from another server claiming
// to be leader whose term is at least as large as the candidate's current term,
// it recognizes the leader as legitimate and returns to follower state.
// Reference: section 5.2
func TestCandidateFallback(t *testing.T) {
tests := []pb.Message{
{From: 2, To: 1, Term: 1, Type: pb.MsgApp},
{From: 2, To: 1, Term: 2, Type: pb.MsgApp},
}
for i, tt := range tests {
r := newTestRaft(1, []uint64{1, 2, 3}, 10, 1, NewMemoryStorage())
r.Step(pb.Message{From: 1, To: 1, Type: pb.MsgHup})
if r.state != StateCandidate {
t.Fatalf("unexpected state = %s, want %s", r.state, StateCandidate)
}
r.Step(tt)
if g := r.state; g != StateFollower {
t.Errorf("#%d: state = %s, want %s", i, g, StateFollower)
}
if g := r.Term; g != tt.Term {
t.Errorf("#%d: term = %d, want %d", i, g, tt.Term)
}
}
}
func TestFollowerElectionTimeoutRandomized(t *testing.T) {
SetLogger(discardLogger)
defer SetLogger(defaultLogger)
testNonleaderElectionTimeoutRandomized(t, StateFollower)
}
func TestCandidateElectionTimeoutRandomized(t *testing.T) {
SetLogger(discardLogger)
defer SetLogger(defaultLogger)
testNonleaderElectionTimeoutRandomized(t, StateCandidate)
}
// testNonleaderElectionTimeoutRandomized tests that election timeout for
// follower or candidate is randomized.
// Reference: section 5.2
func testNonleaderElectionTimeoutRandomized(t *testing.T, state StateType) {
et := 10
r := newTestRaft(1, []uint64{1, 2, 3}, et, 1, NewMemoryStorage())
timeouts := make(map[int]bool)
for round := 0; round < 50*et; round++ {
switch state {
case StateFollower:
r.becomeFollower(r.Term+1, 2)
case StateCandidate:
r.becomeCandidate()
}
time := 0
for len(r.readMessages()) == 0 {
r.tick()
time++
}
timeouts[time] = true
}
for d := et + 1; d < 2*et; d++ {
if !timeouts[d] {
t.Errorf("timeout in %d ticks should happen", d)
}
}
}
func TestFollowersElectioinTimeoutNonconflict(t *testing.T) {
SetLogger(discardLogger)
defer SetLogger(defaultLogger)
testNonleadersElectionTimeoutNonconflict(t, StateFollower)
}
func TestCandidatesElectionTimeoutNonconflict(t *testing.T) {
SetLogger(discardLogger)
defer SetLogger(defaultLogger)
testNonleadersElectionTimeoutNonconflict(t, StateCandidate)
}
// testNonleadersElectionTimeoutNonconflict tests that in most cases only a
// single server(follower or candidate) will time out, which reduces the
// likelihood of split vote in the new election.
// Reference: section 5.2
func testNonleadersElectionTimeoutNonconflict(t *testing.T, state StateType) {
et := 10
size := 5
rs := make([]*raft, size)
ids := idsBySize(size)
for k := range rs {
rs[k] = newTestRaft(ids[k], ids, et, 1, NewMemoryStorage())
}
conflicts := 0
for round := 0; round < 1000; round++ {
for _, r := range rs {
switch state {
case StateFollower:
r.becomeFollower(r.Term+1, None)
case StateCandidate:
r.becomeCandidate()
}
}
timeoutNum := 0
for timeoutNum == 0 {
for _, r := range rs {
r.tick()
if len(r.readMessages()) > 0 {
timeoutNum++
}
}
}
// several rafts time out at the same tick
if timeoutNum > 1 {
conflicts++
}
}
if g := float64(conflicts) / 1000; g > 0.3 {
t.Errorf("probability of conflicts = %v, want <= 0.3", g)
}
}
// TestLeaderStartReplication tests that when receiving client proposals,
// the leader appends the proposal to its log as a new entry, then issues
// AppendEntries RPCs in parallel to each of the other servers to replicate
// the entry. Also, when sending an AppendEntries RPC, the leader includes
// the index and term of the entry in its log that immediately precedes
// the new entries.
// Also, it writes the new entry into stable storage.
// Reference: section 5.3
func TestLeaderStartReplication(t *testing.T) {
s := NewMemoryStorage()
r := newTestRaft(1, []uint64{1, 2, 3}, 10, 1, s)
r.becomeCandidate()
r.becomeLeader()
commitNoopEntry(r, s)
li := r.raftLog.lastIndex()
ents := []pb.Entry{{Data: []byte("some data")}}
r.Step(pb.Message{From: 1, To: 1, Type: pb.MsgProp, Entries: ents})
if g := r.raftLog.lastIndex(); g != li+1 {
t.Errorf("lastIndex = %d, want %d", g, li+1)
}
if g := r.raftLog.committed; g != li {
t.Errorf("committed = %d, want %d", g, li)
}
msgs := r.readMessages()
sort.Sort(messageSlice(msgs))
wents := []pb.Entry{{Index: li + 1, Term: 1, Data: []byte("some data")}}
wmsgs := []pb.Message{
{From: 1, To: 2, Term: 1, Type: pb.MsgApp, Index: li, LogTerm: 1, Entries: wents, Commit: li},
{From: 1, To: 3, Term: 1, Type: pb.MsgApp, Index: li, LogTerm: 1, Entries: wents, Commit: li},
}
if !reflect.DeepEqual(msgs, wmsgs) {
t.Errorf("msgs = %+v, want %+v", msgs, wmsgs)
}
if g := r.raftLog.unstableEntries(); !reflect.DeepEqual(g, wents) {
t.Errorf("ents = %+v, want %+v", g, wents)
}
}
// TestLeaderCommitEntry tests that when the entry has been safely replicated,
// the leader gives out the applied entries, which can be applied to its state
// machine.
// Also, the leader keeps track of the highest index it knows to be committed,
// and it includes that index in future AppendEntries RPCs so that the other
// servers eventually find out.
// Reference: section 5.3
func TestLeaderCommitEntry(t *testing.T) {
s := NewMemoryStorage()
r := newTestRaft(1, []uint64{1, 2, 3}, 10, 1, s)
r.becomeCandidate()
r.becomeLeader()
commitNoopEntry(r, s)
li := r.raftLog.lastIndex()
r.Step(pb.Message{From: 1, To: 1, Type: pb.MsgProp, Entries: []pb.Entry{{Data: []byte("some data")}}})
for _, m := range r.readMessages() {
r.Step(acceptAndReply(m))
}
if g := r.raftLog.committed; g != li+1 {
t.Errorf("committed = %d, want %d", g, li+1)
}
wents := []pb.Entry{{Index: li + 1, Term: 1, Data: []byte("some data")}}
if g := r.raftLog.nextEnts(); !reflect.DeepEqual(g, wents) {
t.Errorf("nextEnts = %+v, want %+v", g, wents)
}
msgs := r.readMessages()
sort.Sort(messageSlice(msgs))
for i, m := range msgs {
if w := uint64(i + 2); m.To != w {
t.Errorf("to = %x, want %x", m.To, w)
}
if m.Type != pb.MsgApp {
t.Errorf("type = %v, want %v", m.Type, pb.MsgApp)
}
if m.Commit != li+1 {
t.Errorf("commit = %d, want %d", m.Commit, li+1)
}
}
}
// TestLeaderAcknowledgeCommit tests that a log entry is committed once the
// leader that created the entry has replicated it on a majority of the servers.
// Reference: section 5.3
func TestLeaderAcknowledgeCommit(t *testing.T) {
tests := []struct {
size int
acceptors map[uint64]bool
wack bool
}{
{1, nil, true},
{3, nil, false},
{3, map[uint64]bool{2: true}, true},
{3, map[uint64]bool{2: true, 3: true}, true},
{5, nil, false},
{5, map[uint64]bool{2: true}, false},
{5, map[uint64]bool{2: true, 3: true}, true},
{5, map[uint64]bool{2: true, 3: true, 4: true}, true},
{5, map[uint64]bool{2: true, 3: true, 4: true, 5: true}, true},
}
for i, tt := range tests {
s := NewMemoryStorage()
r := newTestRaft(1, idsBySize(tt.size), 10, 1, s)
r.becomeCandidate()
r.becomeLeader()
commitNoopEntry(r, s)
li := r.raftLog.lastIndex()
r.Step(pb.Message{From: 1, To: 1, Type: pb.MsgProp, Entries: []pb.Entry{{Data: []byte("some data")}}})
for _, m := range r.readMessages() {
if tt.acceptors[m.To] {
r.Step(acceptAndReply(m))
}
}
if g := r.raftLog.committed > li; g != tt.wack {
t.Errorf("#%d: ack commit = %v, want %v", i, g, tt.wack)
}
}
}
// TestLeaderCommitPrecedingEntries tests that when leader commits a log entry,
// it also commits all preceding entries in the leaders log, including
// entries created by previous leaders.
// Also, it applies the entry to its local state machine (in log order).
// Reference: section 5.3
func TestLeaderCommitPrecedingEntries(t *testing.T) {
tests := [][]pb.Entry{
{},
{{Term: 2, Index: 1}},
{{Term: 1, Index: 1}, {Term: 2, Index: 2}},
{{Term: 1, Index: 1}},
}
for i, tt := range tests {
storage := NewMemoryStorage()
storage.Append(tt)
r := newTestRaft(1, []uint64{1, 2, 3}, 10, 1, storage)
r.loadState(pb.HardState{Term: 2})
r.becomeCandidate()
r.becomeLeader()
r.Step(pb.Message{From: 1, To: 1, Type: pb.MsgProp, Entries: []pb.Entry{{Data: []byte("some data")}}})
for _, m := range r.readMessages() {
r.Step(acceptAndReply(m))
}
li := uint64(len(tt))
wents := append(tt, pb.Entry{Term: 3, Index: li + 1}, pb.Entry{Term: 3, Index: li + 2, Data: []byte("some data")})
if g := r.raftLog.nextEnts(); !reflect.DeepEqual(g, wents) {
t.Errorf("#%d: ents = %+v, want %+v", i, g, wents)
}
}
}
// TestFollowerCommitEntry tests that once a follower learns that a log entry
// is committed, it applies the entry to its local state machine (in log order).
// Reference: section 5.3
func TestFollowerCommitEntry(t *testing.T) {
tests := []struct {
ents []pb.Entry
commit uint64
}{
{
[]pb.Entry{
{Term: 1, Index: 1, Data: []byte("some data")},
},
1,
},
{
[]pb.Entry{
{Term: 1, Index: 1, Data: []byte("some data")},
{Term: 1, Index: 2, Data: []byte("some data2")},
},
2,
},
{
[]pb.Entry{
{Term: 1, Index: 1, Data: []byte("some data2")},
{Term: 1, Index: 2, Data: []byte("some data")},
},
2,
},
{
[]pb.Entry{
{Term: 1, Index: 1, Data: []byte("some data")},
{Term: 1, Index: 2, Data: []byte("some data2")},
},
1,
},
}
for i, tt := range tests {
r := newTestRaft(1, []uint64{1, 2, 3}, 10, 1, NewMemoryStorage())
r.becomeFollower(1, 2)
r.Step(pb.Message{From: 2, To: 1, Type: pb.MsgApp, Term: 1, Entries: tt.ents, Commit: tt.commit})
if g := r.raftLog.committed; g != tt.commit {
t.Errorf("#%d: committed = %d, want %d", i, g, tt.commit)
}
wents := tt.ents[:int(tt.commit)]
if g := r.raftLog.nextEnts(); !reflect.DeepEqual(g, wents) {
t.Errorf("#%d: nextEnts = %v, want %v", i, g, wents)
}
}
}
// TestFollowerCheckMsgApp tests that if the follower does not find an
// entry in its log with the same index and term as the one in AppendEntries RPC,
// then it refuses the new entries. Otherwise it replies that it accepts the
// append entries.
// Reference: section 5.3
func TestFollowerCheckMsgApp(t *testing.T) {
ents := []pb.Entry{{Term: 1, Index: 1}, {Term: 2, Index: 2}}
tests := []struct {
term uint64
index uint64
windex uint64
wreject bool
wrejectHint uint64
}{
// match with committed entries
{0, 0, 1, false, 0},
{ents[0].Term, ents[0].Index, 1, false, 0},
// match with uncommitted entries
{ents[1].Term, ents[1].Index, 2, false, 0},
// unmatch with existing entry
{ents[0].Term, ents[1].Index, ents[1].Index, true, 2},
// unexisting entry
{ents[1].Term + 1, ents[1].Index + 1, ents[1].Index + 1, true, 2},
}
for i, tt := range tests {
storage := NewMemoryStorage()
storage.Append(ents)
r := newTestRaft(1, []uint64{1, 2, 3}, 10, 1, storage)
r.loadState(pb.HardState{Commit: 1})
r.becomeFollower(2, 2)
r.Step(pb.Message{From: 2, To: 1, Type: pb.MsgApp, Term: 2, LogTerm: tt.term, Index: tt.index})
msgs := r.readMessages()
wmsgs := []pb.Message{
{From: 1, To: 2, Type: pb.MsgAppResp, Term: 2, Index: tt.windex, Reject: tt.wreject, RejectHint: tt.wrejectHint},
}
if !reflect.DeepEqual(msgs, wmsgs) {
t.Errorf("#%d: msgs = %+v, want %+v", i, msgs, wmsgs)
}
}
}
// TestFollowerAppendEntries tests that when AppendEntries RPC is valid,
// the follower will delete the existing conflict entry and all that follow it,
// and append any new entries not already in the log.
// Also, it writes the new entry into stable storage.
// Reference: section 5.3
func TestFollowerAppendEntries(t *testing.T) {
tests := []struct {
index, term uint64
ents []pb.Entry
wents []pb.Entry
wunstable []pb.Entry
}{
{
2, 2,
[]pb.Entry{{Term: 3, Index: 3}},
[]pb.Entry{{Term: 1, Index: 1}, {Term: 2, Index: 2}, {Term: 3, Index: 3}},
[]pb.Entry{{Term: 3, Index: 3}},
},
{
1, 1,
[]pb.Entry{{Term: 3, Index: 2}, {Term: 4, Index: 3}},
[]pb.Entry{{Term: 1, Index: 1}, {Term: 3, Index: 2}, {Term: 4, Index: 3}},
[]pb.Entry{{Term: 3, Index: 2}, {Term: 4, Index: 3}},
},
{
0, 0,
[]pb.Entry{{Term: 1, Index: 1}},
[]pb.Entry{{Term: 1, Index: 1}, {Term: 2, Index: 2}},
nil,
},
{
0, 0,
[]pb.Entry{{Term: 3, Index: 1}},
[]pb.Entry{{Term: 3, Index: 1}},
[]pb.Entry{{Term: 3, Index: 1}},
},
}
for i, tt := range tests {
storage := NewMemoryStorage()
storage.Append([]pb.Entry{{Term: 1, Index: 1}, {Term: 2, Index: 2}})
r := newTestRaft(1, []uint64{1, 2, 3}, 10, 1, storage)
r.becomeFollower(2, 2)
r.Step(pb.Message{From: 2, To: 1, Type: pb.MsgApp, Term: 2, LogTerm: tt.term, Index: tt.index, Entries: tt.ents})
if g := r.raftLog.allEntries(); !reflect.DeepEqual(g, tt.wents) {
t.Errorf("#%d: ents = %+v, want %+v", i, g, tt.wents)
}
if g := r.raftLog.unstableEntries(); !reflect.DeepEqual(g, tt.wunstable) {
t.Errorf("#%d: unstableEnts = %+v, want %+v", i, g, tt.wunstable)
}
}
}
// TestLeaderSyncFollowerLog tests that the leader could bring a follower's log
// into consistency with its own.
// Reference: section 5.3, figure 7
func TestLeaderSyncFollowerLog(t *testing.T) {
ents := []pb.Entry{
{},
{Term: 1, Index: 1}, {Term: 1, Index: 2}, {Term: 1, Index: 3},
{Term: 4, Index: 4}, {Term: 4, Index: 5},
{Term: 5, Index: 6}, {Term: 5, Index: 7},
{Term: 6, Index: 8}, {Term: 6, Index: 9}, {Term: 6, Index: 10},
}
term := uint64(8)
tests := [][]pb.Entry{
{
{},
{Term: 1, Index: 1}, {Term: 1, Index: 2}, {Term: 1, Index: 3},
{Term: 4, Index: 4}, {Term: 4, Index: 5},
{Term: 5, Index: 6}, {Term: 5, Index: 7},
{Term: 6, Index: 8}, {Term: 6, Index: 9},
},
{
{},
{Term: 1, Index: 1}, {Term: 1, Index: 2}, {Term: 1, Index: 3},
{Term: 4, Index: 4},
},
{
{},
{Term: 1, Index: 1}, {Term: 1, Index: 2}, {Term: 1, Index: 3},
{Term: 4, Index: 4}, {Term: 4, Index: 5},
{Term: 5, Index: 6}, {Term: 5, Index: 7},
{Term: 6, Index: 8}, {Term: 6, Index: 9}, {Term: 6, Index: 10}, {Term: 6, Index: 11},
},
{
{},
{Term: 1, Index: 1}, {Term: 1, Index: 2}, {Term: 1, Index: 3},
{Term: 4, Index: 4}, {Term: 4, Index: 5},
{Term: 5, Index: 6}, {Term: 5, Index: 7},
{Term: 6, Index: 8}, {Term: 6, Index: 9}, {Term: 6, Index: 10},
{Term: 7, Index: 11}, {Term: 7, Index: 12},
},
{
{},
{Term: 1, Index: 1}, {Term: 1, Index: 2}, {Term: 1, Index: 3},
{Term: 4, Index: 4}, {Term: 4, Index: 5}, {Term: 4, Index: 6}, {Term: 4, Index: 7},
},
{
{},
{Term: 1, Index: 1}, {Term: 1, Index: 2}, {Term: 1, Index: 3},
{Term: 2, Index: 4}, {Term: 2, Index: 5}, {Term: 2, Index: 6},
{Term: 3, Index: 7}, {Term: 3, Index: 8}, {Term: 3, Index: 9}, {Term: 3, Index: 10}, {Term: 3, Index: 11},
},
}
for i, tt := range tests {
leadStorage := NewMemoryStorage()
leadStorage.Append(ents)
lead := newTestRaft(1, []uint64{1, 2, 3}, 10, 1, leadStorage)
lead.loadState(pb.HardState{Commit: lead.raftLog.lastIndex(), Term: term})
followerStorage := NewMemoryStorage()
followerStorage.Append(tt)
follower := newTestRaft(2, []uint64{1, 2, 3}, 10, 1, followerStorage)
follower.loadState(pb.HardState{Term: term - 1})
// It is necessary to have a three-node cluster.
// The second may have more up-to-date log than the first one, so the
// first node needs the vote from the third node to become the leader.
n := newNetwork(lead, follower, nopStepper)
n.send(pb.Message{From: 1, To: 1, Type: pb.MsgHup})
// The election occurs in the term after the one we loaded with
// lead.loadState above.
n.send(pb.Message{From: 3, To: 1, Type: pb.MsgVoteResp, Term: term + 1})
n.send(pb.Message{From: 1, To: 1, Type: pb.MsgProp, Entries: []pb.Entry{{}}})
if g := diffu(ltoa(lead.raftLog), ltoa(follower.raftLog)); g != "" {
t.Errorf("#%d: log diff:\n%s", i, g)
}
}
}
// TestVoteRequest tests that the vote request includes information about the candidates log
// and are sent to all of the other nodes.
// Reference: section 5.4.1
func TestVoteRequest(t *testing.T) {
tests := []struct {
ents []pb.Entry
wterm uint64
}{
{[]pb.Entry{{Term: 1, Index: 1}}, 2},
{[]pb.Entry{{Term: 1, Index: 1}, {Term: 2, Index: 2}}, 3},
}
for j, tt := range tests {
r := newTestRaft(1, []uint64{1, 2, 3}, 10, 1, NewMemoryStorage())
r.Step(pb.Message{
From: 2, To: 1, Type: pb.MsgApp, Term: tt.wterm - 1, LogTerm: 0, Index: 0, Entries: tt.ents,
})
r.readMessages()
for i := 1; i < r.electionTimeout*2; i++ {
r.tickElection()
}
msgs := r.readMessages()
sort.Sort(messageSlice(msgs))
if len(msgs) != 2 {
t.Fatalf("#%d: len(msg) = %d, want %d", j, len(msgs), 2)
}
for i, m := range msgs {
if m.Type != pb.MsgVote {
t.Errorf("#%d: msgType = %d, want %d", i, m.Type, pb.MsgVote)
}
if m.To != uint64(i+2) {
t.Errorf("#%d: to = %d, want %d", i, m.To, i+2)
}
if m.Term != tt.wterm {
t.Errorf("#%d: term = %d, want %d", i, m.Term, tt.wterm)
}
windex, wlogterm := tt.ents[len(tt.ents)-1].Index, tt.ents[len(tt.ents)-1].Term
if m.Index != windex {
t.Errorf("#%d: index = %d, want %d", i, m.Index, windex)
}
if m.LogTerm != wlogterm {
t.Errorf("#%d: logterm = %d, want %d", i, m.LogTerm, wlogterm)
}
}
}
}
// TestVoter tests the voter denies its vote if its own log is more up-to-date
// than that of the candidate.
// Reference: section 5.4.1
func TestVoter(t *testing.T) {
tests := []struct {
ents []pb.Entry
logterm uint64
index uint64
wreject bool
}{
// same logterm
{[]pb.Entry{{Term: 1, Index: 1}}, 1, 1, false},
{[]pb.Entry{{Term: 1, Index: 1}}, 1, 2, false},
{[]pb.Entry{{Term: 1, Index: 1}, {Term: 1, Index: 2}}, 1, 1, true},
// candidate higher logterm
{[]pb.Entry{{Term: 1, Index: 1}}, 2, 1, false},
{[]pb.Entry{{Term: 1, Index: 1}}, 2, 2, false},
{[]pb.Entry{{Term: 1, Index: 1}, {Term: 1, Index: 2}}, 2, 1, false},
// voter higher logterm
{[]pb.Entry{{Term: 2, Index: 1}}, 1, 1, true},
{[]pb.Entry{{Term: 2, Index: 1}}, 1, 2, true},
{[]pb.Entry{{Term: 2, Index: 1}, {Term: 1, Index: 2}}, 1, 1, true},
}
for i, tt := range tests {
storage := NewMemoryStorage()
storage.Append(tt.ents)
r := newTestRaft(1, []uint64{1, 2}, 10, 1, storage)
r.Step(pb.Message{From: 2, To: 1, Type: pb.MsgVote, Term: 3, LogTerm: tt.logterm, Index: tt.index})
msgs := r.readMessages()
if len(msgs) != 1 {
t.Fatalf("#%d: len(msg) = %d, want %d", i, len(msgs), 1)
}
m := msgs[0]
if m.Type != pb.MsgVoteResp {
t.Errorf("#%d: msgType = %d, want %d", i, m.Type, pb.MsgVoteResp)
}
if m.Reject != tt.wreject {
t.Errorf("#%d: reject = %t, want %t", i, m.Reject, tt.wreject)
}
}
}
// TestLeaderOnlyCommitsLogFromCurrentTerm tests that only log entries from the leaders
// current term are committed by counting replicas.
// Reference: section 5.4.2
func TestLeaderOnlyCommitsLogFromCurrentTerm(t *testing.T) {
ents := []pb.Entry{{Term: 1, Index: 1}, {Term: 2, Index: 2}}
tests := []struct {
index uint64
wcommit uint64
}{
// do not commit log entries in previous terms
{1, 0},
{2, 0},
// commit log in current term
{3, 3},
}
for i, tt := range tests {
storage := NewMemoryStorage()
storage.Append(ents)
r := newTestRaft(1, []uint64{1, 2}, 10, 1, storage)
r.loadState(pb.HardState{Term: 2})
// become leader at term 3
r.becomeCandidate()
r.becomeLeader()
r.readMessages()
// propose a entry to current term
r.Step(pb.Message{From: 1, To: 1, Type: pb.MsgProp, Entries: []pb.Entry{{}}})
r.Step(pb.Message{From: 2, To: 1, Type: pb.MsgAppResp, Term: r.Term, Index: tt.index})
if r.raftLog.committed != tt.wcommit {
t.Errorf("#%d: commit = %d, want %d", i, r.raftLog.committed, tt.wcommit)
}
}
}
type messageSlice []pb.Message
func (s messageSlice) Len() int { return len(s) }
func (s messageSlice) Less(i, j int) bool { return fmt.Sprint(s[i]) < fmt.Sprint(s[j]) }
func (s messageSlice) Swap(i, j int) { s[i], s[j] = s[j], s[i] }
func commitNoopEntry(r *raft, s *MemoryStorage) {
if r.state != StateLeader {
panic("it should only be used when it is the leader")
}
r.bcastAppend()
// simulate the response of MsgApp
msgs := r.readMessages()
for _, m := range msgs {
if m.Type != pb.MsgApp || len(m.Entries) != 1 || m.Entries[0].Data != nil {
panic("not a message to append noop entry")
}
r.Step(acceptAndReply(m))
}
// ignore further messages to refresh followers' commit index
r.readMessages()
s.Append(r.raftLog.unstableEntries())
r.raftLog.appliedTo(r.raftLog.committed)
r.raftLog.stableTo(r.raftLog.lastIndex(), r.raftLog.lastTerm())
}
func acceptAndReply(m pb.Message) pb.Message {
if m.Type != pb.MsgApp {
panic("type should be MsgApp")
}
return pb.Message{
From: m.To,
To: m.From,
Term: m.Term,
Type: pb.MsgAppResp,
Index: m.Index + uint64(len(m.Entries)),
}
}

134
vendor/github.com/coreos/etcd/raft/raft_snap_test.go generated vendored Normal file
View file

@ -0,0 +1,134 @@
// Copyright 2015 The etcd Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package raft
import (
"testing"
pb "github.com/coreos/etcd/raft/raftpb"
)
var (
testingSnap = pb.Snapshot{
Metadata: pb.SnapshotMetadata{
Index: 11, // magic number
Term: 11, // magic number
ConfState: pb.ConfState{Nodes: []uint64{1, 2}},
},
}
)
func TestSendingSnapshotSetPendingSnapshot(t *testing.T) {
storage := NewMemoryStorage()
sm := newTestRaft(1, []uint64{1}, 10, 1, storage)
sm.restore(testingSnap)
sm.becomeCandidate()
sm.becomeLeader()
// force set the next of node 1, so that
// node 1 needs a snapshot
sm.prs[2].Next = sm.raftLog.firstIndex()
sm.Step(pb.Message{From: 2, To: 1, Type: pb.MsgAppResp, Index: sm.prs[2].Next - 1, Reject: true})
if sm.prs[2].PendingSnapshot != 11 {
t.Fatalf("PendingSnapshot = %d, want 11", sm.prs[2].PendingSnapshot)
}
}
func TestPendingSnapshotPauseReplication(t *testing.T) {
storage := NewMemoryStorage()
sm := newTestRaft(1, []uint64{1, 2}, 10, 1, storage)
sm.restore(testingSnap)
sm.becomeCandidate()
sm.becomeLeader()
sm.prs[2].becomeSnapshot(11)
sm.Step(pb.Message{From: 1, To: 1, Type: pb.MsgProp, Entries: []pb.Entry{{Data: []byte("somedata")}}})
msgs := sm.readMessages()
if len(msgs) != 0 {
t.Fatalf("len(msgs) = %d, want 0", len(msgs))
}
}
func TestSnapshotFailure(t *testing.T) {
storage := NewMemoryStorage()
sm := newTestRaft(1, []uint64{1, 2}, 10, 1, storage)
sm.restore(testingSnap)
sm.becomeCandidate()
sm.becomeLeader()
sm.prs[2].Next = 1
sm.prs[2].becomeSnapshot(11)
sm.Step(pb.Message{From: 2, To: 1, Type: pb.MsgSnapStatus, Reject: true})
if sm.prs[2].PendingSnapshot != 0 {
t.Fatalf("PendingSnapshot = %d, want 0", sm.prs[2].PendingSnapshot)
}
if sm.prs[2].Next != 1 {
t.Fatalf("Next = %d, want 1", sm.prs[2].Next)
}
if !sm.prs[2].Paused {
t.Errorf("Paused = %v, want true", sm.prs[2].Paused)
}
}
func TestSnapshotSucceed(t *testing.T) {
storage := NewMemoryStorage()
sm := newTestRaft(1, []uint64{1, 2}, 10, 1, storage)
sm.restore(testingSnap)
sm.becomeCandidate()
sm.becomeLeader()
sm.prs[2].Next = 1
sm.prs[2].becomeSnapshot(11)
sm.Step(pb.Message{From: 2, To: 1, Type: pb.MsgSnapStatus, Reject: false})
if sm.prs[2].PendingSnapshot != 0 {
t.Fatalf("PendingSnapshot = %d, want 0", sm.prs[2].PendingSnapshot)
}
if sm.prs[2].Next != 12 {
t.Fatalf("Next = %d, want 12", sm.prs[2].Next)
}
if !sm.prs[2].Paused {
t.Errorf("Paused = %v, want true", sm.prs[2].Paused)
}
}
func TestSnapshotAbort(t *testing.T) {
storage := NewMemoryStorage()
sm := newTestRaft(1, []uint64{1, 2}, 10, 1, storage)
sm.restore(testingSnap)
sm.becomeCandidate()
sm.becomeLeader()
sm.prs[2].Next = 1
sm.prs[2].becomeSnapshot(11)
// A successful msgAppResp that has a higher/equal index than the
// pending snapshot should abort the pending snapshot.
sm.Step(pb.Message{From: 2, To: 1, Type: pb.MsgAppResp, Index: 11})
if sm.prs[2].PendingSnapshot != 0 {
t.Fatalf("PendingSnapshot = %d, want 0", sm.prs[2].PendingSnapshot)
}
if sm.prs[2].Next != 12 {
t.Fatalf("Next = %d, want 12", sm.prs[2].Next)
}
}

3246
vendor/github.com/coreos/etcd/raft/raft_test.go generated vendored Normal file

File diff suppressed because it is too large Load diff

1900
vendor/github.com/coreos/etcd/raft/raftpb/raft.pb.go generated vendored Normal file

File diff suppressed because it is too large Load diff

93
vendor/github.com/coreos/etcd/raft/raftpb/raft.proto generated vendored Normal file
View file

@ -0,0 +1,93 @@
syntax = "proto2";
package raftpb;
import "gogoproto/gogo.proto";
option (gogoproto.marshaler_all) = true;
option (gogoproto.sizer_all) = true;
option (gogoproto.unmarshaler_all) = true;
option (gogoproto.goproto_getters_all) = false;
option (gogoproto.goproto_enum_prefix_all) = false;
enum EntryType {
EntryNormal = 0;
EntryConfChange = 1;
}
message Entry {
optional uint64 Term = 2 [(gogoproto.nullable) = false]; // must be 64-bit aligned for atomic operations
optional uint64 Index = 3 [(gogoproto.nullable) = false]; // must be 64-bit aligned for atomic operations
optional EntryType Type = 1 [(gogoproto.nullable) = false];
optional bytes Data = 4;
}
message SnapshotMetadata {
optional ConfState conf_state = 1 [(gogoproto.nullable) = false];
optional uint64 index = 2 [(gogoproto.nullable) = false];
optional uint64 term = 3 [(gogoproto.nullable) = false];
}
message Snapshot {
optional bytes data = 1;
optional SnapshotMetadata metadata = 2 [(gogoproto.nullable) = false];
}
enum MessageType {
MsgHup = 0;
MsgBeat = 1;
MsgProp = 2;
MsgApp = 3;
MsgAppResp = 4;
MsgVote = 5;
MsgVoteResp = 6;
MsgSnap = 7;
MsgHeartbeat = 8;
MsgHeartbeatResp = 9;
MsgUnreachable = 10;
MsgSnapStatus = 11;
MsgCheckQuorum = 12;
MsgTransferLeader = 13;
MsgTimeoutNow = 14;
MsgReadIndex = 15;
MsgReadIndexResp = 16;
MsgPreVote = 17;
MsgPreVoteResp = 18;
}
message Message {
optional MessageType type = 1 [(gogoproto.nullable) = false];
optional uint64 to = 2 [(gogoproto.nullable) = false];
optional uint64 from = 3 [(gogoproto.nullable) = false];
optional uint64 term = 4 [(gogoproto.nullable) = false];
optional uint64 logTerm = 5 [(gogoproto.nullable) = false];
optional uint64 index = 6 [(gogoproto.nullable) = false];
repeated Entry entries = 7 [(gogoproto.nullable) = false];
optional uint64 commit = 8 [(gogoproto.nullable) = false];
optional Snapshot snapshot = 9 [(gogoproto.nullable) = false];
optional bool reject = 10 [(gogoproto.nullable) = false];
optional uint64 rejectHint = 11 [(gogoproto.nullable) = false];
optional bytes context = 12;
}
message HardState {
optional uint64 term = 1 [(gogoproto.nullable) = false];
optional uint64 vote = 2 [(gogoproto.nullable) = false];
optional uint64 commit = 3 [(gogoproto.nullable) = false];
}
message ConfState {
repeated uint64 nodes = 1;
}
enum ConfChangeType {
ConfChangeAddNode = 0;
ConfChangeRemoveNode = 1;
ConfChangeUpdateNode = 2;
}
message ConfChange {
optional uint64 ID = 1 [(gogoproto.nullable) = false];
optional ConfChangeType Type = 2 [(gogoproto.nullable) = false];
optional uint64 NodeID = 3 [(gogoproto.nullable) = false];
optional bytes Context = 4;
}

16
vendor/github.com/coreos/etcd/raft/rafttest/doc.go generated vendored Normal file
View file

@ -0,0 +1,16 @@
// Copyright 2015 The etcd Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
// Package rafttest provides functional tests for etcd's raft implementation.
package rafttest

183
vendor/github.com/coreos/etcd/raft/rafttest/network.go generated vendored Normal file
View file

@ -0,0 +1,183 @@
// Copyright 2015 The etcd Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package rafttest
import (
"math/rand"
"sync"
"time"
"github.com/coreos/etcd/raft/raftpb"
)
// a network interface
type iface interface {
send(m raftpb.Message)
recv() chan raftpb.Message
disconnect()
connect()
}
// a network
type network interface {
// drop message at given rate (1.0 drops all messages)
drop(from, to uint64, rate float64)
// delay message for (0, d] randomly at given rate (1.0 delay all messages)
// do we need rate here?
delay(from, to uint64, d time.Duration, rate float64)
disconnect(id uint64)
connect(id uint64)
// heal heals the network
heal()
}
type raftNetwork struct {
mu sync.Mutex
disconnected map[uint64]bool
dropmap map[conn]float64
delaymap map[conn]delay
recvQueues map[uint64]chan raftpb.Message
}
type conn struct {
from, to uint64
}
type delay struct {
d time.Duration
rate float64
}
func newRaftNetwork(nodes ...uint64) *raftNetwork {
pn := &raftNetwork{
recvQueues: make(map[uint64]chan raftpb.Message),
dropmap: make(map[conn]float64),
delaymap: make(map[conn]delay),
disconnected: make(map[uint64]bool),
}
for _, n := range nodes {
pn.recvQueues[n] = make(chan raftpb.Message, 1024)
}
return pn
}
func (rn *raftNetwork) nodeNetwork(id uint64) iface {
return &nodeNetwork{id: id, raftNetwork: rn}
}
func (rn *raftNetwork) send(m raftpb.Message) {
rn.mu.Lock()
to := rn.recvQueues[m.To]
if rn.disconnected[m.To] {
to = nil
}
drop := rn.dropmap[conn{m.From, m.To}]
dl := rn.delaymap[conn{m.From, m.To}]
rn.mu.Unlock()
if to == nil {
return
}
if drop != 0 && rand.Float64() < drop {
return
}
// TODO: shall we dl without blocking the send call?
if dl.d != 0 && rand.Float64() < dl.rate {
rd := rand.Int63n(int64(dl.d))
time.Sleep(time.Duration(rd))
}
// use marshal/unmarshal to copy message to avoid data race.
b, err := m.Marshal()
if err != nil {
panic(err)
}
var cm raftpb.Message
err = cm.Unmarshal(b)
if err != nil {
panic(err)
}
select {
case to <- cm:
default:
// drop messages when the receiver queue is full.
}
}
func (rn *raftNetwork) recvFrom(from uint64) chan raftpb.Message {
rn.mu.Lock()
fromc := rn.recvQueues[from]
if rn.disconnected[from] {
fromc = nil
}
rn.mu.Unlock()
return fromc
}
func (rn *raftNetwork) drop(from, to uint64, rate float64) {
rn.mu.Lock()
defer rn.mu.Unlock()
rn.dropmap[conn{from, to}] = rate
}
func (rn *raftNetwork) delay(from, to uint64, d time.Duration, rate float64) {
rn.mu.Lock()
defer rn.mu.Unlock()
rn.delaymap[conn{from, to}] = delay{d, rate}
}
func (rn *raftNetwork) heal() {
rn.mu.Lock()
defer rn.mu.Unlock()
rn.dropmap = make(map[conn]float64)
rn.delaymap = make(map[conn]delay)
}
func (rn *raftNetwork) disconnect(id uint64) {
rn.mu.Lock()
defer rn.mu.Unlock()
rn.disconnected[id] = true
}
func (rn *raftNetwork) connect(id uint64) {
rn.mu.Lock()
defer rn.mu.Unlock()
rn.disconnected[id] = false
}
type nodeNetwork struct {
id uint64
*raftNetwork
}
func (nt *nodeNetwork) connect() {
nt.raftNetwork.connect(nt.id)
}
func (nt *nodeNetwork) disconnect() {
nt.raftNetwork.disconnect(nt.id)
}
func (nt *nodeNetwork) send(m raftpb.Message) {
nt.raftNetwork.send(m)
}
func (nt *nodeNetwork) recv() chan raftpb.Message {
return nt.recvFrom(nt.id)
}

View file

@ -0,0 +1,72 @@
// Copyright 2015 The etcd Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package rafttest
import (
"testing"
"time"
"github.com/coreos/etcd/raft/raftpb"
)
func TestNetworkDrop(t *testing.T) {
// drop around 10% messages
sent := 1000
droprate := 0.1
nt := newRaftNetwork(1, 2)
nt.drop(1, 2, droprate)
for i := 0; i < sent; i++ {
nt.send(raftpb.Message{From: 1, To: 2})
}
c := nt.recvFrom(2)
received := 0
done := false
for !done {
select {
case <-c:
received++
default:
done = true
}
}
drop := sent - received
if drop > int((droprate+0.1)*float64(sent)) || drop < int((droprate-0.1)*float64(sent)) {
t.Errorf("drop = %d, want around %.2f", drop, droprate*float64(sent))
}
}
func TestNetworkDelay(t *testing.T) {
sent := 1000
delay := time.Millisecond
delayrate := 0.1
nt := newRaftNetwork(1, 2)
nt.delay(1, 2, delay, delayrate)
var total time.Duration
for i := 0; i < sent; i++ {
s := time.Now()
nt.send(raftpb.Message{From: 1, To: 2})
total += time.Since(s)
}
w := time.Duration(float64(sent)*delayrate/2) * delay
// there is some overhead in the send call since it generates random numbers.
if total < w {
t.Errorf("total = %v, want > %v", total, w)
}
}

150
vendor/github.com/coreos/etcd/raft/rafttest/node.go generated vendored Normal file
View file

@ -0,0 +1,150 @@
// Copyright 2015 The etcd Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package rafttest
import (
"log"
"sync"
"time"
"github.com/coreos/etcd/raft"
"github.com/coreos/etcd/raft/raftpb"
"golang.org/x/net/context"
)
type node struct {
raft.Node
id uint64
iface iface
stopc chan struct{}
pausec chan bool
// stable
storage *raft.MemoryStorage
mu sync.Mutex // guards state
state raftpb.HardState
}
func startNode(id uint64, peers []raft.Peer, iface iface) *node {
st := raft.NewMemoryStorage()
c := &raft.Config{
ID: id,
ElectionTick: 10,
HeartbeatTick: 1,
Storage: st,
MaxSizePerMsg: 1024 * 1024,
MaxInflightMsgs: 256,
}
rn := raft.StartNode(c, peers)
n := &node{
Node: rn,
id: id,
storage: st,
iface: iface,
pausec: make(chan bool),
}
n.start()
return n
}
func (n *node) start() {
n.stopc = make(chan struct{})
ticker := time.Tick(5 * time.Millisecond)
go func() {
for {
select {
case <-ticker:
n.Tick()
case rd := <-n.Ready():
if !raft.IsEmptyHardState(rd.HardState) {
n.mu.Lock()
n.state = rd.HardState
n.mu.Unlock()
n.storage.SetHardState(n.state)
}
n.storage.Append(rd.Entries)
time.Sleep(time.Millisecond)
// TODO: make send async, more like real world...
for _, m := range rd.Messages {
n.iface.send(m)
}
n.Advance()
case m := <-n.iface.recv():
go n.Step(context.TODO(), m)
case <-n.stopc:
n.Stop()
log.Printf("raft.%d: stop", n.id)
n.Node = nil
close(n.stopc)
return
case p := <-n.pausec:
recvms := make([]raftpb.Message, 0)
for p {
select {
case m := <-n.iface.recv():
recvms = append(recvms, m)
case p = <-n.pausec:
}
}
// step all pending messages
for _, m := range recvms {
n.Step(context.TODO(), m)
}
}
}
}()
}
// stop stops the node. stop a stopped node might panic.
// All in memory state of node is discarded.
// All stable MUST be unchanged.
func (n *node) stop() {
n.iface.disconnect()
n.stopc <- struct{}{}
// wait for the shutdown
<-n.stopc
}
// restart restarts the node. restart a started node
// blocks and might affect the future stop operation.
func (n *node) restart() {
// wait for the shutdown
<-n.stopc
c := &raft.Config{
ID: n.id,
ElectionTick: 10,
HeartbeatTick: 1,
Storage: n.storage,
MaxSizePerMsg: 1024 * 1024,
MaxInflightMsgs: 256,
}
n.Node = raft.RestartNode(c)
n.start()
n.iface.connect()
}
// pause pauses the node.
// The paused node buffers the received messages and replies
// all of them when it resumes.
func (n *node) pause() {
n.pausec <- true
}
// resume resumes the paused node.
func (n *node) resume() {
n.pausec <- false
}

View file

@ -0,0 +1,53 @@
// Copyright 2015 The etcd Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package rafttest
import (
"testing"
"time"
"github.com/coreos/etcd/raft"
"golang.org/x/net/context"
)
func BenchmarkProposal3Nodes(b *testing.B) {
peers := []raft.Peer{{1, nil}, {2, nil}, {3, nil}}
nt := newRaftNetwork(1, 2, 3)
nodes := make([]*node, 0)
for i := 1; i <= 3; i++ {
n := startNode(uint64(i), peers, nt.nodeNetwork(uint64(i)))
nodes = append(nodes, n)
}
// get ready and warm up
time.Sleep(50 * time.Millisecond)
b.ResetTimer()
for i := 0; i < b.N; i++ {
nodes[0].Propose(context.TODO(), []byte("somedata"))
}
for _, n := range nodes {
if n.state.Commit != uint64(b.N+4) {
continue
}
}
b.StopTimer()
for _, n := range nodes {
n.stop()
}
}

View file

@ -0,0 +1,175 @@
// Copyright 2015 The etcd Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package rafttest
import (
"testing"
"time"
"github.com/coreos/etcd/raft"
"golang.org/x/net/context"
)
func TestBasicProgress(t *testing.T) {
peers := []raft.Peer{{1, nil}, {2, nil}, {3, nil}, {4, nil}, {5, nil}}
nt := newRaftNetwork(1, 2, 3, 4, 5)
nodes := make([]*node, 0)
for i := 1; i <= 5; i++ {
n := startNode(uint64(i), peers, nt.nodeNetwork(uint64(i)))
nodes = append(nodes, n)
}
waitLeader(nodes)
for i := 0; i < 100; i++ {
nodes[0].Propose(context.TODO(), []byte("somedata"))
}
if !waitCommitConverge(nodes, 100) {
t.Errorf("commits failed to converge!")
}
for _, n := range nodes {
n.stop()
}
}
func TestRestart(t *testing.T) {
peers := []raft.Peer{{1, nil}, {2, nil}, {3, nil}, {4, nil}, {5, nil}}
nt := newRaftNetwork(1, 2, 3, 4, 5)
nodes := make([]*node, 0)
for i := 1; i <= 5; i++ {
n := startNode(uint64(i), peers, nt.nodeNetwork(uint64(i)))
nodes = append(nodes, n)
}
l := waitLeader(nodes)
k1, k2 := (l+1)%5, (l+2)%5
for i := 0; i < 30; i++ {
nodes[l].Propose(context.TODO(), []byte("somedata"))
}
nodes[k1].stop()
for i := 0; i < 30; i++ {
nodes[(l+3)%5].Propose(context.TODO(), []byte("somedata"))
}
nodes[k2].stop()
for i := 0; i < 30; i++ {
nodes[(l+4)%5].Propose(context.TODO(), []byte("somedata"))
}
nodes[k2].restart()
for i := 0; i < 30; i++ {
nodes[l].Propose(context.TODO(), []byte("somedata"))
}
nodes[k1].restart()
if !waitCommitConverge(nodes, 120) {
t.Errorf("commits failed to converge!")
}
for _, n := range nodes {
n.stop()
}
}
func TestPause(t *testing.T) {
peers := []raft.Peer{{1, nil}, {2, nil}, {3, nil}, {4, nil}, {5, nil}}
nt := newRaftNetwork(1, 2, 3, 4, 5)
nodes := make([]*node, 0)
for i := 1; i <= 5; i++ {
n := startNode(uint64(i), peers, nt.nodeNetwork(uint64(i)))
nodes = append(nodes, n)
}
waitLeader(nodes)
for i := 0; i < 30; i++ {
nodes[0].Propose(context.TODO(), []byte("somedata"))
}
nodes[1].pause()
for i := 0; i < 30; i++ {
nodes[0].Propose(context.TODO(), []byte("somedata"))
}
nodes[2].pause()
for i := 0; i < 30; i++ {
nodes[0].Propose(context.TODO(), []byte("somedata"))
}
nodes[2].resume()
for i := 0; i < 30; i++ {
nodes[0].Propose(context.TODO(), []byte("somedata"))
}
nodes[1].resume()
if !waitCommitConverge(nodes, 120) {
t.Errorf("commits failed to converge!")
}
for _, n := range nodes {
n.stop()
}
}
func waitLeader(ns []*node) int {
var l map[uint64]struct{}
var lindex int
for {
l = make(map[uint64]struct{})
for i, n := range ns {
lead := n.Status().SoftState.Lead
if lead != 0 {
l[lead] = struct{}{}
if n.id == lead {
lindex = i
}
}
}
if len(l) == 1 {
return lindex
}
}
}
func waitCommitConverge(ns []*node, target uint64) bool {
var c map[uint64]struct{}
for i := 0; i < 50; i++ {
c = make(map[uint64]struct{})
var good int
for _, n := range ns {
commit := n.Node.Status().HardState.Commit
c[commit] = struct{}{}
if commit > target {
good++
}
}
if len(c) == 1 && good == len(ns) {
return true
}
time.Sleep(100 * time.Millisecond)
}
return false
}

264
vendor/github.com/coreos/etcd/raft/rawnode.go generated vendored Normal file
View file

@ -0,0 +1,264 @@
// Copyright 2015 The etcd Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package raft
import (
"errors"
pb "github.com/coreos/etcd/raft/raftpb"
)
// ErrStepLocalMsg is returned when try to step a local raft message
var ErrStepLocalMsg = errors.New("raft: cannot step raft local message")
// ErrStepPeerNotFound is returned when try to step a response message
// but there is no peer found in raft.prs for that node.
var ErrStepPeerNotFound = errors.New("raft: cannot step as peer not found")
// RawNode is a thread-unsafe Node.
// The methods of this struct correspond to the methods of Node and are described
// more fully there.
type RawNode struct {
raft *raft
prevSoftSt *SoftState
prevHardSt pb.HardState
}
func (rn *RawNode) newReady() Ready {
return newReady(rn.raft, rn.prevSoftSt, rn.prevHardSt)
}
func (rn *RawNode) commitReady(rd Ready) {
if rd.SoftState != nil {
rn.prevSoftSt = rd.SoftState
}
if !IsEmptyHardState(rd.HardState) {
rn.prevHardSt = rd.HardState
}
if rn.prevHardSt.Commit != 0 {
// In most cases, prevHardSt and rd.HardState will be the same
// because when there are new entries to apply we just sent a
// HardState with an updated Commit value. However, on initial
// startup the two are different because we don't send a HardState
// until something changes, but we do send any un-applied but
// committed entries (and previously-committed entries may be
// incorporated into the snapshot, even if rd.CommittedEntries is
// empty). Therefore we mark all committed entries as applied
// whether they were included in rd.HardState or not.
rn.raft.raftLog.appliedTo(rn.prevHardSt.Commit)
}
if len(rd.Entries) > 0 {
e := rd.Entries[len(rd.Entries)-1]
rn.raft.raftLog.stableTo(e.Index, e.Term)
}
if !IsEmptySnap(rd.Snapshot) {
rn.raft.raftLog.stableSnapTo(rd.Snapshot.Metadata.Index)
}
if len(rd.ReadStates) != 0 {
rn.raft.readStates = nil
}
}
// NewRawNode returns a new RawNode given configuration and a list of raft peers.
func NewRawNode(config *Config, peers []Peer) (*RawNode, error) {
if config.ID == 0 {
panic("config.ID must not be zero")
}
r := newRaft(config)
rn := &RawNode{
raft: r,
}
lastIndex, err := config.Storage.LastIndex()
if err != nil {
panic(err) // TODO(bdarnell)
}
// If the log is empty, this is a new RawNode (like StartNode); otherwise it's
// restoring an existing RawNode (like RestartNode).
// TODO(bdarnell): rethink RawNode initialization and whether the application needs
// to be able to tell us when it expects the RawNode to exist.
if lastIndex == 0 {
r.becomeFollower(1, None)
ents := make([]pb.Entry, len(peers))
for i, peer := range peers {
cc := pb.ConfChange{Type: pb.ConfChangeAddNode, NodeID: peer.ID, Context: peer.Context}
data, err := cc.Marshal()
if err != nil {
panic("unexpected marshal error")
}
ents[i] = pb.Entry{Type: pb.EntryConfChange, Term: 1, Index: uint64(i + 1), Data: data}
}
r.raftLog.append(ents...)
r.raftLog.committed = uint64(len(ents))
for _, peer := range peers {
r.addNode(peer.ID)
}
}
// Set the initial hard and soft states after performing all initialization.
rn.prevSoftSt = r.softState()
if lastIndex == 0 {
rn.prevHardSt = emptyState
} else {
rn.prevHardSt = r.hardState()
}
return rn, nil
}
// Tick advances the internal logical clock by a single tick.
func (rn *RawNode) Tick() {
rn.raft.tick()
}
// TickQuiesced advances the internal logical clock by a single tick without
// performing any other state machine processing. It allows the caller to avoid
// periodic heartbeats and elections when all of the peers in a Raft group are
// known to be at the same state. Expected usage is to periodically invoke Tick
// or TickQuiesced depending on whether the group is "active" or "quiesced".
//
// WARNING: Be very careful about using this method as it subverts the Raft
// state machine. You should probably be using Tick instead.
func (rn *RawNode) TickQuiesced() {
rn.raft.electionElapsed++
}
// Campaign causes this RawNode to transition to candidate state.
func (rn *RawNode) Campaign() error {
return rn.raft.Step(pb.Message{
Type: pb.MsgHup,
})
}
// Propose proposes data be appended to the raft log.
func (rn *RawNode) Propose(data []byte) error {
return rn.raft.Step(pb.Message{
Type: pb.MsgProp,
From: rn.raft.id,
Entries: []pb.Entry{
{Data: data},
}})
}
// ProposeConfChange proposes a config change.
func (rn *RawNode) ProposeConfChange(cc pb.ConfChange) error {
data, err := cc.Marshal()
if err != nil {
return err
}
return rn.raft.Step(pb.Message{
Type: pb.MsgProp,
Entries: []pb.Entry{
{Type: pb.EntryConfChange, Data: data},
},
})
}
// ApplyConfChange applies a config change to the local node.
func (rn *RawNode) ApplyConfChange(cc pb.ConfChange) *pb.ConfState {
if cc.NodeID == None {
rn.raft.resetPendingConf()
return &pb.ConfState{Nodes: rn.raft.nodes()}
}
switch cc.Type {
case pb.ConfChangeAddNode:
rn.raft.addNode(cc.NodeID)
case pb.ConfChangeRemoveNode:
rn.raft.removeNode(cc.NodeID)
case pb.ConfChangeUpdateNode:
rn.raft.resetPendingConf()
default:
panic("unexpected conf type")
}
return &pb.ConfState{Nodes: rn.raft.nodes()}
}
// Step advances the state machine using the given message.
func (rn *RawNode) Step(m pb.Message) error {
// ignore unexpected local messages receiving over network
if IsLocalMsg(m.Type) {
return ErrStepLocalMsg
}
if _, ok := rn.raft.prs[m.From]; ok || !IsResponseMsg(m.Type) {
return rn.raft.Step(m)
}
return ErrStepPeerNotFound
}
// Ready returns the current point-in-time state of this RawNode.
func (rn *RawNode) Ready() Ready {
rd := rn.newReady()
rn.raft.msgs = nil
return rd
}
// HasReady called when RawNode user need to check if any Ready pending.
// Checking logic in this method should be consistent with Ready.containsUpdates().
func (rn *RawNode) HasReady() bool {
r := rn.raft
if !r.softState().equal(rn.prevSoftSt) {
return true
}
if hardSt := r.hardState(); !IsEmptyHardState(hardSt) && !isHardStateEqual(hardSt, rn.prevHardSt) {
return true
}
if r.raftLog.unstable.snapshot != nil && !IsEmptySnap(*r.raftLog.unstable.snapshot) {
return true
}
if len(r.msgs) > 0 || len(r.raftLog.unstableEntries()) > 0 || r.raftLog.hasNextEnts() {
return true
}
if len(r.readStates) != 0 {
return true
}
return false
}
// Advance notifies the RawNode that the application has applied and saved progress in the
// last Ready results.
func (rn *RawNode) Advance(rd Ready) {
rn.commitReady(rd)
}
// Status returns the current status of the given group.
func (rn *RawNode) Status() *Status {
status := getStatus(rn.raft)
return &status
}
// ReportUnreachable reports the given node is not reachable for the last send.
func (rn *RawNode) ReportUnreachable(id uint64) {
_ = rn.raft.Step(pb.Message{Type: pb.MsgUnreachable, From: id})
}
// ReportSnapshot reports the status of the sent snapshot.
func (rn *RawNode) ReportSnapshot(id uint64, status SnapshotStatus) {
rej := status == SnapshotFailure
_ = rn.raft.Step(pb.Message{Type: pb.MsgSnapStatus, From: id, Reject: rej})
}
// TransferLeader tries to transfer leadership to the given transferee.
func (rn *RawNode) TransferLeader(transferee uint64) {
_ = rn.raft.Step(pb.Message{Type: pb.MsgTransferLeader, From: transferee})
}
// ReadIndex requests a read state. The read state will be set in ready.
// Read State has a read index. Once the application advances further than the read
// index, any linearizable read requests issued before the read request can be
// processed safely. The read state will have the same rctx attached.
func (rn *RawNode) ReadIndex(rctx []byte) {
_ = rn.raft.Step(pb.Message{Type: pb.MsgReadIndex, Entries: []pb.Entry{{Data: rctx}}})
}

398
vendor/github.com/coreos/etcd/raft/rawnode_test.go generated vendored Normal file
View file

@ -0,0 +1,398 @@
// Copyright 2015 The etcd Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package raft
import (
"bytes"
"reflect"
"testing"
"github.com/coreos/etcd/raft/raftpb"
)
// TestRawNodeStep ensures that RawNode.Step ignore local message.
func TestRawNodeStep(t *testing.T) {
for i, msgn := range raftpb.MessageType_name {
s := NewMemoryStorage()
rawNode, err := NewRawNode(newTestConfig(1, nil, 10, 1, s), []Peer{{ID: 1}})
if err != nil {
t.Fatal(err)
}
msgt := raftpb.MessageType(i)
err = rawNode.Step(raftpb.Message{Type: msgt})
// LocalMsg should be ignored.
if IsLocalMsg(msgt) {
if err != ErrStepLocalMsg {
t.Errorf("%d: step should ignore %s", msgt, msgn)
}
}
}
}
// TestNodeStepUnblock from node_test.go has no equivalent in rawNode because there is
// no goroutine in RawNode.
// TestRawNodeProposeAndConfChange ensures that RawNode.Propose and RawNode.ProposeConfChange
// send the given proposal and ConfChange to the underlying raft.
func TestRawNodeProposeAndConfChange(t *testing.T) {
s := NewMemoryStorage()
var err error
rawNode, err := NewRawNode(newTestConfig(1, nil, 10, 1, s), []Peer{{ID: 1}})
if err != nil {
t.Fatal(err)
}
rd := rawNode.Ready()
s.Append(rd.Entries)
rawNode.Advance(rd)
rawNode.Campaign()
proposed := false
var (
lastIndex uint64
ccdata []byte
)
for {
rd = rawNode.Ready()
s.Append(rd.Entries)
// Once we are the leader, propose a command and a ConfChange.
if !proposed && rd.SoftState.Lead == rawNode.raft.id {
rawNode.Propose([]byte("somedata"))
cc := raftpb.ConfChange{Type: raftpb.ConfChangeAddNode, NodeID: 1}
ccdata, err = cc.Marshal()
if err != nil {
t.Fatal(err)
}
rawNode.ProposeConfChange(cc)
proposed = true
}
rawNode.Advance(rd)
// Exit when we have four entries: one ConfChange, one no-op for the election,
// our proposed command and proposed ConfChange.
lastIndex, err = s.LastIndex()
if err != nil {
t.Fatal(err)
}
if lastIndex >= 4 {
break
}
}
entries, err := s.Entries(lastIndex-1, lastIndex+1, noLimit)
if err != nil {
t.Fatal(err)
}
if len(entries) != 2 {
t.Fatalf("len(entries) = %d, want %d", len(entries), 2)
}
if !bytes.Equal(entries[0].Data, []byte("somedata")) {
t.Errorf("entries[0].Data = %v, want %v", entries[0].Data, []byte("somedata"))
}
if entries[1].Type != raftpb.EntryConfChange {
t.Fatalf("type = %v, want %v", entries[1].Type, raftpb.EntryConfChange)
}
if !bytes.Equal(entries[1].Data, ccdata) {
t.Errorf("data = %v, want %v", entries[1].Data, ccdata)
}
}
// TestRawNodeProposeAddDuplicateNode ensures that two proposes to add the same node should
// not affect the later propose to add new node.
func TestRawNodeProposeAddDuplicateNode(t *testing.T) {
s := NewMemoryStorage()
rawNode, err := NewRawNode(newTestConfig(1, nil, 10, 1, s), []Peer{{ID: 1}})
if err != nil {
t.Fatal(err)
}
rd := rawNode.Ready()
s.Append(rd.Entries)
rawNode.Advance(rd)
rawNode.Campaign()
for {
rd = rawNode.Ready()
s.Append(rd.Entries)
if rd.SoftState.Lead == rawNode.raft.id {
rawNode.Advance(rd)
break
}
rawNode.Advance(rd)
}
proposeConfChangeAndApply := func(cc raftpb.ConfChange) {
rawNode.ProposeConfChange(cc)
rd = rawNode.Ready()
s.Append(rd.Entries)
for _, entry := range rd.CommittedEntries {
if entry.Type == raftpb.EntryConfChange {
var cc raftpb.ConfChange
cc.Unmarshal(entry.Data)
rawNode.ApplyConfChange(cc)
}
}
rawNode.Advance(rd)
}
cc1 := raftpb.ConfChange{Type: raftpb.ConfChangeAddNode, NodeID: 1}
ccdata1, err := cc1.Marshal()
if err != nil {
t.Fatal(err)
}
proposeConfChangeAndApply(cc1)
// try to add the same node again
proposeConfChangeAndApply(cc1)
// the new node join should be ok
cc2 := raftpb.ConfChange{Type: raftpb.ConfChangeAddNode, NodeID: 2}
ccdata2, err := cc2.Marshal()
if err != nil {
t.Fatal(err)
}
proposeConfChangeAndApply(cc2)
lastIndex, err := s.LastIndex()
if err != nil {
t.Fatal(err)
}
// the last three entries should be: ConfChange cc1, cc1, cc2
entries, err := s.Entries(lastIndex-2, lastIndex+1, noLimit)
if err != nil {
t.Fatal(err)
}
if len(entries) != 3 {
t.Fatalf("len(entries) = %d, want %d", len(entries), 3)
}
if !bytes.Equal(entries[0].Data, ccdata1) {
t.Errorf("entries[0].Data = %v, want %v", entries[0].Data, ccdata1)
}
if !bytes.Equal(entries[2].Data, ccdata2) {
t.Errorf("entries[2].Data = %v, want %v", entries[2].Data, ccdata2)
}
}
// TestRawNodeReadIndex ensures that Rawnode.ReadIndex sends the MsgReadIndex message
// to the underlying raft. It also ensures that ReadState can be read out.
func TestRawNodeReadIndex(t *testing.T) {
msgs := []raftpb.Message{}
appendStep := func(r *raft, m raftpb.Message) {
msgs = append(msgs, m)
}
wrs := []ReadState{{Index: uint64(1), RequestCtx: []byte("somedata")}}
s := NewMemoryStorage()
c := newTestConfig(1, nil, 10, 1, s)
rawNode, err := NewRawNode(c, []Peer{{ID: 1}})
if err != nil {
t.Fatal(err)
}
rawNode.raft.readStates = wrs
// ensure the ReadStates can be read out
hasReady := rawNode.HasReady()
if !hasReady {
t.Errorf("HasReady() returns %t, want %t", hasReady, true)
}
rd := rawNode.Ready()
if !reflect.DeepEqual(rd.ReadStates, wrs) {
t.Errorf("ReadStates = %d, want %d", rd.ReadStates, wrs)
}
s.Append(rd.Entries)
rawNode.Advance(rd)
// ensure raft.readStates is reset after advance
if rawNode.raft.readStates != nil {
t.Errorf("readStates = %v, want %v", rawNode.raft.readStates, nil)
}
wrequestCtx := []byte("somedata2")
rawNode.Campaign()
for {
rd = rawNode.Ready()
s.Append(rd.Entries)
if rd.SoftState.Lead == rawNode.raft.id {
rawNode.Advance(rd)
// Once we are the leader, issue a ReadIndex request
rawNode.raft.step = appendStep
rawNode.ReadIndex(wrequestCtx)
break
}
rawNode.Advance(rd)
}
// ensure that MsgReadIndex message is sent to the underlying raft
if len(msgs) != 1 {
t.Fatalf("len(msgs) = %d, want %d", len(msgs), 1)
}
if msgs[0].Type != raftpb.MsgReadIndex {
t.Errorf("msg type = %d, want %d", msgs[0].Type, raftpb.MsgReadIndex)
}
if !bytes.Equal(msgs[0].Entries[0].Data, wrequestCtx) {
t.Errorf("data = %v, want %v", msgs[0].Entries[0].Data, wrequestCtx)
}
}
// TestBlockProposal from node_test.go has no equivalent in rawNode because there is
// no leader check in RawNode.
// TestNodeTick from node_test.go has no equivalent in rawNode because
// it reaches into the raft object which is not exposed.
// TestNodeStop from node_test.go has no equivalent in rawNode because there is
// no goroutine in RawNode.
// TestRawNodeStart ensures that a node can be started correctly. The node should
// start with correct configuration change entries, and can accept and commit
// proposals.
func TestRawNodeStart(t *testing.T) {
cc := raftpb.ConfChange{Type: raftpb.ConfChangeAddNode, NodeID: 1}
ccdata, err := cc.Marshal()
if err != nil {
t.Fatalf("unexpected marshal error: %v", err)
}
wants := []Ready{
{
HardState: raftpb.HardState{Term: 1, Commit: 1, Vote: 0},
Entries: []raftpb.Entry{
{Type: raftpb.EntryConfChange, Term: 1, Index: 1, Data: ccdata},
},
CommittedEntries: []raftpb.Entry{
{Type: raftpb.EntryConfChange, Term: 1, Index: 1, Data: ccdata},
},
},
{
HardState: raftpb.HardState{Term: 2, Commit: 3, Vote: 1},
Entries: []raftpb.Entry{{Term: 2, Index: 3, Data: []byte("foo")}},
CommittedEntries: []raftpb.Entry{{Term: 2, Index: 3, Data: []byte("foo")}},
},
}
storage := NewMemoryStorage()
rawNode, err := NewRawNode(newTestConfig(1, nil, 10, 1, storage), []Peer{{ID: 1}})
if err != nil {
t.Fatal(err)
}
rd := rawNode.Ready()
t.Logf("rd %v", rd)
if !reflect.DeepEqual(rd, wants[0]) {
t.Fatalf("#%d: g = %+v,\n w %+v", 1, rd, wants[0])
} else {
storage.Append(rd.Entries)
rawNode.Advance(rd)
}
storage.Append(rd.Entries)
rawNode.Advance(rd)
rawNode.Campaign()
rd = rawNode.Ready()
storage.Append(rd.Entries)
rawNode.Advance(rd)
rawNode.Propose([]byte("foo"))
if rd = rawNode.Ready(); !reflect.DeepEqual(rd, wants[1]) {
t.Errorf("#%d: g = %+v,\n w %+v", 2, rd, wants[1])
} else {
storage.Append(rd.Entries)
rawNode.Advance(rd)
}
if rawNode.HasReady() {
t.Errorf("unexpected Ready: %+v", rawNode.Ready())
}
}
func TestRawNodeRestart(t *testing.T) {
entries := []raftpb.Entry{
{Term: 1, Index: 1},
{Term: 1, Index: 2, Data: []byte("foo")},
}
st := raftpb.HardState{Term: 1, Commit: 1}
want := Ready{
HardState: emptyState,
// commit up to commit index in st
CommittedEntries: entries[:st.Commit],
}
storage := NewMemoryStorage()
storage.SetHardState(st)
storage.Append(entries)
rawNode, err := NewRawNode(newTestConfig(1, nil, 10, 1, storage), nil)
if err != nil {
t.Fatal(err)
}
rd := rawNode.Ready()
if !reflect.DeepEqual(rd, want) {
t.Errorf("g = %+v,\n w %+v", rd, want)
}
rawNode.Advance(rd)
if rawNode.HasReady() {
t.Errorf("unexpected Ready: %+v", rawNode.Ready())
}
}
func TestRawNodeRestartFromSnapshot(t *testing.T) {
snap := raftpb.Snapshot{
Metadata: raftpb.SnapshotMetadata{
ConfState: raftpb.ConfState{Nodes: []uint64{1, 2}},
Index: 2,
Term: 1,
},
}
entries := []raftpb.Entry{
{Term: 1, Index: 3, Data: []byte("foo")},
}
st := raftpb.HardState{Term: 1, Commit: 3}
want := Ready{
HardState: emptyState,
// commit up to commit index in st
CommittedEntries: entries,
}
s := NewMemoryStorage()
s.SetHardState(st)
s.ApplySnapshot(snap)
s.Append(entries)
rawNode, err := NewRawNode(newTestConfig(1, nil, 10, 1, s), nil)
if err != nil {
t.Fatal(err)
}
if rd := rawNode.Ready(); !reflect.DeepEqual(rd, want) {
t.Errorf("g = %+v,\n w %+v", rd, want)
} else {
rawNode.Advance(rd)
}
if rawNode.HasReady() {
t.Errorf("unexpected Ready: %+v", rawNode.HasReady())
}
}
// TestNodeAdvance from node_test.go has no equivalent in rawNode because there is
// no dependency check between Ready() and Advance()
func TestRawNodeStatus(t *testing.T) {
storage := NewMemoryStorage()
rawNode, err := NewRawNode(newTestConfig(1, nil, 10, 1, storage), []Peer{{ID: 1}})
if err != nil {
t.Fatal(err)
}
status := rawNode.Status()
if status == nil {
t.Errorf("expected status struct, got nil")
}
}

118
vendor/github.com/coreos/etcd/raft/read_only.go generated vendored Normal file
View file

@ -0,0 +1,118 @@
// Copyright 2016 The etcd Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package raft
import pb "github.com/coreos/etcd/raft/raftpb"
// ReadState provides state for read only query.
// It's caller's responsibility to call ReadIndex first before getting
// this state from ready, It's also caller's duty to differentiate if this
// state is what it requests through RequestCtx, eg. given a unique id as
// RequestCtx
type ReadState struct {
Index uint64
RequestCtx []byte
}
type readIndexStatus struct {
req pb.Message
index uint64
acks map[uint64]struct{}
}
type readOnly struct {
option ReadOnlyOption
pendingReadIndex map[string]*readIndexStatus
readIndexQueue []string
}
func newReadOnly(option ReadOnlyOption) *readOnly {
return &readOnly{
option: option,
pendingReadIndex: make(map[string]*readIndexStatus),
}
}
// addRequest adds a read only reuqest into readonly struct.
// `index` is the commit index of the raft state machine when it received
// the read only request.
// `m` is the original read only request message from the local or remote node.
func (ro *readOnly) addRequest(index uint64, m pb.Message) {
ctx := string(m.Entries[0].Data)
if _, ok := ro.pendingReadIndex[ctx]; ok {
return
}
ro.pendingReadIndex[ctx] = &readIndexStatus{index: index, req: m, acks: make(map[uint64]struct{})}
ro.readIndexQueue = append(ro.readIndexQueue, ctx)
}
// recvAck notifies the readonly struct that the raft state machine received
// an acknowledgment of the heartbeat that attached with the read only request
// context.
func (ro *readOnly) recvAck(m pb.Message) int {
rs, ok := ro.pendingReadIndex[string(m.Context)]
if !ok {
return 0
}
rs.acks[m.From] = struct{}{}
// add one to include an ack from local node
return len(rs.acks) + 1
}
// advance advances the read only request queue kept by the readonly struct.
// It dequeues the requests until it finds the read only request that has
// the same context as the given `m`.
func (ro *readOnly) advance(m pb.Message) []*readIndexStatus {
var (
i int
found bool
)
ctx := string(m.Context)
rss := []*readIndexStatus{}
for _, okctx := range ro.readIndexQueue {
i++
rs, ok := ro.pendingReadIndex[okctx]
if !ok {
panic("cannot find corresponding read state from pending map")
}
rss = append(rss, rs)
if okctx == ctx {
found = true
break
}
}
if found {
ro.readIndexQueue = ro.readIndexQueue[i:]
for _, rs := range rss {
delete(ro.pendingReadIndex, string(rs.req.Entries[0].Data))
}
return rss
}
return nil
}
// lastPendingRequestCtx returns the context of the last pending read only
// request in readonly struct.
func (ro *readOnly) lastPendingRequestCtx() string {
if len(ro.readIndexQueue) == 0 {
return ""
}
return ro.readIndexQueue[len(ro.readIndexQueue)-1]
}

76
vendor/github.com/coreos/etcd/raft/status.go generated vendored Normal file
View file

@ -0,0 +1,76 @@
// Copyright 2015 The etcd Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package raft
import (
"fmt"
pb "github.com/coreos/etcd/raft/raftpb"
)
type Status struct {
ID uint64
pb.HardState
SoftState
Applied uint64
Progress map[uint64]Progress
}
// getStatus gets a copy of the current raft status.
func getStatus(r *raft) Status {
s := Status{ID: r.id}
s.HardState = r.hardState()
s.SoftState = *r.softState()
s.Applied = r.raftLog.applied
if s.RaftState == StateLeader {
s.Progress = make(map[uint64]Progress)
for id, p := range r.prs {
s.Progress[id] = *p
}
}
return s
}
// MarshalJSON translates the raft status into JSON.
// TODO: try to simplify this by introducing ID type into raft
func (s Status) MarshalJSON() ([]byte, error) {
j := fmt.Sprintf(`{"id":"%x","term":%d,"vote":"%x","commit":%d,"lead":"%x","raftState":%q,"progress":{`,
s.ID, s.Term, s.Vote, s.Commit, s.Lead, s.RaftState)
if len(s.Progress) == 0 {
j += "}}"
} else {
for k, v := range s.Progress {
subj := fmt.Sprintf(`"%x":{"match":%d,"next":%d,"state":%q},`, k, v.Match, v.Next, v.State)
j += subj
}
// remove the trailing ","
j = j[:len(j)-1] + "}}"
}
return []byte(j), nil
}
func (s Status) String() string {
b, err := s.MarshalJSON()
if err != nil {
raftLogger.Panicf("unexpected error: %v", err)
}
return string(b)
}

271
vendor/github.com/coreos/etcd/raft/storage.go generated vendored Normal file
View file

@ -0,0 +1,271 @@
// Copyright 2015 The etcd Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package raft
import (
"errors"
"sync"
pb "github.com/coreos/etcd/raft/raftpb"
)
// ErrCompacted is returned by Storage.Entries/Compact when a requested
// index is unavailable because it predates the last snapshot.
var ErrCompacted = errors.New("requested index is unavailable due to compaction")
// ErrSnapOutOfDate is returned by Storage.CreateSnapshot when a requested
// index is older than the existing snapshot.
var ErrSnapOutOfDate = errors.New("requested index is older than the existing snapshot")
// ErrUnavailable is returned by Storage interface when the requested log entries
// are unavailable.
var ErrUnavailable = errors.New("requested entry at index is unavailable")
// ErrSnapshotTemporarilyUnavailable is returned by the Storage interface when the required
// snapshot is temporarily unavailable.
var ErrSnapshotTemporarilyUnavailable = errors.New("snapshot is temporarily unavailable")
// Storage is an interface that may be implemented by the application
// to retrieve log entries from storage.
//
// If any Storage method returns an error, the raft instance will
// become inoperable and refuse to participate in elections; the
// application is responsible for cleanup and recovery in this case.
type Storage interface {
// InitialState returns the saved HardState and ConfState information.
InitialState() (pb.HardState, pb.ConfState, error)
// Entries returns a slice of log entries in the range [lo,hi).
// MaxSize limits the total size of the log entries returned, but
// Entries returns at least one entry if any.
Entries(lo, hi, maxSize uint64) ([]pb.Entry, error)
// Term returns the term of entry i, which must be in the range
// [FirstIndex()-1, LastIndex()]. The term of the entry before
// FirstIndex is retained for matching purposes even though the
// rest of that entry may not be available.
Term(i uint64) (uint64, error)
// LastIndex returns the index of the last entry in the log.
LastIndex() (uint64, error)
// FirstIndex returns the index of the first log entry that is
// possibly available via Entries (older entries have been incorporated
// into the latest Snapshot; if storage only contains the dummy entry the
// first log entry is not available).
FirstIndex() (uint64, error)
// Snapshot returns the most recent snapshot.
// If snapshot is temporarily unavailable, it should return ErrSnapshotTemporarilyUnavailable,
// so raft state machine could know that Storage needs some time to prepare
// snapshot and call Snapshot later.
Snapshot() (pb.Snapshot, error)
}
// MemoryStorage implements the Storage interface backed by an
// in-memory array.
type MemoryStorage struct {
// Protects access to all fields. Most methods of MemoryStorage are
// run on the raft goroutine, but Append() is run on an application
// goroutine.
sync.Mutex
hardState pb.HardState
snapshot pb.Snapshot
// ents[i] has raft log position i+snapshot.Metadata.Index
ents []pb.Entry
}
// NewMemoryStorage creates an empty MemoryStorage.
func NewMemoryStorage() *MemoryStorage {
return &MemoryStorage{
// When starting from scratch populate the list with a dummy entry at term zero.
ents: make([]pb.Entry, 1),
}
}
// InitialState implements the Storage interface.
func (ms *MemoryStorage) InitialState() (pb.HardState, pb.ConfState, error) {
return ms.hardState, ms.snapshot.Metadata.ConfState, nil
}
// SetHardState saves the current HardState.
func (ms *MemoryStorage) SetHardState(st pb.HardState) error {
ms.Lock()
defer ms.Unlock()
ms.hardState = st
return nil
}
// Entries implements the Storage interface.
func (ms *MemoryStorage) Entries(lo, hi, maxSize uint64) ([]pb.Entry, error) {
ms.Lock()
defer ms.Unlock()
offset := ms.ents[0].Index
if lo <= offset {
return nil, ErrCompacted
}
if hi > ms.lastIndex()+1 {
raftLogger.Panicf("entries' hi(%d) is out of bound lastindex(%d)", hi, ms.lastIndex())
}
// only contains dummy entries.
if len(ms.ents) == 1 {
return nil, ErrUnavailable
}
ents := ms.ents[lo-offset : hi-offset]
return limitSize(ents, maxSize), nil
}
// Term implements the Storage interface.
func (ms *MemoryStorage) Term(i uint64) (uint64, error) {
ms.Lock()
defer ms.Unlock()
offset := ms.ents[0].Index
if i < offset {
return 0, ErrCompacted
}
if int(i-offset) >= len(ms.ents) {
return 0, ErrUnavailable
}
return ms.ents[i-offset].Term, nil
}
// LastIndex implements the Storage interface.
func (ms *MemoryStorage) LastIndex() (uint64, error) {
ms.Lock()
defer ms.Unlock()
return ms.lastIndex(), nil
}
func (ms *MemoryStorage) lastIndex() uint64 {
return ms.ents[0].Index + uint64(len(ms.ents)) - 1
}
// FirstIndex implements the Storage interface.
func (ms *MemoryStorage) FirstIndex() (uint64, error) {
ms.Lock()
defer ms.Unlock()
return ms.firstIndex(), nil
}
func (ms *MemoryStorage) firstIndex() uint64 {
return ms.ents[0].Index + 1
}
// Snapshot implements the Storage interface.
func (ms *MemoryStorage) Snapshot() (pb.Snapshot, error) {
ms.Lock()
defer ms.Unlock()
return ms.snapshot, nil
}
// ApplySnapshot overwrites the contents of this Storage object with
// those of the given snapshot.
func (ms *MemoryStorage) ApplySnapshot(snap pb.Snapshot) error {
ms.Lock()
defer ms.Unlock()
//handle check for old snapshot being applied
msIndex := ms.snapshot.Metadata.Index
snapIndex := snap.Metadata.Index
if msIndex >= snapIndex {
return ErrSnapOutOfDate
}
ms.snapshot = snap
ms.ents = []pb.Entry{{Term: snap.Metadata.Term, Index: snap.Metadata.Index}}
return nil
}
// CreateSnapshot makes a snapshot which can be retrieved with Snapshot() and
// can be used to reconstruct the state at that point.
// If any configuration changes have been made since the last compaction,
// the result of the last ApplyConfChange must be passed in.
func (ms *MemoryStorage) CreateSnapshot(i uint64, cs *pb.ConfState, data []byte) (pb.Snapshot, error) {
ms.Lock()
defer ms.Unlock()
if i <= ms.snapshot.Metadata.Index {
return pb.Snapshot{}, ErrSnapOutOfDate
}
offset := ms.ents[0].Index
if i > ms.lastIndex() {
raftLogger.Panicf("snapshot %d is out of bound lastindex(%d)", i, ms.lastIndex())
}
ms.snapshot.Metadata.Index = i
ms.snapshot.Metadata.Term = ms.ents[i-offset].Term
if cs != nil {
ms.snapshot.Metadata.ConfState = *cs
}
ms.snapshot.Data = data
return ms.snapshot, nil
}
// Compact discards all log entries prior to compactIndex.
// It is the application's responsibility to not attempt to compact an index
// greater than raftLog.applied.
func (ms *MemoryStorage) Compact(compactIndex uint64) error {
ms.Lock()
defer ms.Unlock()
offset := ms.ents[0].Index
if compactIndex <= offset {
return ErrCompacted
}
if compactIndex > ms.lastIndex() {
raftLogger.Panicf("compact %d is out of bound lastindex(%d)", compactIndex, ms.lastIndex())
}
i := compactIndex - offset
ents := make([]pb.Entry, 1, 1+uint64(len(ms.ents))-i)
ents[0].Index = ms.ents[i].Index
ents[0].Term = ms.ents[i].Term
ents = append(ents, ms.ents[i+1:]...)
ms.ents = ents
return nil
}
// Append the new entries to storage.
// TODO (xiangli): ensure the entries are continuous and
// entries[0].Index > ms.entries[0].Index
func (ms *MemoryStorage) Append(entries []pb.Entry) error {
if len(entries) == 0 {
return nil
}
ms.Lock()
defer ms.Unlock()
first := ms.firstIndex()
last := entries[0].Index + uint64(len(entries)) - 1
// shortcut if there is no new entry.
if last < first {
return nil
}
// truncate compacted entries
if first > entries[0].Index {
entries = entries[first-entries[0].Index:]
}
offset := entries[0].Index - ms.ents[0].Index
switch {
case uint64(len(ms.ents)) > offset:
ms.ents = append([]pb.Entry{}, ms.ents[:offset]...)
ms.ents = append(ms.ents, entries...)
case uint64(len(ms.ents)) == offset:
ms.ents = append(ms.ents, entries...)
default:
raftLogger.Panicf("missing log entry [last: %d, append at: %d]",
ms.lastIndex(), entries[0].Index)
}
return nil
}

285
vendor/github.com/coreos/etcd/raft/storage_test.go generated vendored Normal file
View file

@ -0,0 +1,285 @@
// Copyright 2015 The etcd Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package raft
import (
"math"
"reflect"
"testing"
pb "github.com/coreos/etcd/raft/raftpb"
)
func TestStorageTerm(t *testing.T) {
ents := []pb.Entry{{Index: 3, Term: 3}, {Index: 4, Term: 4}, {Index: 5, Term: 5}}
tests := []struct {
i uint64
werr error
wterm uint64
wpanic bool
}{
{2, ErrCompacted, 0, false},
{3, nil, 3, false},
{4, nil, 4, false},
{5, nil, 5, false},
{6, ErrUnavailable, 0, false},
}
for i, tt := range tests {
s := &MemoryStorage{ents: ents}
func() {
defer func() {
if r := recover(); r != nil {
if !tt.wpanic {
t.Errorf("%d: panic = %v, want %v", i, true, tt.wpanic)
}
}
}()
term, err := s.Term(tt.i)
if err != tt.werr {
t.Errorf("#%d: err = %v, want %v", i, err, tt.werr)
}
if term != tt.wterm {
t.Errorf("#%d: term = %d, want %d", i, term, tt.wterm)
}
}()
}
}
func TestStorageEntries(t *testing.T) {
ents := []pb.Entry{{Index: 3, Term: 3}, {Index: 4, Term: 4}, {Index: 5, Term: 5}, {Index: 6, Term: 6}}
tests := []struct {
lo, hi, maxsize uint64
werr error
wentries []pb.Entry
}{
{2, 6, math.MaxUint64, ErrCompacted, nil},
{3, 4, math.MaxUint64, ErrCompacted, nil},
{4, 5, math.MaxUint64, nil, []pb.Entry{{Index: 4, Term: 4}}},
{4, 6, math.MaxUint64, nil, []pb.Entry{{Index: 4, Term: 4}, {Index: 5, Term: 5}}},
{4, 7, math.MaxUint64, nil, []pb.Entry{{Index: 4, Term: 4}, {Index: 5, Term: 5}, {Index: 6, Term: 6}}},
// even if maxsize is zero, the first entry should be returned
{4, 7, 0, nil, []pb.Entry{{Index: 4, Term: 4}}},
// limit to 2
{4, 7, uint64(ents[1].Size() + ents[2].Size()), nil, []pb.Entry{{Index: 4, Term: 4}, {Index: 5, Term: 5}}},
// limit to 2
{4, 7, uint64(ents[1].Size() + ents[2].Size() + ents[3].Size()/2), nil, []pb.Entry{{Index: 4, Term: 4}, {Index: 5, Term: 5}}},
{4, 7, uint64(ents[1].Size() + ents[2].Size() + ents[3].Size() - 1), nil, []pb.Entry{{Index: 4, Term: 4}, {Index: 5, Term: 5}}},
// all
{4, 7, uint64(ents[1].Size() + ents[2].Size() + ents[3].Size()), nil, []pb.Entry{{Index: 4, Term: 4}, {Index: 5, Term: 5}, {Index: 6, Term: 6}}},
}
for i, tt := range tests {
s := &MemoryStorage{ents: ents}
entries, err := s.Entries(tt.lo, tt.hi, tt.maxsize)
if err != tt.werr {
t.Errorf("#%d: err = %v, want %v", i, err, tt.werr)
}
if !reflect.DeepEqual(entries, tt.wentries) {
t.Errorf("#%d: entries = %v, want %v", i, entries, tt.wentries)
}
}
}
func TestStorageLastIndex(t *testing.T) {
ents := []pb.Entry{{Index: 3, Term: 3}, {Index: 4, Term: 4}, {Index: 5, Term: 5}}
s := &MemoryStorage{ents: ents}
last, err := s.LastIndex()
if err != nil {
t.Errorf("err = %v, want nil", err)
}
if last != 5 {
t.Errorf("term = %d, want %d", last, 5)
}
s.Append([]pb.Entry{{Index: 6, Term: 5}})
last, err = s.LastIndex()
if err != nil {
t.Errorf("err = %v, want nil", err)
}
if last != 6 {
t.Errorf("last = %d, want %d", last, 5)
}
}
func TestStorageFirstIndex(t *testing.T) {
ents := []pb.Entry{{Index: 3, Term: 3}, {Index: 4, Term: 4}, {Index: 5, Term: 5}}
s := &MemoryStorage{ents: ents}
first, err := s.FirstIndex()
if err != nil {
t.Errorf("err = %v, want nil", err)
}
if first != 4 {
t.Errorf("first = %d, want %d", first, 4)
}
s.Compact(4)
first, err = s.FirstIndex()
if err != nil {
t.Errorf("err = %v, want nil", err)
}
if first != 5 {
t.Errorf("first = %d, want %d", first, 5)
}
}
func TestStorageCompact(t *testing.T) {
ents := []pb.Entry{{Index: 3, Term: 3}, {Index: 4, Term: 4}, {Index: 5, Term: 5}}
tests := []struct {
i uint64
werr error
windex uint64
wterm uint64
wlen int
}{
{2, ErrCompacted, 3, 3, 3},
{3, ErrCompacted, 3, 3, 3},
{4, nil, 4, 4, 2},
{5, nil, 5, 5, 1},
}
for i, tt := range tests {
s := &MemoryStorage{ents: ents}
err := s.Compact(tt.i)
if err != tt.werr {
t.Errorf("#%d: err = %v, want %v", i, err, tt.werr)
}
if s.ents[0].Index != tt.windex {
t.Errorf("#%d: index = %d, want %d", i, s.ents[0].Index, tt.windex)
}
if s.ents[0].Term != tt.wterm {
t.Errorf("#%d: term = %d, want %d", i, s.ents[0].Term, tt.wterm)
}
if len(s.ents) != tt.wlen {
t.Errorf("#%d: len = %d, want %d", i, len(s.ents), tt.wlen)
}
}
}
func TestStorageCreateSnapshot(t *testing.T) {
ents := []pb.Entry{{Index: 3, Term: 3}, {Index: 4, Term: 4}, {Index: 5, Term: 5}}
cs := &pb.ConfState{Nodes: []uint64{1, 2, 3}}
data := []byte("data")
tests := []struct {
i uint64
werr error
wsnap pb.Snapshot
}{
{4, nil, pb.Snapshot{Data: data, Metadata: pb.SnapshotMetadata{Index: 4, Term: 4, ConfState: *cs}}},
{5, nil, pb.Snapshot{Data: data, Metadata: pb.SnapshotMetadata{Index: 5, Term: 5, ConfState: *cs}}},
}
for i, tt := range tests {
s := &MemoryStorage{ents: ents}
snap, err := s.CreateSnapshot(tt.i, cs, data)
if err != tt.werr {
t.Errorf("#%d: err = %v, want %v", i, err, tt.werr)
}
if !reflect.DeepEqual(snap, tt.wsnap) {
t.Errorf("#%d: snap = %+v, want %+v", i, snap, tt.wsnap)
}
}
}
func TestStorageAppend(t *testing.T) {
ents := []pb.Entry{{Index: 3, Term: 3}, {Index: 4, Term: 4}, {Index: 5, Term: 5}}
tests := []struct {
entries []pb.Entry
werr error
wentries []pb.Entry
}{
{
[]pb.Entry{{Index: 3, Term: 3}, {Index: 4, Term: 4}, {Index: 5, Term: 5}},
nil,
[]pb.Entry{{Index: 3, Term: 3}, {Index: 4, Term: 4}, {Index: 5, Term: 5}},
},
{
[]pb.Entry{{Index: 3, Term: 3}, {Index: 4, Term: 6}, {Index: 5, Term: 6}},
nil,
[]pb.Entry{{Index: 3, Term: 3}, {Index: 4, Term: 6}, {Index: 5, Term: 6}},
},
{
[]pb.Entry{{Index: 3, Term: 3}, {Index: 4, Term: 4}, {Index: 5, Term: 5}, {Index: 6, Term: 5}},
nil,
[]pb.Entry{{Index: 3, Term: 3}, {Index: 4, Term: 4}, {Index: 5, Term: 5}, {Index: 6, Term: 5}},
},
// truncate incoming entries, truncate the existing entries and append
{
[]pb.Entry{{Index: 2, Term: 3}, {Index: 3, Term: 3}, {Index: 4, Term: 5}},
nil,
[]pb.Entry{{Index: 3, Term: 3}, {Index: 4, Term: 5}},
},
// truncate the existing entries and append
{
[]pb.Entry{{Index: 4, Term: 5}},
nil,
[]pb.Entry{{Index: 3, Term: 3}, {Index: 4, Term: 5}},
},
// direct append
{
[]pb.Entry{{Index: 6, Term: 5}},
nil,
[]pb.Entry{{Index: 3, Term: 3}, {Index: 4, Term: 4}, {Index: 5, Term: 5}, {Index: 6, Term: 5}},
},
}
for i, tt := range tests {
s := &MemoryStorage{ents: ents}
err := s.Append(tt.entries)
if err != tt.werr {
t.Errorf("#%d: err = %v, want %v", i, err, tt.werr)
}
if !reflect.DeepEqual(s.ents, tt.wentries) {
t.Errorf("#%d: entries = %v, want %v", i, s.ents, tt.wentries)
}
}
}
func TestStorageApplySnapshot(t *testing.T) {
cs := &pb.ConfState{Nodes: []uint64{1, 2, 3}}
data := []byte("data")
tests := []pb.Snapshot{{Data: data, Metadata: pb.SnapshotMetadata{Index: 4, Term: 4, ConfState: *cs}},
{Data: data, Metadata: pb.SnapshotMetadata{Index: 3, Term: 3, ConfState: *cs}},
}
s := NewMemoryStorage()
//Apply Snapshot successful
i := 0
tt := tests[i]
err := s.ApplySnapshot(tt)
if err != nil {
t.Errorf("#%d: err = %v, want %v", i, err, nil)
}
//Apply Snapshot fails due to ErrSnapOutOfDate
i = 1
tt = tests[i]
err = s.ApplySnapshot(tt)
if err != ErrSnapOutOfDate {
t.Errorf("#%d: err = %v, want %v", i, err, ErrSnapOutOfDate)
}
}

129
vendor/github.com/coreos/etcd/raft/util.go generated vendored Normal file
View file

@ -0,0 +1,129 @@
// Copyright 2015 The etcd Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package raft
import (
"bytes"
"fmt"
pb "github.com/coreos/etcd/raft/raftpb"
)
func (st StateType) MarshalJSON() ([]byte, error) {
return []byte(fmt.Sprintf("%q", st.String())), nil
}
// uint64Slice implements sort interface
type uint64Slice []uint64
func (p uint64Slice) Len() int { return len(p) }
func (p uint64Slice) Less(i, j int) bool { return p[i] < p[j] }
func (p uint64Slice) Swap(i, j int) { p[i], p[j] = p[j], p[i] }
func min(a, b uint64) uint64 {
if a > b {
return b
}
return a
}
func max(a, b uint64) uint64 {
if a > b {
return a
}
return b
}
func IsLocalMsg(msgt pb.MessageType) bool {
return msgt == pb.MsgHup || msgt == pb.MsgBeat || msgt == pb.MsgUnreachable ||
msgt == pb.MsgSnapStatus || msgt == pb.MsgCheckQuorum
}
func IsResponseMsg(msgt pb.MessageType) bool {
return msgt == pb.MsgAppResp || msgt == pb.MsgVoteResp || msgt == pb.MsgHeartbeatResp || msgt == pb.MsgUnreachable || msgt == pb.MsgPreVoteResp
}
// voteResponseType maps vote and prevote message types to their corresponding responses.
func voteRespMsgType(msgt pb.MessageType) pb.MessageType {
switch msgt {
case pb.MsgVote:
return pb.MsgVoteResp
case pb.MsgPreVote:
return pb.MsgPreVoteResp
default:
panic(fmt.Sprintf("not a vote message: %s", msgt))
}
}
// EntryFormatter can be implemented by the application to provide human-readable formatting
// of entry data. Nil is a valid EntryFormatter and will use a default format.
type EntryFormatter func([]byte) string
// DescribeMessage returns a concise human-readable description of a
// Message for debugging.
func DescribeMessage(m pb.Message, f EntryFormatter) string {
var buf bytes.Buffer
fmt.Fprintf(&buf, "%x->%x %v Term:%d Log:%d/%d", m.From, m.To, m.Type, m.Term, m.LogTerm, m.Index)
if m.Reject {
fmt.Fprintf(&buf, " Rejected")
if m.RejectHint != 0 {
fmt.Fprintf(&buf, "(Hint:%d)", m.RejectHint)
}
}
if m.Commit != 0 {
fmt.Fprintf(&buf, " Commit:%d", m.Commit)
}
if len(m.Entries) > 0 {
fmt.Fprintf(&buf, " Entries:[")
for i, e := range m.Entries {
if i != 0 {
buf.WriteString(", ")
}
buf.WriteString(DescribeEntry(e, f))
}
fmt.Fprintf(&buf, "]")
}
if !IsEmptySnap(m.Snapshot) {
fmt.Fprintf(&buf, " Snapshot:%v", m.Snapshot)
}
return buf.String()
}
// DescribeEntry returns a concise human-readable description of an
// Entry for debugging.
func DescribeEntry(e pb.Entry, f EntryFormatter) string {
var formatted string
if e.Type == pb.EntryNormal && f != nil {
formatted = f(e.Data)
} else {
formatted = fmt.Sprintf("%q", e.Data)
}
return fmt.Sprintf("%d/%d %s %s", e.Term, e.Index, e.Type, formatted)
}
func limitSize(ents []pb.Entry, maxSize uint64) []pb.Entry {
if len(ents) == 0 {
return ents
}
size := ents[0].Size()
var limit int
for limit = 1; limit < len(ents); limit++ {
size += ents[limit].Size()
if uint64(size) > maxSize {
break
}
}
return ents[:limit]
}

106
vendor/github.com/coreos/etcd/raft/util_test.go generated vendored Normal file
View file

@ -0,0 +1,106 @@
// Copyright 2015 The etcd Authors
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
package raft
import (
"math"
"reflect"
"strings"
"testing"
pb "github.com/coreos/etcd/raft/raftpb"
)
var testFormatter EntryFormatter = func(data []byte) string {
return strings.ToUpper(string(data))
}
func TestDescribeEntry(t *testing.T) {
entry := pb.Entry{
Term: 1,
Index: 2,
Type: pb.EntryNormal,
Data: []byte("hello\x00world"),
}
defaultFormatted := DescribeEntry(entry, nil)
if defaultFormatted != "1/2 EntryNormal \"hello\\x00world\"" {
t.Errorf("unexpected default output: %s", defaultFormatted)
}
customFormatted := DescribeEntry(entry, testFormatter)
if customFormatted != "1/2 EntryNormal HELLO\x00WORLD" {
t.Errorf("unexpected custom output: %s", customFormatted)
}
}
func TestLimitSize(t *testing.T) {
ents := []pb.Entry{{Index: 4, Term: 4}, {Index: 5, Term: 5}, {Index: 6, Term: 6}}
tests := []struct {
maxsize uint64
wentries []pb.Entry
}{
{math.MaxUint64, []pb.Entry{{Index: 4, Term: 4}, {Index: 5, Term: 5}, {Index: 6, Term: 6}}},
// even if maxsize is zero, the first entry should be returned
{0, []pb.Entry{{Index: 4, Term: 4}}},
// limit to 2
{uint64(ents[0].Size() + ents[1].Size()), []pb.Entry{{Index: 4, Term: 4}, {Index: 5, Term: 5}}},
// limit to 2
{uint64(ents[0].Size() + ents[1].Size() + ents[2].Size()/2), []pb.Entry{{Index: 4, Term: 4}, {Index: 5, Term: 5}}},
{uint64(ents[0].Size() + ents[1].Size() + ents[2].Size() - 1), []pb.Entry{{Index: 4, Term: 4}, {Index: 5, Term: 5}}},
// all
{uint64(ents[0].Size() + ents[1].Size() + ents[2].Size()), []pb.Entry{{Index: 4, Term: 4}, {Index: 5, Term: 5}, {Index: 6, Term: 6}}},
}
for i, tt := range tests {
if !reflect.DeepEqual(limitSize(ents, tt.maxsize), tt.wentries) {
t.Errorf("#%d: entries = %v, want %v", i, limitSize(ents, tt.maxsize), tt.wentries)
}
}
}
func TestIsLocalMsg(t *testing.T) {
tests := []struct {
msgt pb.MessageType
isLocal bool
}{
{pb.MsgHup, true},
{pb.MsgBeat, true},
{pb.MsgUnreachable, true},
{pb.MsgSnapStatus, true},
{pb.MsgCheckQuorum, true},
{pb.MsgTransferLeader, false},
{pb.MsgProp, false},
{pb.MsgApp, false},
{pb.MsgAppResp, false},
{pb.MsgVote, false},
{pb.MsgVoteResp, false},
{pb.MsgSnap, false},
{pb.MsgHeartbeat, false},
{pb.MsgHeartbeatResp, false},
{pb.MsgTimeoutNow, false},
{pb.MsgReadIndex, false},
{pb.MsgReadIndexResp, false},
{pb.MsgPreVote, false},
{pb.MsgPreVoteResp, false},
}
for i, tt := range tests {
got := IsLocalMsg(tt.msgt)
if got != tt.isLocal {
t.Errorf("#%d: got %v, want %v", i, got, tt.isLocal)
}
}
}