Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[State Sync] Add a new Storage Service implementation (server-side) #9199

Merged
merged 2 commits into from Sep 17, 2021

Conversation

JoshLind
Copy link
Contributor

Motivation

This PR adds a new server-side implementation for the Storage Service. The Storage Service is an inter-node service allowing Diem nodes to request storage data (e.g., transactions and proofs) directly from other nodes over the Diem network. This PR sets up the framework (template) for this service by implementing the plumbing between request/response processing from the server-side, as well as outlines the required interface between the server and local Diem DB.

To achieve this, the PR offers 2 commits:

  1. Add the new server implementation (as a library to later be exposed over the network)
  2. Add several (simple) tests for the server implementation. These will need to be improved and expanded in the future.

Notes:

  • The implementation offered here is by no means complete (and you'll notice many TODOs). Instead, this PR is meant to set up the framework so that we can start building on the server-side implementation and make the Storage Service a living thing 馃槃

Have you read the Contributing Guidelines on pull requests?

Yes.

Test Plan

A set of simple unit tests have been added for this functionality.

Related PRs

None, but this PR relates to: #8906

@JoshLind JoshLind requested a review from a team as a code owner September 16, 2021 16:54
@bors-libra bors-libra added this to In Review in bors Sep 16, 2021
/// The underlying implementation of the StorageReaderInterface, used by the
/// storage server.
pub struct StorageReader {
storage: Arc<RwLock<DbReaderWriter>>,
Copy link

@msmouse msmouse Sep 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it ever write to the DB? Why not a Arc<dyn DbReader>?

And why is the RwLock needed?

Copy link
Contributor Author

@JoshLind JoshLind Sep 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msmouse, it doesn't ever write to the db. However, I'm trying to be consistent in the state sync and storage service code. Everything will get an Arc<RwLock<DbReaderWriter>>. The reason for this is we'll have several state sync (and storage service threads) reading and writing to the db concurrently and it's simpler to give them all a shared RwLock object than to try and do some magic around reader and writer only. IIUC, the DiemDB struct today does not have a global lock (e.g. mutex), correct? If so, I want to avoid any scary situations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like the DbReaderWriter is just

pub struct DbReaderWriter {
    pub reader: Arc<dyn DbReader>,
    pub writer: Arc<dyn DbWriter>,
}

so whoever is constructing the StorageReader just could just clone the inner Arc<dyn DbReader> and hand it down.

as far as concurrency, part of the DbReader API contract is to be thread safe, so you don't need any locks/synchronization to call it. after all, we're effectively doing read-only calls to an MVCC database here, which supports unlimited read concurrency

Copy link
Contributor Author

@JoshLind JoshLind Sep 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if a single DbReader call (e.g., get_transactions()) makes several independent calls to the database (e.g., get_transactions, get_transaction_info and get_events_by_version). Each of these calls is going to read different objects in the database. MVCC might guarantee that each read works, but there's no guarantees across several independent reads, correct? A writer might write between them and violate the higher level assumptions (at the DbReader API). Likewise with interleaved writes. Note: the get_transactions() example is a little simple (because you could argue it's bound by versions, which shouldn't be changed once written, but I'm not comfortable making that assumption for everything). Perhaps this has all been thought about before, though? With a single read/write lock around the entire API, it's much easier to reason about (each API call can be thought of as executing as a single "database transaction" or atomic operation). Happy to be convinced, but I'm still not there yet 馃槃

Copy link

@msmouse msmouse Sep 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DB interfaces always take a ledger_version parameter if necessary, to guarantee a consistent view of the blockchain across reads. If not a flaw, it never assumes the user asks for "the latest".

Copy link
Contributor Author

@JoshLind JoshLind Sep 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msmouse, yeah, my feeling is that in most cases we might be alright. But, race conditions can be very sneaky and I'm still not 100% sure we won't have issues. We're also moving to a world where state sync is going to be multi-threaded and writing to the DB concurrently (e.g., writing different epochs, account states, transactions in parallel, etc.). Thus, I want to do this simple thing now (i.e., wrap all uses of state sync and storage code in a single RwLock). If we find that the single RwLock becomes a point of contention, I'll happily optimize it, but I'd like to make sure that everything is safe and simple for now. Does that make sense?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as we are on the same page.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely, thanks @msmouse (and @phlip9)! I've noted this down as something to dig into in the future (once we have an initial end-to-end prototype). 馃槃

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But still, we shouldn't need to write the DB in concurrency. It's not worth it. 馃槣

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msmouse.. I think we'll see. I just don't want to rule this out now, especially as our current design is going to be processing data chunks in parallel 馃槃

// Returns the frozen subtrees at the specified version.
//fn get_frozen_subtrees(
// version,
//) -> Result<Vec<HashValue>, Error>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this API for? bootstrapping the TransactionAccumulatorSummary?

Copy link
Contributor Author

@JoshLind JoshLind Sep 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's for the case where a node grabs all account states and needs to bootstrap its view of the transaction accumulator. However, I'll remove it for now until I'm absolutely sure we need it. I think there's a way around it.

#[error("Unexpected error encountered: {0}")]
UnexpectedErrorEncountered(String),
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a pub type Result<T, E = Error> = ::std::result::Result<T, E>;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate? Why would we want this? These errors should be internal to the storage service only.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This I think is just more convenience along with the Error so you don't have to keep typing Error. It's optional

Copy link
Contributor Author

@JoshLind JoshLind Sep 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As weirdly as it sounds, I prefer typing Error everywhere. Especially since so much of our code uses anyhow (which I'm not a big fan of). This way, I know exactly what Error type I'm getting 馃槃 . It's also consistent with the rest of our code (elsewhere)

pub fn handle_request(
&self,
request: StorageServiceRequest,
) -> Result<StorageServiceResponse, Error> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we probably want handle_request to return just StorageServiceResponse and convert any errors into StorageServiceErrors? Even if DiemDB returns an error, it should probably just return an "Internal Error" variant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done this for now but we'll need to give it more attention in the future. For example, if some errors are really bad (e.g., internal invariant violations), we may want them to bubble up to main to cause the node to die. But we can deal with that later 馃槃

Comment on lines +162 to +164
"state-sync/storage-service/server",
"state-sync/storage-service/types",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, something I hadn't been thinking about before now, think about how this works out from the logging perspective, and how you find the logs with all of these crates. You'll have to check how the modules work out and hopefully they're consistent. For example, I found out there were 3 different ones for storage yesterday, and only one for say state sync

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's not great. But, I'd argue that this is a limitation with the logger and setup and not with our code structure. For example, it's important to split the code here up because of circular dependencies and not wanting to tie the server and client together (which is what state sync currently does). When I come to add the logger I'll try to dig a little

state-sync/storage-service/types/src/lib.rs Show resolved Hide resolved
#[error("Unexpected error encountered: {0}")]
UnexpectedErrorEncountered(String),
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This I think is just more convenience along with the Error so you don't have to keep typing Error. It's optional

state-sync/storage-service/server/src/lib.rs Show resolved Hide resolved
gregnazario
gregnazario previously approved these changes Sep 17, 2021
@JoshLind
Copy link
Contributor Author

Thanks, all! 馃槃

/land

@bors-libra bors-libra moved this from In Review to Queued in bors Sep 17, 2021
@bors-libra bors-libra moved this from Queued to Testing in bors Sep 17, 2021
@github-actions
Copy link

Cluster Test Result

Test runner setup time spent 281 secs
Compatibility test results for land_a9975df1 ==> land_6122cb78 (PR)
1. All instances running land_a9975df1, generating some traffic on network
2. First full node land_a9975df1 ==> land_6122cb78, to validate new full node to old validator node traffic
3. First Validator node land_a9975df1 ==> land_6122cb78, to validate storage compatibility
4. First batch validators (14) land_a9975df1 ==> land_6122cb78, to test consensus and traffic between old full nodes and new validator node
5. First batch full nodes (14) land_a9975df1 ==> land_6122cb78
6. Second batch validators (15) land_a9975df1 ==> land_6122cb78, to upgrade rest of the validators
7. Second batch of full nodes (15) land_a9975df1 ==> land_6122cb78, to finish the network upgrade, time spent 669 secs
all up : 1183 TPS, 3838 ms latency, 4400 ms p99 latency, no expired txns, time spent 250 secs
Logs: http://kibana.ct-0-k8s-testnet.aws.hlw3truzy4ls.com/app/kibana#/discover?_g=(time:(from:'2021-09-17T23:19:28Z',to:'2021-09-17T23:42:21Z'))
Dashboard: http://grafana.ct-0-k8s-testnet.aws.hlw3truzy4ls.com/d/performance/performance?from=1631920768000&to=1631922141000
Validator 1 logs: http://kibana.ct-0-k8s-testnet.aws.hlw3truzy4ls.com/app/kibana#/discover?_g=(time:(from:'2021-09-17T23:19:28Z',to:'2021-09-17T23:42:21Z'))&_a=(columns:!(log),query:(language:kuery,query:'kubernetes.pod_name:"val-1"'),sort:!(!('@timestamp',desc)))

Repro cmd:

./scripts/cti --tag land_a9975df1 --cluster-test-tag land_6122cb78 -E BATCH_SIZE=15 -E UPDATE_TO_TAG=land_6122cb78 --report report.json --suite land_blocking_compat

馃帀 Land-blocking cluster test passed! 馃憣

@bors-libra bors-libra temporarily deployed to Sccache September 17, 2021 23:42 Inactive
@bors-libra bors-libra temporarily deployed to Docker September 17, 2021 23:43 Inactive
@bors-libra bors-libra temporarily deployed to Sccache September 17, 2021 23:43 Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants