|
|
il y a 11 mois | |
|---|---|---|
| .. | ||
| remote | il y a 1 an | |
| scripts | il y a 11 mois | |
| .gitignore | il y a 6 ans | |
| README.md | il y a 11 mois | |
| azure.sh | il y a 6 ans | |
| colo.sh | il y a 6 ans | |
| common.sh | il y a 2 ans | |
| ec2.sh | il y a 7 ans | |
| gce.sh | il y a 11 mois | |
| init-metrics.sh | il y a 11 mois | |
| net.sh | il y a 11 mois | |
| scp.sh | il y a 7 ans | |
| ssh.sh | il y a 5 ans | |
The ./net/ directory in the monorepo contains scripts useful for creation and manipulation of a test network.
The test network allows you to run a fully isolated set of validators and clients on a configurable hardware setup.
It's intended to be both dev and CD friendly.
The test networks to be created can run in GCP, AWS or colo. Whichever cloud provider you choose, you will need the credentials set up on your machine.
You will need a working gcloud command from google SDK,
if you do not have it follow the guide in https://cloud.google.com/sdk?hl=en
Before running any scripts, authenticate with
$ gcloud auth login
If you are running the scripts on a headless machine, you can use curl to issue requests to confirm your auth.
If you are doing it the first time, you might need to set up project
gcloud config set project principal-lane-200702
Obtain your credentials from the AWS IAM Console and configure the AWS CLI with
$ aws configure
More information on AWS CLI configuration can be found here
Metrics collection relies on 2 environment variables that are patched to the remote nodes by net.sh:
RUST_LOG to enable metrics reporting in principleSOLANA_METRICS_CONFIG to tell agave where to log the metrics[!NOTE] Anza employees should follow the guide in notion to set up the influxDB account.
${host} is the host name of the InfluxDB you can access, for example https://internal-metrics.solana.com:8086${user} is the name of an InfluxDB user account with enough
rights to create a new InfluxDB database, for example solana.You will normally only need to do this once. Once this is done, you will be able to save the metrics configuration and load it later from the environment.
./init-metrics.sh -c testnet-dev-${user} ${user}
net/config/config file like the following:
export SOLANA_METRICS_CONFIG="host=${host},db=testnet-dev-${user},u=${user},p=some_secret"./init-metrics.sh -e to quickly load it./init-metrics.sh with appropriate arguments every time you set up a new clusterSOLANA_METRICS_CONFIG in your env is appropriate, but you may want to use different databases for different runs of net.sh
SOLANA_METRICS_CONFIG in the ./net/config/config fileBy default, metrics are only logged by agave if RUST_LOG is set to info or higher. You can provide it as environment for ./net.sh start command, or set this in your shell environment.
RUST_LOG="info,solana_runtime=debug"
Note: this only works if you store SOLANA_METRICS_CONFIG in your shell environment
cd ./scripts/
source ./configure-metrics.sh
INFLUX_HOST=https://internal-metrics.solana.com:8086
INFLUX_DATABASE=testnet-dev-solana
INFLUX_USERNAME=solana
INFLUX_PASSWORD=********
./metrics-write-datapoint.sh "testnet-deploy net-create-begin=1"
RUST_LOG is set to info or debugNOTE: This example uses GCE. If you are using AWS EC2, replace ./gce.sh with
./ec2.sh in the commands.
# In Agave repo
cd net/
# Create a GCE testnet with 4 additional validator nodes (beyond the bootstrap node) and 1 client (billing starts here)
./gce.sh create -n 4 -c 1
# Configure the metrics database and validate credentials using environment variable `SOLANA_METRICS_CONFIG` (skip this if you are not using metrics)
./init-metrics.sh -c testnet-dev-${USER} ${USER}
# Deploy the network from the local workspace and start processes on all nodes including bench-tps on the client node
RUST_LOG=info ./net.sh start
# Show a help to ssh into any testnet node to access logs/etc
./ssh.sh
# Stop running processes on all nodes
./net.sh stop
# Dispose of the network (billing stops here)
./gce.sh delete
./net/ directory in agave repo./gce.sh command controls creation and destruction of the nodes in the test net. It does not actually run any software.
./gce.sh create \-n 4 \-c 2 creates cluster with 4 validators and 1 node for load generation, this is minimal viable setup for all solana features to worknet/config/config will contain the config file of the testnet just createdSOLANA_METRICS_CONFIG set in your shell env, gce.sh may complain about metrics not being configured, this is perfectly fine./gce.sh info lists active test cluster nodes, this allows you to get their IP addresses for SSH access and/or debugging./gce.sh delete destroys the nodes (save the electricity and $$$ - destroy your test nets the moment you no longer need them).--self-destruct-hours=N argument to gce.sh./init-metrics.sh -c testnet-dev-${user} ${user} to create a new metrics database from scratchSOLANA_METRICS_CONFIG in ./net/config/config (which is exactly what init-metrics.sh does for you)./init-metrics.sh -e to load metrics config from SOLANA_METRICS_CONFIG into the testnet config file or./net.sh controls the payload on the testnet nodes, i.e. bootstrapping, the validators and bench-tps. In principle, you can run everything by hand, but ./net.sh makes it easier.
./net.sh start to actually run the test network.The script will take 5-10 of minutes to run, in the end it should print something like
--- Deployment Successful
Bootstrap validator deployment took 164 seconds
`Additional validator deployment (5 validators, 0 blockstreamer nodes) took 120 seconds
Client deployment (1 instances) took 11 seconds
Network start logs in /home/sol/agave/net/log
You can also make sure it logs successful test transfers:
* `./net.sh start bench-tps=2="--tx_count 2500"` will start 2 clients with bench-tps workload sending 2500 transactions per batch.
* --tx_count argument is passed to the bench-tps program, see its manual for more options
* `./net.sh sanity` to test the deployment, it is also run by start command
* `./net.sh stop` to stop the validators and client. This does not kill the machines, so you can study the logs etc.
* `./net.sh start --nobuild` will skip the source compilation, you will generally want that if you are only changing configuration files rather than code, or just want to re-run the last test.
* To connect to the nodes:
* `./gce.sh info ` to get the public IPs
* `./ssh.sh <IP> ` to get a shell on the node
* `sudo su` will give you root access on the nodes
* Nodes run latest ubuntu LTS image
* You can also interact with the nodes using solana cli:
bash
source net/config/config
../target/release/solana -u http://${validatorIpList[1]}:8899 airdrop 1
../target/release/solana -u http://${validatorIpList[1]}:8899 feature status
../target/release/solana -u http://${validatorIpList[1]}:8899 feature activate
../target/release/solana --url http://${validatorIpList[0]}:8899 validators
## Tips
### Automation
You will want to have a script like this pretty much immediately to avoid making mistakes in the init process:
bash
./gce.sh create -n4 -c2 --custom-machine-type "--machine-type n1-standard-16" --client-machine-type "--machine-type n1-standard-4"
./init-metrics.sh -e
RUST_LOG=info ./net.sh start -c bench-tps=2="--tx_count 25000"
### Inscrutable "nothing works everything times out state"
Note that net.sh and `gce.sh info` commands do not actually check if all the nodes are still alive in gcloud,
they just assume the config file information is correct. So if your nodes got killed/timed out they will lie to you. In such case, just use `gce.sh delete` to reset.
### Running the network over public IP addresses
By default private IP addresses are used with all instances in the same
availability zone to avoid GCE network egress charges. However to run the
network over public IP addresses:
bash $ ./gce.sh create -P ...
or
bash $ ./ec2.sh create -P ...
### Deploying a tarball-based network
To deploy the latest pre-built `edge` channel tarball (ie, latest from the `master`
branch), once the testnet has been created run:
bash $ ./net.sh start -t edge
### Enabling CUDA
> [!NOTE]
> CUDA is currently not available on GCE
First ensure the network instances are created with GPU enabled:
bash $ ./gce.sh create -g ...
or
bash $ ./ec2.sh create -g ...
If deploying a tarball-based network nothing further is required, as GPU presence
is detected at runtime and the CUDA build is auto selected.
### Partition testing
To induce the partition `net.sh netem --config-file <config file path>`
To remove partition `net.sh netem --config-file <config file path> --netem-cmd cleanup`
The partitioning is also removed if you do `net.sh stop` or `restart`.
An example config that produces 3 almost equal partitions:
{
"partitions":[
34,
33,
33
],
"interconnects":[
{
"a":0,
"b":1,
"config":"loss 15% delay 25ms"
},
{
"a":1,
"b":0,
"config":"loss 15% delay 25ms"
},
{
"a":0,
"b":2,
"config":"loss 10% delay 15ms"
},
{
"a":2,
"b":0,
"config":"loss 10% delay 15ms"
},
{
"a":2,
"b":1,
"config":"loss 5% delay 5ms"
},
{
"a":1,
"b":2,
"config":"loss 5% delay 5ms"
}
]
} ```