Jelajahi Sumber

Rework cluster metrics dashboard to support the modern clusters

Michael Vines 5 tahun lalu
induk
melakukan
5f5824d78d

+ 2 - 2
docs/src/running-validator/validator-start.md

@@ -24,8 +24,8 @@ solana transaction-count
 Inspect the network explorer at
 Inspect the network explorer at
 [https://explorer.solana.com/](https://explorer.solana.com/) for activity.
 [https://explorer.solana.com/](https://explorer.solana.com/) for activity.
 
 
-View the [metrics dashboard](https://metrics.solana.com:3000/d/testnet-beta/testnet-monitor-beta?var-testnet=testnet)
-for more detail on cluster activity.
+View the [metrics dashboard](https://metrics.solana.com:3000/d/monitor/cluster-telemetry) for more
+detail on cluster activity.
 
 
 ## Confirm your Installation
 ## Confirm your Installation
 
 

+ 2 - 2
docs/src/running-validator/validator-troubleshoot.md

@@ -5,7 +5,7 @@ testnet participants, [https://discord.gg/pquxPsq](https://discord.gg/pquxPsq).
 
 
 ## Useful Links & Discussion
 ## Useful Links & Discussion
 * [Network Explorer](http://explorer.solana.com/)
 * [Network Explorer](http://explorer.solana.com/)
-* [Testnet Metrics Dashboard](https://metrics.solana.com:3000/d/testnet-edge/testnet-monitor-edge?refresh=60s&orgId=2)
+* [Testnet Metrics Dashboard](https://metrics.solana.com:3000/d/monitor-edge/cluster-telemetry-edge?refresh=60s&orgId=2)
 * Validator chat channels
 * Validator chat channels
   * [\#validator-support](https://discord.gg/rZsenD) General support channel for any Validator related queries.
   * [\#validator-support](https://discord.gg/rZsenD) General support channel for any Validator related queries.
   * [\#tourdesol](https://discord.gg/BdujK2) Discussion and support channel for Tour de SOL participants ([What is Tour de SOL?](https://solana.com/tds/)).
   * [\#tourdesol](https://discord.gg/BdujK2) Discussion and support channel for Tour de SOL participants ([What is Tour de SOL?](https://solana.com/tds/)).
@@ -14,6 +14,6 @@ testnet participants, [https://discord.gg/pquxPsq](https://discord.gg/pquxPsq).
 * [Core software repo](https://github.com/solana-labs/solana)
 * [Core software repo](https://github.com/solana-labs/solana)
 * [Tour de SOL Docs](https://docs.solana.com/tour-de-sol)
 * [Tour de SOL Docs](https://docs.solana.com/tour-de-sol)
 * [TdS repo](https://github.com/solana-labs/tour-de-sol)
 * [TdS repo](https://github.com/solana-labs/tour-de-sol)
-* [TdS metrics dashboard](https://metrics.solana.com:3000/d/testnet-edge/testnet-monitor-edge?refresh=1m&from=now-15m&to=now&var-testnet=tds&orgId=2&var-datasource=TdS%20Metrics%20%28read-only%29)
+* [TdS metrics dashboard](https://metrics.solana.com:3000/d/monitor-edge/cluster-telemetry-edge?refresh=1m&from=now-15m&to=now&var-testnet=tds)
 
 
 Can't find what you're looking for? Send an email to ryan@solana.com or reach out to @rshea\#2622 on Discord.
 Can't find what you're looking for? Send an email to ryan@solana.com or reach out to @rshea\#2622 on Discord.

+ 1 - 1
docs/src/tour-de-sol/useful-links.md

@@ -6,7 +6,7 @@ description: Where to go after you've read this guide
 
 
 * [Solana Docs](https://docs.solana.com/)
 * [Solana Docs](https://docs.solana.com/)
 * [Network Explorer](http://explorer.solana.com/)
 * [Network Explorer](http://explorer.solana.com/)
-* [TdS metrics dashboard](https://metrics.solana.com:3000/d/testnet/testnet-monitor?refresh=1m&from=now-15m&to=now&orgId=2&var-datasource=Solana%20Metrics%20(read-only)&var-testnet=tds&var-hostid=All9)
+* [TdS metrics dashboard](https://metrics.solana.com:3000/d/monitor-edge/cluster-telemetry-edge?refresh=1m&from=now-15m&to=now&var-testnet=tds)
 * Validator chat channels
 * Validator chat channels
   * [\#validator-support](https://discord.gg/rZsenD) General support channel for any Validator related queries that don’t fall under Tour de SOL.
   * [\#validator-support](https://discord.gg/rZsenD) General support channel for any Validator related queries that don’t fall under Tour de SOL.
   * [\#tourdesol](https://discord.gg/BdujK2) Discussion and support channel for Tour de SOL participants.
   * [\#tourdesol](https://discord.gg/BdujK2) Discussion and support channel for Tour de SOL participants.

+ 9 - 8
metrics/README.md

@@ -4,13 +4,14 @@
 
 
 There are three versions of the testnet dashboard, corresponding to the three
 There are three versions of the testnet dashboard, corresponding to the three
 release channels:
 release channels:
-* https://metrics.solana.com:3000/d/testnet-edge/testnet-monitor-edge
-* https://metrics.solana.com:3000/d/testnet-beta/testnet-monitor-beta
-* https://metrics.solana.com:3000/d/testnet/testnet-monitor
+* https://metrics.solana.com:3000/d/monitor-edge/cluster-telemetry-edge
+* https://metrics.solana.com:3000/d/monitor-beta/cluster-telemetry-beta
+* https://metrics.solana.com:3000/d/monitor/cluster-telemetry
 
 
 The dashboard for each channel is defined from the
 The dashboard for each channel is defined from the
-`metrics/testnet-monitor.json` source file in the git branch associated with
-that channel, and deployed by automation running `ci/publish-metrics-dashboard.sh`.
+`metrics/scripts/grafana-provisioning/dashboards/cluster-monitor.json` source
+file in the git branch associated with that channel, and deployed by automation
+running `ci/publish-metrics-dashboard.sh`.
 
 
 A deploy can be triggered at any time via the `New Build` button of
 A deploy can be triggered at any time via the `New Build` button of
 https://buildkite.com/solana-labs/publish-metrics-dashboard.
 https://buildkite.com/solana-labs/publish-metrics-dashboard.
@@ -18,7 +19,7 @@ https://buildkite.com/solana-labs/publish-metrics-dashboard.
 ### Modifying a Dashboard
 ### Modifying a Dashboard
 
 
 Dashboard updates are accomplished by modifying
 Dashboard updates are accomplished by modifying
-`metrics/scripts/grafana-provisioning/dashboards/testnet-monitor.json`,
+`metrics/scripts/grafana-provisioning/dashboards/cluster-monitor.json`,
 **manual edits made directly in Grafana will be overwritten**.
 **manual edits made directly in Grafana will be overwritten**.
 
 
 * Check out metrics to add at https://metrics.solana.com:8888/ in the data explorer.
 * Check out metrics to add at https://metrics.solana.com:8888/ in the data explorer.
@@ -32,13 +33,13 @@ Dashboard updates are accomplished by modifying
    `Settings` menu for the dashboard
    `Settings` menu for the dashboard
 3. Edit dashboard as desired
 3. Edit dashboard as desired
 4. Extract the JSON Model by selecting `JSON Model` in the `Settings` menu.  Copy the JSON to the clipboard
 4. Extract the JSON Model by selecting `JSON Model` in the `Settings` menu.  Copy the JSON to the clipboard
-    and paste into `metrics/scripts/grafana-provisioning/dashboards/testnet-monitor.json`,
+    and paste into `metrics/scripts/grafana-provisioning/dashboards/cluster-monitor.json`,
 5. Delete your development dashboard: `Settings` => `Delete`
 5. Delete your development dashboard: `Settings` => `Delete`
 
 
 ### Deploying a Dashboard Manually
 ### Deploying a Dashboard Manually
 
 
 If you need to immediately deploy a dashboard using the contents of
 If you need to immediately deploy a dashboard using the contents of
-`testnet-monitor.json` in your local workspace,
+`cluster-monitor.json` in your local workspace,
 ```
 ```
 $ export GRAFANA_API_TOKEN="an API key from https://metrics.solana.com:3000/org/apikeys"
 $ export GRAFANA_API_TOKEN="an API key from https://metrics.solana.com:3000/org/apikeys"
 $ metrics/publish-metrics-dashboard.sh (edge|beta|stable)
 $ metrics/publish-metrics-dashboard.sh (edge|beta|stable)

+ 4 - 4
metrics/publish-metrics-dashboard.sh

@@ -11,13 +11,13 @@ fi
 
 
 case $CHANNEL in
 case $CHANNEL in
 edge)
 edge)
-  DASHBOARD=testnet-monitor-edge
+  DASHBOARD=cluster-telemetry-edge
   ;;
   ;;
 beta)
 beta)
-  DASHBOARD=testnet-monitor-beta
+  DASHBOARD=cluster-telemetry-beta
   ;;
   ;;
 stable)
 stable)
-  DASHBOARD=testnet-monitor
+  DASHBOARD=cluster-telemetry
   ;;
   ;;
 *)
 *)
   echo "Error: Invalid CHANNEL=$CHANNEL"
   echo "Error: Invalid CHANNEL=$CHANNEL"
@@ -31,7 +31,7 @@ if [[ -z $GRAFANA_API_TOKEN ]]; then
   exit 1
   exit 1
 fi
 fi
 
 
-DASHBOARD_JSON=scripts/grafana-provisioning/dashboards/testnet-monitor.json
+DASHBOARD_JSON=scripts/grafana-provisioning/dashboards/cluster-monitor.json
 if [[ ! -r $DASHBOARD_JSON ]]; then
 if [[ ! -r $DASHBOARD_JSON ]]; then
   echo Error: $DASHBOARD_JSON not found
   echo Error: $DASHBOARD_JSON not found
 fi
 fi

+ 25 - 21
metrics/scripts/adjust-dashboard-for-channel.py

@@ -21,7 +21,7 @@ with open(dashboard_json, 'r') as read_file:
     data = json.load(read_file)
     data = json.load(read_file)
 
 
 if channel == 'local':
 if channel == 'local':
-    data['title'] = 'Local Testnet Monitor'
+    data['title'] = 'Local Cluster Monitor'
     data['uid'] = 'local'
     data['uid'] = 'local'
     data['links'] = []
     data['links'] = []
     data['templating']['list'] = [{'current': {'text': '$datasource',
     data['templating']['list'] = [{'current': {'text': '$datasource',
@@ -66,10 +66,9 @@ if channel == 'local':
                                     'useTags': False}]
                                     'useTags': False}]
 
 
 elif channel == 'stable':
 elif channel == 'stable':
-    # Stable dashboard only allows the user to select between the stable
-    # testnet databases
-    data['title'] = 'Testnet Monitor'
-    data['uid'] = 'testnet'
+    # Stable dashboard only allows the user to select between public clusters
+    data['title'] = 'Cluster Telemetry'
+    data['uid'] = 'monitor'
     data['templating']['list'] = [{'current': {'text': '$datasource',
     data['templating']['list'] = [{'current': {'text': '$datasource',
                                                'value': '$datasource'},
                                                'value': '$datasource'},
                                    'hide': 1,
                                    'hide': 1,
@@ -81,20 +80,26 @@ elif channel == 'stable':
                                    'regex': '',
                                    'regex': '',
                                    'type': 'datasource'},
                                    'type': 'datasource'},
                                   {'allValue': None,
                                   {'allValue': None,
-                                   'current': {'text': 'testnet',
-                                               'value': 'testnet'},
+                                   'current': {'text': 'Developer Testnet',
+                                               'value': 'devnet'},
                                    'hide': 1,
                                    'hide': 1,
                                    'includeAll': False,
                                    'includeAll': False,
                                    'label': 'Testnet',
                                    'label': 'Testnet',
                                    'multi': False,
                                    'multi': False,
                                    'name': 'testnet',
                                    'name': 'testnet',
-                                   'options': [{'selected': False,
-                                                'text': 'testnet',
-                                                'value': 'testnet'},
-                                               {'selected': True,
-                                                'text': 'testnet-perf',
-                                                'value': 'testnet-perf'}],
-                                   'query': 'testnet,testnet-perf',
+                                   'options': [{'selected': True,
+                                                'text': 'Developer Testnet',
+                                                'value': 'devnet'},
+                                               {'selected': False,
+                                                'text': 'Mainnet Beta',
+                                                'value': 'mainnet-beta'},
+                                               {'selected': False,
+                                                'text': 'Tour de SOL Testnet',
+                                                'value': 'tds'},
+                                               {'selected': False,
+                                                'text': 'Soft Launch Testnet',
+                                                'value': 'cluster'}],
+                                   'query': 'devnet,mainnet-beta,tds,cluster',
                                    'type': 'custom'},
                                    'type': 'custom'},
                                    {'allValue': ".*",
                                    {'allValue': ".*",
                                     'datasource': '$datasource',
                                     'datasource': '$datasource',
@@ -114,10 +119,9 @@ elif channel == 'stable':
                                     'type': 'query',
                                     'type': 'query',
                                     'useTags': False}]
                                     'useTags': False}]
 else:
 else:
-    # Non-stable dashboard only allows the user to select between all testnet
-    # databases
-    data['title'] = 'Testnet Monitor ({})'.format(channel)
-    data['uid'] = 'testnet-' + channel
+    # Non-stable dashboard includes all the dev clusters
+    data['title'] = 'Cluster Telemetry ({})'.format(channel)
+    data['uid'] = 'monitor-' + channel
     data['templating']['list'] = [{'current': {'text': '$datasource',
     data['templating']['list'] = [{'current': {'text': '$datasource',
                                                'value': '$datasource'},
                                                'value': '$datasource'},
                                    'hide': 1,
                                    'hide': 1,
@@ -129,8 +133,8 @@ else:
                                    'regex': '',
                                    'regex': '',
                                    'type': 'datasource'},
                                    'type': 'datasource'},
                                    {'allValue': ".*",
                                    {'allValue': ".*",
-                                   'current': {'text': 'testnet',
-                                               'value': 'testnet'},
+                                   'current': {'text': 'Developer Testnet',
+                                               'value': 'devnet'},
                                    'datasource': '$datasource',
                                    'datasource': '$datasource',
                                    'hide': 1,
                                    'hide': 1,
                                    'includeAll': False,
                                    'includeAll': False,
@@ -140,7 +144,7 @@ else:
                                    'options': [],
                                    'options': [],
                                    'query': 'show databases',
                                    'query': 'show databases',
                                    'refresh': 1,
                                    'refresh': 1,
-                                   'regex': 'testnet.*',
+                                   'regex': '(devnet|cluster|tds|mainnet-beta|testnet.*)',
                                    'sort': 1,
                                    'sort': 1,
                                    'tagValuesQuery': '',
                                    'tagValuesQuery': '',
                                    'tags': [],
                                    'tags': [],

+ 13 - 15
metrics/scripts/grafana-provisioning/dashboards/testnet-monitor.json → metrics/scripts/grafana-provisioning/dashboards/cluster-monitor.json

@@ -27,21 +27,21 @@
       "title": "Stable",
       "title": "Stable",
       "tooltip": "",
       "tooltip": "",
       "type": "link",
       "type": "link",
-      "url": "https://metrics.solana.com:3000/d/testnet/testnet-monitor"
+      "url": "https://metrics.solana.com:3000/d/monitor/cluster-telemetry"
     },
     },
     {
     {
       "icon": "dashboard",
       "icon": "dashboard",
       "tags": [],
       "tags": [],
       "title": "Beta",
       "title": "Beta",
       "type": "link",
       "type": "link",
-      "url": "https://metrics.solana.com:3000/d/testnet-beta/testnet-monitor-beta"
+      "url": "https://metrics.solana.com:3000/d/monitor-beta/cluster-telemetry-beta"
     },
     },
     {
     {
       "icon": "dashboard",
       "icon": "dashboard",
       "tags": [],
       "tags": [],
       "title": "Edge",
       "title": "Edge",
       "type": "link",
       "type": "link",
-      "url": "https://metrics.solana.com:3000/d/testnet-edge/testnet-monitor-edge"
+      "url": "https://metrics.solana.com:3000/d/monitor-edge/cluster-telemetry-edge"
     }
     }
   ],
   ],
   "panels": [
   "panels": [
@@ -4618,7 +4618,7 @@
       },
       },
       "yaxes": [
       "yaxes": [
         {
         {
-          "format": "µs",
+          "format": "\u00b5s",
           "label": null,
           "label": null,
           "logBase": 1,
           "logBase": 1,
           "max": null,
           "max": null,
@@ -5385,7 +5385,7 @@
       },
       },
       "yaxes": [
       "yaxes": [
         {
         {
-          "format": "µs",
+          "format": "\u00b5s",
           "label": null,
           "label": null,
           "logBase": 1,
           "logBase": 1,
           "max": null,
           "max": null,
@@ -5752,7 +5752,7 @@
       },
       },
       "yaxes": [
       "yaxes": [
         {
         {
-          "format": "µs",
+          "format": "\u00b5s",
           "label": null,
           "label": null,
           "logBase": 1,
           "logBase": 1,
           "max": null,
           "max": null,
@@ -6727,7 +6727,7 @@
       },
       },
       "yaxes": [
       "yaxes": [
         {
         {
-          "format": "µs",
+          "format": "\u00b5s",
           "label": null,
           "label": null,
           "logBase": 1,
           "logBase": 1,
           "max": null,
           "max": null,
@@ -10181,7 +10181,6 @@
     "list": [
     "list": [
       {
       {
         "current": {
         "current": {
-          "selected": true,
           "text": "$datasource",
           "text": "$datasource",
           "value": "$datasource"
           "value": "$datasource"
         },
         },
@@ -10197,9 +10196,8 @@
       {
       {
         "allValue": ".*",
         "allValue": ".*",
         "current": {
         "current": {
-          "selected": false,
-          "text": "testnet",
-          "value": "testnet"
+          "text": "Developer Testnet",
+          "value": "devnet"
         },
         },
         "datasource": "$datasource",
         "datasource": "$datasource",
         "hide": 1,
         "hide": 1,
@@ -10210,7 +10208,7 @@
         "options": [],
         "options": [],
         "query": "show databases",
         "query": "show databases",
         "refresh": 1,
         "refresh": 1,
-        "regex": "testnet.*",
+        "regex": "(devnet|cluster|tds|mainnet-beta|testnet.*)",
         "sort": 1,
         "sort": 1,
         "tagValuesQuery": "",
         "tagValuesQuery": "",
         "tags": [],
         "tags": [],
@@ -10269,7 +10267,7 @@
     ]
     ]
   },
   },
   "timezone": "",
   "timezone": "",
-  "title": "Testnet Monitor (edge)",
-  "uid": "testnet-edge",
+  "title": "Cluster Telemetry (edge)",
+  "uid": "monitor-edge",
   "version": 2
   "version": 2
-}
+}

+ 1 - 1
metrics/scripts/start.sh

@@ -34,7 +34,7 @@ source lib/config.sh
 if [[ ! -f lib/grafana-provisioning ]]; then
 if [[ ! -f lib/grafana-provisioning ]]; then
   cp -va grafana-provisioning lib
   cp -va grafana-provisioning lib
   ./adjust-dashboard-for-channel.py \
   ./adjust-dashboard-for-channel.py \
-    lib/grafana-provisioning/dashboards/testnet-monitor.json local
+    lib/grafana-provisioning/dashboards/cluster-monitor.json local
 
 
   mkdir -p lib/grafana-provisioning/datasources
   mkdir -p lib/grafana-provisioning/datasources
   cat > lib/grafana-provisioning/datasources/datasource.yml <<EOF
   cat > lib/grafana-provisioning/datasources/datasource.yml <<EOF

+ 1 - 1
system-test/automation_utils.sh

@@ -106,7 +106,7 @@ function upload_results_to_slack() {
     BUILDKITE_BUILD_URL="https://buildkite.com/solana-labs/"
     BUILDKITE_BUILD_URL="https://buildkite.com/solana-labs/"
   fi
   fi
 
 
-  GRAFANA_URL="https://metrics.solana.com:3000/d/testnet-${CHANNEL:-edge}/testnet-monitor-${CHANNEL:-edge}?var-testnet=${TESTNET_TAG:-testnet-automation}&from=${TESTNET_START_UNIX_MSECS:-0}&to=${TESTNET_FINISH_UNIX_MSECS:-0}"
+  GRAFANA_URL="https://metrics.solana.com:3000/d/monitor-${CHANNEL:-edge}/cluster-telemetry-${CHANNEL:-edge}?var-testnet=${TESTNET_TAG:-testnet-automation}&from=${TESTNET_START_UNIX_MSECS:-0}&to=${TESTNET_FINISH_UNIX_MSECS:-0}"
 
 
   [[ -n $RESULT_DETAILS ]] || RESULT_DETAILS="Undefined"
   [[ -n $RESULT_DETAILS ]] || RESULT_DETAILS="Undefined"
   [[ -n $TEST_CONFIGURATION ]] || TEST_CONFIGURATION="Undefined"
   [[ -n $TEST_CONFIGURATION ]] || TEST_CONFIGURATION="Undefined"