158 lines
4.5 KiB
Markdown
158 lines
4.5 KiB
Markdown
# Metrics Stack
|
|
|
|
Self-contained monitoring stack using VictoriaMetrics, vmagent, Grafana, and Uptime Kuma.
|
|
Deploy one instance per client site. Access remotely over VPN.
|
|
|
|
## Stack Components
|
|
|
|
| Service | Purpose | Default Port |
|
|
|---|---|---|
|
|
| VictoriaMetrics | Time-series metric storage | 8428 |
|
|
| vmagent | Prometheus-compatible scrape agent | 8429 |
|
|
| Grafana | Dashboards and visualization | 3000 |
|
|
| Uptime Kuma | Availability monitoring + alerting | 3001 |
|
|
| node_exporter | Host metrics (this machine) | internal only |
|
|
| snmp_exporter | SNMP metrics for network devices | 9116 (optional) |
|
|
|
|
---
|
|
|
|
## Initial Setup
|
|
|
|
### 1. Configure environment
|
|
|
|
```bash
|
|
cp .env.example .env
|
|
```
|
|
|
|
Edit `.env`:
|
|
- Set `BIND_HOST` to this machine's LAN IP
|
|
- Set `CLIENT_NAME` to identify the client
|
|
- Set strong passwords for `GF_ADMIN_PASSWORD`
|
|
- Set `TZ` to the correct timezone
|
|
|
|
### 2. Configure endpoints
|
|
|
|
Edit `vmagent/config/scrape.yml`:
|
|
- Update the `linux-host` job with this machine's hostname and site name
|
|
- Add any other endpoints (see "Adding Endpoints" below)
|
|
|
|
### 3. Start the stack
|
|
|
|
```bash
|
|
podman-compose up -d
|
|
```
|
|
|
|
### 4. Finish Uptime Kuma setup
|
|
|
|
1. Browse to `http://BIND_HOST:3001` and complete the initial setup wizard
|
|
2. Note the username/password you set
|
|
3. In `vmagent/config/scrape.yml`, uncomment the `uptime_kuma` job and fill in those credentials
|
|
4. Run `podman-compose restart vmagent`
|
|
|
|
---
|
|
|
|
## Adding Endpoints
|
|
|
|
Open `vmagent/config/scrape.yml`. The file has two sections:
|
|
|
|
- **ACTIVE JOBS** — jobs that are currently running
|
|
- **TEMPLATES** — commented-out job blocks, one per endpoint type
|
|
|
|
To add a new endpoint:
|
|
|
|
1. Find the matching template at the bottom of `scrape.yml`
|
|
2. Copy the entire commented block (from `# - job_name:` to the end of the block)
|
|
3. Paste it into the **ACTIVE JOBS** section
|
|
4. Uncomment it (remove the leading `# ` from each line)
|
|
5. Fill in the IP addresses, hostnames, and site label
|
|
6. Restart vmagent:
|
|
|
|
```bash
|
|
podman-compose restart vmagent
|
|
```
|
|
|
|
### Available templates
|
|
|
|
| Template | Exporter needed on target | Port |
|
|
|---|---|---|
|
|
| Windows Domain Controller | windows_exporter | 9182 |
|
|
| Hyper-V Host | windows_exporter (with hyperv collector) | 9182 |
|
|
| Windows General Purpose Server | windows_exporter | 9182 |
|
|
| Linux Server | node_exporter | 9100 |
|
|
| SNMP Device | snmp_exporter (runs in this stack) | n/a |
|
|
|
|
### Installing windows_exporter
|
|
|
|
Download the latest `.msi` from:
|
|
https://github.com/prometheus-community/windows_exporter/releases
|
|
|
|
For Hyper-V hosts, ensure the `hyperv` collector is enabled. You can set this
|
|
in the MSI installer or by modifying the service arguments post-install:
|
|
|
|
```
|
|
--collectors.enabled defaults,hyperv,cpu_info,physical_disk,process
|
|
```
|
|
|
|
### Enabling SNMP monitoring
|
|
|
|
1. Uncomment the `snmp-exporter` service in `podman-compose.yml`
|
|
2. Download a pre-built `snmp.yml` from:
|
|
https://github.com/prometheus/snmp_exporter/releases
|
|
3. Place it at `snmp_exporter/snmp.yml`
|
|
4. Uncomment and configure the `snmp-devices` job template in `scrape.yml`
|
|
5. Restart the stack: `podman-compose up -d`
|
|
|
|
---
|
|
|
|
## Useful Commands
|
|
|
|
```bash
|
|
# Start the stack
|
|
podman-compose up -d
|
|
|
|
# Stop the stack
|
|
podman-compose down
|
|
|
|
# Restart a single service (e.g., after editing scrape.yml)
|
|
podman-compose restart vmagent
|
|
|
|
# View logs for a service
|
|
podman-compose logs -f vmagent
|
|
podman-compose logs -f victoriametrics
|
|
|
|
# Check running containers
|
|
podman-compose ps
|
|
|
|
# Pull latest images and restart
|
|
podman-compose pull && podman-compose up -d
|
|
```
|
|
|
|
## Verify vmagent is scraping
|
|
|
|
Browse to `http://BIND_HOST:8429/targets` to see all configured scrape targets
|
|
and their current status (up/down, last scrape time, errors).
|
|
|
|
---
|
|
|
|
## Directory Structure
|
|
|
|
```
|
|
metrics/
|
|
├── .env # Active config (do not commit)
|
|
├── .env.example # Config template
|
|
├── podman-compose.yml # Stack definition
|
|
├── vmagent/
|
|
│ └── config/
|
|
│ └── scrape.yml # Endpoint config — edit this to add endpoints
|
|
├── grafana/
|
|
│ ├── data/ # Grafana database (auto-created)
|
|
│ └── provisioning/
|
|
│ └── datasources/
|
|
│ └── victoriametrics.yml # Auto-wires VictoriaMetrics as datasource
|
|
├── victoriametrics/
|
|
│ └── data/ # Metric storage (auto-created)
|
|
├── uptime_kuma/
|
|
│ └── data/ # Uptime Kuma database (auto-created)
|
|
└── snmp_exporter/
|
|
└── snmp.yml # SNMP module config (download separately)
|
|
```
|