metrics_template/README.md
2026-03-06 18:41:07 -05:00

158 lines
4.5 KiB
Markdown

# Metrics Stack
Self-contained monitoring stack using VictoriaMetrics, vmagent, Grafana, and Uptime Kuma.
Deploy one instance per client site. Access remotely over VPN.
## Stack Components
| Service | Purpose | Default Port |
|---|---|---|
| VictoriaMetrics | Time-series metric storage | 8428 |
| vmagent | Prometheus-compatible scrape agent | 8429 |
| Grafana | Dashboards and visualization | 3000 |
| Uptime Kuma | Availability monitoring + alerting | 3001 |
| node_exporter | Host metrics (this machine) | internal only |
| snmp_exporter | SNMP metrics for network devices | 9116 (optional) |
---
## Initial Setup
### 1. Configure environment
```bash
cp .env.example .env
```
Edit `.env`:
- Set `BIND_HOST` to this machine's LAN IP
- Set `CLIENT_NAME` to identify the client
- Set strong passwords for `GF_ADMIN_PASSWORD`
- Set `TZ` to the correct timezone
### 2. Configure endpoints
Edit `vmagent/config/scrape.yml`:
- Update the `linux-host` job with this machine's hostname and site name
- Add any other endpoints (see "Adding Endpoints" below)
### 3. Start the stack
```bash
podman-compose up -d
```
### 4. Finish Uptime Kuma setup
1. Browse to `http://BIND_HOST:3001` and complete the initial setup wizard
2. Note the username/password you set
3. In `vmagent/config/scrape.yml`, uncomment the `uptime_kuma` job and fill in those credentials
4. Run `podman-compose restart vmagent`
---
## Adding Endpoints
Open `vmagent/config/scrape.yml`. The file has two sections:
- **ACTIVE JOBS** — jobs that are currently running
- **TEMPLATES** — commented-out job blocks, one per endpoint type
To add a new endpoint:
1. Find the matching template at the bottom of `scrape.yml`
2. Copy the entire commented block (from `# - job_name:` to the end of the block)
3. Paste it into the **ACTIVE JOBS** section
4. Uncomment it (remove the leading `# ` from each line)
5. Fill in the IP addresses, hostnames, and site label
6. Restart vmagent:
```bash
podman-compose restart vmagent
```
### Available templates
| Template | Exporter needed on target | Port |
|---|---|---|
| Windows Domain Controller | windows_exporter | 9182 |
| Hyper-V Host | windows_exporter (with hyperv collector) | 9182 |
| Windows General Purpose Server | windows_exporter | 9182 |
| Linux Server | node_exporter | 9100 |
| SNMP Device | snmp_exporter (runs in this stack) | n/a |
### Installing windows_exporter
Download the latest `.msi` from:
https://github.com/prometheus-community/windows_exporter/releases
For Hyper-V hosts, ensure the `hyperv` collector is enabled. You can set this
in the MSI installer or by modifying the service arguments post-install:
```
--collectors.enabled defaults,hyperv,cpu_info,physical_disk,process
```
### Enabling SNMP monitoring
1. Uncomment the `snmp-exporter` service in `podman-compose.yml`
2. Download a pre-built `snmp.yml` from:
https://github.com/prometheus/snmp_exporter/releases
3. Place it at `snmp_exporter/snmp.yml`
4. Uncomment and configure the `snmp-devices` job template in `scrape.yml`
5. Restart the stack: `podman-compose up -d`
---
## Useful Commands
```bash
# Start the stack
podman-compose up -d
# Stop the stack
podman-compose down
# Restart a single service (e.g., after editing scrape.yml)
podman-compose restart vmagent
# View logs for a service
podman-compose logs -f vmagent
podman-compose logs -f victoriametrics
# Check running containers
podman-compose ps
# Pull latest images and restart
podman-compose pull && podman-compose up -d
```
## Verify vmagent is scraping
Browse to `http://BIND_HOST:8429/targets` to see all configured scrape targets
and their current status (up/down, last scrape time, errors).
---
## Directory Structure
```
metrics/
├── .env # Active config (do not commit)
├── .env.example # Config template
├── podman-compose.yml # Stack definition
├── vmagent/
│ └── config/
│ └── scrape.yml # Endpoint config — edit this to add endpoints
├── grafana/
│ ├── data/ # Grafana database (auto-created)
│ └── provisioning/
│ └── datasources/
│ └── victoriametrics.yml # Auto-wires VictoriaMetrics as datasource
├── victoriametrics/
│ └── data/ # Metric storage (auto-created)
├── uptime_kuma/
│ └── data/ # Uptime Kuma database (auto-created)
└── snmp_exporter/
└── snmp.yml # SNMP module config (download separately)
```