Module 8: Verification & Assessment | Grafana Integration Lab

Module Overview

Congratulations on reaching the final module! You've built a comprehensive monitoring integration spanning LibreNMS, Telegraf/InfluxDB, and VictoriaLogs. Now it's time to validate your work, test every component, and ensure production readiness.

This module provides systematic verification procedures, performance benchmarks, hands-on exercises, and knowledge assessment quizzes. By the end, you'll have confidence that your integration is robust, optimized, and ready for production deployment.

🎯 Learning Objectives

By completing this module, you will be able to: verify data source connectivity and accuracy, validate dashboard functionality across all panels, test alert rules and notification delivery, measure query performance and optimize bottlenecks, complete production readiness checklists, and demonstrate mastery through practical exercises and quiz assessments.

Estimated Completion Time: 25-30 minutes

Section 1: Data Source Verification

Before trusting your dashboards in production, you must verify that each data source is properly configured, returning accurate data, and performing within acceptable parameters. This section provides systematic testing procedures for LibreNMS, InfluxDB, and VictoriaLogs.

Step 1: LibreNMS Data Source Verification

LibreNMS serves as your network inventory and SNMP polling engine. Verify that Grafana can query device data, interface statistics, and alert information correctly.

Test 1: Verify LibreNMS connectivity and authentication

# Navigate to Grafana Data Sources
# URL: http://your-grafana-server:3000/datasources

# Click on your LibreNMS MySQL data source
# Look for green "Data source is working" message

# Test query from Explore tab:
SELECT hostname, sysName, os 
FROM devices 
WHERE status = 1 
LIMIT 5;

# Expected output: List of 5 active devices with their OS types

Test 2: Validate interface data accuracy

# Test query for interface statistics
SELECT 
    devices.hostname,
    ports.ifName,
    ports.ifOperStatus,
    ports.ifSpeed,
    ports.ifInOctets_rate,
    ports.ifOutOctets_rate
FROM ports
JOIN devices ON ports.device_id = devices.device_id
WHERE devices.hostname = 'core-switch-01'
    AND ports.ifOperStatus = 'up'
ORDER BY ports.ifInOctets_rate DESC
LIMIT 10;

# Expected output: Top 10 busiest interfaces with current rates
# Verify rates match SNMP walk data or switch CLI output

✅ LibreNMS Verification Complete

You should see:

Green health status in Grafana data source configuration
Device queries return expected hostnames and metadata
Interface rates match known traffic patterns
Query execution time under 2 seconds for typical device queries

Step 2: InfluxDB Data Source Verification

InfluxDB stores time-series metrics from Telegraf agents. Verify metric collection, retention policies, and query performance.

Test 1: Verify InfluxDB connectivity and database access

# From InfluxDB server or remote client
influx -precision rfc3339

# List available databases
SHOW DATABASES

# Expected output should include: telegraf

# Use telegraf database
USE telegraf

# Show measurements (metric types)
SHOW MEASUREMENTS

# Expected output: cpu, disk, mem, net, system, etc.

Test 2: Validate metric collection and data freshness

# Check most recent data point for each host
SELECT 
    LAST(usage_idle) as last_cpu_idle, 
    host 
FROM cpu 
WHERE time > now() - 5m 
GROUP BY host

# Verify data within last 60 seconds (Telegraf default interval: 10s)
SELECT 
    time, 
    host, 
    usage_idle 
FROM cpu 
WHERE time > now() - 2m 
ORDER BY time DESC 
LIMIT 20

⚠️ Common InfluxDB Issues

If queries return no data: Check Telegraf agent status on monitored hosts (systemctl status telegraf), verify InfluxDB output configuration in /etc/telegraf/telegraf.conf, check firewall rules allowing port 8086, and review InfluxDB logs for write errors (/var/log/influxdb/influxd.log).

✅ InfluxDB Verification Complete

You should see:

Telegraf database exists and contains expected measurements
Recent data points (within last 60 seconds) for all monitored hosts
Query execution times under 1 second for 24-hour ranges
No write errors in InfluxDB logs

Step 3: VictoriaLogs Data Source Verification

VictoriaLogs aggregates syslog and application logs. Verify log ingestion, filtering, and query performance using LogQL syntax.

Test 1: Verify VictoriaLogs connectivity

# Check VictoriaLogs health from command line
curl -s http://victorialogs-server:9428/health

# Expected output: {"status":"ok"}

# Test basic log query in Grafana Explore
{job="syslog"} | limit 100

# Query by severity
{job="syslog"} |~ "error|critical|alert" | limit 50

💡 VictoriaLogs Performance Tips

For best query performance: Use label filters ({job="syslog", hostname="..."}) before line filters (|~), limit time ranges to necessary duration, use count_over_time for aggregations rather than retrieving all log lines.

✅ VictoriaLogs Verification Complete

You should see:

Health endpoint returns {"status":"ok"}
Recent logs appear within last 60 seconds
Severity filtering returns appropriate log entries
Query execution times under 3 seconds for 24-hour ranges

Step 4: Query Performance Baseline Measurements

Establish performance baselines to identify degradation over time as your monitoring environment scales.

Query Type	Data Source	Time Range	Target Response Time
Device inventory list	LibreNMS	N/A	< 2 seconds
Interface top 10 bandwidth	LibreNMS	Current	< 3 seconds
CPU usage time series	InfluxDB	24 hours	< 1 second
Memory aggregation	InfluxDB	7 days	< 2 seconds
Log search by keyword	VictoriaLogs	1 hour	< 2 seconds

Section 2: Dashboard Functionality Testing

Your dashboards are the primary interface for monitoring operations. This section provides comprehensive testing procedures to ensure every interactive element functions correctly.

Step 1: Variable Selection and Panel Updates

Dashboard variables allow dynamic filtering. Test that variable changes propagate to all dependent panels.

Test procedure:

Navigate to your unified monitoring dashboard
Document current variable values
Change the variable value (select different host)
Verify all panels update within 2-3 seconds
Test multi-select variables with multiple values
Test "All" option for aggregate data

✅ Expected Results

All panels refresh within 2-3 seconds. No panels show "No data". Panel titles update to reflect new variable values. Query inspector shows updated WHERE clauses.

Step 2: Time Range Picker Validation

The time range picker is crucial for historical analysis. Verify it affects all visualizations correctly.

Time Range	Expected Behavior
Last 5 minutes	All panels show only last 5 minutes of data
Last 6 hours	Data compressed to 30s or 1m aggregation
Last 7 days	Downsampled to 5m or 10m intervals

Step 3: Refresh and Auto-Refresh Testing

For real-time monitoring, auto-refresh is essential. Test both manual refresh and automatic intervals.

Click refresh button - observe all panels reload
Set auto-refresh to 10s
Open browser console → Network tab
Observe requests firing every 10 seconds
Verify panels update with changing metrics

⚠️ Auto-Refresh Performance Impact

Aggressive auto-refresh (5s or less) can overload data sources. For production NOC screens, use 30s or 1m refresh rates. For troubleshooting, 10s is acceptable. Disable when not actively monitoring.

Section 3: Alert Rule Validation

Alerting is critical for proactive monitoring. This section ensures alert rules trigger correctly, notifications are delivered, and alert lifecycle functions properly.

Step 1: Alert Condition Trigger Testing

Intentionally trigger each alert to verify threshold accuracy and notification delivery.

Test high CPU alert:

# SSH to a monitored host
ssh admin@test-server-01

# Generate CPU load
stress --cpu 4 --timeout 120s

# Watch Grafana alert state transition:
# Normal → Pending → Firing
# Verify notification received

Test interface down alert:

# Safely shut down a test interface
ssh admin@test-switch
config t
interface GigabitEthernet1/0/24
shutdown

# Wait for LibreNMS poller cycle
# Verify alert fires in Grafana

# Restore interface:
no shutdown

✅ Alert Trigger Validation Complete

For each alert rule, verify:

State transitions Normal → Pending → Firing correctly
Alert evaluation follows configured interval
Pending duration matches configured threshold
Alert annotations appear on dashboards

Step 2: Notification Channel Verification

Test that alerts are delivered to all configured notification channels.

Channel Type	Test Method
Email (SMTP)	Send test notification button
Slack	Send test notification button
PagerDuty	Trigger test alert

# Minimum alert notification should contain:
# - Alert name/title
# - Current metric value
# - Threshold value
# - Affected host/device
# - Timestamp
# - Dashboard link
# - Instructions/runbook link

Step 3: Alert Silencing for Maintenance

Test that you can silence alerts during planned maintenance.

Navigate to Alerting → Silences
Click "New silence"
Configure matcher: hostname=test-server-01
Set duration: 1 hour
Trigger alert condition
Verify alert shows "Suppressed" instead of firing

🛠️ Maintenance Window Best Practices

Create silences 15 minutes before maintenance. Use descriptive comments including ticket number. Set duration slightly longer than estimated maintenance. Delete silence manually if work completes early.

Section 4: Knowledge Assessment Quiz

Test your understanding of concepts covered throughout this lab series.

Question 1: Data Source Selection

Which query language is used to retrieve data from LibreNMS in Grafana?

A) InfluxQL

B) LogQL

C) SQL (MySQL dialect)

D) PromQL

Correct Answer: C - SQL (MySQL dialect)
LibreNMS uses a MySQL database to store device inventory and SNMP polling data. When configuring LibreNMS as a data source in Grafana, you select "MySQL" and write standard SQL queries.

Question 2: Metric Conversion

To convert LibreNMS interface traffic from octets to Mbps, what calculation is required?

A) Divide octets by 1,000,000

B) Multiply octets by 8, then divide by 1,000,000

C) Multiply octets by 1,000,000

D) Divide octets by 8

Correct Answer: B - Multiply octets by 8, then divide by 1,000,000
1 byte (octet) = 8 bits. Formula: (octets * 8) / 1000000 = Mbps. Example: 10,000,000 octets/sec = 80 Mbps.

Question 3: Unified Monitoring Benefits

What is the PRIMARY operational benefit of integrating LibreNMS, Telegraf, and VictoriaLogs?

A) Reduces infrastructure costs

B) Enables faster root cause analysis through data correlation

C) Eliminates need for individual tools

D) Automatically fixes issues

Correct Answer: B - Enables faster root cause analysis through data correlation
The key value is correlation: when interface errors (LibreNMS), CPU spikes (Telegraf), and application errors (VictoriaLogs) occur simultaneously, correlation dramatically reduces MTTR.

📊 Quiz Score

Total Questions: 3 sample questions shown
Correct Answers: 0 / 3

Section 5: Hands-On Practical Exercises

Apply your knowledge through practical exercises that simulate real-world scenarios.

Exercise 1: Generate Test Load and Observe Dashboard Changes

Objective: Verify dashboards accurately reflect system load changes in real-time.

Open your unified monitoring dashboard
Set auto-refresh to 10 seconds
Note baseline CPU/memory for test server
SSH to server: stress --cpu 2 --vm 2 --vm-bytes 512M --timeout 120s
Observe panels update within 30 seconds
Document peak values
Verify metrics return to baseline

Expected: CPU spike to ~100%, memory +512MB, baseline within 1 minute.

Exercise 2: Correlate Data Across Sources

Scenario: Web application unreachable. Use dashboard to identify cause.

Stop web service: systemctl stop nginx
Generate test traffic: curl http://test-webserver
Check VictoriaLogs for HTTP errors
Check LibreNMS for interface status
Check InfluxDB for connection failures
Correlate timestamps: which issue occurred first?
Document analysis and resolution

Expected: Identify stopped service as root cause, demonstrate timestamp correlation.

Exercise 3: Create Custom Dashboard Variable

Objective: Build multi-select variable for filtering by location.

Edit dashboard → Settings → Variables → Add
Name: location, Type: Query
Data source: LibreNMS MySQL
Query: SELECT DISTINCT location FROM devices WHERE status = 1
Enable "Multi-value" and "Include All"
Update panels to use: WHERE location IN ($location)
Test selection and verify updates

Expected: Dropdown filters entire dashboard, "All" shows all devices.

Section 6: Production Readiness Checklist

Complete this comprehensive checklist before production deployment.

Infrastructure Health

All data sources show green health status

LibreNMS poller running without errors

Telegraf agents reporting from all hosts

VictoriaLogs ingesting from all sources

Time synchronization verified (NTP)

Performance Validation

Dashboard load times under 5 seconds

Queries optimized with indexes

Auto-refresh set to 30s-1m for NOC

Performance baselines documented

Alerting and Notifications

All alerts tested with real conditions

Notifications reach correct recipients

Alert messages include dashboard links

Alert silencing procedures documented

Security and Access Control

Authentication configured (not default admin password)

Data sources use read-only accounts

HTTPS/TLS enabled for Grafana

Audit logging enabled

Backup and Documentation

Dashboards exported to version control

Database backups automated

Disaster recovery runbook created

Team trained on dashboard usage

Troubleshooting runbooks documented

Progress

0

of 22 items completed

Completion

0%

Ready at 100%

Status

NOT READY

Complete all items

Section 7: Next Steps & Advanced Topics

You've built a production-grade unified monitoring solution. Here are advanced topics to expand your capabilities.

1. Advanced Alerting Strategies

Predictive Alerting: Use Grafana ML plugin or InfluxDB forecast() to predict issues before they occur.

Composite Alerts: Create alerts evaluating multiple conditions across data sources.

Dynamic Thresholds: Use percentile-based alerts that adapt to normal patterns instead of static thresholds.

2. Scaling for Enterprise

High Availability: Deploy Grafana behind load balancer with shared PostgreSQL backend.

Performance: Implement query result caching, database replication for read scaling, and CDN for static assets.

Multi-tenancy: Use Grafana organizations for customer/department isolation.

3. Integration Opportunities

ITSM Integration: Connect alerts to ServiceNow, Jira for automatic ticket creation.

ChatOps: Deploy Grafana Slack bot for dashboard queries from chat.

Automation: Use Terraform for infrastructure-as-code dashboard deployments.

4. Continuous Improvement

Monthly Review: Analyze alert fatigue metrics, dashboard usage statistics, and query performance trends.

Feedback Loop: Collect input from operations team on dashboard effectiveness.

Stay Updated: Follow Grafana Labs blog, join community forums, attend virtual meetups.

✅ Verification & Knowledge Assessment

Module Overview

Section 1: Data Source Verification

Step 1: LibreNMS Data Source Verification

Step 2: InfluxDB Data Source Verification

Step 3: VictoriaLogs Data Source Verification

Step 4: Query Performance Baseline Measurements

Section 2: Dashboard Functionality Testing

Step 1: Variable Selection and Panel Updates

Step 2: Time Range Picker Validation

Step 3: Refresh and Auto-Refresh Testing

Section 3: Alert Rule Validation

Step 1: Alert Condition Trigger Testing

Step 2: Notification Channel Verification

Step 3: Alert Silencing for Maintenance

Section 4: Knowledge Assessment Quiz

📊 Quiz Score

Section 5: Hands-On Practical Exercises

Exercise 1: Generate Test Load and Observe Dashboard Changes

Exercise 2: Correlate Data Across Sources

Exercise 3: Create Custom Dashboard Variable

Section 6: Production Readiness Checklist

Infrastructure Health

Performance Validation

Alerting and Notifications

Security and Access Control

Backup and Documentation

Progress

Completion

Status

Section 7: Next Steps & Advanced Topics

1. Advanced Alerting Strategies

2. Scaling for Enterprise

3. Integration Opportunities

4. Continuous Improvement

🎉 Congratulations!