← Back to Menu Module 8 of 8

✅ Verification & Knowledge Assessment

Testing Your Integration and Validating Production Readiness

Module Overview

Congratulations on reaching the final module! You've built a comprehensive monitoring integration spanning LibreNMS, Telegraf/InfluxDB, and VictoriaLogs. Now it's time to validate your work, test every component, and ensure production readiness.

This module provides systematic verification procedures, performance benchmarks, hands-on exercises, and knowledge assessment quizzes. By the end, you'll have confidence that your integration is robust, optimized, and ready for production deployment.

🎯 Learning Objectives

By completing this module, you will be able to: verify data source connectivity and accuracy, validate dashboard functionality across all panels, test alert rules and notification delivery, measure query performance and optimize bottlenecks, complete production readiness checklists, and demonstrate mastery through practical exercises and quiz assessments.

Estimated Completion Time: 25-30 minutes

Section 1: Data Source Verification

Before trusting your dashboards in production, you must verify that each data source is properly configured, returning accurate data, and performing within acceptable parameters. This section provides systematic testing procedures for LibreNMS, InfluxDB, and VictoriaLogs.

Step 1: LibreNMS Data Source Verification

LibreNMS serves as your network inventory and SNMP polling engine. Verify that Grafana can query device data, interface statistics, and alert information correctly.

Test 1: Verify LibreNMS connectivity and authentication

# Navigate to Grafana Data Sources # URL: http://your-grafana-server:3000/datasources # Click on your LibreNMS MySQL data source # Look for green "Data source is working" message # Test query from Explore tab: SELECT hostname, sysName, os FROM devices WHERE status = 1 LIMIT 5; # Expected output: List of 5 active devices with their OS types

Test 2: Validate interface data accuracy

# Test query for interface statistics SELECT devices.hostname, ports.ifName, ports.ifOperStatus, ports.ifSpeed, ports.ifInOctets_rate, ports.ifOutOctets_rate FROM ports JOIN devices ON ports.device_id = devices.device_id WHERE devices.hostname = 'core-switch-01' AND ports.ifOperStatus = 'up' ORDER BY ports.ifInOctets_rate DESC LIMIT 10; # Expected output: Top 10 busiest interfaces with current rates # Verify rates match SNMP walk data or switch CLI output
✅ LibreNMS Verification Complete

You should see:

  • Green health status in Grafana data source configuration
  • Device queries return expected hostnames and metadata
  • Interface rates match known traffic patterns
  • Query execution time under 2 seconds for typical device queries

Step 2: InfluxDB Data Source Verification

InfluxDB stores time-series metrics from Telegraf agents. Verify metric collection, retention policies, and query performance.

Test 1: Verify InfluxDB connectivity and database access

# From InfluxDB server or remote client influx -precision rfc3339 # List available databases SHOW DATABASES # Expected output should include: telegraf # Use telegraf database USE telegraf # Show measurements (metric types) SHOW MEASUREMENTS # Expected output: cpu, disk, mem, net, system, etc.

Test 2: Validate metric collection and data freshness

# Check most recent data point for each host SELECT LAST(usage_idle) as last_cpu_idle, host FROM cpu WHERE time > now() - 5m GROUP BY host # Verify data within last 60 seconds (Telegraf default interval: 10s) SELECT time, host, usage_idle FROM cpu WHERE time > now() - 2m ORDER BY time DESC LIMIT 20
⚠️ Common InfluxDB Issues

If queries return no data: Check Telegraf agent status on monitored hosts (systemctl status telegraf), verify InfluxDB output configuration in /etc/telegraf/telegraf.conf, check firewall rules allowing port 8086, and review InfluxDB logs for write errors (/var/log/influxdb/influxd.log).

✅ InfluxDB Verification Complete

You should see:

  • Telegraf database exists and contains expected measurements
  • Recent data points (within last 60 seconds) for all monitored hosts
  • Query execution times under 1 second for 24-hour ranges
  • No write errors in InfluxDB logs

Step 3: VictoriaLogs Data Source Verification

VictoriaLogs aggregates syslog and application logs. Verify log ingestion, filtering, and query performance using LogQL syntax.

Test 1: Verify VictoriaLogs connectivity

# Check VictoriaLogs health from command line curl -s http://victorialogs-server:9428/health # Expected output: {"status":"ok"} # Test basic log query in Grafana Explore {job="syslog"} | limit 100 # Query by severity {job="syslog"} |~ "error|critical|alert" | limit 50
💡 VictoriaLogs Performance Tips

For best query performance: Use label filters ({job="syslog", hostname="..."}) before line filters (|~), limit time ranges to necessary duration, use count_over_time for aggregations rather than retrieving all log lines.

✅ VictoriaLogs Verification Complete

You should see:

  • Health endpoint returns {"status":"ok"}
  • Recent logs appear within last 60 seconds
  • Severity filtering returns appropriate log entries
  • Query execution times under 3 seconds for 24-hour ranges

Step 4: Query Performance Baseline Measurements

Establish performance baselines to identify degradation over time as your monitoring environment scales.

Query Type Data Source Time Range Target Response Time
Device inventory list LibreNMS N/A < 2 seconds
Interface top 10 bandwidth LibreNMS Current < 3 seconds
CPU usage time series InfluxDB 24 hours < 1 second
Memory aggregation InfluxDB 7 days < 2 seconds
Log search by keyword VictoriaLogs 1 hour < 2 seconds

Section 2: Dashboard Functionality Testing

Your dashboards are the primary interface for monitoring operations. This section provides comprehensive testing procedures to ensure every interactive element functions correctly.

Step 1: Variable Selection and Panel Updates

Dashboard variables allow dynamic filtering. Test that variable changes propagate to all dependent panels.

Test procedure:

  1. Navigate to your unified monitoring dashboard
  2. Document current variable values
  3. Change the variable value (select different host)
  4. Verify all panels update within 2-3 seconds
  5. Test multi-select variables with multiple values
  6. Test "All" option for aggregate data
✅ Expected Results

All panels refresh within 2-3 seconds. No panels show "No data". Panel titles update to reflect new variable values. Query inspector shows updated WHERE clauses.

Step 2: Time Range Picker Validation

The time range picker is crucial for historical analysis. Verify it affects all visualizations correctly.

Time Range Expected Behavior
Last 5 minutes All panels show only last 5 minutes of data
Last 6 hours Data compressed to 30s or 1m aggregation
Last 7 days Downsampled to 5m or 10m intervals

Step 3: Refresh and Auto-Refresh Testing

For real-time monitoring, auto-refresh is essential. Test both manual refresh and automatic intervals.

  1. Click refresh button - observe all panels reload
  2. Set auto-refresh to 10s
  3. Open browser console → Network tab
  4. Observe requests firing every 10 seconds
  5. Verify panels update with changing metrics
⚠️ Auto-Refresh Performance Impact

Aggressive auto-refresh (5s or less) can overload data sources. For production NOC screens, use 30s or 1m refresh rates. For troubleshooting, 10s is acceptable. Disable when not actively monitoring.

Section 3: Alert Rule Validation

Alerting is critical for proactive monitoring. This section ensures alert rules trigger correctly, notifications are delivered, and alert lifecycle functions properly.

Step 1: Alert Condition Trigger Testing

Intentionally trigger each alert to verify threshold accuracy and notification delivery.

Test high CPU alert:

# SSH to a monitored host ssh admin@test-server-01 # Generate CPU load stress --cpu 4 --timeout 120s # Watch Grafana alert state transition: # Normal → Pending → Firing # Verify notification received

Test interface down alert:

# Safely shut down a test interface ssh admin@test-switch config t interface GigabitEthernet1/0/24 shutdown # Wait for LibreNMS poller cycle # Verify alert fires in Grafana # Restore interface: no shutdown
✅ Alert Trigger Validation Complete

For each alert rule, verify:

  • State transitions Normal → Pending → Firing correctly
  • Alert evaluation follows configured interval
  • Pending duration matches configured threshold
  • Alert annotations appear on dashboards

Step 2: Notification Channel Verification

Test that alerts are delivered to all configured notification channels.

Channel Type Test Method
Email (SMTP) Send test notification button
Slack Send test notification button
PagerDuty Trigger test alert
# Minimum alert notification should contain: # - Alert name/title # - Current metric value # - Threshold value # - Affected host/device # - Timestamp # - Dashboard link # - Instructions/runbook link

Step 3: Alert Silencing for Maintenance

Test that you can silence alerts during planned maintenance.

  1. Navigate to Alerting → Silences
  2. Click "New silence"
  3. Configure matcher: hostname=test-server-01
  4. Set duration: 1 hour
  5. Trigger alert condition
  6. Verify alert shows "Suppressed" instead of firing
🛠️ Maintenance Window Best Practices

Create silences 15 minutes before maintenance. Use descriptive comments including ticket number. Set duration slightly longer than estimated maintenance. Delete silence manually if work completes early.

Section 4: Knowledge Assessment Quiz

Test your understanding of concepts covered throughout this lab series.

Question 1: Data Source Selection

Which query language is used to retrieve data from LibreNMS in Grafana?

A) InfluxQL
B) LogQL
C) SQL (MySQL dialect)
D) PromQL
Correct Answer: C - SQL (MySQL dialect)
LibreNMS uses a MySQL database to store device inventory and SNMP polling data. When configuring LibreNMS as a data source in Grafana, you select "MySQL" and write standard SQL queries.
Question 2: Metric Conversion

To convert LibreNMS interface traffic from octets to Mbps, what calculation is required?

A) Divide octets by 1,000,000
B) Multiply octets by 8, then divide by 1,000,000
C) Multiply octets by 1,000,000
D) Divide octets by 8
Correct Answer: B - Multiply octets by 8, then divide by 1,000,000
1 byte (octet) = 8 bits. Formula: (octets * 8) / 1000000 = Mbps. Example: 10,000,000 octets/sec = 80 Mbps.
Question 3: Unified Monitoring Benefits

What is the PRIMARY operational benefit of integrating LibreNMS, Telegraf, and VictoriaLogs?

A) Reduces infrastructure costs
B) Enables faster root cause analysis through data correlation
C) Eliminates need for individual tools
D) Automatically fixes issues
Correct Answer: B - Enables faster root cause analysis through data correlation
The key value is correlation: when interface errors (LibreNMS), CPU spikes (Telegraf), and application errors (VictoriaLogs) occur simultaneously, correlation dramatically reduces MTTR.

📊 Quiz Score

Total Questions: 3 sample questions shown
Correct Answers: 0 / 3

Section 5: Hands-On Practical Exercises

Apply your knowledge through practical exercises that simulate real-world scenarios.

Exercise 1: Generate Test Load and Observe Dashboard Changes

Objective: Verify dashboards accurately reflect system load changes in real-time.

  1. Open your unified monitoring dashboard
  2. Set auto-refresh to 10 seconds
  3. Note baseline CPU/memory for test server
  4. SSH to server: stress --cpu 2 --vm 2 --vm-bytes 512M --timeout 120s
  5. Observe panels update within 30 seconds
  6. Document peak values
  7. Verify metrics return to baseline

Expected: CPU spike to ~100%, memory +512MB, baseline within 1 minute.

Exercise 2: Correlate Data Across Sources

Scenario: Web application unreachable. Use dashboard to identify cause.

  1. Stop web service: systemctl stop nginx
  2. Generate test traffic: curl http://test-webserver
  3. Check VictoriaLogs for HTTP errors
  4. Check LibreNMS for interface status
  5. Check InfluxDB for connection failures
  6. Correlate timestamps: which issue occurred first?
  7. Document analysis and resolution

Expected: Identify stopped service as root cause, demonstrate timestamp correlation.

Exercise 3: Create Custom Dashboard Variable

Objective: Build multi-select variable for filtering by location.

  1. Edit dashboard → Settings → Variables → Add
  2. Name: location, Type: Query
  3. Data source: LibreNMS MySQL
  4. Query: SELECT DISTINCT location FROM devices WHERE status = 1
  5. Enable "Multi-value" and "Include All"
  6. Update panels to use: WHERE location IN ($location)
  7. Test selection and verify updates

Expected: Dropdown filters entire dashboard, "All" shows all devices.

Section 6: Production Readiness Checklist

Complete this comprehensive checklist before production deployment.

Infrastructure Health

Performance Validation

Alerting and Notifications

Security and Access Control

Backup and Documentation

Progress

0
of 22 items completed

Completion

0%
Ready at 100%

Status

NOT READY
Complete all items

Section 7: Next Steps & Advanced Topics

You've built a production-grade unified monitoring solution. Here are advanced topics to expand your capabilities.

1. Advanced Alerting Strategies

Predictive Alerting: Use Grafana ML plugin or InfluxDB forecast() to predict issues before they occur.

Composite Alerts: Create alerts evaluating multiple conditions across data sources.

Dynamic Thresholds: Use percentile-based alerts that adapt to normal patterns instead of static thresholds.

2. Scaling for Enterprise

High Availability: Deploy Grafana behind load balancer with shared PostgreSQL backend.

Performance: Implement query result caching, database replication for read scaling, and CDN for static assets.

Multi-tenancy: Use Grafana organizations for customer/department isolation.

3. Integration Opportunities

ITSM Integration: Connect alerts to ServiceNow, Jira for automatic ticket creation.

ChatOps: Deploy Grafana Slack bot for dashboard queries from chat.

Automation: Use Terraform for infrastructure-as-code dashboard deployments.

4. Continuous Improvement

Monthly Review: Analyze alert fatigue metrics, dashboard usage statistics, and query performance trends.

Feedback Loop: Collect input from operations team on dashboard effectiveness.

Stay Updated: Follow Grafana Labs blog, join community forums, attend virtual meetups.

🎉 Congratulations!

You've completed the Grafana Dashboard Integration Lab! You've learned to:

  • Configure three distinct data sources (LibreNMS, Telegraf, VictoriaLogs)
  • Write queries in SQL, InfluxQL, and LogQL
  • Build integrated dashboards for unified monitoring
  • Troubleshoot common integration issues
  • Apply production-grade best practices
  • Validate deployments through comprehensive testing

What's Next? Apply these skills to your production environment, share your dashboards with your team, and continue learning through the WholeStack Solutions platform.