Module 5: VictoriaLogs LogQL Queries - Grafana Integration Lab

1. Loki Data Source Configuration for VictoriaLogs

VictoriaLogs implements the Loki API, making it compatible with Grafana's Loki data source. This allows you to use Grafana's mature Loki query builder without requiring custom plugins.

Step 1: Add Loki Data Source

Log into Grafana (typically http://localhost:3000)
Click ☰ menu → Connections → Data sources
Click "Add new data source"
Search for and select "Loki"

💡 Why Loki?

VictoriaLogs implements the Loki API for compatibility, so we use the Loki data source type. Name it VictoriaLogs for clarity.

Step 2: Configure the Critical /select/logsql Endpoint

⚠️ CRITICAL: The /select/logsql Endpoint

This is the most important configuration step. VictoriaLogs requires /select/logsql as its query endpoint. Without this exact path, queries will fail with 404 errors.

URL Configuration Examples:

# Local development
http://localhost:9428/select/logsql

# Docker Compose (using service name)
http://victorialogs:9428/select/logsql

# Production with hostname
http://victorialogs.example.com:9428/select/logsql

# Production with HTTPS
https://victorialogs.example.com/select/logsql

Complete Configuration Settings:

Setting	Value	Notes
Name	`VictoriaLogs`	Descriptive name for identification
URL	`http://victorialogs:9428/select/logsql`	Must end with `/select/logsql`
Access	`Server (default)`	Grafana server makes requests
Timeout	`60` seconds	Adjust for large queries if needed

💡 Understanding the Endpoint

/select - VictoriaLogs query API namespace
/logsql - LogQL compatibility interface
Port 9428 is VictoriaLogs' default HTTP port

Step 3: Test and Verify Connection

Scroll to the bottom of the configuration page
Click "Save & test"
Wait for the connection test (1-3 seconds)

✅ Expected Success Message

"Data source connected and labels found."

This confirms Grafana can reach VictoriaLogs and query data successfully.

⚠️ Common Connection Errors

Error	Cause	Solution
"HTTP 404"	Missing `/select/logsql`	Verify URL ends with exact path
"Connection refused"	VictoriaLogs not running	Check: `curl http://localhost:9428/health`
"Timeout"	Network/firewall issue	Verify network connectivity

Step 4: Verify in Explore View

Test the connection with a simple query:

Navigate to ☰ menu → Explore
Select VictoriaLogs from the data source dropdown
Enter query: {_stream=~".+"} (matches all streams)
Click "Run query" or press Shift + Enter

✅ What to Expect

You should see log lines with timestamps and labels. If "No logs found," the connection works but you may need different label filters based on your data.

2. LogQL Syntax with Label Filtering

LogQL (Log Query Language) is the query language for VictoriaLogs. Understanding its syntax is essential for effective log querying and dashboard building.

🔑 LogQL Query Structure

# Basic structure
{label_selectors} |= "text filter" | parser | field_filters

# Example
{job="nginx", level="error"} |= "database" | json | status >= 500

Queries consist of:

Stream Selector: {label="value"} - Filters log streams
Line Filter: |= "text" - Searches log content
Parser: | json or | regexp - Extracts fields
Field Filter: | field > value - Filters on parsed fields

Label Matching Operators

Operator	Description	Example
`=`	Exact match	`{job="nginx"}`
`!=`	Not equal	`{level!="debug"}`
`=~`	Regex match	`{host=~"web.*"}`
`!~`	Regex not match	`{path!~"/health.*"}`

Label Filtering Examples

# Single label
{job="nginx"}

# Multiple labels (AND logic)
{job="nginx", level="error"}

# Regex pattern - all web servers
{host=~"web[0-9]+"}

# Exclude pattern
{path!~"/health.*"}

# Complex combination
{job="nginx", environment="production", level="error"}

Line Filter Operators

Operator	Description	Example
`\|=`	Contains string	`\|= "error"`
`!=`	Does not contain	`!= "debug"`
`\|~`	Matches regex	`\|~ "error\|warn"`
`!~`	Does not match regex	`!~ "GET /health"`

Text Filtering Examples

# Contains "database"
{job="api"} |= "database"

# Exclude health checks
{job="nginx"} != "health check"

# Regex OR - errors OR warnings
{job="app"} |~ "error|warn|fatal"

# Multiple filters (pipeline)
{job="api"} |= "timeout" != "expected timeout"

# HTTP error codes 4xx or 5xx
{job="nginx"} |~ "\" [45][0-9]{2} "

⚠️ Performance Tip

Use simple text matching (|=) before regex (|~) when possible. Regex is slower, especially with complex patterns.

3. Log Parsing (JSON Extraction & Regex Patterns)

VictoriaLogs provides parsers to extract structured fields from logs, enabling field-level filtering and aggregations.

JSON Parser

The | json parser extracts fields from JSON-formatted logs.

Example JSON Log

{
  "timestamp": "2025-01-22T10:30:45Z",
  "level": "error",
  "message": "Database connection failed",
  "service": "user-api",
  "metadata": {
    "user_id": "12345",
    "duration_ms": 5432,
    "error_code": "CONN_TIMEOUT"
  }
}

JSON Parsing Syntax

# Parse all JSON fields
{job="api"} | json

# Parse and filter on field
{job="api"} | json | level="error"

# Nested fields use underscores: metadata.user_id becomes metadata_user_id
{job="api"} | json | metadata_error_code="CONN_TIMEOUT"

# Multiple filters
{job="api"} | json | level="error" | metadata_duration_ms > 5000

💡 Field Naming for Nested JSON

Nested JSON fields are flattened with underscores:

metadata.user_id → metadata_user_id
metadata.duration_ms → metadata_duration_ms

Numeric Comparisons

# Greater than
{job="api"} | json | duration_ms > 1000

# Range filtering
{job="nginx"} | json | status >= 400 | status < 500

# Multiple numeric filters
{job="api"} | json | retry_count > 2 | duration_ms > 5000

Regex Parser

The | regexp parser extracts fields from unstructured logs using named capture groups.

Regex Syntax

# Basic syntax with named groups
| regexp "pattern with (?Pregex)"

# Extract IP and status from Apache logs
{job="nginx"} 
| regexp "^(?P\\d+\\.\\d+\\.\\d+\\.\\d+) .* (?P\\d{3})"
| status >= 400

Common Log Format Patterns

# Apache/Nginx access log
{job="nginx"} 
| regexp "^(?P[\\d.]+) .* \"(?P\\w+) (?P[^ ]+) [^\"]+\" (?P\\d{3})"
| status >= 500

# Application log with level and component
{job="app"} 
| regexp "\\[(?P\\w+)\\] (?P\\w+): (?P.*)"
| level="ERROR"

# Extract timestamp, user, and action
{job="audit"} 
| regexp "\\[(?P[^\\]]+)\\] user=(?P\\w+) action=(?P\\w+)"
| action=~"delete|modify"

Regex Building Blocks

Data Type	Pattern	Example Match
IP Address	`[\d.]+`	192.168.1.1
Integer	`\d+`	12345
Decimal	`[\d.]+`	99.99
Word	`\w+`	error, user_id
Path	`[^ ]+`	/api/users/123
Rest of line	`.*`	Any text

⚠️ Performance: Filter Before Parsing

# ✅ GOOD: Filter first, then parse
{job="nginx"} |= "POST" | regexp "..." | status >= 400

# ❌ BAD: Parse everything, then filter
{job="nginx"} | regexp "..." | method="POST" | status >= 400

4. Rate Calculations and Aggregations

LogQL enables powerful aggregations to calculate metrics from logs: rates, counts, and statistics.

Range Vectors

Aggregations operate on time windows specified in square brackets:

# Time window syntax
[5m]   # Last 5 minutes
[1h]   # Last 1 hour
[24h]  # Last 24 hours
[7d]   # Last 7 days

count_over_time() - Count Log Lines

# Count errors in last 5 minutes
count_over_time({job="api", level="error"}[5m])

# Count 404 errors
count_over_time({job="nginx"} | json | status="404" [5m])

# Count database failures
count_over_time({job="api"} |= "database" |= "failed" [15m])

rate() - Calculate Rate per Second

# Error rate per second
rate({job="api", level="error"}[5m])

# Request rate for specific endpoint
rate({job="nginx"} | json | path="/api/users" [5m])

# Failed auth rate
rate({job="auth"} | json | event="auth_failed" [10m])

💡 count_over_time() vs rate()

count_over_time() - Absolute counts ("total errors in last hour")
rate() - Per-second rate ("error rate trend over time")

Aggregation Operators

sum() and sum by()

# Total error count
sum(count_over_time({level="error"}[5m]))

# Error count per service
sum by (service) (count_over_time({level="error"} | json [5m]))

# Request rate per HTTP method
sum by (method) (rate({job="nginx"} | json [5m]))

# Status code distribution
sum by (status) (count_over_time({job="nginx"} | json [5m]))

Other Aggregation Operators

Operator	Description	Example
`avg()`	Average value	`avg(rate({job="api"}[5m]))`
`min()`	Minimum value	`min(count_over_time(...))`
`max()`	Maximum value	`max(rate({job="nginx"}[5m]))`
`topk()`	Top K series	`topk(5, sum by(service)(rate(...)))`

⚠️ Common Aggregation Mistakes

Missing time range: Always specify [5m], [1h], etc.
Too short range: Use at least [1m] for meaningful rates
Forgot to parse: Include | json before field filtering

5. Comprehensive LogQL Query Examples

Real-world queries demonstrating LogQL capabilities for production monitoring.

Example 1: API Error Monitoring

Use case: Track API errors by endpoint and status code

sum by (endpoint, status) (
  count_over_time(
    {job="api", environment="production"} 
    | json endpoint, status, method
    | status >= 400
    | endpoint!~"/health.*|/metrics.*"
    [5m]
  )
)

Example 2: Database Connection Failures

Use case: Monitor database connection issues by service

sum by (service) (
  rate(
    {job=~".*-api", level="error"} 
    | json service, error_code
    | error_code=~"DB_.*|POOL_.*"
    [10m]
  )
)

Example 3: Failed Authentication Analysis

Use case: Identify potential brute force attacks

# Failed attempts by country
topk(10,
  sum by (country) (
    count_over_time(
      {job="auth-service"} 
      | json event, country
      | event="auth_failed"
      [1h]
    )
  )
)

# High-frequency failures (brute force detection)
sum by (ip_address) (
  count_over_time(
    {job="auth-service"} 
    | json event, ip_address
    | event="auth_failed"
    [5m]
  )
) > 10

Example 4: Payment Transaction Monitoring

Use case: Track payment success rates

# Success rate by payment method
sum by (payment_method) (
  count_over_time(
    {job="payment-api"} 
    | json transaction_status, payment_method
    | transaction_status="success"
    [15m]
  )
)
/
sum by (payment_method) (
  count_over_time(
    {job="payment-api"} 
    | json payment_method
    [15m]
  )
)

Example 5: Nginx Performance Analysis

Use case: Monitor slow requests and traffic patterns

# Request rate by method
sum by (method) (
  rate({job="nginx"} | json method [5m])
)

# Slow requests (>2s) by endpoint
sum by (path) (
  count_over_time(
    {job="nginx"} 
    | json path, request_time
    | request_time > 2.0
    [5m]
  )
)

# Top requested endpoints
topk(10,
  sum by (path) (
    rate({job="nginx"} | json path [15m])
  )
)

Example 6: Application Exception Tracking

Use case: Monitor critical exceptions

# Exception rate by type
sum by (exception_type) (
  rate(
    {level="error"} 
    | regexp "(?P\\w+Exception):"
    | environment="production"
    [10m]
  )
)

# Critical exception alert
count_over_time(
  {level="error"} 
  | json exception_type
  | exception_type=~"NullPointer.*|OutOfMemory.*"
  [5m]
) > 5

Example 7: Microservices Request Tracing

Use case: Track requests across services

# Find all logs for specific request
{environment="production"} 
| json request_id, service
| request_id="abc-def-123"

# Service latency distribution
sum by (service) (
  count_over_time(
    {job=~".*-api"} 
    | json service, duration_ms
    | duration_ms > 1000
    [15m]
  )
)

Example 8: Kubernetes Pod Monitoring

Use case: Monitor pod restarts and OOMKills

# Pod restart events
sum by (namespace, reason) (
  count_over_time(
    {job="kubernetes-events"} 
    | json event_type, namespace, reason
    | event_type="Warning"
    | reason=~"BackOff|CrashLoopBackOff|OOMKilled"
    [1h]
  )
)

6. Label Naming Conventions

Consistent label naming improves query performance and maintainability across your log infrastructure.

Understanding Label Cardinality

Cardinality	Unique Values	Examples	Recommendation
Low	< 100	environment, level, service	✅ Use as labels
Medium	100-1000	host, container_name	⚠️ Use carefully
High	> 1000	user_id, request_id, IP	❌ Extract from log content

⚠️ High Cardinality Impact

High-cardinality labels create performance problems:

Slower queries (more streams to scan)
Higher memory usage
Storage bloat from metadata overhead

Solution: Keep high-cardinality data in log content, extract with | json

Recommended Label Schema

Label	Purpose	Example Values
`job`	Application/service name	user-api, nginx, payment-service
`environment`	Deployment environment	production, staging, development
`level`	Log severity	error, warn, info, debug
`host`	Server hostname	web01, api-server-prod-01
`namespace`	Kubernetes namespace	default, production, monitoring
`region`	Cloud region	us-east-1, eu-west-1

💡 Label Naming Best Practices

Use lowercase: environment not Environment
Use underscores: error_code not errorCode
Be concise: level not log_level
Avoid special characters (only alphanumeric + underscore)
Match Prometheus conventions if using both

7. Integration with Log Shippers (Brief)

Log shippers (Promtail, Fluent Bit, Vector) send logs to VictoriaLogs and apply labels at ingestion time. This section provides a brief overview; detailed configuration is covered in Module 7.

Promtail Configuration Example

# promtail-config.yaml
clients:
  - url: http://victorialogs:9428/insert/loki/api/v1/push

scrape_configs:
  - job_name: nginx
    static_configs:
      - labels:
          job: nginx
          environment: production
          host: __hostname__
        targets:
          - localhost
    pipeline_stages:
      - regex:
          expression: '.*\[(?P\w+)\].*'
      - labels:
          level:

Fluent Bit Configuration Example

# fluent-bit.conf
[INPUT]
    Name              tail
    Path              /var/log/app/*.log
    Tag               app.logs

[FILTER]
    Name              record_modifier
    Match             *
    Record            job application
    Record            environment production
    Record            host ${HOSTNAME}

[OUTPUT]
    Name              http
    Match             *
    Host              victorialogs
    Port              9428
    URI               /insert/loki/api/v1/push
    Format            json

💡 Key Principle

Apply low-cardinality labels at the shipper level (environment, job, host). Extract high-cardinality data during querying with parsers.

8. Common LogQL Errors and Solutions

Error Reference Table

Error Message	Cause	Solution
"parse error"	Syntax error	Check for missing quotes, braces, or pipes
"no streams"	No matching logs	Verify labels with `{_stream=~".+"}`
"json parser: invalid"	Logs not JSON format	Verify log format, use different parser
"regex error"	Invalid regex pattern	Test regex separately, check escaping
"range vector not allowed"	Missing aggregation	Wrap in `count_over_time(...[5m])`
"timeout exceeded"	Query too complex	Add filters, reduce time range
"unknown label"	Label doesn't exist	Labels must be applied at ingestion

Query Performance Troubleshooting

Performance Optimization Checklist

# ❌ SLOW: No label filters
{_stream=~".+"} | json

# ✅ FAST: Specific labels
{job="nginx", environment="production"} | json [5m]

# ❌ SLOW: Complex regex on all logs
{job="api"} |~ "^.*user.*transaction.*failed.*$"

# ✅ FAST: Text filter first
{job="api"} |= "transaction failed"

# ❌ SLOW: Large time range
{job="nginx"}[7d]

# ✅ FAST: Aggregated with reasonable range
sum(count_over_time({job="nginx"} | json [1h]))

Best Practices

Use specific labels - More labels = smaller dataset
Filter before parsing - |= before | json
Limit time ranges - Start with 5m-15m, expand as needed
Avoid wildcards at start - nginx.* better than .*nginx
Use aggregations for graphs - Don't query individual lines for time series

Debugging Process

Start simple: {job="nginx"}
Verify data exists: Try {_stream=~".*nginx.*"}
Add filters incrementally: One at a time
Check parser output: {job="api"} | json (view fields)
Test regex separately: Use online regex testers
Verify time range: Ensure logs exist in selected period

⚠️ Top 5 Mistakes to Avoid

Forgetting time ranges: rate(...) needs [5m]
Using high-cardinality labels (user IDs, IPs)
Not escaping regex: \\. for literal dots
Parsing before filtering (always filter first)
Querying too much data (use specific labels)

Module Summary

🎓 What You Learned

✅ Configured Loki data source with critical /select/logsql endpoint
✅ Mastered LogQL syntax: labels, filters, parsers
✅ Parsed JSON logs and extracted fields with regex
✅ Performed rate calculations and aggregations
✅ Applied label naming conventions
✅ Implemented 8 real-world query examples
✅ Understood log shipper integration basics
✅ Learned to troubleshoot common errors

💡 Key Takeaways

Concept	Remember This
Endpoint	Always use `/select/logsql`
Cardinality	Low cardinality as labels, high in content
Performance	Filter with `\|=` before `\| json`
Aggregations	Always specify time range: `[5m]`

🎉 Module 5 Complete!

You can now query logs effectively in VictoriaLogs using LogQL. Next, Module 6 teaches you how to build comprehensive Grafana dashboards using these queries.

🎯 Module Overview

Learning Objectives