🔍 Module 5: VictoriaLogs LogQL Queries

Mastering Log Query Language for VictoriaLogs Integration

⏱️ Estimated Time: 25-30 minutes

Progress: Module 5 of 8 (62.5%)

🎯 Module Overview

VictoriaLogs provides powerful log aggregation and querying capabilities that integrate seamlessly with Grafana. This module teaches you how to configure the VictoriaLogs data source in Grafana and craft effective LogQL queries to extract meaningful insights from your log data.

Learning Objectives

  • Configure Loki data source in Grafana with the critical /select/logsql endpoint
  • Master LogQL syntax including label filtering and stream selection
  • Parse JSON logs and extract structured data with regex patterns
  • Perform rate calculations and aggregations on log data
  • Apply label naming conventions for consistent log organization
  • Troubleshoot common LogQL query errors

1. Loki Data Source Configuration for VictoriaLogs

VictoriaLogs implements the Loki API, making it compatible with Grafana's Loki data source. This allows you to use Grafana's mature Loki query builder without requiring custom plugins.

Step 1: Add Loki Data Source

  1. Log into Grafana (typically http://localhost:3000)
  2. Click ☰ menuConnectionsData sources
  3. Click "Add new data source"
  4. Search for and select "Loki"

💡 Why Loki?

VictoriaLogs implements the Loki API for compatibility, so we use the Loki data source type. Name it VictoriaLogs for clarity.

Step 2: Configure the Critical /select/logsql Endpoint

⚠️ CRITICAL: The /select/logsql Endpoint

This is the most important configuration step. VictoriaLogs requires /select/logsql as its query endpoint. Without this exact path, queries will fail with 404 errors.

URL Configuration Examples:

# Local development
http://localhost:9428/select/logsql

# Docker Compose (using service name)
http://victorialogs:9428/select/logsql

# Production with hostname
http://victorialogs.example.com:9428/select/logsql

# Production with HTTPS
https://victorialogs.example.com/select/logsql

Complete Configuration Settings:

Setting Value Notes
Name VictoriaLogs Descriptive name for identification
URL http://victorialogs:9428/select/logsql Must end with /select/logsql
Access Server (default) Grafana server makes requests
Timeout 60 seconds Adjust for large queries if needed

💡 Understanding the Endpoint

  • /select - VictoriaLogs query API namespace
  • /logsql - LogQL compatibility interface
  • Port 9428 is VictoriaLogs' default HTTP port

Step 3: Test and Verify Connection

  1. Scroll to the bottom of the configuration page
  2. Click "Save & test"
  3. Wait for the connection test (1-3 seconds)

✅ Expected Success Message

"Data source connected and labels found."

This confirms Grafana can reach VictoriaLogs and query data successfully.

⚠️ Common Connection Errors

Error Cause Solution
"HTTP 404" Missing /select/logsql Verify URL ends with exact path
"Connection refused" VictoriaLogs not running Check: curl http://localhost:9428/health
"Timeout" Network/firewall issue Verify network connectivity

Step 4: Verify in Explore View

Test the connection with a simple query:

  1. Navigate to ☰ menuExplore
  2. Select VictoriaLogs from the data source dropdown
  3. Enter query: {_stream=~".+"} (matches all streams)
  4. Click "Run query" or press Shift + Enter

✅ What to Expect

You should see log lines with timestamps and labels. If "No logs found," the connection works but you may need different label filters based on your data.

2. LogQL Syntax with Label Filtering

LogQL (Log Query Language) is the query language for VictoriaLogs. Understanding its syntax is essential for effective log querying and dashboard building.

🔑 LogQL Query Structure

# Basic structure
{label_selectors} |= "text filter" | parser | field_filters

# Example
{job="nginx", level="error"} |= "database" | json | status >= 500

Queries consist of:

  1. Stream Selector: {label="value"} - Filters log streams
  2. Line Filter: |= "text" - Searches log content
  3. Parser: | json or | regexp - Extracts fields
  4. Field Filter: | field > value - Filters on parsed fields

Label Matching Operators

Operator Description Example
= Exact match {job="nginx"}
!= Not equal {level!="debug"}
=~ Regex match {host=~"web.*"}
!~ Regex not match {path!~"/health.*"}

Label Filtering Examples

# Single label
{job="nginx"}

# Multiple labels (AND logic)
{job="nginx", level="error"}

# Regex pattern - all web servers
{host=~"web[0-9]+"}

# Exclude pattern
{path!~"/health.*"}

# Complex combination
{job="nginx", environment="production", level="error"}

Line Filter Operators

Operator Description Example
|= Contains string |= "error"
!= Does not contain != "debug"
|~ Matches regex |~ "error|warn"
!~ Does not match regex !~ "GET /health"

Text Filtering Examples

# Contains "database"
{job="api"} |= "database"

# Exclude health checks
{job="nginx"} != "health check"

# Regex OR - errors OR warnings
{job="app"} |~ "error|warn|fatal"

# Multiple filters (pipeline)
{job="api"} |= "timeout" != "expected timeout"

# HTTP error codes 4xx or 5xx
{job="nginx"} |~ "\" [45][0-9]{2} "

⚠️ Performance Tip

Use simple text matching (|=) before regex (|~) when possible. Regex is slower, especially with complex patterns.

3. Log Parsing (JSON Extraction & Regex Patterns)

VictoriaLogs provides parsers to extract structured fields from logs, enabling field-level filtering and aggregations.

JSON Parser

The | json parser extracts fields from JSON-formatted logs.

Example JSON Log

{
  "timestamp": "2025-01-22T10:30:45Z",
  "level": "error",
  "message": "Database connection failed",
  "service": "user-api",
  "metadata": {
    "user_id": "12345",
    "duration_ms": 5432,
    "error_code": "CONN_TIMEOUT"
  }
}

JSON Parsing Syntax

# Parse all JSON fields
{job="api"} | json

# Parse and filter on field
{job="api"} | json | level="error"

# Nested fields use underscores: metadata.user_id becomes metadata_user_id
{job="api"} | json | metadata_error_code="CONN_TIMEOUT"

# Multiple filters
{job="api"} | json | level="error" | metadata_duration_ms > 5000

💡 Field Naming for Nested JSON

Nested JSON fields are flattened with underscores:

  • metadata.user_idmetadata_user_id
  • metadata.duration_msmetadata_duration_ms

Numeric Comparisons

# Greater than
{job="api"} | json | duration_ms > 1000

# Range filtering
{job="nginx"} | json | status >= 400 | status < 500

# Multiple numeric filters
{job="api"} | json | retry_count > 2 | duration_ms > 5000

Regex Parser

The | regexp parser extracts fields from unstructured logs using named capture groups.

Regex Syntax

# Basic syntax with named groups
| regexp "pattern with (?Pregex)"

# Extract IP and status from Apache logs
{job="nginx"} 
| regexp "^(?P\\d+\\.\\d+\\.\\d+\\.\\d+) .* (?P\\d{3})"
| status >= 400

Common Log Format Patterns

# Apache/Nginx access log
{job="nginx"} 
| regexp "^(?P[\\d.]+) .* \"(?P\\w+) (?P[^ ]+) [^\"]+\" (?P\\d{3})"
| status >= 500

# Application log with level and component
{job="app"} 
| regexp "\\[(?P\\w+)\\] (?P\\w+): (?P.*)"
| level="ERROR"

# Extract timestamp, user, and action
{job="audit"} 
| regexp "\\[(?P[^\\]]+)\\] user=(?P\\w+) action=(?P\\w+)"
| action=~"delete|modify"

Regex Building Blocks

Data Type Pattern Example Match
IP Address [\d.]+ 192.168.1.1
Integer \d+ 12345
Decimal [\d.]+ 99.99
Word \w+ error, user_id
Path [^ ]+ /api/users/123
Rest of line .* Any text

⚠️ Performance: Filter Before Parsing

# ✅ GOOD: Filter first, then parse
{job="nginx"} |= "POST" | regexp "..." | status >= 400

# ❌ BAD: Parse everything, then filter
{job="nginx"} | regexp "..." | method="POST" | status >= 400

4. Rate Calculations and Aggregations

LogQL enables powerful aggregations to calculate metrics from logs: rates, counts, and statistics.

Range Vectors

Aggregations operate on time windows specified in square brackets:

# Time window syntax
[5m]   # Last 5 minutes
[1h]   # Last 1 hour
[24h]  # Last 24 hours
[7d]   # Last 7 days

count_over_time() - Count Log Lines

# Count errors in last 5 minutes
count_over_time({job="api", level="error"}[5m])

# Count 404 errors
count_over_time({job="nginx"} | json | status="404" [5m])

# Count database failures
count_over_time({job="api"} |= "database" |= "failed" [15m])

rate() - Calculate Rate per Second

# Error rate per second
rate({job="api", level="error"}[5m])

# Request rate for specific endpoint
rate({job="nginx"} | json | path="/api/users" [5m])

# Failed auth rate
rate({job="auth"} | json | event="auth_failed" [10m])

💡 count_over_time() vs rate()

  • count_over_time() - Absolute counts ("total errors in last hour")
  • rate() - Per-second rate ("error rate trend over time")

Aggregation Operators

sum() and sum by()

# Total error count
sum(count_over_time({level="error"}[5m]))

# Error count per service
sum by (service) (count_over_time({level="error"} | json [5m]))

# Request rate per HTTP method
sum by (method) (rate({job="nginx"} | json [5m]))

# Status code distribution
sum by (status) (count_over_time({job="nginx"} | json [5m]))

Other Aggregation Operators

Operator Description Example
avg() Average value avg(rate({job="api"}[5m]))
min() Minimum value min(count_over_time(...))
max() Maximum value max(rate({job="nginx"}[5m]))
topk() Top K series topk(5, sum by(service)(rate(...)))

⚠️ Common Aggregation Mistakes

  • Missing time range: Always specify [5m], [1h], etc.
  • Too short range: Use at least [1m] for meaningful rates
  • Forgot to parse: Include | json before field filtering

5. Comprehensive LogQL Query Examples

Real-world queries demonstrating LogQL capabilities for production monitoring.

Example 1: API Error Monitoring

Use case: Track API errors by endpoint and status code

sum by (endpoint, status) (
  count_over_time(
    {job="api", environment="production"} 
    | json endpoint, status, method
    | status >= 400
    | endpoint!~"/health.*|/metrics.*"
    [5m]
  )
)

Example 2: Database Connection Failures

Use case: Monitor database connection issues by service

sum by (service) (
  rate(
    {job=~".*-api", level="error"} 
    | json service, error_code
    | error_code=~"DB_.*|POOL_.*"
    [10m]
  )
)

Example 3: Failed Authentication Analysis

Use case: Identify potential brute force attacks

# Failed attempts by country
topk(10,
  sum by (country) (
    count_over_time(
      {job="auth-service"} 
      | json event, country
      | event="auth_failed"
      [1h]
    )
  )
)

# High-frequency failures (brute force detection)
sum by (ip_address) (
  count_over_time(
    {job="auth-service"} 
    | json event, ip_address
    | event="auth_failed"
    [5m]
  )
) > 10

Example 4: Payment Transaction Monitoring

Use case: Track payment success rates

# Success rate by payment method
sum by (payment_method) (
  count_over_time(
    {job="payment-api"} 
    | json transaction_status, payment_method
    | transaction_status="success"
    [15m]
  )
)
/
sum by (payment_method) (
  count_over_time(
    {job="payment-api"} 
    | json payment_method
    [15m]
  )
)

Example 5: Nginx Performance Analysis

Use case: Monitor slow requests and traffic patterns

# Request rate by method
sum by (method) (
  rate({job="nginx"} | json method [5m])
)

# Slow requests (>2s) by endpoint
sum by (path) (
  count_over_time(
    {job="nginx"} 
    | json path, request_time
    | request_time > 2.0
    [5m]
  )
)

# Top requested endpoints
topk(10,
  sum by (path) (
    rate({job="nginx"} | json path [15m])
  )
)

Example 6: Application Exception Tracking

Use case: Monitor critical exceptions

# Exception rate by type
sum by (exception_type) (
  rate(
    {level="error"} 
    | regexp "(?P\\w+Exception):"
    | environment="production"
    [10m]
  )
)

# Critical exception alert
count_over_time(
  {level="error"} 
  | json exception_type
  | exception_type=~"NullPointer.*|OutOfMemory.*"
  [5m]
) > 5

Example 7: Microservices Request Tracing

Use case: Track requests across services

# Find all logs for specific request
{environment="production"} 
| json request_id, service
| request_id="abc-def-123"

# Service latency distribution
sum by (service) (
  count_over_time(
    {job=~".*-api"} 
    | json service, duration_ms
    | duration_ms > 1000
    [15m]
  )
)

Example 8: Kubernetes Pod Monitoring

Use case: Monitor pod restarts and OOMKills

# Pod restart events
sum by (namespace, reason) (
  count_over_time(
    {job="kubernetes-events"} 
    | json event_type, namespace, reason
    | event_type="Warning"
    | reason=~"BackOff|CrashLoopBackOff|OOMKilled"
    [1h]
  )
)

6. Label Naming Conventions

Consistent label naming improves query performance and maintainability across your log infrastructure.

Understanding Label Cardinality

Cardinality Unique Values Examples Recommendation
Low < 100 environment, level, service ✅ Use as labels
Medium 100-1000 host, container_name ⚠️ Use carefully
High > 1000 user_id, request_id, IP ❌ Extract from log content

⚠️ High Cardinality Impact

High-cardinality labels create performance problems:

  • Slower queries (more streams to scan)
  • Higher memory usage
  • Storage bloat from metadata overhead

Solution: Keep high-cardinality data in log content, extract with | json

Recommended Label Schema

Label Purpose Example Values
job Application/service name user-api, nginx, payment-service
environment Deployment environment production, staging, development
level Log severity error, warn, info, debug
host Server hostname web01, api-server-prod-01
namespace Kubernetes namespace default, production, monitoring
region Cloud region us-east-1, eu-west-1

💡 Label Naming Best Practices

  • Use lowercase: environment not Environment
  • Use underscores: error_code not errorCode
  • Be concise: level not log_level
  • Avoid special characters (only alphanumeric + underscore)
  • Match Prometheus conventions if using both

7. Integration with Log Shippers (Brief)

Log shippers (Promtail, Fluent Bit, Vector) send logs to VictoriaLogs and apply labels at ingestion time. This section provides a brief overview; detailed configuration is covered in Module 7.

Promtail Configuration Example

# promtail-config.yaml
clients:
  - url: http://victorialogs:9428/insert/loki/api/v1/push

scrape_configs:
  - job_name: nginx
    static_configs:
      - labels:
          job: nginx
          environment: production
          host: __hostname__
        targets:
          - localhost
    pipeline_stages:
      - regex:
          expression: '.*\[(?P\w+)\].*'
      - labels:
          level:

Fluent Bit Configuration Example

# fluent-bit.conf
[INPUT]
    Name              tail
    Path              /var/log/app/*.log
    Tag               app.logs

[FILTER]
    Name              record_modifier
    Match             *
    Record            job application
    Record            environment production
    Record            host ${HOSTNAME}

[OUTPUT]
    Name              http
    Match             *
    Host              victorialogs
    Port              9428
    URI               /insert/loki/api/v1/push
    Format            json

💡 Key Principle

Apply low-cardinality labels at the shipper level (environment, job, host). Extract high-cardinality data during querying with parsers.

8. Common LogQL Errors and Solutions

Error Reference Table

Error Message Cause Solution
"parse error" Syntax error Check for missing quotes, braces, or pipes
"no streams" No matching logs Verify labels with {_stream=~".+"}
"json parser: invalid" Logs not JSON format Verify log format, use different parser
"regex error" Invalid regex pattern Test regex separately, check escaping
"range vector not allowed" Missing aggregation Wrap in count_over_time(...[5m])
"timeout exceeded" Query too complex Add filters, reduce time range
"unknown label" Label doesn't exist Labels must be applied at ingestion

Query Performance Troubleshooting

Performance Optimization Checklist

# ❌ SLOW: No label filters
{_stream=~".+"} | json

# ✅ FAST: Specific labels
{job="nginx", environment="production"} | json [5m]

# ❌ SLOW: Complex regex on all logs
{job="api"} |~ "^.*user.*transaction.*failed.*$"

# ✅ FAST: Text filter first
{job="api"} |= "transaction failed"

# ❌ SLOW: Large time range
{job="nginx"}[7d]

# ✅ FAST: Aggregated with reasonable range
sum(count_over_time({job="nginx"} | json [1h]))

Best Practices

  1. Use specific labels - More labels = smaller dataset
  2. Filter before parsing - |= before | json
  3. Limit time ranges - Start with 5m-15m, expand as needed
  4. Avoid wildcards at start - nginx.* better than .*nginx
  5. Use aggregations for graphs - Don't query individual lines for time series

Debugging Process

  1. Start simple: {job="nginx"}
  2. Verify data exists: Try {_stream=~".*nginx.*"}
  3. Add filters incrementally: One at a time
  4. Check parser output: {job="api"} | json (view fields)
  5. Test regex separately: Use online regex testers
  6. Verify time range: Ensure logs exist in selected period

⚠️ Top 5 Mistakes to Avoid

  1. Forgetting time ranges: rate(...) needs [5m]
  2. Using high-cardinality labels (user IDs, IPs)
  3. Not escaping regex: \\. for literal dots
  4. Parsing before filtering (always filter first)
  5. Querying too much data (use specific labels)

Module Summary

🎓 What You Learned

  • ✅ Configured Loki data source with critical /select/logsql endpoint
  • ✅ Mastered LogQL syntax: labels, filters, parsers
  • ✅ Parsed JSON logs and extracted fields with regex
  • ✅ Performed rate calculations and aggregations
  • ✅ Applied label naming conventions
  • ✅ Implemented 8 real-world query examples
  • ✅ Understood log shipper integration basics
  • ✅ Learned to troubleshoot common errors

💡 Key Takeaways

Concept Remember This
Endpoint Always use /select/logsql
Cardinality Low cardinality as labels, high in content
Performance Filter with |= before | json
Aggregations Always specify time range: [5m]

🎉 Module 5 Complete!

You can now query logs effectively in VictoriaLogs using LogQL. Next, Module 6 teaches you how to build comprehensive Grafana dashboards using these queries.