Security Features Documentation

July 23, 2025 · View on GitHub

Overview

The url-preview library implements comprehensive security measures to protect against common web scraping vulnerabilities and ensure safe URL preview generation. This document details all security features available in version 0.5.0 and above.

URL Validation
Content Security
Network Security
Error Handling
Usage Examples
Best Practices

URL Validation

Overview

The library provides robust URL validation through the UrlValidator class, which prevents access to potentially dangerous URLs and implements SSRF (Server-Side Request Forgery) protection.

Features

1. Scheme Validation

Default allowed schemes: http, https
Blocks dangerous schemes like file://, ftp://, javascript:, etc.
Configurable through UrlValidationConfig

2. Private IP Blocking

Blocks access to private IP ranges (RFC 1918)
Prevents SSRF attacks targeting internal networks
Includes IPv4 ranges:
- 10.0.0.0/8
- 172.16.0.0/12
- 192.168.0.0/16
- 169.254.0.0/16 (link-local)
- 100.64.0.0/10 (carrier-grade NAT)
Includes IPv6 ranges:
- fe80::/10 (link-local)
- fc00::/7 (unique local)

3. Localhost Blocking

Blocks localhost, 127.0.0.1, ::1
Prevents access to local services

4. Domain Filtering

Blacklist: Block specific domains or subdomains
Whitelist: Only allow specific domains (when configured)
Subdomain matching included

5. Redirect Control

Configurable maximum redirect count (default: 10)
Prevents redirect loops and excessive redirections

Configuration

use url_preview::{UrlValidationConfig, UrlValidator};
use std::collections::HashSet;

let mut config = UrlValidationConfig::default();

// Customize allowed schemes
config.allowed_schemes.insert("https".to_string());
config.allowed_schemes.remove("http"); // HTTPS only

// Add blocked domains
config.blocked_domains.insert("malicious.com".to_string());
config.blocked_domains.insert("phishing.site".to_string());

// Or use whitelist mode
config.allowed_domains.insert("trusted.com".to_string());
config.allowed_domains.insert("mycompany.com".to_string());

// Adjust security settings
config.block_private_ips = true;
config.block_localhost = true;
config.max_redirects = 5;

let validator = UrlValidator::new(config);

Content Security

Overview

The library implements content security measures to prevent resource exhaustion and ensure safe content processing.

Features

1. Content Size Limits

Default maximum: 10MB
Prevents memory exhaustion from large files
Enforced both via Content-Length header and streaming

2. Download Time Limits

Default maximum: 30 seconds
Prevents slow-loris style attacks
Ensures timely response

3. Content Type Filtering

Default allowed types:
- text/html
- application/xhtml+xml
- text/plain
- application/json
Prevents downloading binary files or unexpected content

Configuration

use url_preview::{ContentLimits, FetcherConfig};
use std::collections::HashSet;

let mut content_limits = ContentLimits::default();

// Adjust size limit (5MB)
content_limits.max_content_size = 5 * 1024 * 1024;

// Adjust time limit (15 seconds)
content_limits.max_download_time = 15;

// Customize allowed content types
content_limits.allowed_content_types.clear();
content_limits.allowed_content_types.insert("text/html".to_string());
content_limits.allowed_content_types.insert("application/json".to_string());

let config = FetcherConfig {
    content_limits,
    ..Default::default()
};

Network Security

Built-in Features

1. Timeout Protection

Default request timeout: 10 seconds
Twitter-specific timeout: 30 seconds
Configurable per-request

2. Connection Pooling

Maximum 10 idle connections per host
Prevents connection exhaustion

3. Retry Logic

Maximum 3 retries with exponential backoff
Only retries on server errors (5xx) and timeouts
Prevents amplification attacks

4. Concurrent Request Limiting

Default limit: 500 concurrent requests
Prevents resource exhaustion
Configurable via PreviewServiceConfig

SSL/TLS Security

The library uses reqwest's default TLS configuration, which includes:

TLS 1.2+ support
Certificate validation
Hostname verification

Error Handling

Security-Specific Error Types

The library provides detailed error types for security-related issues:

pub enum PreviewError {
    // URL validation errors
    InvalidUrlScheme(String),
    InvalidUrl(String),
    DomainNotAllowed(String),
    DomainBlocked(String),
    LocalhostBlocked,
    PrivateIpBlocked(String),
    
    // Content security errors
    ContentSizeExceeded { size: usize, limit: usize },
    DownloadTimeExceeded { elapsed: u64, limit: u64 },
    ContentTypeNotAllowed(String),
    
    // Other security-relevant errors
    TimeoutError(String),
    DnsError(String),
    ConnectionError(String),
}

Error Logging

With the logging feature enabled, all security events are logged appropriately:

URL validation failures: WARN level
Content security violations: WARN level
Network errors: ERROR level

Usage Examples

Basic Usage with Default Security

use url_preview::{PreviewService, PreviewError};

#[tokio::main]
async fn main() -> Result<(), PreviewError> {
    // Default configuration includes all security features
    let service = PreviewService::new();
    
    // This will fail due to localhost blocking
    match service.generate_preview("http://localhost:8080").await {
        Err(PreviewError::LocalhostBlocked) => {
            println!("Localhost access blocked as expected");
        }
        _ => unreachable!(),
    }
    
    // This will fail due to private IP blocking
    match service.generate_preview("http://192.168.1.1").await {
        Err(PreviewError::PrivateIpBlocked(_)) => {
            println!("Private IP access blocked as expected");
        }
        _ => unreachable!(),
    }
    
    Ok(())
}

Custom Security Configuration

use url_preview::{PreviewService, PreviewServiceConfig, FetcherConfig};
use url_preview::{UrlValidationConfig, ContentLimits};
use std::collections::HashSet;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create custom URL validation
    let mut url_validation = UrlValidationConfig::default();
    url_validation.allowed_domains.insert("github.com".to_string());
    url_validation.allowed_domains.insert("twitter.com".to_string());
    url_validation.max_redirects = 3;
    
    // Create custom content limits
    let mut content_limits = ContentLimits::default();
    content_limits.max_content_size = 5 * 1024 * 1024; // 5MB
    content_limits.max_download_time = 20; // 20 seconds
    
    // Configure fetcher
    let fetcher_config = FetcherConfig {
        url_validation,
        content_limits,
        ..Default::default()
    };
    
    // Create service with custom config
    let service_config = PreviewServiceConfig {
        fetcher_config: Some(fetcher_config),
        ..Default::default()
    };
    
    let service = PreviewService::with_config(service_config);
    
    // Only github.com and twitter.com URLs will work
    let preview = service.generate_preview("https://github.com/rust-lang/rust").await?;
    println!("Title: {:?}", preview.title);
    
    Ok(())
}

Handling Security Errors

use url_preview::{PreviewService, PreviewError};

async fn safe_preview(url: &str) -> Result<String, String> {
    let service = PreviewService::new();
    
    match service.generate_preview(url).await {
        Ok(preview) => Ok(preview.title.unwrap_or_else(|| "No title".to_string())),
        Err(e) => match e {
            PreviewError::InvalidUrlScheme(scheme) => {
                Err(format!("Unsafe URL scheme: {}", scheme))
            }
            PreviewError::PrivateIpBlocked(ip) => {
                Err(format!("Access to private IP {} is blocked", ip))
            }
            PreviewError::LocalhostBlocked => {
                Err("Access to localhost is blocked".to_string())
            }
            PreviewError::DomainBlocked(domain) => {
                Err(format!("Domain {} is blocked", domain))
            }
            PreviewError::ContentSizeExceeded { size, limit } => {
                Err(format!("Content too large: {} bytes (limit: {})", size, limit))
            }
            PreviewError::ContentTypeNotAllowed(ct) => {
                Err(format!("Content type {} not allowed", ct))
            }
            _ => Err(format!("Preview failed: {}", e)),
        }
    }
}

Best Practices

1. Always Use HTTPS When Possible

let mut config = UrlValidationConfig::default();
config.allowed_schemes.clear();
config.allowed_schemes.insert("https".to_string());

2. Implement Domain Whitelisting for Production

let mut config = UrlValidationConfig::default();
// Only allow specific trusted domains
config.allowed_domains.insert("trusted-news-site.com".to_string());
config.allowed_domains.insert("official-blog.com".to_string());

3. Adjust Limits Based on Use Case

// For previewing only lightweight pages
let mut limits = ContentLimits::default();
limits.max_content_size = 1 * 1024 * 1024; // 1MB
limits.max_download_time = 10; // 10 seconds

// For specific content types only
limits.allowed_content_types.clear();
limits.allowed_content_types.insert("text/html".to_string());

4. Monitor Security Events

use url_preview::setup_logging;

// Enable logging to monitor security events
setup_logging(None);

// Security events will be logged at appropriate levels

5. Regular Security Updates

Keep the library updated to get latest security patches
Review and update domain lists periodically
Monitor for new attack vectors and adjust configuration

Security Considerations

What This Library Protects Against

SSRF (Server-Side Request Forgery)
- Private IP filtering
- Localhost blocking
- Scheme validation
Resource Exhaustion
- Content size limits
- Download time limits
- Connection pooling
- Concurrent request limiting
Malicious Redirects
- Redirect count limiting
- Each redirect is validated
Unexpected Content
- Content type filtering
- UTF-8 validation

What This Library Does NOT Protect Against

JavaScript Execution
- The library only parses HTML, it doesn't execute JavaScript
- Dynamic content loaded by JavaScript won't be captured
Advanced SSRF Techniques
- DNS rebinding attacks
- Time-of-check to time-of-use (TOCTOU) attacks
- IPv6 address confusion
Content Validation
- The library doesn't sanitize extracted content
- XSS prevention is the responsibility of the consuming application
Rate Limiting Per Domain
- The library limits total concurrent requests but not per-domain
- Implement additional rate limiting if needed

Conclusion

The url-preview library provides comprehensive security features suitable for most URL preview generation use cases. By following the configuration guidelines and best practices outlined in this document, you can ensure safe and secure preview generation in your applications.

For additional security requirements or custom implementations, consider extending the validation logic or implementing additional layers of security in your application.