Environment Variables
info
This feature is only available in Crawlab Pro.
Crawlab allows users to set environment variables during spider runtime.
Why Use Environment Variables?
Using environment variables in Crawlab offers several important advantages:
- Configuration Separation: Keep configuration separate from code, following best practices
- Sensitive Data Management: Securely store API keys, credentials, and tokens outside your code
- Environment-Specific Settings: Easily configure different values for development, testing, and production
- Centralized Management: Update configurations for all spiders from a single location
- Runtime Flexibility: Change behavior of spiders without modifying code
- Enhanced Security: Avoid hardcoding sensitive information in spider scripts
Create Environment Variables
- Navigate to the
System Settingspage. - Click the
Environmenttab. - Click the
New Environment Variablebutton. - Fill in the configuration form.
- Click the
Confirmbutton.
Use Environment Variables
Assuming we have set an environment variable with the key FOO and the value BAR, we can access it in a spider script using the following sample code.
import os
foo = os.environ.get('FOO')
print(foo) # BAR
Example Use Cases
Example 1: API Key Management
Scenario: Secure storage of a scraping API key
import os
import requests
api_key = os.environ.get('SCRAPINGHUB_API_KEY')
response = requests.get(
'https://api.scrapinghub.com/v1/projects',
headers={'Authorization': f'Bearer {api_key}'}
)
Example 2: Database Configuration
Scenario: Database connection settings for different environments
import os
import psycopg2
db_config = {
'host': os.environ.get('DB_HOST'),
'port': os.environ.get('DB_PORT', '5432'), # Default port fallback
'user': os.environ.get('DB_USER'),
'password': os.environ.get('DB_PASSWORD'),
'database': os.environ.get('DB_NAME')
}
conn = psycopg2.connect(**db_config)
Example 3: Environment-Specific Settings
Scenario: Different configurations for development/production
import os
ENVIRONMENT = os.environ.get('APP_ENV', 'development')
BASE_URL = os.environ.get(
'API_BASE_URL',
'https://dev.api.example.com' if ENVIRONMENT == 'development'
else 'https://api.example.com'
)
Example 4: Proxy Rotation for Web Crawlers
Scenario: Managing multiple proxies with authentication for large-scale scraping
import os
import random
import requests
from urllib.parse import urlparse
# Get proxy list from environment variable (comma-separated)
PROXIES = os.environ.get('CRAWLER_PROXIES', '').split(',')
PROXY_USER = os.environ.get('PROXY_USER', '')
PROXY_PASS = os.environ.get('PROXY_PASS', '')
def get_random_proxy():
if not PROXIES:
return None # Fallback to no proxy
proxy = random.choice(PROXIES).strip()
# Add authentication if credentials exist
if PROXY_USER and PROXY_PASS:
parsed = urlparse(proxy)
proxy = f"{parsed.scheme}://{PROXY_USER}:{PROXY_PASS}@{parsed.hostname}:{parsed.port}"
return {
'http': proxy,
'https': proxy
}
# Usage example
response = requests.get(
'https://example.com',
proxies=get_random_proxy(),
timeout=30
)