Data Integration

You can integrate your spiders with Crawlab SDK. This allows you to view scraped results visually on Crawlab.

Crawlab SDK supports integration with various web crawler frameworks including Scrapy and programming languages including Python, Node.js, Go and Java.

note

By default, Crawlab Python SDK (crawlab-sdk) is pre-installed in the base image of Crawlab. You can directly use it in the docker image of Crawlab.

Basic Usage

Below code snippets show how to save a basic item with different programming languages. The item is a dictionary with key hello and value crawlab. This will be saved to the database and displayed on the Crawlab web interface once the code is executed.

Python
Node.js
Go
Java

from crawlab import save_item

# Save dictionary as item
save_item({'hello': 'crawlab'})

const { saveItem } = require('@crawlab/sdk');

// Save object as item
saveItem({'hello': 'crawlab'})

info

Node.js is only supported in Crawlab Pro.

package main

import "github.com/crawlab-team/crawlab-sdk-go"

func main() {
  // Save map as item
  crawlab.SaveItem(map[string]interface{}{
    "hello": "crawlab",
  })
}

info

Go is only supported in Crawlab Pro.

import io.crawlab.sdk.CrawlabSdk;
import java.util.HashMap;

public class Main {
  public static void main(String[] args) {
    // Save HashMap as item
    CrawlabSdk.saveItem(new HashMap<String, Object>() {{
      put("hello", "crawlab");
    }});
  }
}

info

Java is only supported in Crawlab Pro.

Scrapy

Scrapy is a very popular web crawler framework for efficient and scalable web crawling tasks in Python.

Integrate Scrapy to Crawlab is very easy. You only need to add crawlab.CrawlabPipeline to settings.py.

ITEM_PIPELINES = {
  'crawlab.CrawlabPipeline': 888,
}

More Examples

Crawlab allows users to integrate with other web crawling frameworks quite easily.

You can refer to Examples for more detailed data integration examples.

Data Preview

Crawlab provides a data preview feature that allows users to inspect crawled data increments directly in the UI.

View Task Data

You can view task data following the steps below:

Navigate to the Tasks detail page
Click on the Data tab to view the task data

View Spider Data

You can view spider data following the steps below:

Navigate to the Spiders detail page
Click on the Data tab to view the spider data

Basic Usage​

Scrapy​

More Examples​

Data Preview​

View Task Data​

View Spider Data​

Basic Usage

Scrapy

More Examples

Data Preview

View Task Data

View Spider Data