> **Building with AI coding agents?** If you're using an AI coding agent, install the official Scalekit plugin. It gives your agent full awareness of the Scalekit API — reducing hallucinations and enabling faster, more accurate code generation.
>
> - **Claude Code**: `/plugin marketplace add scalekit-inc/claude-code-authstack` then `/plugin install <auth-type>@scalekit-auth-stack`
> - **GitHub Copilot CLI**: `copilot plugin marketplace add scalekit-inc/github-copilot-authstack` then `copilot plugin install <auth-type>@scalekit-auth-stack`
> - **Codex**: run the bash installer, restart, then open Plugin Directory and enable `<auth-type>`
> - **Skills CLI** (Windsurf, Cline, 40+ agents): `npx skills add scalekit-inc/skills --list` then `--skill <skill-name>`
>
> `<auth-type>` / `<skill-name>`: `agentkit`, `full-stack-auth`, `mcp-auth`, `modular-sso`, `modular-scim` — [Full setup guide](https://docs.scalekit.com/dev-kit/build-with-ai/)

---

# Exa

**Authentication:** API Key
**Categories:** Data, Analytics, Ai, Automation
## What you can do

Connect this agent connector to let your agent:

- **Similar find** — Find web pages similar to a given URL using Exa's neural similarity search
- **Search search** — Search the web using Exa's AI-powered semantic or keyword search engine
- **Research research** — Run in-depth research on a topic using Exa's neural search
- **Crawl crawl** — Crawl one or more web pages by URL and extract their content including full text, highlights, and AI-generated summaries
- **List list** — List all Exa Websets in your account with optional pagination
- **Websets websets** — Execute a complex web query designed to discover and return large sets of URLs (up to thousands) matching specific criteria

## Authentication

This connector uses **API Key** authentication. Your users provide their Exa API key once, and Scalekit stores and manages it securely. Your agent code never handles keys directly — you only pass a `connectionName` and a user `identifier`.

Before calling this connector from your code, create the Exa connection in **AgentKit** > **Connections** and copy the exact **Connection name** from that connection into your code. The value in code must match the dashboard exactly.

## Set up the connector

Register your Exa API key with Scalekit so it can authenticate and proxy requests on behalf of your users. Unlike OAuth connectors, Exa uses API key authentication — there is no redirect URI or OAuth flow.

1. ### Generate an Exa API key

   - Sign in to [dashboard.exa.ai/api-keys](https://dashboard.exa.ai/api-keys). Under **Management**, click **API Keys**.

   - Click **+ Create Key**, enter a name (e.g., `Agent Auth`), and confirm.

   - In the **Secret Key** column, click the eye icon to reveal the key and copy it. Store it somewhere safe — you will not be able to view it again.

   > Image: Exa dashboard API Keys page showing existing keys and the + Create Key button

2. ### Create a connection in Scalekit

   - In [Scalekit dashboard](https://app.scalekit.com), go to **AgentKit** > **Connections** > **Create Connection**. Find **Exa** and click **Create**.

   - Note the **Connection name** — you will use this as `connection_name` in your code (e.g., `exa`).

   > Image: Scalekit connection configuration for Exa showing the connection name and API Key authentication type

3. ### Add a connected account

   Connected accounts link a specific user identifier in your system to an Exa API key. Add accounts via the dashboard for testing, or via the Scalekit API in production.

   **Via dashboard (for testing)**

   - Open the connection you created and click the **Connected Accounts** tab → **Add account**.

   - Fill in:
     - **Your User's ID** — a unique identifier for this user in your system (e.g., `user_123`)
     - **API Key** — the Exa API key you copied in step 1

   - Click **Save**.

   > Image: Add connected account form for Exa in Scalekit dashboard

   **Via API (for production)**

   
     ### Node.js

```typescript
await scalekit.actions.upsertConnectedAccount({
  connectionName: 'exa',
  identifier: 'user_123',
  credentials: { api_key: 'your-exa-api-key' },
});
```

     ### Python

```python
scalekit_client.actions.upsert_connected_account(
    connection_name="exa",
    identifier="user_123",
    credentials={"api_key": "your-exa-api-key"}
)
```

   
   > tip: Production usage tip
>
> In production, call `upsertConnectedAccount` when a user connects their Exa account — for example, after they paste their API key into a settings page in your app.

> note: Credits and rate limits
>
> Each Exa API key has a default limit of 10 QPS. Search, find-similar, and get-contents cost 1 credit per request, plus additional credits per content item (text, highlights, or summary) returned. `exa_research` and `exa_websets` run multiple sub-queries internally and consume significantly more credits. Monitor usage at [dashboard.exa.ai](https://dashboard.exa.ai) → **Usage**.

## Code examples

Once a connected account is set up, make API calls through the Scalekit proxy. Scalekit injects the Exa API key automatically — you never handle credentials in your application code.

## Proxy API Calls

  ### Node.js

```typescript

const connectionName = 'exa';     // connection name from your Scalekit dashboard
const identifier = 'user_123';    // your user's unique identifier

// Get your credentials from app.scalekit.com → Developers → Settings → API Credentials
const scalekit = new ScalekitClient(
  process.env.SCALEKIT_ENV_URL,
  process.env.SCALEKIT_CLIENT_ID,
  process.env.SCALEKIT_CLIENT_SECRET
);
const actions = scalekit.actions;

// Make a request via Scalekit proxy — no API key needed here
const result = await actions.request({
  connectionName,
  identifier,
  path: '/search',
  method: 'POST',
  body: { query: 'LLM observability tools 2025', num_results: 5 },
});
console.log(result.data);
```

  ### Python

```python

from dotenv import load_dotenv
load_dotenv()

connection_name = "exa"    # connection name from your Scalekit dashboard
identifier = "user_123"    # your user's unique identifier

# Get your credentials from app.scalekit.com → Developers → Settings → API Credentials
scalekit_client = scalekit.client.ScalekitClient(
    client_id=os.getenv("SCALEKIT_CLIENT_ID"),
    client_secret=os.getenv("SCALEKIT_CLIENT_SECRET"),
    env_url=os.getenv("SCALEKIT_ENV_URL"),
)
actions = scalekit_client.actions

# Semantic search via Scalekit proxy — no API key needed here
result = actions.request(
    connection_name=connection_name,
    identifier=identifier,
    path="/search",
    method="POST",
    json={"query": "LLM observability tools 2025", "num_results": 5}
)
print(result)
```

> tip: No OAuth flow needed
>
> Exa uses API key auth — unlike OAuth connectors, there is no authorization link or redirect flow. Once you call `upsertConnectedAccount` (or add an account via the dashboard), your users can make requests immediately.

## Scalekit tools

Use `execute_tool` to call Exa tools directly from your code. Scalekit resolves the connected account, injects the API key, and returns a structured response — no raw HTTP needed.

### Semantic search

Search the web by meaning, not just keywords. This example searches for companies in the AI infrastructure space and returns AI-generated summaries for each result.

```python title="examples/exa_semantic_search.py"

from dotenv import load_dotenv
load_dotenv()

scalekit_client = scalekit.client.ScalekitClient(
    client_id=os.getenv("SCALEKIT_CLIENT_ID"),
    client_secret=os.getenv("SCALEKIT_CLIENT_SECRET"),
    env_url=os.getenv("SCALEKIT_ENV_URL"),
)
actions = scalekit_client.actions

# Resolve connected account
response = actions.get_or_create_connected_account(
    connection_name="exa",
    identifier="user_123"
)
connected_account = response.connected_account

# Search for AI infrastructure companies with summaries
result = actions.execute_tool(
    tool_name="exa_search",
    connected_account_id=connected_account.id,
    tool_input={
        "query": "AI infrastructure companies building GPU cloud platforms",
        "num_results": 10,
        "type": "neural",
        "category": "company",
        "contents": {
            "summary": {"query": "What does this company do and who are their customers?"}
        }
    }
)

for item in result.result.get("results", []):
    print(f"{item['title']}: {item['url']}")
    print(f"  → {item.get('summary', 'No summary')}\n")
```

### Search with full content enrichment

Retrieve the full page text and highlighted snippets alongside search results — useful when you want to pass source material directly into an LLM context window.

> note: Credit cost
>
> Requesting `text` or `highlights` costs 1 extra credit per result. With 10 results, this doubles your per-request cost. Set `num_results` conservatively when enriching content.

```python title="examples/exa_search_with_content.py"
result = actions.execute_tool(
    tool_name="exa_search",
    connected_account_id=connected_account.id,
    tool_input={
        "query": "OpenAI API rate limits and pricing 2025",
        "num_results": 5,
        "type": "keyword",                     # keyword mode for precise terms
        "include_domains": ["openai.com", "platform.openai.com"],
        "contents": {
            "text": {"max_characters": 2000},  # cap text to save tokens
            "highlights": {
                "num_sentences": 3,
                "highlights_per_url": 2
            }
        }
    }
)

for item in result.result.get("results", []):
    print(f"## {item['title']}")
    print(f"URL: {item['url']}")
    if item.get("highlights"):
        print("Highlights:")
        for h in item["highlights"]:
            print(f"  - {h}")
    print()
```

### Find similar pages

Discover pages that are semantically similar to a known URL — useful for competitive research, finding alternative data sources, or discovering similar products.

```python title="examples/exa_find_similar.py"
# Find companies similar to a known competitor
result = actions.execute_tool(
    tool_name="exa_find_similar",
    connected_account_id=connected_account.id,
    tool_input={
        "url": "https://www.linear.app",
        "num_results": 10,
        "exclude_domains": ["linear.app"],     # exclude the source URL itself
        "start_published_date": "2024-01-01",  # only recently indexed pages
        "contents": {
            "summary": {"query": "What product does this company build?"}
        }
    }
)

print("Similar companies to Linear:")
for item in result.result.get("results", []):
    print(f"  {item['title']} — {item['url']}")
    if item.get("summary"):
        print(f"    {item['summary']}")
```

### Get content for known URLs

Extract structured content from a list of URLs you already have — from a CRM export, a prior search, or a manually curated list. No search query required.

```python title="examples/exa_get_contents.py"
# Enrich a list of company URLs from your CRM
company_urls = [
    "https://www.anthropic.com",
    "https://mistral.ai",
    "https://cohere.com",
]

result = actions.execute_tool(
    tool_name="exa_get_contents",
    connected_account_id=connected_account.id,
    tool_input={
        "urls": company_urls,
        "summary": {
            "query": "What AI models or products does this company offer, and who are their target customers?"
        },
        "subpages": 1,                # also fetch one subpage per URL (e.g. /about or /pricing)
        "subpage_target": "pricing",  # target the pricing subpage specifically
        "max_age_hours": 48           # use content no older than 48 hours
    }
)

for item in result.result.get("results", []):
    print(f"{item['url']}: {item.get('summary', 'No summary')}")
```

### Get a direct answer

Ask a question and get a synthesized natural language answer grounded in live web sources. Returns the answer and the source URLs used — ready to display or inject into a citation-aware LLM prompt.

```python title="examples/exa_answer.py"
result = actions.execute_tool(
    tool_name="exa_answer",
    connected_account_id=connected_account.id,
    tool_input={
        "query": "What are the context window sizes and pricing for Claude Sonnet and GPT-4o as of 2025?",
        "num_results": 8,
        "text": True,                                  # include source snippets
        "include_domains": ["anthropic.com", "openai.com", "platform.openai.com"]
    }
)

print("Answer:", result.result.get("answer"))
print("\nSources:")
for source in result.result.get("sources", []):
    print(f"  - {source['title']}: {source['url']}")
```

### Deep research on a topic

Run multi-angle research that decomposes your topic into parallel sub-queries and synthesizes the results. Use `output_schema` to get structured JSON instead of free-form text — useful for generating reports your code can consume directly.

> caution: Higher credit cost
>
> `exa_research` runs multiple sub-queries in parallel. With the default `num_subqueries: 5`, expect roughly 5–10× the credit cost of a single `exa_search` call. Start with a low `num_subqueries` value while testing.

```python title="examples/exa_research.py"
result = actions.execute_tool(
    tool_name="exa_research",
    connected_account_id=connected_account.id,
    tool_input={
        "topic": "Competitive landscape of AI coding assistants in 2025 — key players, pricing, and differentiators",
        "num_subqueries": 5,
        "output_schema": {
            "type": "object",
            "properties": {
                "summary": {"type": "string"},
                "competitors": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "name": {"type": "string"},
                            "pricing": {"type": "string"},
                            "key_differentiator": {"type": "string"},
                            "target_customer": {"type": "string"}
                        }
                    }
                },
                "market_trends": {
                    "type": "array",
                    "items": {"type": "string"}
                }
            },
            "required": ["summary", "competitors", "market_trends"]
        }
    }
)

report = result.result
print("Summary:", report.get("summary"))
print("\nCompetitors:")
for c in report.get("competitors", []):
    print(f"  {c['name']}: {c.get('key_differentiator')}")
print("\nTrends:")
for t in report.get("market_trends", []):
    print(f"  - {t}")
```

### LangChain integration

Let an LLM decide which Exa tool to call based on natural language. This example builds an agent that can search, retrieve content, and answer research questions on demand.

```python title="examples/exa_langchain.py"

from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_core.prompts import (
    ChatPromptTemplate, SystemMessagePromptTemplate,
    HumanMessagePromptTemplate, MessagesPlaceholder, PromptTemplate
)
load_dotenv()

scalekit_client = scalekit.client.ScalekitClient(
    client_id=os.getenv("SCALEKIT_CLIENT_ID"),
    client_secret=os.getenv("SCALEKIT_CLIENT_SECRET"),
    env_url=os.getenv("SCALEKIT_ENV_URL"),
)
actions = scalekit_client.actions

identifier = "user_123"

# Resolve connected account (API key auth — no OAuth redirect needed)
actions.get_or_create_connected_account(
    connection_name="exa",
    identifier=identifier
)

# Load all Exa tools in LangChain format. Use page_size=100 so connector tool lists are not truncated.
tools = actions.langchain.get_tools(
    identifier=identifier,
    providers=["EXA"],
    page_size=100
)

prompt = ChatPromptTemplate.from_messages([
    SystemMessagePromptTemplate(prompt=PromptTemplate(
        input_variables=[],
        template=(
            "You are a research assistant with access to Exa web search tools. "
            "Use exa_search for general queries, exa_answer for direct questions, "
            "exa_find_similar for competitive analysis, and exa_research for deep multi-source topics. "
            "Always cite your sources."
        )
    )),
    MessagesPlaceholder(variable_name="chat_history", optional=True),
    HumanMessagePromptTemplate(prompt=PromptTemplate(
        input_variables=["input"], template="{input}"
    )),
    MessagesPlaceholder(variable_name="agent_scratchpad")
])

llm = ChatOpenAI(model="gpt-4o")
agent = create_openai_tools_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

result = agent_executor.invoke({
    "input": "Who are the top 5 competitors to Notion for team knowledge management? Summarize each and compare their pricing."
})
print(result["output"])
```

## Tool list

Use the exact tool names from the **Tool list** below when you call `execute_tool`. If you're not sure which name to use, list the tools available for the current user first.

## Tool list

### `exa_answer`

Get a natural language answer to a question by searching the web with Exa and synthesizing results. Returns a direct answer with citations to the source pages. Ideal for factual questions, current events, and research queries. Rate limit: 60 requests/minute.

Parameters:

- `query` (`string`, required): The question or query to answer from web sources.
- `exclude_domains` (`array`, optional): JSON array of domains to exclude from answer sources.
- `include_domains` (`array`, optional): JSON array of domains to restrict source search to. Example: ["reuters.com","bbc.com"]
- `include_text` (`boolean`, optional): When true, also returns the source page text alongside the synthesized answer.
- `num_results` (`integer`, optional): Number of web sources to use when generating the answer (1–20). More sources improves accuracy but costs more credits.

### `exa_crawl`

Crawl one or more web pages by URL and extract their content including full text, highlights, and AI-generated summaries. Useful for reading specific pages discovered via search. Rate limit: 60 requests/minute. Credit consumption depends on number of URLs.

Parameters:

- `urls` (`array`, required): JSON array of URLs to crawl and extract content from.
- `highlights_per_url` (`integer`, optional): Number of highlight sentences to return per URL when include_highlights is true. Defaults to 3.
- `include_highlights` (`boolean`, optional): When true, returns the most relevant sentence-level highlights from each page.
- `include_html_tags` (`boolean`, optional): When true, retains HTML tags in the extracted text. Defaults to false (plain text only).
- `include_summary` (`boolean`, optional): When true, returns an AI-generated summary for each crawled page.
- `max_characters` (`integer`, optional): Maximum characters of text to extract per page. Defaults to 5000.
- `summary_query` (`string`, optional): Optional query to focus the AI summary on a specific aspect of the page.

### `exa_delete_webset`

Delete an Exa Webset by its ID. This permanently removes the webset and all its collected items. This action cannot be undone.

Parameters:

- `webset_id` (`string`, required): The ID of the webset to delete.

### `exa_find_similar`

Find web pages similar to a given URL using Exa's neural similarity search. Useful for competitor research, finding related articles, or discovering similar companies. Optionally returns page text, highlights, or summaries. Rate limit: 60 requests/minute.

Parameters:

- `url` (`string`, required): The URL to find similar pages for.
- `end_published_date` (`string`, optional): Only return pages published before this date. ISO 8601 format: YYYY-MM-DDTHH:MM:SS.000Z
- `exclude_domains` (`array`, optional): Array of domains to exclude from results.
- `include_domains` (`array`, optional): Array of domains to restrict results to.
- `include_text` (`boolean`, optional): When true, returns the full text content of each result page.
- `max_characters` (`integer`, optional): Maximum characters of page text to return per result when include_text is true. Defaults to 3000.
- `num_results` (`integer`, optional): Number of similar results to return (1–100). Defaults to 10.
- `start_published_date` (`string`, optional): Only return pages published after this date. ISO 8601 format: YYYY-MM-DDTHH:MM:SS.000Z

### `exa_get_webset`

Get the status and details of an existing Exa Webset by its ID. Use this to poll the status of an async webset created with Create Webset. Returns metadata including status (created, running, completed, cancelled), progress, and configuration.

Parameters:

- `webset_id` (`string`, required): The ID of the webset to retrieve.

### `exa_list_webset_items`

List the collected URLs and items from a completed Exa Webset. Use this after polling Get Webset until its status is 'completed' to retrieve the discovered results.

Parameters:

- `webset_id` (`string`, required): The ID of the webset to retrieve items from.
- `count` (`integer`, optional): Number of items to return per page. Defaults to 10.
- `cursor` (`string`, optional): Pagination cursor from a previous response to fetch the next page of items.

### `exa_list_websets`

List all Exa Websets in your account with optional pagination. Returns a list of websets with their IDs, statuses, and configurations.

Parameters:

- `count` (`integer`, optional): Number of websets to return per page. Defaults to 10.
- `cursor` (`string`, optional): Pagination cursor from a previous response to fetch the next page.

### `exa_research`

Run in-depth research on a topic using Exa's neural search. Performs a semantic search and returns results with full page text and AI-generated summaries, providing structured multi-source research output. Best for comprehensive topic analysis. Rate limit: 60 requests/minute.

Parameters:

- `query` (`string`, required): The research topic or question to investigate across the web.
- `category` (`string`, optional): Restrict research to a specific content category for more targeted results.
- `exclude_domains` (`array`, optional): JSON array of domains to exclude from research results.
- `include_domains` (`array`, optional): JSON array of domains to restrict research sources to. Useful to focus on authoritative sources.
- `max_characters` (`integer`, optional): Maximum characters of text to extract per source page. Defaults to 5000.
- `num_results` (`integer`, optional): Number of sources to gather for the research (1–20). More sources provide broader coverage.
- `start_published_date` (`string`, optional): Only include sources published after this date. ISO 8601 format.
- `summary_query` (`string`, optional): Optional focused question to guide the AI page summaries. Defaults to the main research query.

### `exa_search`

Search the web using Exa's AI-powered semantic or keyword search engine. Supports filtering by domain, date range, content category, and result type. Optionally returns page text, highlights, or summaries alongside search results. Rate limit: 60 requests/minute.

Parameters:

- `query` (`string`, required): The search query. For neural/auto type, natural language works best. For keyword type, use specific terms.
- `category` (`string`, optional): Restrict results to a specific content category.
- `end_published_date` (`string`, optional): Only return pages published before this date. ISO 8601 format: YYYY-MM-DDTHH:MM:SS.000Z
- `exclude_domains` (`array`, optional): JSON array of domains to exclude from results. Example: ["reddit.com","quora.com"]
- `include_domains` (`array`, optional): JSON array of domains to restrict results to. Example: ["techcrunch.com","wired.com"]
- `include_text` (`boolean`, optional): When true, returns the full text content of each result page (up to max_characters).
- `max_characters` (`integer`, optional): Maximum characters of page text to return per result when include_text is true. Defaults to 3000.
- `num_results` (`integer`, optional): Number of results to return (1–100). Defaults to 10.
- `start_published_date` (`string`, optional): Only return pages published after this date. ISO 8601 format: YYYY-MM-DDTHH:MM:SS.000Z
- `type` (`string`, optional): Search type: 'neural' for semantic AI search (best for natural language), 'keyword' for exact-match keyword search, 'auto' to let Exa decide.
- `use_autoprompt` (`boolean`, optional): When true, Exa automatically rewrites the query to be more semantically effective.

### `exa_websets`

Execute a complex web query designed to discover and return large sets of URLs (up to thousands) matching specific criteria. Websets are ideal for lead generation, market research, competitor analysis, and large-scale data collection. Returns a webset ID — poll status with GET /websets/v0/websets/{id}. High credit consumption.

Parameters:

- `query` (`string`, required): The search query describing what kinds of pages or entities to find. Be specific and descriptive for best results.
- `count` (`integer`, optional): Target number of URLs to collect. Can range from hundreds to thousands. Higher counts take longer and consume more credits.
- `entity_type` (`string`, optional): The type of entity to search for. Helps Exa understand what constitutes a valid result match.
- `exclude_domains` (`array`, optional): JSON array of domains to exclude from webset results.
- `external_id` (`string`, optional): Optional external identifier to tag this webset for reference in your system.
- `include_domains` (`array`, optional): JSON array of domains to restrict webset sources to.


---

## More Scalekit documentation

| Resource | What it contains | When to use it |
|----------|-----------------|----------------|
| [/llms.txt](/llms.txt) | Structured index with routing hints per product area | Start here — find which documentation set covers your topic before loading full content |
| [/llms-full.txt](/llms-full.txt) | Complete documentation for all Scalekit products in one file | Use when you need exhaustive context across multiple products or when the topic spans several areas |
| [sitemap-0.xml](https://docs.scalekit.com/sitemap-0.xml) | Full URL list of every documentation page | Use to discover specific page URLs you can fetch for targeted, page-level answers |