CorpusIQ

CorpusIQ User Guide

Welcome to the CorpusIQ Apps SDK User Guide! This comprehensive guide will help you understand, configure, and use CorpusIQ to unlock the power of searching across all your connected data sources directly from ChatGPT.

Table of Contents

  1. Introduction
  2. Getting Started
  3. Core Concepts
  4. Using CorpusIQ
  5. Managing Connectors
  6. Advanced Features
  7. Configuration
  8. Security

Introduction

What is CorpusIQ?

CorpusIQ is a production-ready MCP (Model Context Protocol) server that seamlessly integrates with ChatGPT, enabling you to search across multiple connected data sources in one unified interface. Think of it as your intelligent search assistant that knows where to find information across your entire digital workspace.

What Can You Do With CorpusIQ?

  • Unified Search: Search across Gmail, OneDrive, QuickBooks, and more from a single interface
  • Conversational Queries: Use natural language to find what you need
  • Beautiful Interface: View results in an intuitive, interactive widget
  • Easy Management: Connect and disconnect data sources with simple clicks
  • Privacy-First: Your data stays secure with enterprise-grade security

Who Should Use CorpusIQ?

  • Business Professionals: Find emails, documents, and financial data instantly
  • Developers: Access technical documentation, code repositories, and issue trackers
  • Researchers: Search across papers, notes, and reference materials
  • Teams: Collaborate with shared access to organizational knowledge
  • Anyone: Who needs to search across multiple data sources efficiently

Getting Started

Prerequisites

Before you begin, ensure you have:

  • Python 3.10 or higher installed on your system
  • ChatGPT account with access to Developer Mode (check with your admin)
  • A few minutes to set up and configure

Quick Installation

Follow these steps to get CorpusIQ running:

Step 1: Clone and Install

# Clone the repository
git clone https://github.com/CorpusIQ/corpusiq-openai-sdk.git
cd corpusiq-openai-sdk

# Create a virtual environment
python3 -m venv .venv

# Activate the virtual environment
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -e .

Step 2: Configure

# Copy the example configuration
cp .env.example .env

# Edit the configuration file
# You can use any text editor like nano, vim, or VS Code
nano .env

Recommended settings for first-time users:

# Allow ChatGPT to connect
CORPUSIQ_CORS_ALLOW_ORIGINS_CSV=https://chat.openai.com

# Enable debug mode for testing (disable in production)
CORPUSIQ_DEBUG_MODE=false

# Set logging level
CORPUSIQ_LOG_LEVEL=INFO

# Configure rate limiting
CORPUSIQ_RATE_LIMIT_REQUESTS_PER_MINUTE=60

Step 3: Start the Server

python -m corpusiq

You should see:

INFO: Starting CorpusIQ Apps SDK server...
INFO: Uvicorn running on http://0.0.0.0:8000

Keep this terminal window open while using CorpusIQ.

Step 4: Create HTTPS Tunnel

ChatGPT requires an HTTPS connection. Use one of these free options:

Option A: Cloudflare Tunnel (Recommended)

# Install cloudflared
# macOS: brew install cloudflare/cloudflare/cloudflared
# Windows/Linux: See https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/downloads/

# Start the tunnel
cloudflared tunnel --url http://localhost:8000

Option B: ngrok

# Install ngrok from https://ngrok.com/download

# Start the tunnel
ngrok http 8000

Copy the HTTPS URL provided (e.g., https://abc123.trycloudflare.com).

Step 5: Connect to ChatGPT

  1. Open ChatGPT at https://chat.openai.com
  2. Go to SettingsApps & ConnectorsAdvanced Settings
  3. Enable Developer Mode
  4. Click Create Connector or Add MCP Server
  5. Paste your HTTPS tunnel URL
  6. Click Connect

ChatGPT will automatically discover your CorpusIQ server!

Core Concepts

Model Context Protocol (MCP)

MCP is a standardized protocol that allows AI applications like ChatGPT to communicate with external services. Think of it as a universal translator between ChatGPT and your data sources.

Key Benefits:

  • Standardized: Works across different AI platforms
  • Secure: Built-in authentication and authorization
  • Extensible: Easy to add new capabilities
  • Efficient: Optimized for real-time conversations

Tools

CorpusIQ provides two main tools that ChatGPT can use:

Search across all your connected data sources with natural language queries.

Parameters:

  • query (required): Your search question in plain English
  • maxResults (optional): How many results to return (1-20, default: 5)

Examples:

"Find emails about the Q4 budget review"
"Show me documents related to the Smith project"
"What did Sarah say about the deadline?"

2. open_connectors

Open the interactive connector management interface to add, remove, or configure data sources.

No parameters required - just opens the interface!

The Widget Interface

CorpusIQ includes a beautiful, interactive HTML widget that displays:

  • Search Results: Formatted and easy to read
  • Connector Status: See which data sources are connected
  • Quick Actions: Connect or disconnect sources with one click
  • Real-time Updates: Changes reflect immediately

The widget appears automatically when you use CorpusIQ tools in ChatGPT.

Using CorpusIQ

To search across your connected data sources, simply ask ChatGPT:

Search my corpus for "quarterly financial reports"

Or:

Find information about project Phoenix in my connected sources

ChatGPT will understand these natural language requests and use the corpus_search tool automatically.

Advanced Search Queries

You can make your searches more specific:

Time-based queries:

"Find emails from last week about the merger"
"Show me documents created in January"

Source-specific queries:

"Search my Gmail for messages from john@example.com"
"Find QuickBooks invoices over $10,000"

Content-type queries:

"Find all PDF documents about marketing strategy"
"Show me spreadsheets with budget data"

Combining criteria:

"Find emails from Sarah last month about the website redesign"

Adjusting Result Count

Control how many results you receive:

"Search my corpus for 'project updates' and show me 10 results"

Or be explicit:

"Find documents about annual planning, maximum 3 results"

Understanding Search Results

Results typically include:

  • Title: Document or email subject
  • Source: Which connector found this (Gmail, OneDrive, etc.)
  • Snippet: Relevant excerpt showing why it matched
  • Date: When it was created or modified
  • Link: Direct link to open the original (if available)
  • Relevance Score: How well it matches your query

Managing Connectors

Viewing Connected Sources

To see what data sources are currently connected:

"Show me my connected data sources"

Or:

"Open the connectors interface"

Adding a New Connector

  1. Ask ChatGPT: "Open connectors"
  2. In the widget, click Add Connector
  3. Choose from available sources:
    • Gmail
    • Google Drive
    • OneDrive
    • Dropbox
    • QuickBooks
    • Salesforce
    • Slack
    • And more…
  4. Follow the OAuth flow to authorize access
  5. Wait for “Connected” status

Removing a Connector

  1. Open the connectors interface
  2. Find the connector you want to remove
  3. Click Disconnect or Remove
  4. Confirm your choice

Note: Removing a connector doesn’t delete your data; it just stops CorpusIQ from searching it.

Connector Status

Each connector shows its status:

  • 🟢 Connected: Active and ready to search
  • 🟡 Connecting: Authorization in progress
  • 🔴 Disconnected: Not active
  • ⚠️ Error: Connection issue (check logs)

Troubleshooting Connectors

Connector won’t connect:

  • Check your internet connection
  • Ensure you have access to the data source
  • Try disconnecting and reconnecting
  • Check the OAuth credentials are valid

Connector shows error:

  • Look at server logs for details
  • Verify API quotas haven’t been exceeded
  • Check if the service is experiencing downtime
  • Ensure tokens haven’t expired

Advanced Features

Custom Search Operators

Use these operators for precise searches:

  • Exact phrase: Use quotes "exact phrase here"
  • Exclude terms: Use minus -term
  • Either/or: Use OR between terms
  • Required term: Use +term

Examples:

"annual report" -draft
budget OR forecast +2024
"project phoenix" -cancelled

Filtering by Connector

Search specific sources only:

"Search only my Gmail for messages about the conference"
"Find documents in my OneDrive folder about the merger"

Date Range Searches

Specify time periods:

"Find emails from the last 7 days"
"Show me documents modified this month"
"Search for invoices from Q4 2023"

Saved Searches (Coming Soon)

Save frequently used searches for quick access:

"Save this search as 'Weekly Reports'"
"Run my saved search 'Client Communications'"

Batch Operations (Coming Soon)

Perform actions on multiple results:

"Export all search results to CSV"
"Download the top 5 matching documents"

Configuration

Environment Variables

Configure CorpusIQ by editing the .env file:

Core Settings

# CORS Configuration
CORPUSIQ_CORS_ALLOW_ORIGINS_CSV=https://chat.openai.com

# Set to true only during development
CORPUSIQ_DEBUG_MODE=false

# Logging level: DEBUG, INFO, WARNING, ERROR, CRITICAL
CORPUSIQ_LOG_LEVEL=INFO

Rate Limiting

# Maximum requests per minute per IP
CORPUSIQ_RATE_LIMIT_REQUESTS_PER_MINUTE=60

# Increase for heavy usage:
# CORPUSIQ_RATE_LIMIT_REQUESTS_PER_MINUTE=120

Performance Tuning

# Maximum request size (in bytes)
CORPUSIQ_MAX_REQUEST_SIZE=2097152  # 2MB

# Request timeout (seconds)
CORPUSIQ_REQUEST_TIMEOUT=30

OAuth Settings (Production Only)

CORPUSIQ_OAUTH_RESOURCE_URL=https://your-domain.com
CORPUSIQ_OAUTH_ISSUER=https://your-auth-provider.com
CORPUSIQ_OAUTH_AUTHORIZATION_ENDPOINT=https://your-auth-provider.com/authorize
CORPUSIQ_OAUTH_TOKEN_ENDPOINT=https://your-auth-provider.com/token
CORPUSIQ_OAUTH_JWKS_URI=https://your-auth-provider.com/.well-known/jwks.json

Applying Configuration Changes

After modifying .env:

  1. Stop the server (Ctrl+C in the terminal)
  2. Restart: python -m corpusiq
  3. Refresh the connector in ChatGPT if needed

Configuration Best Practices

  • Development: Enable DEBUG_MODE for detailed logs
  • Production: Disable DEBUG_MODE for security
  • High Traffic: Increase rate limits gradually
  • Security: Never commit .env file to version control
  • Backups: Keep a copy of your working configuration

Security

Data Privacy

CorpusIQ takes your privacy seriously:

  • Your Data Stays Yours: We don’t store or log search queries or results
  • Encrypted Transit: All communication uses HTTPS/TLS
  • Minimal Access: Only requests what’s needed from data sources
  • Token Security: OAuth tokens are securely managed
  • No Third-Party Sharing: Your data never leaves the CorpusIQ-ChatGPT connection

Access Control

Control who can use your CorpusIQ instance:

  • OAuth Authentication: Require login for all requests (production)
  • IP Whitelisting: Restrict access to specific IP addresses
  • Rate Limiting: Prevent abuse with request limits
  • CORS: Only allow ChatGPT to connect

Best Security Practices

  1. Always use HTTPS in production (never HTTP)
  2. Keep dependencies updated regularly
  3. Monitor access logs for suspicious activity
  4. Use strong OAuth credentials with regular rotation
  5. Enable rate limiting to prevent abuse
  6. Set appropriate CORS origins (don’t use *)
  7. Review connector permissions regularly
  8. Implement token validation in production
  9. Use environment variables for secrets (never hardcode)
  10. Regular security audits of your deployment

Reporting Security Issues

Found a security vulnerability? Please report it responsibly:

  1. Don’t create a public GitHub issue
  2. Do email security contact (see SECURITY.md)
  3. Include detailed reproduction steps
  4. Allow time for a fix before public disclosure

See SECURITY.md for our complete security policy.

Getting Help

Documentation Resources

Community Support

  • GitHub Issues: Report bugs or request features
  • Discussions: Ask questions and share ideas
  • Documentation: This guide and related docs

Professional Support

For enterprise support options, contact the CorpusIQ team.

Next Steps

Now that you understand CorpusIQ, here are suggested next steps:

  1. Try Advanced Searches: Experiment with different query types
  2. Connect More Sources: Add additional data connectors
  3. Customize Configuration: Tune settings for your needs
  4. Read Best Practices: Learn optimal usage patterns
  5. Join the Community: Share your experience and learn from others

Appendix

Glossary

  • MCP: Model Context Protocol - standard for AI-service communication
  • Tool: A capability that ChatGPT can invoke (like search)
  • Connector: Integration with a data source (Gmail, OneDrive, etc.)
  • Widget: The interactive UI that displays results
  • OAuth: Secure authentication standard for accessing data sources
  • CORS: Cross-Origin Resource Sharing - security feature for web requests
  • Rate Limiting: Restricting request frequency to prevent abuse

Keyboard Shortcuts

When using the CorpusIQ widget:

  • Tab: Navigate between fields and buttons
  • Enter: Submit search or activate button
  • Escape: Close modal or cancel action
  • Ctrl/Cmd + K: Focus search box (if implemented)

System Requirements

Minimum:

  • Python 3.10
  • 512 MB RAM
  • 100 MB disk space
  • Internet connection

Recommended:

  • Python 3.11+
  • 1 GB RAM
  • 500 MB disk space
  • Stable high-speed internet

Version History

See CHANGES.md for detailed version history and release notes.


Last Updated: January 2026
Version: 0.1.0
Maintained By: CorpusIQ Team

For the latest documentation, visit: https://github.com/CorpusIQ/corpusiq-openai-sdk