CorpusIQ User Guide
Welcome to the CorpusIQ Apps SDK User Guide! This comprehensive guide will help you understand, configure, and use CorpusIQ to unlock the power of searching across all your connected data sources directly from ChatGPT.
Table of Contents
- Introduction
- Getting Started
- Core Concepts
- Using CorpusIQ
- Managing Connectors
- Advanced Features
- Configuration
- Security
Introduction
What is CorpusIQ?
CorpusIQ is a production-ready MCP (Model Context Protocol) server that seamlessly integrates with ChatGPT, enabling you to search across multiple connected data sources in one unified interface. Think of it as your intelligent search assistant that knows where to find information across your entire digital workspace.
What Can You Do With CorpusIQ?
- Unified Search: Search across Gmail, OneDrive, QuickBooks, and more from a single interface
- Conversational Queries: Use natural language to find what you need
- Beautiful Interface: View results in an intuitive, interactive widget
- Easy Management: Connect and disconnect data sources with simple clicks
- Privacy-First: Your data stays secure with enterprise-grade security
Who Should Use CorpusIQ?
- Business Professionals: Find emails, documents, and financial data instantly
- Developers: Access technical documentation, code repositories, and issue trackers
- Researchers: Search across papers, notes, and reference materials
- Teams: Collaborate with shared access to organizational knowledge
- Anyone: Who needs to search across multiple data sources efficiently
Getting Started
Prerequisites
Before you begin, ensure you have:
- Python 3.10 or higher installed on your system
- ChatGPT account with access to Developer Mode (check with your admin)
- A few minutes to set up and configure
Quick Installation
Follow these steps to get CorpusIQ running:
Step 1: Clone and Install
# Clone the repository
git clone https://github.com/CorpusIQ/corpusiq-openai-sdk.git
cd corpusiq-openai-sdk
# Create a virtual environment
python3 -m venv .venv
# Activate the virtual environment
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -e .
Step 2: Configure
# Copy the example configuration
cp .env.example .env
# Edit the configuration file
# You can use any text editor like nano, vim, or VS Code
nano .env
Recommended settings for first-time users:
# Allow ChatGPT to connect
CORPUSIQ_CORS_ALLOW_ORIGINS_CSV=https://chat.openai.com
# Enable debug mode for testing (disable in production)
CORPUSIQ_DEBUG_MODE=false
# Set logging level
CORPUSIQ_LOG_LEVEL=INFO
# Configure rate limiting
CORPUSIQ_RATE_LIMIT_REQUESTS_PER_MINUTE=60
Step 3: Start the Server
python -m corpusiq
You should see:
INFO: Starting CorpusIQ Apps SDK server...
INFO: Uvicorn running on http://0.0.0.0:8000
Keep this terminal window open while using CorpusIQ.
Step 4: Create HTTPS Tunnel
ChatGPT requires an HTTPS connection. Use one of these free options:
Option A: Cloudflare Tunnel (Recommended)
# Install cloudflared
# macOS: brew install cloudflare/cloudflare/cloudflared
# Windows/Linux: See https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/downloads/
# Start the tunnel
cloudflared tunnel --url http://localhost:8000
Option B: ngrok
# Install ngrok from https://ngrok.com/download
# Start the tunnel
ngrok http 8000
Copy the HTTPS URL provided (e.g., https://abc123.trycloudflare.com).
Step 5: Connect to ChatGPT
- Open ChatGPT at https://chat.openai.com
- Go to Settings → Apps & Connectors → Advanced Settings
- Enable Developer Mode
- Click Create Connector or Add MCP Server
- Paste your HTTPS tunnel URL
- Click Connect
ChatGPT will automatically discover your CorpusIQ server!
Core Concepts
Model Context Protocol (MCP)
MCP is a standardized protocol that allows AI applications like ChatGPT to communicate with external services. Think of it as a universal translator between ChatGPT and your data sources.
Key Benefits:
- Standardized: Works across different AI platforms
- Secure: Built-in authentication and authorization
- Extensible: Easy to add new capabilities
- Efficient: Optimized for real-time conversations
Tools
CorpusIQ provides two main tools that ChatGPT can use:
1. corpus_search
Search across all your connected data sources with natural language queries.
Parameters:
query(required): Your search question in plain EnglishmaxResults(optional): How many results to return (1-20, default: 5)
Examples:
"Find emails about the Q4 budget review"
"Show me documents related to the Smith project"
"What did Sarah say about the deadline?"
2. open_connectors
Open the interactive connector management interface to add, remove, or configure data sources.
No parameters required - just opens the interface!
The Widget Interface
CorpusIQ includes a beautiful, interactive HTML widget that displays:
- Search Results: Formatted and easy to read
- Connector Status: See which data sources are connected
- Quick Actions: Connect or disconnect sources with one click
- Real-time Updates: Changes reflect immediately
The widget appears automatically when you use CorpusIQ tools in ChatGPT.
Using CorpusIQ
Basic Search
To search across your connected data sources, simply ask ChatGPT:
Search my corpus for "quarterly financial reports"
Or:
Find information about project Phoenix in my connected sources
ChatGPT will understand these natural language requests and use the corpus_search tool automatically.
Advanced Search Queries
You can make your searches more specific:
Time-based queries:
"Find emails from last week about the merger"
"Show me documents created in January"
Source-specific queries:
"Search my Gmail for messages from john@example.com"
"Find QuickBooks invoices over $10,000"
Content-type queries:
"Find all PDF documents about marketing strategy"
"Show me spreadsheets with budget data"
Combining criteria:
"Find emails from Sarah last month about the website redesign"
Adjusting Result Count
Control how many results you receive:
"Search my corpus for 'project updates' and show me 10 results"
Or be explicit:
"Find documents about annual planning, maximum 3 results"
Understanding Search Results
Results typically include:
- Title: Document or email subject
- Source: Which connector found this (Gmail, OneDrive, etc.)
- Snippet: Relevant excerpt showing why it matched
- Date: When it was created or modified
- Link: Direct link to open the original (if available)
- Relevance Score: How well it matches your query
Managing Connectors
Viewing Connected Sources
To see what data sources are currently connected:
"Show me my connected data sources"
Or:
"Open the connectors interface"
Adding a New Connector
- Ask ChatGPT:
"Open connectors" - In the widget, click Add Connector
- Choose from available sources:
- Gmail
- Google Drive
- OneDrive
- Dropbox
- QuickBooks
- Salesforce
- Slack
- And more…
- Follow the OAuth flow to authorize access
- Wait for “Connected” status
Removing a Connector
- Open the connectors interface
- Find the connector you want to remove
- Click Disconnect or Remove
- Confirm your choice
Note: Removing a connector doesn’t delete your data; it just stops CorpusIQ from searching it.
Connector Status
Each connector shows its status:
- 🟢 Connected: Active and ready to search
- 🟡 Connecting: Authorization in progress
- 🔴 Disconnected: Not active
- ⚠️ Error: Connection issue (check logs)
Troubleshooting Connectors
Connector won’t connect:
- Check your internet connection
- Ensure you have access to the data source
- Try disconnecting and reconnecting
- Check the OAuth credentials are valid
Connector shows error:
- Look at server logs for details
- Verify API quotas haven’t been exceeded
- Check if the service is experiencing downtime
- Ensure tokens haven’t expired
Advanced Features
Custom Search Operators
Use these operators for precise searches:
- Exact phrase: Use quotes
"exact phrase here" - Exclude terms: Use minus
-term - Either/or: Use
ORbetween terms - Required term: Use
+term
Examples:
"annual report" -draft
budget OR forecast +2024
"project phoenix" -cancelled
Filtering by Connector
Search specific sources only:
"Search only my Gmail for messages about the conference"
"Find documents in my OneDrive folder about the merger"
Date Range Searches
Specify time periods:
"Find emails from the last 7 days"
"Show me documents modified this month"
"Search for invoices from Q4 2023"
Saved Searches (Coming Soon)
Save frequently used searches for quick access:
"Save this search as 'Weekly Reports'"
"Run my saved search 'Client Communications'"
Batch Operations (Coming Soon)
Perform actions on multiple results:
"Export all search results to CSV"
"Download the top 5 matching documents"
Configuration
Environment Variables
Configure CorpusIQ by editing the .env file:
Core Settings
# CORS Configuration
CORPUSIQ_CORS_ALLOW_ORIGINS_CSV=https://chat.openai.com
# Set to true only during development
CORPUSIQ_DEBUG_MODE=false
# Logging level: DEBUG, INFO, WARNING, ERROR, CRITICAL
CORPUSIQ_LOG_LEVEL=INFO
Rate Limiting
# Maximum requests per minute per IP
CORPUSIQ_RATE_LIMIT_REQUESTS_PER_MINUTE=60
# Increase for heavy usage:
# CORPUSIQ_RATE_LIMIT_REQUESTS_PER_MINUTE=120
Performance Tuning
# Maximum request size (in bytes)
CORPUSIQ_MAX_REQUEST_SIZE=2097152 # 2MB
# Request timeout (seconds)
CORPUSIQ_REQUEST_TIMEOUT=30
OAuth Settings (Production Only)
CORPUSIQ_OAUTH_RESOURCE_URL=https://your-domain.com
CORPUSIQ_OAUTH_ISSUER=https://your-auth-provider.com
CORPUSIQ_OAUTH_AUTHORIZATION_ENDPOINT=https://your-auth-provider.com/authorize
CORPUSIQ_OAUTH_TOKEN_ENDPOINT=https://your-auth-provider.com/token
CORPUSIQ_OAUTH_JWKS_URI=https://your-auth-provider.com/.well-known/jwks.json
Applying Configuration Changes
After modifying .env:
- Stop the server (Ctrl+C in the terminal)
- Restart:
python -m corpusiq - Refresh the connector in ChatGPT if needed
Configuration Best Practices
- Development: Enable
DEBUG_MODEfor detailed logs - Production: Disable
DEBUG_MODEfor security - High Traffic: Increase rate limits gradually
- Security: Never commit
.envfile to version control - Backups: Keep a copy of your working configuration
Security
Data Privacy
CorpusIQ takes your privacy seriously:
- Your Data Stays Yours: We don’t store or log search queries or results
- Encrypted Transit: All communication uses HTTPS/TLS
- Minimal Access: Only requests what’s needed from data sources
- Token Security: OAuth tokens are securely managed
- No Third-Party Sharing: Your data never leaves the CorpusIQ-ChatGPT connection
Access Control
Control who can use your CorpusIQ instance:
- OAuth Authentication: Require login for all requests (production)
- IP Whitelisting: Restrict access to specific IP addresses
- Rate Limiting: Prevent abuse with request limits
- CORS: Only allow ChatGPT to connect
Best Security Practices
- Always use HTTPS in production (never HTTP)
- Keep dependencies updated regularly
- Monitor access logs for suspicious activity
- Use strong OAuth credentials with regular rotation
- Enable rate limiting to prevent abuse
- Set appropriate CORS origins (don’t use
*) - Review connector permissions regularly
- Implement token validation in production
- Use environment variables for secrets (never hardcode)
- Regular security audits of your deployment
Reporting Security Issues
Found a security vulnerability? Please report it responsibly:
- Don’t create a public GitHub issue
- Do email security contact (see SECURITY.md)
- Include detailed reproduction steps
- Allow time for a fix before public disclosure
See SECURITY.md for our complete security policy.
Getting Help
Documentation Resources
- Quick Start: QUICKSTART.md - Get running in 5 minutes
- Deployment Guide: DEPLOYMENT.md - Production deployment
- API Reference: API_REFERENCE.md - Technical details
- FAQ: FAQ.md - Common questions answered
- Troubleshooting: TROUBLESHOOTING.md - Fix common issues
Community Support
- GitHub Issues: Report bugs or request features
- Discussions: Ask questions and share ideas
- Documentation: This guide and related docs
Professional Support
For enterprise support options, contact the CorpusIQ team.
Next Steps
Now that you understand CorpusIQ, here are suggested next steps:
- Try Advanced Searches: Experiment with different query types
- Connect More Sources: Add additional data connectors
- Customize Configuration: Tune settings for your needs
- Read Best Practices: Learn optimal usage patterns
- Join the Community: Share your experience and learn from others
Appendix
Glossary
- MCP: Model Context Protocol - standard for AI-service communication
- Tool: A capability that ChatGPT can invoke (like search)
- Connector: Integration with a data source (Gmail, OneDrive, etc.)
- Widget: The interactive UI that displays results
- OAuth: Secure authentication standard for accessing data sources
- CORS: Cross-Origin Resource Sharing - security feature for web requests
- Rate Limiting: Restricting request frequency to prevent abuse
Keyboard Shortcuts
When using the CorpusIQ widget:
- Tab: Navigate between fields and buttons
- Enter: Submit search or activate button
- Escape: Close modal or cancel action
- Ctrl/Cmd + K: Focus search box (if implemented)
System Requirements
Minimum:
- Python 3.10
- 512 MB RAM
- 100 MB disk space
- Internet connection
Recommended:
- Python 3.11+
- 1 GB RAM
- 500 MB disk space
- Stable high-speed internet
Version History
See CHANGES.md for detailed version history and release notes.
Last Updated: January 2026
Version: 0.1.0
Maintained By: CorpusIQ Team
For the latest documentation, visit: https://github.com/CorpusIQ/corpusiq-openai-sdk