MongoDB Configuration and Setup - pacificnm/wiki-ai GitHub Wiki

MongoDB Configuration and Setup

This document provides comprehensive information about MongoDB configuration, connection, and usage in the wiki-ai project.

Overview

The wiki-ai project uses MongoDB Atlas (cloud-hosted MongoDB) with Mongoose as the ODM (Object Document Mapper). The database configuration includes connection management, retry logic, health checking, and automatic initialization.

Environment Configuration

Required Environment Variables

# MongoDB Atlas Configuration
MONGO_URI=mongodb+srv://username:[email protected]/?retryWrites=true&w=majority&appName=ClusterName
MONGO_DB_NAME=wiki-ai

# Optional Configuration
LOG_LEVEL=info
NODE_ENV=development

MongoDB Atlas Connection String Format

mongodb+srv://<username>:<password>@<cluster-url>/?retryWrites=true&w=majority&appName=<app-name>

Components:

  • <username>: MongoDB Atlas database user
  • <password>: Database user password
  • <cluster-url>: Your cluster's connection URL
  • retryWrites=true: Automatically retry write operations
  • w=majority: Write concern for replica sets
  • appName: Application identifier for monitoring

Database Configuration

Connection Options

The database is configured with optimized settings for MongoDB Atlas:

const connectionOptions = {
  maxPoolSize: 10,              // Maximum 10 concurrent connections
  serverSelectionTimeoutMS: 10000,  // 10 seconds to select server
  socketTimeoutMS: 45000,       // 45 seconds socket timeout
  connectTimeoutMS: 10000,      // 10 seconds connection timeout
  maxIdleTimeMS: 30000,        // 30 seconds idle timeout
  retryWrites: true,           // Retry failed writes
  w: 'majority'                // Write concern
};

Connection Features

  1. Automatic Retry Logic: Exponential backoff with up to 5 retry attempts
  2. Connection Pooling: Efficient connection reuse
  3. Health Monitoring: Continuous connection health checks
  4. Graceful Shutdown: Proper cleanup on application termination
  5. Error Handling: Comprehensive error logging and recovery

Database Structure

Collections

The wiki-ai database uses the following collections:

Collection Purpose Key Features
users User accounts and profiles Firebase UID integration, role-based access
documents Wiki documents and metadata Versioning, categorization, tagging
versions Document version history Markdown content, change tracking
categories Document organization Hierarchical structure, slug-based URLs
comments Document comments Version-specific commenting
attachments File attachments Document and version linking
logs Application logging Structured error and activity logs
aisuggestions AI-generated content Tags, summaries, and suggestions
accesscontrol Document sharing User permissions and access levels
favorites User bookmarks Quick access to preferred documents

Indexes

Automatically created indexes for optimal performance:

Users Collection:

  • { email: 1 } (unique)
  • { uid: 1 } (unique)
  • { createdAt: 1 }

Documents Collection:

  • { userId: 1 }
  • { title: 'text', content: 'text' } (full-text search)
  • { createdAt: -1 }

Sessions Collection:

  • { sessionId: 1 } (unique)
  • { userId: 1 }
  • { expiresAt: 1 } (TTL index)

Connection Management

Database Connection

import { connectToDatabase } from './config/database.js';

// Connect with default settings
await connectToDatabase();

// Connect with custom URI
await connectToDatabase('mongodb+srv://custom-uri');

Connection State Monitoring

import { dbState } from './config/database.js';

console.log(dbState.isConnected);  // Boolean
console.log(dbState.error);        // Last error (if any)
console.log(dbState.retryAttempts); // Number of retry attempts

Health Checking

import { checkDatabaseHealth } from './config/database.js';

const health = await checkDatabaseHealth();
console.log(health.status); // 'connected' | 'disconnected' | 'error'

Database Operations

Basic Operations

Create a Document

import Document from './models/Document.js';

const document = new Document({
  userId: user._id,
  title: 'My Wiki Page',
  description: 'A sample wiki page',
  tags: ['sample', 'wiki'],
  markdown: '# Hello World\n\nThis is content.'
});

await document.save();

Find Documents

// Find all documents for a user
const userDocs = await Document.find({ userId: user._id });

// Find with pagination
const docs = await Document
  .find()
  .skip(page * limit)
  .limit(limit)
  .sort({ createdAt: -1 });

// Text search
const searchResults = await Document.find({
  $text: { $search: 'search terms' }
});

Update Documents

// Update single document
await Document.findByIdAndUpdate(docId, {
  title: 'Updated Title',
  updatedAt: new Date()
});

// Update multiple documents
await Document.updateMany(
  { userId: user._id },
  { $set: { updatedAt: new Date() } }
);

Delete Documents

// Delete single document
await Document.findByIdAndDelete(docId);

// Delete multiple documents
await Document.deleteMany({ userId: user._id });

Advanced Operations

Aggregation Pipeline

const stats = await Document.aggregate([
  {
    $match: { userId: user._id }
  },
  {
    $group: {
      _id: null,
      totalDocs: { $sum: 1 },
      avgTags: { $avg: { $size: '$tags' } }
    }
  }
]);

Population (Joins)

const document = await Document
  .findById(docId)
  .populate('userId', 'displayName email')
  .populate('categoryIds', 'name slug')
  .populate({
    path: 'versionHistory',
    select: 'createdAt createdBy reason',
    options: { sort: { createdAt: -1 }, limit: 10 }
  });

Transactions

import mongoose from 'mongoose';

const session = await mongoose.startSession();

try {
  await session.withTransaction(async () => {
    // Create document
    const document = new Document({ /* data */ });
    await document.save({ session });
    
    // Create initial version
    const version = new Version({ 
      documentId: document._id,
      /* other data */
    });
    await version.save({ session });
    
    // Update document with version reference
    document.currentVersionId = version._id;
    await document.save({ session });
  });
} finally {
  await session.endSession();
}

Error Handling

Connection Errors

import { connectToDatabase } from './config/database.js';
import { logger } from './middleware/logger.js';

try {
  await connectToDatabase();
} catch (error) {
  logger.error('Database connection failed', { 
    error: error.message,
    code: error.code 
  });
  process.exit(1);
}

Operation Errors

try {
  const document = await Document.findById(docId);
  if (!document) {
    throw new NotFoundError('Document not found');
  }
} catch (error) {
  if (error instanceof mongoose.Error.ValidationError) {
    // Handle validation errors
    logger.error('Validation error', { errors: error.errors });
  } else if (error instanceof mongoose.Error.CastError) {
    // Handle invalid ObjectId
    logger.error('Invalid document ID', { id: docId });
  } else {
    // Handle other errors
    logger.error('Database operation failed', { error: error.message });
  }
  throw error;
}

Performance Optimization

Query Optimization

  1. Use Indexes: Ensure queries use appropriate indexes
  2. Limit Fields: Use .select() to fetch only needed fields
  3. Pagination: Implement proper pagination for large datasets
  4. Aggregation: Use aggregation pipeline for complex queries
// Optimized query example
const documents = await Document
  .find({ userId: user._id, published: true })
  .select('title description createdAt tags')
  .sort({ createdAt: -1 })
  .limit(20)
  .lean(); // Returns plain objects instead of Mongoose documents

Connection Optimization

  1. Connection Pooling: Configured with maxPoolSize: 10
  2. Connection Reuse: Single connection instance across application
  3. Idle Timeout: Automatic cleanup of idle connections
  4. Write Concerns: Balanced consistency and performance

Monitoring and Statistics

Database Statistics

import { getDatabaseStats } from './config/database.js';

const stats = await getDatabaseStats();
console.log({
  collections: stats.collections,
  dataSize: stats.dataSize,
  storageSize: stats.storageSize,
  objects: stats.objects
});

Connection Monitoring

The database configuration includes automatic monitoring:

  • Connection state tracking
  • Error logging
  • Retry attempt counting
  • Performance metrics

Backup and Recovery

Automated Backup

import { backupDatabase } from './config/database.js';

// Create backup
const backupPath = await backupDatabase('./backups/');
console.log(`Backup created: ${backupPath}`);

MongoDB Atlas Backup

MongoDB Atlas provides automatic backup features:

  1. Continuous Backup: Point-in-time recovery
  2. Snapshot Backup: Scheduled snapshots
  3. Cross-Region Backup: Geographic redundancy

Restore Procedures

  1. Atlas Dashboard: Use MongoDB Atlas interface for restores
  2. mongorestore: Command-line tool for local restores
  3. Application-level: Custom restore logic for specific data

Security Considerations

Connection Security

  1. TLS/SSL: All connections encrypted by default
  2. Authentication: Database user credentials required
  3. Network Access: IP whitelisting in Atlas
  4. Connection String: Secure credential storage

Data Security

  1. Field Validation: Mongoose schema validation
  2. Input Sanitization: Prevent injection attacks
  3. Access Control: User-based permissions
  4. Audit Logging: Track data access and changes

Troubleshooting

Common Issues

Connection Timeouts

Error: Server selection timed out after 30000 ms

Solutions:

  • Check network connectivity
  • Verify Atlas IP whitelist
  • Increase serverSelectionTimeoutMS

Authentication Failures

Error: Authentication failed

Solutions:

  • Verify username/password
  • Check database user permissions
  • Ensure correct database name

Memory Issues

Error: Allocation failed - JavaScript heap out of memory

Solutions:

  • Implement pagination
  • Use .lean() for read-only operations
  • Limit result set sizes

Debug Configuration

# Enable debug logging
DEBUG=mongoose:*
LOG_LEVEL=debug

Monitoring Tools

  1. MongoDB Atlas Monitoring: Built-in performance monitoring
  2. Application Logs: Winston-based structured logging
  3. Health Checks: Automated connection health verification

Best Practices

Schema Design

  1. Normalize vs Denormalize: Balance based on read/write patterns
  2. Embed vs Reference: Consider document size and update frequency
  3. Index Strategy: Create indexes based on query patterns
  4. Data Types: Use appropriate MongoDB data types

Query Patterns

  1. Avoid Large Documents: Keep documents under 16MB
  2. Batch Operations: Use bulk operations for multiple changes
  3. Connection Management: Reuse connections across requests
  4. Error Handling: Implement comprehensive error handling

Development Workflow

  1. Schema Migration: Use migration scripts for schema changes
  2. Seed Data: Automated data seeding for development
  3. Testing: Use separate test databases
  4. Documentation: Keep schema documentation updated

API Endpoints

Health Check

GET /api/health

Returns database connection status and basic statistics.

Database Statistics (Admin)

GET /api/admin/database/stats

Returns detailed database statistics for administrators.

Migration Guide

From Local MongoDB to Atlas

  1. Export Data: Use mongodump to export local data
  2. Create Atlas Cluster: Set up MongoDB Atlas cluster
  3. Import Data: Use mongorestore or Atlas migration tools
  4. Update Configuration: Change connection string
  5. Test Connection: Verify application connectivity

Schema Changes

  1. Create Migration Script: Write migration logic
  2. Backup Database: Always backup before migrations
  3. Test Migration: Run on development environment first
  4. Apply Migration: Execute on production during maintenance window
  5. Verify Results: Confirm migration success

This documentation provides comprehensive coverage of MongoDB usage in the wiki-ai project, from basic setup to advanced operations and troubleshooting.

⚠️ **GitHub.com Fallback** ⚠️