Slack and ChatGPT conversation formats - robbiemu/aclarai GitHub Wiki

Conversational Log Export Formats: Slack and ChatGPT Standards

This analysis examines the technical specifications and structural characteristics of conversational log formats exported from Slack and ChatGPT. Both platforms utilize JSON-based export formats with distinct organizational principles that reflect their respective conversational paradigms and technical architectures.

Slack Export Format Specification

Export Architecture and Types

Slack provides multiple export options depending on subscription tier and administrative needs. The primary export mechanism delivers data as compressed ZIP archives containing hierarchical JSON structures[1]. However, Slack also offers limited CSV export functionality for specific administrative tasks.

For CSV exports, workspace owners and administrators can generate a "Channel audit report" containing channel listings and metadata, plus manual exports of workspace analytics including member and channel overview data. These CSV exports serve administrative and migration planning purposes but do not include actual conversation content.

The comprehensive JSON exports contain complete workspace data with availability varying by subscription level. Free plan users access only the last 90 days of data, while Business+ and Enterprise users can export complete historical records[4]. Export completion notifications are delivered via email with downloadable ZIP file links.

JSON Export Structure

The root directory contains critical catalog files that index workspace organization[1]. The channels.json file provides a comprehensive listing structure like this:

[
  {
    "id": "C1234567890",
    "name": "general",
    "created": 1234567890,
    "creator": "U0987654321",
    "is_archived": false,
    "is_channel": true,
    "members": ["U0987654321", "U1122334455", "..."],
    "topic": {
      "value": "Company announcements and general discussion",
      "creator": "U0987654321",
      "last_set": 1234567890
    },
    "purpose": {
      "value": "This channel is for workspace-wide communication...",
      "creator": "U0987654321", 
      "last_set": 1234567890
    }
  },
  // ... additional channels
]

The users.json file contains user profiles with this structure:

[
  {
    "id": "U0987654321",
    "name": "john.doe",
    "real_name": "John Doe",
    "profile": {
      "email": "[email protected]", // Requires admin config
      "display_name": "John",
      "status_text": "Working remotely",
      "image_72": "https://secure.gravatar.com/..."
    },
    "is_admin": false,
    "is_owner": false,
    "is_bot": false,
    "updated": 1234567890
  },
  // ... additional users
]

Individual conversations are organized into separate folders within the archive, with each channel receiving its own directory. Within conversation folders, message history is segmented into daily JSON files using standardized naming like 2024-06-09.json[1]. This temporal organization facilitates efficient processing and targeted analysis of specific time periods.

Message Object Schema

Each message follows a standardized structure aligned with Slack's API format[1]. A typical message structure appears as:

[
  {
    "type": "message",
    "user": "U0987654321", // References users.json entry
    "text": "Hello everyone! How's the project going?",
    "ts": "1234567890.123456", // Timestamp + unique identifier
    "edited": {
      "user": "U0987654321",
      "ts": "1234567891.000000"
    },
    "reactions": [
      {
        "name": "thumbsup",
        "users": ["U1122334455", "U2233445566"],
        "count": 2
      }
    ]
  },
  {
    "type": "message",
    "user": "U1122334455",
    "text": "Here's the latest update <@U0987654321>",
    "ts": "1234567892.234567",
    "files": [
      {
        "id": "F1234567890",
        "name": "project_update.pdf",
        "title": "Q2 Project Status",
        "mimetype": "application/pdf",
        "url_private": "https://files.slack.com/files-pri/...",
        "size": 245760
      }
    ]
  },
  // ... additional messages
]

Complex messages may include a blocks array containing Block Kit objects for rich formatting and interactive elements[7]. Messages with attachments contain a files array with metadata and links, though actual file content is not included in exports and requires separate downloading through provided URLs[1].

File Handling Limitations

A critical limitation is that actual file content is excluded from exports[1]. Instead, the system provides file links directing back to original workspace files, requiring continued workspace access for complete data retrieval. This design reduces export sizes but creates dependencies on ongoing workspace availability for comprehensive data preservation.

ChatGPT Export Format Specification

Export Structure

ChatGPT exports use a fundamentally different approach, delivering conversation history as a single conversations.json file within a ZIP archive[6]. This monolithic structure reflects ChatGPT's conversation-centric design where each interaction represents a complete thread between user and AI assistant.

The top-level structure contains an array of conversation objects:

[
  {
    "title": "Python Data Analysis Help",
    "create_time": 1234567890.123,
    "update_time": 1234567950.456,
    "mapping": {
      "root-node-id": {
        "id": "root-node-id",
        "parent": null,
        "children": ["first-message-id"]
      },
      "first-message-id": {
        "id": "first-message-id", 
        "parent": "root-node-id",
        "children": ["response-id-1"],
        "message": {
          "id": "msg-001",
          "author": {"role": "user"},
          "create_time": 1234567890.123,
          "content": {
            "content_type": "text",
            "parts": ["How do I analyze CSV data in Python?"]
          }
        }
      },
      "response-id-1": {
        "id": "response-id-1",
        "parent": "first-message-id", 
        "children": ["follow-up-id"],
        "message": {
          "id": "msg-002",
          "author": {"role": "assistant"},
          "create_time": 1234567895.678,
          "content": {
            "content_type": "text",
            "parts": ["You can use pandas library for CSV analysis. Here's how to get started:\n\n```python\nimport pandas as pd\ndf = pd.read_csv('your_file.csv')\nprint(df.head())\n```\n\nThis will load your CSV and show the first 5 rows..."]
          }
        }
      }
      // ... additional nodes in conversation tree
    },
    "current_node": "response-id-1" // Points to active leaf node
  },
  // ... additional conversations
]

Conversation Tree Architecture

ChatGPT conversations are structured as tree-like data accommodating branching dialogues and alternative response paths[6]. Each conversation object contains a mapping field serving as a lookup table for all message nodes, indexed by unique identifiers.

The current_node field identifies the most recent active message in the thread[6]. To reconstruct complete conversations chronologically, applications must traverse backwards from the current node using parent relationships between nodes until reaching the root node with a null parent value. This tree structure enables the platform's unique capability for exploring alternative conversation branches.

Message Node Structure and Content Schema

Individual message nodes contain essential navigation and content information[6]. Within message objects, the author field specifies the source through a role value like "user" or "assistant"[6]. The content object contains a content_type field (commonly "text") and a parts array with actual message segments, typically as string elements[6].

Here's how a branching conversation might look in the mapping structure:

{
  "mapping": {
    "root": {"id": "root", "parent": null, "children": ["user-q1"]},
    "user-q1": {
      "id": "user-q1",
      "parent": "root", 
      "children": ["assistant-r1a", "assistant-r1b"], // Two alternative responses
      "message": {
        "author": {"role": "user"},
        "content": {
          "content_type": "text", 
          "parts": ["Explain machine learning in simple terms"]
        }
      }
    },
    "assistant-r1a": {
      "id": "assistant-r1a",
      "parent": "user-q1",
      "children": ["user-follow-up-a"],
      "message": {
        "author": {"role": "assistant"},
        "content": {
          "content_type": "text",
          "parts": ["Machine learning is like teaching computers to recognize patterns..."]
        }
      }
    },
    "assistant-r1b": { // Alternative response branch
      "id": "assistant-r1b", 
      "parent": "user-q1",
      "children": [],
      "message": {
        "author": {"role": "assistant"},
        "content": {
          "content_type": "text",
          "parts": ["Think of machine learning as a way to train computers using examples..."]
        }
      }
    }
    // ... additional nodes
  },
  "current_node": "assistant-r1a" // Active conversation path
}

Temporal and Metadata Fields

Conversation objects include create_time (Unix timestamp of thread creation) and update_time (most recent modification)[6]. The title field contains user-defined or system-generated conversation identifiers that facilitate navigation and organization[6].

Comparative Analysis

Structural Philosophy Differences

The fundamental difference reflects distinct conversational paradigms. Slack's distributed architecture optimizes for workspace-scale communication with multiple parallel conversations, requiring efficient organization and selective access[1]. The hierarchical structure prioritizes workspace organization and administrative control.

ChatGPT's monolithic approach reflects focus on individual user interactions with conversation continuity taking precedence[6]. The tree-based structure accommodates unique branching capabilities not present in traditional group communication platforms.

Data Completeness and Limitations

Both platforms exhibit specific export limitations. Slack excludes actual file content, providing only links requiring continued workspace access[1]. User email information requires specific administrative configuration for inclusion in exports[2].

ChatGPT provides complete conversational content but lacks detailed metadata like individual message timing or system performance metrics[6]. The focus on conversation content over technical metadata reflects primary use cases but may limit analytical applications.

Implementation Considerations

Organizations working with these export formats must consider structural differences, inherent limitations, and processing requirements when developing analysis, migration, or archival tools. Slack's distributed JSON format requires comprehensive parsing of multiple files and directories, while ChatGPT's tree structure demands recursive traversal algorithms for proper conversation reconstruction.

Understanding these architectural distinctions is crucial for selecting appropriate processing approaches and managing expectations regarding data completeness and accessibility in exported formats. The provided format examples demonstrate the practical structure developers will encounter when implementing parsers for these export formats.

[1] https://slack.com/help/articles/220556107-How-to-read-Slack-data-exports [2] https://stackoverflow.com/questions/45037695/how-to-export-slack-users-email-as-json [4] https://www.mimecast.com/blog/export-slack-conversations/ [6] https://community.openai.com/t/decoding-exported-data-by-parsing-conversations-json-and-or-chat-html/403144 [7] https://api.slack.com/reference/block-kit/composition-objects