Integrating MyFitnessPal Data into the Digital Persona via MCP - Hackshaven/digital-persona GitHub Wiki

Integrating MyFitnessPal Data into the Digital Persona via MCP

Introduction and Goals

Integrating MyFitnessPal (MFP) data into the Digital Persona system requires a privacy-first, modular pipeline that imports personal health metrics (nutrition, weight, water intake, exercise) into the AI’s long-term memory. We will use the python-myfitnesspal library to fetch the user’s daily MFP logs and normalize them into a semantic JSON format (e.g. JSON-LD/ActivityStreams 2.0) compatible with the Model Context Protocol (MCP). The data will be stored locally under the user’s control and served through a local MCP server endpoint for on-demand queries by the Digital Persona AI. This plan ensures daily synchronization of data, robust error handling for authentication/scraping issues, and adherence to the project’s ethical guidelines (local storage, encryption in transit, user consent). Key objectives include:

  • Daily Data Ingestion: Automatically retrieve nutrition diaries (calories, macros), weight logs, water intake, and exercise entries from MyFitnessPal each day.

  • Semantic Normalization: Convert raw MFP data into a structured JSON memory record (timestamped, with meaningful field names and optional tags) that aligns with Digital Persona’s memory schema (e.g. JSON-LD with ActivityStreams vocabulary).

  • MCP Endpoint Exposure: Host a lightweight local HTTP server implementing MCP endpoints (GET routes) to serve the processed data. The Digital Persona (as an MCP client) can query these endpoints to retrieve context when needed.

  • Privacy & Security: Favor local file storage (user-owned “memory vault”) and encrypted communication for any data transfer. Ensure no personal data is sent to third-party cloud services, in line with the project’s privacy-by-design, user-control ethos. Use HTTPS for the MCP server (even if on localhost) and consider encrypting the data at rest (disk encryption or an encrypted vault).

  • Reliability & Logging: Incorporate retry and error-handling strategies for MFP’s cookie-based auth (which relies on browser cookies) and scraping (e.g. site changes or downtime). Maintain logs of ingestion runs and provide monitors or alerts if data sync fails or if data appears inconsistent.

  • Extensibility: Design the connector to accommodate future enhancements, such as automatic summaries of weekly intake, alignment with user fitness goals, or trend detection (e.g. weight change alerts). These can be delivered via the AI’s memory or additional MCP endpoints.

Following sections detail the proposed architecture, data schema mapping, MCP interface design, sample implementation code, and considerations for security and future extensions.

Architecture Overview

Figure: MyFitnessPal to Digital Persona Data Flow (conceptual)

  • MyFitnessPal Data Source: The user logs meals, water, weight, and exercise in MyFitnessPal. We access this data using the python-myfitnesspal Python library, which scrapes MFP on the user’s behalf. Authentication: As of v2.x, the library uses your web browser’s saved MFP session cookies to authenticate (direct login is no longer possible due to MFP’s added CAPTCHA in 2022). The user must be logged into MyFitnessPal in a local browser; the connector will read the cookies (via browser_cookie3) to gain authorized access to the diary data.

  • Connector Script (Local Ingestion Pipeline): A scheduled Python script (or background process) runs daily (e.g. midnight or early morning) to fetch the previous day’s data. It uses myfitnesspal.Client() to retrieve:

    • Nutrition diary: via client.get_date(year, month, day), yielding a Day object with totals for calories and nutrients, water intake, and notes.

    • Exercise entries: from Day.exercises (cardio and strength exercises with details like minutes and calories burned).

    • Weight measurements: via client.get_measurements('Weight', date) for that day’s weigh-in (if any). The connector then normalizes this data into a structured JSON object (see next section) and stores it in the user’s memory vault (e.g. as a file in a persona_memory/health/ directory, or in a local database). All data is timestamped (with the date) and tagged by type. The connector logs the operation (timestamp and success/failure) to an ingestion log. On failure (e.g. network error or auth issue), it can retry after a short delay or mark the day for re-fetch on next run.

  • MCP Server (Local Endpoint): The system runs a lightweight web server (e.g. a FastAPI or Flask app) locally, exposing RESTful endpoints following the Model Context Protocol conventions. This MCP server reads from the local data store and serves JSON responses to authorized clients (in this case, the Digital Persona AI). For security, the server binds to localhost (or a secure local network interface) and uses HTTPS with a local certificate (or is tunneled through an encrypted channel). Authentication could be as simple as an API key or token checked on each request, to ensure only the persona system accesses it.

  • Digital Persona Memory Integration: The Digital Persona AI (which can be an LLM-based agent) acts as an MCP client. Whenever it needs up-to-date personal context – for example, “How many calories did I consume yesterday?” or retrieving user’s recent health data – it will issue an HTTP GET to the MCP server’s endpoint (through a secure channel). The MCP server responds with the JSON-LD structured data, which the AI can then incorporate into its reasoning or conversation. This on-demand retrieval avoids stuffing all data into the prompt and aligns with the project’s memory architecture of pulling facts as needed. The memory records can also be proactively loaded into the AI’s long-term memory store or vector index if needed for semantic search queries (e.g. “Find all days I drank less than 1L of water”).

The architecture is modular: MyFitnessPal acts as a pluggable data source feeding into a unified memory format, using an open standard interface (MCP) to communicate with the AI. This reflects the project’s emphasis on modular data streams and user-owned storage. It also keeps sensitive health data under the user’s control at all times – data is fetched locally and never sent to third-party servers (aside from the initial retrieval from MFP itself).

Daily Data Ingestion Pipeline (MyFitnessPal Connector)

To ensure up-to-date information, the connector will run a daily ingestion workflow:

  1. Scheduled Trigger: Use a scheduler (cron job on Linux/macOS, Task Scheduler on Windows, or a persistent background thread) to invoke the connector script once per day. A recommended schedule is nightly, shortly after the user’s typical logging for the day is complete (e.g. 12:30 AM local time for the previous day’s data). This ensures all meals and activities of the day are captured.

  2. Fetch Data via python-myfitnesspal: The script initializes the MFP client and fetches the relevant entries:

    • Daily Nutrition Totals: client.get_date(Y, M, D) returns a Day object representing that date. The Day.totals property provides a dictionary of total nutrients consumed: calories, carbohydrates, fat, protein, sodium, sugar, etc.. For example, day.totals might return: {'calories': 2001, 'carbohydrates': 369, 'fat': 22, 'protein': 110, 'sodium': 3326, 'sugar': 103}. These values are presumably in kilocalories for calories, and grams or milligrams for nutrients (the library returns raw numbers as shown). We will capture all available fields.

    • Water Intake: Using day.water yields the water consumption logged for that day. By default this returns a number (e.g. 1 in the docs example). In MyFitnessPal UI, water is tracked in cups (8 oz) by default; advanced usage of the library can make it unit-aware. We will treat the day.water value as number of cups (or liters if configured) and later convert it to a standard unit (milliliters) in our normalized data. If no water was logged, this may be 0 or None, which we handle as 0.

    • Exercises/Activities: The Day.exercises attribute provides a list of exercise logs for that day. It typically has two entries: [Cardiovascular, Strength]. Each can be converted to a list via get_as_list(). For example, day.exercises[0].get_as_list() might return a list of cardio exercises with their details (name, minutes, calories burned). One entry might look like: {'name': 'Running (jogging), 8 kph...', 'nutrition_information': {'minutes': 25, 'calories burned': 211}}. Strength exercises similarly have sets, reps/set, and weight/set in their info. The connector will iterate through both cardio and strength lists and collect each exercise’s data into a structured form (standardizing keys like calories_burned, minutes, etc.). If the user hasn’t logged any exercises, these lists may be empty; we’ll handle that by returning an empty array for exercises.

    • Weight Measurement: MyFitnessPal allows logging body weight (and other measurements) on a given date. We use client.get_measurements('Weight', date) to retrieve the weight entry for the target date. This returns an OrderedDict of date->value pairs. For a single date query, it will contain that date if a weight was logged, e.g. {2025-07-11: 171.2} with the weight value (likely in the user’s preferred unit, often pounds). We extract the weight value if present. If the user did not log weight that day, the result may be empty. In that case, we might choose to omit the weight field for that day’s record or leave it as null. (Optional: We could also choose to carry forward the last known weight as the “current weight”, but for data integrity, it’s better to record only actual logged measurements and treat missing days as no data).

  3. Error Handling & Retry: During data fetch, a few issues can arise:

    • Authentication failure: If the MFP session cookies are missing or expired, myfitnesspal.Client() calls may fail to retrieve data (the library might return empty data or throw an exception). Recognizing this (e.g. by catching exceptions or checking if fetched data is empty when it shouldn’t be) should trigger a specific handler: the connector can log an auth error and prompt the user to log in again in their browser to refresh the cookies. In a robust setup, the connector might send a notification or log entry like “MyFitnessPal authentication expired – please log in via browser” rather than silently failing. This ensures the user can fix the issue.

    • Site changes or scraping errors: The python-myfitnesspal library relies on MFP’s website structure. If MyFitnessPal changes their HTML or if the site is temporarily unreachable, calls could fail. We will implement retry logic (e.g. try up to 3 times with a short delay on network errors). If still unsuccessful, log the failure and skip that day; the next day’s run can attempt again (possibly fetching two days if yesterday failed, to backfill). Keeping the library updated (monitoring the python-myfitnesspal GitHub for updates) is also advisable to handle site changes.

    • Data consistency issues: If the user hasn’t logged anything on a given day, the data might be partially empty (e.g. day.totals missing keys). In the 3rd-party example script, if fewer than 6 nutrients were present, they padded with None. Our approach will be to explicitly set 0 for any missing nutrient fields (assuming if not present, it’s 0). This ensures each day’s record has a consistent schema.

  4. Data Normalization: After fetching, the connector transforms the raw data into our target JSON format. This involves:

    • Converting units to a consistent system. For instance, if weight is in pounds, convert to kilograms (or store the unit alongside value). If water is in “cups”, convert to milliliters (1 cup = 236.6 ml) for precision, or to liters. We will decide on a standard (SI units for metric consistency: kg, ml, etc.). Calories are already in kcal which is standard.

    • Renaming fields to be self-explanatory and semantically rich. “carbohydrates” might become “carbs_g” or remain “carbohydrates_g”; “sodium” to “sodium_mg”; etc., making units explicit. We might also nest all nutrition totals under a sub-object for clarity (e.g. "nutrition": { "calories": 2000, "protein_g": 110, ... }).

    • Adding metadata: Each record will include the date (ISO 8601 string) and a type or context to clarify what it represents. For semantic interoperability, we can treat each daily record as an ActivityStreams “Object” or a custom type like "type": "FoodDiaryEntry" in JSON-LD. We can also include an "id" (e.g. a URI or unique string for the entry) if needed for reference. Optionally, a list of "tag" or "keywords" can be added to label the entry (e.g. ["health", "nutrition"] or tags like "low-carb" if the data meets some criteria). Trait tags could also be added in the future (for example, if the system analyzes the entry and tags it with "healthy_day" or "over_target").

  5. Local Storage in Memory Vault: The normalized JSON is then saved to the user’s personal data store. Possible approaches:

    • Append to a cumulative JSON file (e.g. nutrition_log.json containing an array of entries, or a daily dictionary inside).

    • Use one file per day (e.g. health/myfitnesspal/2025-07-11.json) as suggested in the ingestion guidelines. This makes it easy to add new entries and potentially manually edit specific days.

    • Use a lightweight database (like SQLite) to store records with date as key. However, simple JSON files are human-inspectable and align with the project’s transparency goals (users can open and read their data).

  6. For this plan, we’ll assume a file-based store: e.g. a directory persona_memory/health/ with files myfitnesspal_<date>.json or a structured subfolder. This vault is local-first (not uploaded to cloud) and can be inside an encrypted container or disk. The connector ensures file writes are atomic (writing to a temp file then renaming) to avoid corruption. All files will be readable only by the user/AI process (proper OS permissions).

  7. Logging and Monitoring: Each run of the pipeline will log its outcome. For example, an entry in ingestion_log.txt might be added: 2025-07-12 00:30 - Fetched MyFitnessPal data for 2025-07-11 (calories=1800, weight=170.5 lb). If an error occurred, it would log an error message. These logs help in troubleshooting and can be surfaced to the user (e.g. an alert if data hasn’t been updated for 2 days). Over time, the connector could also monitor trends (like if weight changes rapidly or if data hasn’t been logged by user for a while, etc., though that veers into extension territory).

By the end of this pipeline, the user’s daily MFP data is securely captured and stored in a normalized, timestamped JSON record ready to be served to the AI.

Data Normalization and MCP-Compatible Schema

The Model Context Protocol doesn’t mandate a specific JSON schema for data; it standardizes the communication (request/response) between AI and data sources. We have flexibility to define a schema that best represents MyFitnessPal data in the Digital Persona’s semantic memory. We will use a JSON-LD structure inspired by ActivityStreams 2.0, as this is a widely used format for time-stamped activities and can be extended with custom fields. Each daily log will be one JSON object. Below is an example schema for a single day’s entry, followed by a table mapping MFP fields to our normalized fields:

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "type": "FoodDiaryEntry",  
  "date": "2025-07-11",
  "nutrition": {
    "calories_kcal": 1850,
    "carbohydrates_g": 250,
    "fat_g": 60,
    "protein_g": 100,
    "sodium_mg": 2300,
    "sugar_g": 80
  },
  "water_ml": 2000,
  "weight_kg": 78.0,
  "exercises": [
    {
      "name": "Walking, dog",
      "minutes": 60,
      "calories_burned": 200
    },
    {
      "name": "Bench Press",
      "sets": 3,
      "reps_per_set": 12,
      "weight_kg": 40
    }
  ],
  "tags": ["health", "nutrition", "exercise"]
}

In this JSON-LD example, we set a custom "type": "FoodDiaryEntry" to denote the nature of the object (not a standard ActivityStreams type, but we could define it in an @context). The date is the key index. Nutrition totals are grouped under "nutrition" with explicit units in keys. Water and weight have _ml and _kg suffixes to clarify units. The exercises array contains each exercise with relevant fields (we separate strength vs cardio by presence of calories_burned vs sets, etc., or we could add a "category": "cardio/strength" field). Tags can help with retrieval or classification (for instance, the system could tag an entry with "high-protein" or "met-goal" later).

Mapping MyFitnessPal Fields to Normalized Schema:

MyFitnessPal Data Field Example Value Normalized JSON Field Description
Date (implicit) 2025-07-11 date (ISO 8601 string) The date of the entry (YYYY-MM-DD).
Calories 1850 kcal nutrition.calories_kcal Total calories consumed (kilocalories).
Carbohydrates 250 g nutrition.carbohydrates_g Total carbohydrates (grams).
Fat 60 g nutrition.fat_g Total fat (grams).
Protein 100 g nutrition.protein_g Total protein (grams).
Sodium 2300 mg nutrition.sodium_mg Total sodium (milligrams).
Sugar 80 g nutrition.sugar_g Total sugars (grams).
Water 8 cups (~2000 ml) water_ml Water intake (milliliters). MFP logs in cups; converted to ml.
Weight 171.0 lb (77.6 kg) weight_kg Body weight (kilograms) on that date.
Cardio Exercise Name "Walking, dog" exercises[].name Name/description of exercise.
Cardio Exercise Duration 60 minutes exercises[].minutes Duration of cardio (minutes).
Cardio Exercise Calories 200 kcal burned exercises[].calories_burned Calories burned in exercise (kcal).
Strength Exercise Name "Bench Press" exercises[].name Name of strength exercise.
Strength Exercise Sets 3 exercises[].sets Number of sets performed.
Strength Exercise Reps/Set 12 exercises[].reps_per_set Repetitions per set.
Strength Exercise Weight/Set 40 kg exercises[].weight_kg Weight used per set (kg).
Daily Notes (if any) "Felt energetic..." note (optional) Any text note the user entered for that day.

Table: Mapping of MyFitnessPal fields to the normalized MCP schema (with units). Data fields marked with MFP references show where the values come from in the python-myfitnesspal output.

This schema is JSON-LD friendly. For example, we could define an @context such that "calories_kcal" is linked to a standard nutrition ontology, or use Schema.org terms (Schema.org has a NutritionInformation type, though not in JSON-LD by default). However, given this is primarily for internal use between our MCP server and the AI, a simple descriptive schema is sufficient. The key is that it’s structured and self-descriptive, enabling the AI (or any other tool) to interpret the data unambiguously. The Digital Persona’s memory system benefits from such semantic structure – it can retrieve “facts” like a JSON field rather than scraping text, and even do simple computations or comparisons across entries if needed. This is aligned with the project’s aim to use structured memory with semantic tagging.

Example Semantic Representation: In ActivityStreams terms, we might represent the day as an Activity where the actor is the user (digital persona) and the object could be a Note or DataRecord. But since this is personal data not really an action taken, modeling it as just an Object with a custom type is fine. If desired, we could wrap it in an Activity like: "type": "Update", "actor": "PersonasName", "object": { ... above JSON ... }, indicating the persona’s data update. However, this adds complexity with little benefit for now. The straightforward approach is to use the JSON as shown.

Finally, once normalized, the data is inserted into the AI’s long-term memory store. This could mean simply that the JSON files in the vault are considered part of the AI’s knowledge base. If the persona uses a vector store or database for memory, we might additionally store some of this information there (for example, not useful to embed numbers into vectors, but perhaps store a textual summary for semantic searches). Primarily, the AI will rely on the MCP query to get this info on demand, rather than needing all of it in an embedding index.

MCP Server Design and Endpoint Specification

To expose the ingested data to the AI, we implement a local MCP server. According to Anthropic’s spec, an MCP server is essentially a RESTful API that the AI can call for data. Our server will run locally (e.g. http://127.0.0.1:3939 or another port). We define the following endpoints and usage patterns:

  • GET /mcp/myfitnesspal/daily/{date} – Retrieve the normalized MFP data for a specific date. The date can be given as YYYY-MM-DD (we’ll parse it). This returns a JSON object as described above. Example: a GET to /mcp/myfitnesspal/daily/2025-07-11 returns the JSON for that day (including nutrition, water, weight, exercises, etc.). If no data is available for that date (e.g. the user hadn’t joined or logged), the endpoint returns an error or empty result.

  • GET /mcp/myfitnesspal/latest – Returns the latest available entry (e.g. yesterday’s data or the most recent day logged). This is useful for quickly fetching the most recent stats without specifying date. Internally, this could simply read the most recent file or database entry. The response format is the same JSON object structure. We might also include in the JSON a field "date" so the AI knows which date it corresponds to (though it’s already in the content).

  • GET /mcp/myfitnesspal/weight_history?start=YYYY-MM-DD&end=YYYY-MM-DD – (Optional) Returns an array of weight entries between the given dates. Each entry could be a small JSON with date and weight. This allows the AI to fetch, for instance, the past month of weight logs in one call if needed (useful for trend analysis or answering questions like “How has my weight changed in the last month?”). If not needed immediately, we can omit this until an AI use-case arises. Similarly, one could imagine endpoints for nutrient history (e.g. all calories over a period).

  • Query Parameters / Filtering: The endpoints can support simple queries. For example, we could allow /daily?date=2025-07-01 as an alternative to path param. Or allow filtering fields (e.g. ?fields=calories,weight to return only those fields if the AI needs a lighter response). Initially, the AI likely will fetch the whole record and pick what it needs, so this may be unnecessary.

Response Format: All responses are JSON and follow the schema defined. The server will include appropriate HTTP headers (Content-Type: application/json). For MCP, there’s likely an expected pattern where the AI’s request can be in a format like:

{ "server_url": ".../mcp/myfitnesspal/daily/2025-07-11", "query": {} }

  • but since we are designing our own connector, a direct GET is fine. The main idea is the AI (or the persona’s retrieval module) knows the endpoint and how to call it. The MCP spec emphasizes secure, two-way connection, but here the direction is mostly one-way (AI reading data). We can log requests as they come for auditing.

  • Security for MCP Endpoint: Only the local AI should use this. To enforce this:

    • Bind the server to localhost (no external interface).

    • Use an auth token in a header or require a specific custom header the AI includes. Even though on localhost, this prevents other local processes from easily snooping if they don’t have the token.

    • Optionally serve over HTTPS. On localhost, one might skip TLS, but the safest approach per guidelines is to use TLS even internally. We could generate a self-signed cert and have the AI client trust it, or use a local CA. This prevents any MITM even on local network (in case the endpoints are ever exposed beyond one machine).

The MCP server implementation will likely be a small web server code. For example, using FastAPI (Python) we can quickly define these routes. The server can either read directly from the saved JSON files or keep data in memory. Since the data volume is small (one JSON per day), reading from disk on each request is fine. If performance becomes an issue (unlikely for single-user daily data), we can cache the JSON in memory when the ingestion runs.

Example Endpoint Usage:

  • Digital Persona’s retrieval module might call: GET http://localhost:3939/mcp/myfitnesspal/daily/2025-07-11. The server responds with the JSON. The AI then uses that to answer a question or incorporate into its context.

  • If the AI wants the latest data, it calls GET /mcp/myfitnesspal/latest. The server finds the latest date (say 2025-07-12) and returns that entry’s JSON.

  • If the user asks “What was my average protein intake this week?”, the AI could fetch each day via the API (or if we build an aggregate endpoint, call one endpoint). However, since multiple calls are less efficient, a future extension could be to have an endpoint that returns a summary or a week’s data in one go. At first, focusing on per-day access is simpler and aligns with how data is stored (daily files).

MCP Integration Note: In the project’s context, using an MCP server for personal data was exemplified by pulling chat logs from an external service (Limitless AI) on a schedule. Our use of MCP here is analogous: we’ve created a local MCP-compatible API for MyFitnessPal data. This means Digital Persona doesn’t need to know MFP specifics – it simply knows that an MCP source “MyFitnessPal” is available and returns structured health data. This standardization fits the project’s modular design where new data sources can be “plugged in” via MCP without custom prompt engineering for each source. It also keeps the AI’s architecture clean: the AI can request data when needed instead of being overloaded with all data upfront, and the MCP server provides a uniform interface to that data.

Local Data Storage and Memory Integration

All ingested MyFitnessPal data will reside in the user’s personal data vault (on local disk). This vault is essentially part of the Digital Persona’s long-term memory store. By storing daily logs as structured JSON, we ensure the data is transparent and user-editable. The user can review their nutrition logs, correct any errors, or delete entries if desired (for example, if they want to exercise the “right to be forgotten” for certain data). This approach echoes the memory architecture principle that the user should have full insight and control over what the AI remembers.

To protect this sensitive health data, we will:

  • Encrypt Data at Rest: If possible, the data folder should be encrypted. This could be done via full-disk encryption (BitLocker, FileVault, LUKS, etc.), or by the connector encrypting the JSON files individually using a user-provided key. Given the complexity, leveraging OS-level encryption is a practical solution so that when the user’s system is locked, the vault is secure. If implementing file encryption at application level, we’d use a strong symmetric cipher (AES-256) with a key stored in the user’s OS keychain or derived from a passphrase.

  • Backup and Sync: By default, data stays local (which minimizes breach risk). If the user wants backups (e.g. to a personal cloud or device), we can recommend client-side encrypted backups. For instance, the user could keep an encrypted archive of the memory vault on a cloud drive, ensuring that even if the cloud is breached, the data remains unreadable without the key.

  • Integration into AI Memory: The Digital Persona’s memory subsystem will treat these MyFitnessPal entries as part of the knowledge base. Concretely, there are a few ways this integration can happen:

    • On-demand retrieval (pull model): The AI asks for data via MCP as needed. For example, if the AI’s reasoning engine determines a question involves the user’s diet or health, it triggers a call to the /mcp/myfitnesspal/... endpoint to get the facts, which it then incorporates into its response. This is akin to a tool use or function call in the AI’s chain-of-thought.

    • Pre-loading relevant context: If the user is actively in a conversation where health data might be relevant (say discussing fitness or diet), the system could proactively fetch the latest data and include a summary in the AI’s context. For example, it might pull “Today’s nutrition summary” and give it to the AI so it can comment appropriately (“You consumed 2000 kcal today, which is just under your goal.”).

    • Memory Indexing: Over time, the daily logs can be indexed for trends. The persona might not need a vector embedding for pure numeric data, but we could generate a short text summary for each day (e.g. “On 2025-07-11, you ate 1850 kcal and 100g protein, drank 2L water, weight was 78.0kg.”) and index those summaries in a vector store. Then the AI could semantically search if needed (e.g. find when I last ate more than 200g carbs – though a direct query might be easier than semantic search for numeric conditions). For now, indexing can be simple: maybe store an average of each week, etc. The structured storage itself allows programmatic querying without needing ML – a future improvement is to have a query interface where the AI can ask complex questions and the system calculates the answer from the data.

All these ensure that the MyFitnessPal data becomes an integral part of the persona’s memory, enabling personalized, context-aware interactions. For example, the persona could recall “You usually drink about 8 cups of water a day, but yesterday you only had 2 cups; is everything alright?” or answer “In the past week, your average calorie intake was X, compared to Y the week before.” These are powerful capabilities unlocked by integrating the data.

Importantly, data sovereignty is maintained: the user can at any time inspect the memory vault, revoke the AI’s access to it, or delete the data entirely. The AI only knows what’s in these files and only when it queries them, meaning the user has a strong handle on the flow of information. This aligns perfectly with the project’s mission of user-controlled, privacy-first AI personas.

Sample Connector Implementation (Python Code Snippet)

Below is a simplified example of how one might implement the MyFitnessPal connector and MCP server in Python. This code illustrates the key steps: fetching data via python-myfitnesspal, formatting the JSON, and exposing endpoints with FastAPI. (Note: proper error handling and security measures should be added in a real implementation.)

import myfitnesspal
from datetime import date
from fastapi import FastAPI, HTTPException

# Initialize MFP client (uses saved browser cookies for auth)
client = myfitnesspal.Client()  # Uses default cookie jar:contentReference[oaicite:66]{index=66}
app = FastAPI()

def fetch_day_record(day: date):
    """Fetch and normalize MyFitnessPal data for a given date."""
    # Retrieve day data from MyFitnessPal
    mfp_day = client.get_date(day.year, day.month, day.day)

    if mfp_day is None:
        return None  # e.g., if fetch failed or no data

    # Nutrition totals
    totals = mfp_day.totals  # e.g. {'calories': ..., 'protein': ..., ...}:contentReference[oaicite:67]{index=67}

    # Ensure all expected fields are present, fill 0 if missing
    nutrients = {
        "calories_kcal": totals.get("calories", 0),
        "carbohydrates_g": totals.get("carbohydrates", 0),
        "fat_g": totals.get("fat", 0),
        "protein_g": totals.get("protein", 0),
        "sodium_mg": totals.get("sodium", 0),
        "sugar_g": totals.get("sugar", 0)
    }

    # Water intake
    water_cups = mfp_day.water or 0  # number of cups (8 oz):contentReference[oaicite:68]{index=68}
    water_ml = water_cups * 236.588  # convert cups to ml (approx)

    # Exercises
    exercises_list = []

    for ex_cat in mfp_day.exercises:         # iterate categories (cardio/strength):contentReference[oaicite:69]{index=69}:contentReference[oaicite:70]{index=70}
        for ex in ex_cat.get_as_list():      # each exercise as dict
            info = ex["nutrition_information"]
            ex_entry = {"name": ex["name"]}
            if "calories burned" in info:    # cardio exercise:contentReference[oaicite:71]{index=71}
                ex_entry["minutes"] = info.get("minutes")
                ex_entry["calories_burned"] = info.get("calories burned")
            else:                            # strength exercise:contentReference[oaicite:72]{index=72}
                ex_entry["sets"] = info.get("sets")
                ex_entry["reps_per_set"] = info.get("reps/set")

                # convert weight to kg if it's in lbs (assuming user logs in lbs)
                weight_val = info.get("weight/set")
                if weight_val is not None:
                    ex_entry["weight_kg"] = round(weight_val * 0.453592, 2)  # lbs to kg
            exercises_list.append(ex_entry)

    # Weight (if logged on that day)
    weight_val = None

    try:
        weight_data = client.get_measurements("Weight", day, day)  # OrderedDict:contentReference[oaicite:73]{index=73}
        if weight_data and day in weight_data:
            weight_val = weight_data[day]

    except Exception:
        weight_val = None

    weight_kg = round(weight_val * 0.453592, 2) if (weight_val is not None) else None

    # Compile the normalized record

    record = {
        "@context": "https://www.w3.org/ns/activitystreams",
        "type": "FoodDiaryEntry",
        "date": day.isoformat(),
        "nutrition": nutrients,
        "water_ml": round(water_ml),
        "weight_kg": weight_kg,
        "exercises": exercises_list
    }

    return record

# Example of daily job usage:
today = date.today()
yesterday = today.replace(day=today.day-1)
record = fetch_day_record(yesterday)

if record:
    # Here you would save the record to a file, e.g., "2025-07-11.json"
    with open(f"memory_vault/health/myfitnesspal/{record['date']}.json", "w") as f:
        import json; json.dump(record, f)

# MCP API endpoints:

@app.get("/mcp/myfitnesspal/daily/{req_date}")

def get_daily(req_date: str):
    """MCP endpoint to retrieve data for a specific date (YYYY-MM-DD)."""
    try:
        with open(f"memory_vault/health/myfitnesspal/{req_date}.json") as f:
            import json
            return json.load(f)
    except FileNotFoundError:
        raise HTTPException(status_code=404, detail="Date not found")

@app.get("/mcp/myfitnesspal/latest")

def get_latest():
    """MCP endpoint to retrieve the most recent available entry."""
    import os, glob

    files = glob.glob("memory_vault/health/myfitnesspal/*.json")

    if not files:
        raise HTTPException(status_code=404, detail="No data available")
    latest_file = max(files)  # by filename date, assuming YYYY-MM-DD sort
    with open(latest_file) as f:
        import json
        return json.load(f)

Explanation: In this code, fetch_day_record() wraps the library calls: it gets the day object, extracts totals, water, exercises (iterating cardio and strength separately), and weight. It converts units (cups to ml, lbs to kg) and builds the JSON dict. The FastAPI app defines two endpoints: one for specific date and one for the latest entry. In a real deployment, we would start this FastAPI app (for example with Uvicorn) as a service on localhost. We would also include some form of authentication (e.g., an API key required in requests) and HTTPS configuration.

Authentication Edge Case: Because myfitnesspal.Client() relies on browser cookies, if this script is run headless or by an automated scheduler, it must have access to a valid cookie jar. If the scheduler runs under a user account that has a browser session, it will find the cookies; if not, we might need to explicitly specify the cookie jar. The library allows passing a cookiejar parameter to Client for custom scenarios. For instance, we could periodically export cookies from the browser or prompt the user to log in via a script once. In our code, we assume the default behavior is sufficient (the script runs as the logged-in user with browser cookies available).

Saving Credentials: Note we did not use Client(username, password) – this is intentional because direct login is broken by MFP’s anti-bot measures. The reliance on browser cookies is a slight maintenance burden (the user must keep logged in). However, cookies often remain valid for weeks; the script should handle if they expire (which we’d detect by a failure to fetch data). One strategy: if we catch an HTTP 403 or similar, we know the session is invalid. We could then attempt to open a headless browser or instruct the user to refresh. For now, simply logging the error is acceptable, since this is an informed user in control.

Privacy, Security, and Ethical Considerations

Maintaining user privacy and data security is paramount in this integration. We incorporate several safeguards:

  • Local-Only Data Processing: All MyFitnessPal data is fetched and stored locally. No third-party analytics or cloud storages are involved. The design follows a local-first architecture where personal data “resides primarily on the user’s device or in a user-controlled vault rather than on centralized servers”. This dramatically reduces exposure to breaches, as the data never leaves the user’s possession unencrypted.

  • Encrypted Transport: The MCP server can be configured to use TLS even on localhost. This might seem unnecessary on a single machine, but it prevents other processes from eavesdropping via debugging proxies, and is critical if the MCP server were ever accessible over a network. We’d use a self-signed certificate or one issued by the user’s own CA. Additionally, if the Digital Persona AI runs in a separate process, we ensure it communicates with the MCP server over HTTPS with certificate verification, or via an IPC mechanism that is secure. In essence, TLS 1.3 should be enforced for any network transport of this data, treating it with the same care as health data under HIPAA guidelines would be (which call for encryption as best practice).

  • Authentication & Authorization: The MCP server should require a token or key that only the Digital Persona app/agent knows. This could be a static pre-shared token stored in the persona’s config, or a token derived from the user’s authentication to the persona. Since everything is local, a simple implementation is fine (for example, requiring a custom header X-API-Key: <someguid> and the server checks this). This prevents a malicious local program from simply querying the endpoints and reading the user’s diet data. In a scenario where the persona might be part of a larger system or accessed by third-party tools, we would enforce OAuth-like scopes – e.g., only the persona AI is given scope to read health data, and if another app tried, it’d be denied.

  • Data Minimization: We are only collecting data that serves the purpose of the Digital Persona’s functionality (nutrition and fitness info to inform the AI). We are not grabbing extraneous data from MFP like community posts or friend lists. This aligns with the principle of data minimization – collect only what is needed for the use-case (in this case, personal health logs for memory). If, in the future, certain MFP data is not used, we should stop collecting it. For instance, if the user never logs water or we decide water isn’t useful, we could skip it to reduce data stored.

  • Consent and Transparency: The user is initiating this integration and is fully aware of what data is being captured. All data stored is visible in the vault in human-readable form. If the user disables the connector or uninstalls the persona, they can delete these logs. If any summarization or analysis is done on the data, that too can be stored or at least presented to the user for verification (e.g., if the AI generates a summary like “you are eating 20% fat on average,” the raw data supporting that is accessible).

  • Legal Considerations: MyFitnessPal’s terms of service should be respected. The python-myfitnesspal library scrapes the site in a manner akin to how a user would via their browser. We should ensure this usage doesn’t violate their terms (the library has existed for years and is openly available, which suggests it’s generally tolerated for personal use). The account’s data is being accessed by the user themselves (via our tool), and not shared externally, so it likely falls under fair use of their own data. Still, it’s worth checking MFP’s latest terms for any clauses against scraping or automation. If there were concerns, an alternative could be to use official data export (if available) or an API if MFP ever offers one openly. As of now, the private API was not accessible without permission, hence scraping is the workaround.

  • Integrity and Audit: We maintain logs of when data was fetched and perhaps checksums of data files. This can help detect if any data tampering occurred. For example, if an attacker tried to modify the user’s health data (a bizarre scenario, but for completeness), the user could detect it via log discrepancies or checksum failures. Also, should the user have any doubts about the AI’s statements regarding their health, they can cross-verify with the actual MFP app or the JSON logs, ensuring the AI isn’t hallucinating or misrepresenting the data. This fosters trust: the AI’s knowledge about the user can always be audited against the source of truth the user controls.

By addressing these aspects, we ensure that integrating fitness data does not create new privacy vulnerabilities. In fact, it demonstrates the value of the Digital Persona approach – unlike handing data to a cloud AI service, here the user keeps full ownership and the integration is done on the user’s terms.

Future Extensions and Enhancements

This integration opens the door to numerous value-added features that can enhance the Digital Persona’s usefulness in health and wellness contexts. Some ideas for future development:

  • Automated Summary Generation: The connector or the AI can generate daily or weekly summaries of the user’s fitness data. For example, an MCP endpoint /mcp/myfitnesspal/summary/weekly could return something like: “Week of 2025-07-05 to 07-11: Avg calories 1900/day, protein 105g/day; weight decreased 0.5 kg; met water goal 5/7 days.” The AI could present this in conversation, or even proactively comment: “Here’s your weekly nutrition summary.” These summaries could be generated by analyzing the stored data (and we could leverage the AI itself to draft them, given the data – a kind of self-analysis).

  • Goal Tracking and Alerts: If the user has specific goals (e.g. a calorie target of 1800 kcal/day, or a goal to lose 5 kg in 3 months), we can incorporate those into the data model. For instance, we might add a field in each daily record for “calorie_goal” or have a separate config file for goals. The AI can then compare the daily actual vs goal and advise the user. An extension could be an endpoint like /mcp/myfitnesspal/goals to retrieve current goals and progress. The persona, being user-aligned, could use this to encourage or remind the user in a supportive manner – effectively acting as a fitness buddy.

  • Trend Detection & Anomaly Alerts: Over a longer period, the system could detect trends such as “Your average weight has been rising for 4 weeks” or “You consistently consume low protein on weekends”. These insights could be surfaced by the AI. We could implement this via a background analysis that flags patterns and either tag the data (e.g. tag certain weeks as “overeat_week”) or have the AI compute it on the fly. If implementing algorithmically, one might use simple stats or even ML on the time series of data. A specific extension could be integrating with libraries like Pandas or using a small TensorFlow model to predict weight change based on calories (just as an idea). However, even rule-based insights are valuable.

  • Broader Health Context Integration: MyFitnessPal data could be combined with other personal data for richer context. For example, if the user also connects Apple Health or Google Fit data (steps, heart rate, sleep), the persona could correlate these: “On days you sleep < 6 hours, you tend to consume more sugar.” Our connector is a blueprint – additional connectors could feed in sleep data, step counts, etc., all normalized in a similar semantic way (possibly using standards like [HL7 FHIR Observations for health metrics】). The MCP server could host multiple endpoints (e.g. /mcp/applehealth/steps, /mcp/myfitnesspal) or unify them. The current design is modular enough to add more sources without disrupting the AI, since each is just another context source.

  • Interactive Queries via MCP: Right now, the MCP endpoints are fixed to predefined data. In the future, we could implement a query interface where the AI can ask complex questions and get answers. For instance, an endpoint /mcp/myfitnesspal/query where the AI can send a query like {"operation": "average", "field": "protein_g", "since": "2025-07-01", "until": "2025-07-31"} and the server computes the answer (e.g. returns {"average_protein_g": 102.3}). This would offload computation from the AI and ensure accuracy for numeric queries. A more standardized approach could be to integrate a small query language or use existing ones like JSONiq or even SQL via an API. This is speculative, but as the MCP ecosystem grows, tools might emerge to allow such data queries.

Codex-Friendly Implementation Prompt: To streamline development of this connector (especially for those using AI coding assistants like GitHub Copilot or OpenAI Codex), we can craft a prompt that describes the task succinctly. For example, a prompt for an AI code generator could be:

Implement a Python FastAPI server that serves MyFitnessPal data via a Model Context Protocol (MCP) API. Use the python-myfitnesspal library to fetch data.  

Requirements:  
- A function to fetch daily data (calories, carbs, fat, protein, sodium, sugar, water, exercises, weight) for a given date using myfitnesspal.Client.  
- Normalize the data into a JSON with fields: date, nutrition totals, water (ml), weight (kg), exercises list (with name, minutes/calories or sets/reps/weight).  

- Provide GET endpoints:  
    * /mcp/myfitnesspal/daily/{date} -> returns JSON for that date.  
    * /mcp/myfitnesspal/latest -> returns JSON for the most recent date.  

- Secure the endpoints with a token and run on localhost.  
- Handle errors: if data not available or fetch fails, return 404 or error message.  
- Use FastAPI and uvicorn to run the server.  
  • This prompt instructs a GPT-powered IDE to scaffold the code. We’d ensure to include guidance about using the browser_cookie3 (the library uses it automatically) and perhaps an example of parsing exercises. The assistant would then generate code similar to our snippet. We mention this to highlight that the project can leverage AI not just as the end persona, but also in building its components – a meta approach consistent with rapid prototyping.

  • User Interface & Visualization: Down the line, the data could also be presented in a UI for the user (outside the AI). For instance, a simple web dashboard that reads the same JSON files to show graphs of weight over time, calorie intake vs goal, etc. This wasn’t the main ask, but since we are already storing the data nicely, it’s an easy addition. The Digital Persona could even describe or annotate those graphs using its context (an interplay of visual data and commentary).

In conclusion, the MyFitnessPal connector via MCP provides a robust foundation for the Digital Persona to incorporate health and fitness data into its model of the user. By following the above architecture and best practices, we ensure the integration is secure, user-centric, and extensible. The AI will be empowered with factual, up-to-date knowledge of the user’s physical data, enabling more personalized and context-aware interactions – all while respecting the user’s privacy and agency over their own data. This is a concrete step toward a digital twin that truly knows the user’s daily life and can support their goals in an ethical way.

Sources: The approach draws on project guidelines for data ingestion, memory architecture recommendations, and official documentation of the MyFitnessPal Python API for data retrieval (diary totals, exercises, measurements). Security and privacy measures are informed by the project’s ethical implementation papers and the overall mission of a privacy-first personal AI. The MCP concept is based on Anthropic’s open standard for connecting AI to data sources, which this design leverages in a localized context.

⚠️ **GitHub.com Fallback** ⚠️