Active Issue: Relating Source - STIXProject/specifications GitHub Wiki
Consensus
Information identifying and characterizing sources of CTI information should be broken out into a separate "top level" Source construct rather than embedded within each "top level" construct.
Open Questions
How should the relationship between a "top level" construct instance and its Source be asserted?
Questions to consider
- Should Source follow the "one way to do things" with relationships or should it be an exception to the rule?
- Is Source a key CTI object or only metadata?
- Should there be a distinction between the producer of the STIX and the source of the content?
- If so, how should that distinction be conveyed?
- How do we deal with anonymous sources?
- Separate Source object each time an anonymous source is asserted or one general anonymous Source object that is related to for each anonymous source assertion?
- How do we deal with deanonymizing an anonymous source?
- How do we deal with third party source assertions?
- How do we deal with complex source chains (e.g., Z sends me STIX that is a translation of STIX produced by Y that was a STIX codification of information created by X)?
- How do we deal with uncertainty/confidence on source assertions?
- How important is bandwidth efficiency?
- What are the best approaches for dealing with the issue?
Proposal #1
Follow the "one way of doing things" for relationships and assert source relationships for all "top level" construct instances using the Relationship object with a relationship nature of "Has Source".
Strong assertion that Source is a key CTI object and not simply metadata.
Advantages:
- Consistency (one way of doing relationships)
- Treats Source as a key CTI object and allows its characterization and correlation like any other CTI object
- Inherently graph-based to support analysis
- Enables assertions for both producer of the STIX itself and the creator of the content itself
- In large majority of cases they will be the same and this approach allows them to be asserted consistently
- Enables support for anonymous sources and for deanonymizing sources
- Supports third party source assertions
- Inherently supports complex source chains in a consistent fashion
- Allows assertion of confidence for any source assertion
- When same exact content received from multiple sources, allows you to characterize (with confidence) them separately
- Supports more flexible pivoting on Source
Disadvantages:
- Could result in more verbose content (a few extra lines for the "Has Source" relationship of each construct).
- Can be mitigated by a many-to-one relationship for "Has Source" which would offer the most efficient representation available.
Examples
Example #1: simple indicator with attributed source for the information
{
"id": "example:src-83dc6b53-ac3d-40e0-82ef-eab173c7ee1e",
"type": "source",
"timestamp": "2015-12-21T19:59:11Z",
"name": "US-CERT"
}
{
"id": "example:ind-b8e37090-5d62-45a1-ac2e-a88601b08432",
"type": "indicator",
"timestamp": "2015-12-21T19:59:11Z",
"title": "Sakurel Malware",
"indicator_expression": "this would be an observable pattern for a particular file hash using the new CybOX patterning language under consideration",
"indicator_type": ["File Hash Watchlist"]
}
{
"id": "example:rel-9d0c539e-a874-42c7-a055-3e900b98724f",
"type": "relationship",
"timestamp": "2015-12-21T19:59:12Z",
"from": "example:ind-b8e37090-5d62-45a1-ac2e-a88601b08432",
"to": "example:src-83dc6b53-ac3d-40e0-82ef-eab173c7ee1e",
"relationship_nature": "Has Source"
}
Proposal #2 (from TWIGS proposal)
We believe that the "producer" of a STIX object is a distinct data point from the sources of information used for the analysis. We believe that there's value in understanding which single STIX producer created an object and is responsible for it separately from understanding the sources that the producer used in creating it.
In our approach, the producer is tracked via a direct reference from every top-level object. This reference would be included as an optional field from all objects, including relationship
. We've tentatively named this field created_by_ref
.
We're open to various options for how to capture bibliography-style information (sources used to generate the STIX object), but do have a proposal as listed in "Other Proposals" below.
Short example:
{
"type": "indicator",
"id": "indicator--26ffb872-1dd9-446e-b6f5-d58527e5b5d2",
"title": "Some indicator",
"created_by_ref": "identity--8ae20dde-83d4-4218-88fd-41ef0dabf9d1",
"otherindicatorthings":"..."
}
In the example above, if this indicator were created by MITRE, then the created_by_ref
field would point to the Source Object for MITRE, with the ID identity--8ae20dde-83d4-4218-88fd-41ef0dabf9d1
. Note that, unlike in STIX 1.2, this Source object would be re-usable and would primarily capture Identity information (as it is in Sean's proposal).
Features
Assured non-ambiguity
Using a created_by_ref
direct reference is a non-ambiguous way of saying who is responsible for publishing and updating that STIX object. With the relationship approach, that immediately becomes ambiguous and complicated due to the potential for multiple producers and low confidence producers. While the real source of information may be ambiguous and complicated, the STIX object creator should not be. This allows us to do things like mandate that only STIX Object creators can update content and better track responsibility in the ecosystem.
Object Creator can be anonymous if they want
The created_by_ref
field is optional. This was done on purpose to allow Organizations or Individuals to be anonymous if they wish. This was an important use case requested by governmental organizations and some more secretive groups who wish to be able to provide information, but don't want the general populace to know who they are.
Easier to record who created relationships
A direct reference also avoids chains of "source" relationships. For example, if I issue an indicator and then issue a relationship object saying that I created the indicator, how do I indicate that I also created the relationship object? Do I need to have another "source" relationship saying that I'm the source of the first source relationship? Or do we assume that "source" relationships have a source of whoever they point to, which is inconsistent?
Having it as a direct reference in the TLO ensures there's a single, concise, way to do that across all TLOs - even relationships
More compatible with future digital signatures
This proposal will help when we start to cryptographically check for tampering of STIX Objects in the future. Using a created_by_ref
direct reference also ensures that the creator of a STIX Object is included in the HMAC for the Object. This means we will be able to tell if someone has tampered with that relationship. When the content is passed around as a block you can understand who created it and be assured that it's accurate when it is signed by the creator.
Avoids superfluous relationship objects
We feel that relationship object should be reserved to represent relationships between objects in the cyber threat domain. You can use relationships to represent everything, it doesn't mean you should. Using it to represent who created a given STIX construct is beyond that purpose.
It also simply avoids either a high volume of extra relationships (an additional one for each TLO) or having a relationship with multiple target nodes. While a relationship with multiple targets is easy to represent in a serialization, handling that in code can become very tricky and should be avoided.
Helps prevent false ownership claims
This approach also makes it harder for another party to claim ownership of an existing construct. For example, if I issue an indicator I would say that I created it via issuing a relationship. What if you issue another relationship saying that you actually created that indicator? How should a consumer evaluate that? Having the source directly embedded in the object mitigates this by requiring an object update to change the source in the object itself, which can more easily be detected and evaluated.
Disadvantages:
- Less flexibility with confidence for the references to the sources of information used within the STIX object with the current proposal.
- Can be mitigated by replacing the references array with relationship based model if the community decides it requires the confidence.
Full Example
"identities": [
{
"type": "identity",
"id": "identity--8ae20dde-83d4-4218-88fd-41ef0dabf9d1",
"name": "mitre.org"
},
{
"type": "identity",
"id": "identity--18d129bb-71f5-4d58-a8c0-19c1976c2f56",
"name": "fireeye.com"
}
],
"indicators": [
{
"type": "indicator",
"id": "indicator--26ffb872-1dd9-446e-b6f5-d58527e5b5d2",
"created_by_ref": "information-source--8ae20dde-83d4-4218-88fd-41ef0dabf9d1",
"timestamp": "2015-12-21T19:59:11Z",
"references": [
{
"url": "http://fireeye.com/APT1",
"analysis_by_ref": "identity--18d129bb-71f5-4d58-a8c0-19c1976c2f56",
}
],
"title": "Sakurel Malware",
"indicator_expression": "this would be an observable pattern for a particular file hash using the new CybOX patterning language under consideration",
"indicator_type": ["File Hash Watchlist"]
}
]
}
Related TWIGS Proposals
Create an Identity Object
We feel that the InformationSourceType
in STIX 1.2 should be constricted to simply Identity information. Removing references and other information ensures that these source objects are re-usable across constructs.
Capturing Original Source Information
To record the underlying sources of information used to create the STIX Object, we propose having an embedded list of references, that the object creator uses to describe the references they used while creating the STIX Object.
e.g.
{
"type": "indicator",
"id": "indicator--26ffb872-1dd9-446e-b6f5-d58527e5b5d2",
"title": "Some indicator",
"created_by_ref": "identity--8ae20dde-83d4-4218-88fd-41ef0dabf9d1",
"references": [
{
"url": "http://fireeye.com/APT1",
"created_by_ref": "identity--18d129bb-71f5-4d58-a8c0-19c1976c2f56",
}
],
"otherindicatorthings":"..."
}
In the example above, we can see that this Indicator Object has a url pointing to the location of the document, and an created_by_ref
field (in the reference) that points directly to the source identity--18d129bb-71f5-4d58-a8c0-19c1976c2f56
, which would be the ID of Fireeye.
The exact approach of how to store references would still need to be discussed and confirmed, though we do have a ROUGH diagram showing how the relationships would work when tracking who made the object, where the information used in the object came from: https://docs.google.com/drawings/d/1IfU0u_5y2ZbyEbmrLIo5nXgcBX-wSP3ssQjAB8-9iks/edit?usp=sharing
We're open to the idea that the reference producers could be done via a relationship if the community felt that was better.