How structured is your data? - KeynesYouDigIt/Knowledge GitHub Wiki

In my opinion, asking if data is "structured" is less like asking "what type of meat is this?" and much more like asking "how cooked is this meat?"

Purely unstructured data is data with no formal model that it has been forced to conform to.

The difficulty is, data is often "implicitly structured". Per wikipedia,

The term is imprecise for several reasons:

  • Structure, while not formally defined, can still be implied.
  • Data with some form of structure may still be characterized as unstructured if its structure is not helpful for the processing task at hand.
  • Unstructured information might have some structure (semi-structured) or even be highly structured but in ways that are unanticipated or unannounced.

For Data Engineering, the definition is a bit easier to approach. Do you have a schema that the data has been forced to adhear to? then it's structured. If not, its unstructured.

This is why we often keep unstructured data in a Data Lake then move structured data to a Data Warehouse using a structure that makes sense for our current purposes - data structures should be prepared for iteration and change as the project evolves. This is of course a spectrum - many data points can have unstructured pieces.

class User:

    # Highly structured user data
    user_id: uuid
    first_name: str
    number_of_points: int
    dollars_in_wallet: float

    # Semi-structured user data
    special_tag_cloud: Dict[str, Any]

    # (almost) unstructured user data
    # (almost because its still tied to a user
    special_data: Any

All data can be "structured", but in structuring any data we risk loosing any of it that does not conform to the schema.