Concepts - aemadrid/orientdb GitHub Wiki

Main concepts of OrientDB

<wiki:toc max_depth="4" />

Storage

It's the real physical database. It can be:

  • local, where the access is made in the same process
  • remote, by using the network to access a remote storage
  • memory, all data remain in memory without to use the file system at all

A Storage is composed of multiple Cluster and Data Segments. You must move real files in your file system only by using the OrientDB APIs to avoid data corruption.

Cluster

OrientDB uses clusters to store links to the data. A cluster is a very generic way to group records and it is a concept that does not exists in the Relational world. You can use a cluster to group all the record of a certain type, or by a specific value. Example:

  • Use the cluster "Person" to group all the records of type "Person". This approach is similar to the RDBMS where each table is a cluster.
  • Use the cluster "Cache" to group all the records most accessed.
  • Use the cluster "Today" to group all the record created today
  • Use the cluster "!CityCar" to group all the city cars

These are some examples about the clustering concepts. If you have a background from the Relational DBMS world, you can think of a cluster like a table and use it to group all the records by types.

A cluster can be local (physical) or in-memory.

Note: Logical Clusters are not supported anymore since 1.0.

Local Physical Cluster

The cluster is mapped 1-by-2 to files in the underlying File System. The local physical cluster uses two or more files: One or more files with extension "ocl" (OrientDB Cluster) and only one file with extension "och" (OrientDB Cluster Holes).

For example, if you create the "Person" cluster, the following files will be created in the folder that contains your database:

  • person.0.ocl
  • person.och

The first file contains the pointers to the record content in ODA (OrientDB Data Segment). The '0' in the name indicates that more successive data files can be created for this cluster. You can split a physical cluster into multiple real files. This behavior depends on your configuration. When a cluster file is full, a new file will be used.

The second file is the "Hole" file that stores the holes in the cluster that were generated by deleted data.

NOTE : You can move real files in your file system only by using the OrientDB APIs.

In-Memory cluster

The information stored in this kind of cluster is volatile and is never stored on disk. Use this cluster only to work with temporary data. If you need an In-Memory database, create it as an In-memory Database. In-memory databases have only In-memory clusters.

Data Segment

OrientDB uses data segments to store the record content. The data segment behaves similar to the physical cluster files: it uses two or more files. One or multiple files with the extension "oda" (OrientDB DAta) and only one file with the extension "odh" (OrientDB Data Holes).

By default OrientDB creates the first data segment named "default". In the folder that contains your database you will find the following files:

  • default.0.oda
  • default.odh

The first file is the one that contains the real data. The '0' in the name indicates that more successive data files can be created for this cluster. You can split a data segment into multiple real files. This behavior depends on your configuration. When a data segment file is full, a new file will be used.

NOTE: You can move real files in your file system only by using the OrientDB APIs.

Interaction between components: load record use case:

Record

A record is the smallest unit that can be loaded from - and stored into the database.

Record types

There are several types of records.

Document

It's the most flexible record available in OrientDB. It's softly typed. Types are the schema classes with the defined constraints, but can be used also in schema-less mode. It handles fields in a flexible way. A document can be easily imported and exported in JSON format. Example of a Document in JSON format:

    {
      "name": "Jay",
      "surname": "Miner",
      "job": "Developer",
      "creations": [
        { "name": "Amiga 1000",
          "company": "Commodore Inc."
        },
        { "name": "Amiga 500",
          "company": "Commodore Inc."
        }
      ]
    }

OrientDB Documents support complex relationships. From a programmer's perspective this can be seen as a sort of persistent Map<String,Object>.

Flat

Records are strings. No fields are supported, no indexing, no schema.

RecordID

In OrientDB, each record has a unique ID. The RecordID is composed in this way: {{{#[Where:

  • cluster, is the cluster id. Positive numbers mean [#Physical_Cluster physical clusters](:]}}}). Negative numbers mean temporary records, like those used in result set for queries when using projections.
  • position, is the absolute position of the record inside a cluster.

NOTE: After the release 1.0rc4 the prefix character # is mandatory to recognize a RecordID.

The record never looses its identity unless is deleted. Once deleted its identity could be recycled and assigned to a new record. See the Inverse relationships to know more about this.

You can access a record directly by its RecordID. For this reason you don't need to create a field as a primary key like in a Relational DBMS.

Record version

Each record maintains its own version number that is incremented at every update. When a record is created, the version is zero. In optimistic transactions the version is checked in order to avoid conflicts at commit time.

Class

A Class is a concept taken from the Object Oriented paradigm. In OrientDB defines a type of record. It's the closest concept to a Relational DBMS Table. Class can be schema-less, schema-full or mixed.

A class can inherit from another, shaping a tree of classes. Inheritance means that the sub-class extends the parent one, inheriting all the attributes.

Each class has its clusters. A class must have at least one cluster defined (its default cluster), but can support multiple ones. In this case by default OrientDB will write new records in the default cluster, but reads will always involve all the defined clusters.

When you create a new class by default a new physical cluster is created with the same name of the class in lowercase.

Abstract Class

If you know Object Orientation you already know what is an abstract class. For all the rest:

To create a new abstract class look at [Abstract classes are useful to support Object Orientation at 100% without spamming the database with always empty auto-created clusters. NOTE: available since 1.2.0

When to use class or cluster in queries?

Look at this example: you create the class "Invoice" and the 2 clusters "invoice2011" and "invoice2012". This allow to query all the invoices by using the class as target in SQL select:

    SELECT FROM Invoice

If you want to filter per year 2012 and you've create a "year" field in Invoice class do:

    SELECT FROM Invoice where year = 2012

But splitting the Class Invoice in multiple clusters and inserting the invoice in the right cluster, one per year, allows you to reach the same goal using:

    SELECT FROM cluster:invoice2012

This is much faster because OrientDB doesn't need to browse all the cluster but only the right one-

The combination Class/Cluster is very powerful and allows to resolve many use cases.

Relationships

OrientDB supports two kind of relationships: referenced and embedded. OrientDB can manage relationships in a http://code.google.com/p/orient/wiki/Schema#Define_relationships Schema.]] or in Schema-less scenario.

Referenced relationships

Relationships in OrientDB are managed natively without computing costly JOINs as in the Relational DBMSs. In fact OrientDB stores the direct link(s) to the target objects of the relationship. This boost up the load of entire graph of connected objects like in Graph and Object DBMSs. Example:

                      customer
      Record A     ------------->    Record B
    CLASS=Invoice                 CLASS=Customer
      RID=5:23                       RID=10:2

Record A will contain the reference to Record B in the property called "customer". Note that both records are reachable by other records since they have a RecordID.

1-1 and N-1 referenced relationships

This kind of relationships are expressed using the LINK type.

1-N and N-M referenced relationships

This kind of relationships are expressed using the collection of links such as:

  • LINKLIST, as an ordered list of links
  • LINKSET, as an unordered set of links. It doesn't accepts duplicates
  • LINKMAP, as an ordered map of links with key a String. It doesn't accepts duplicated keys

Embedded relationships

Embedded records, instead, are contained inside the record that embeds them. It's a kind of relationship stronger than the reference. It can be represented like the UML Composition relationship. The embedded record will not have an own RecordID, since it can't be directly referenced by other records. It's only accessible through the container record. If the container record is deleted, then the embedded record will be deleted too. Example:

                      address
      Record A     <>---------->   Record B
    CLASS=Account               CLASS=Address
      RID=5:23                     NO RID!

Record A will contain the entire Record B in the property called "address". Record B can be reached only by traversing the container record.

Example:

    SELECT FROM account WHERE address.city = 'Rome'

1-1 and N-1 embedded relationships

This kind of relationships are expressed using the EMBEDDED type.

1-N and N-M embedded relationships

This kind of relationships are expressed using the collection of links such as:

  • EMBEDDEDLIST, as an ordered list of records
  • EMBEDDEDSET, as an unordered set of records. It doesn't accepts duplicates
  • EMBEDDEDMAP, as an ordered map of records as value with key a String. It doesn't accepts duplicated keys

Inverse relationships

Until support for Inverse Relationships is implemented natively, the application developer is responsible for maintinaing their integrity. (See issue [For this reason when a relationship is changed, the developer needs to update the referenced object by hand, removing the back relationship to the original.

Database

A database is an interface to access to the real #Storage Storage.). The database knows all the high-level concepts such as Query, Schema, Metadata, Indexes, etc. OrientDB provides multiple database types. Take a look to the Database types to know more about it.

Each server or JVM can handle multiple database instance, but the database name must be UNIQUE. So you can't manage at the same time 2 databases named "customer" in 2 different paths. To handle this case use the $ (dollar) as separator instead of / (slash). OrientDB will bind the entire name, so it will be unique, but at the file system level it will convert $ with / allowing multiple databases with the same name in different paths. Example:

    test$customers -> test/customers
    production$customers = production/customers

The database must be opened as:

    test = new ODatabaseDocumentTx("remote:localhost/test$customers");
    production =  ODatabaseDocumentTx("remote:localhost/production$customers");

Database URL

OrientDB has its own URL format:

    <engine>:<db-name>

Where:

  • db-name is the database name and depends on the engine used (see below)
  • engine can be:
Engine Description Example
remote The storage will be opened via remote network connection. It requires a OrientDB Server up and running. In this mode, the database is shared among multiple clients. Syntax: remote:<server>:[<port>]/db-name. The port is optional and if not specified is 2480. remote:localhost/petshop
local Direct access via the local File System using the path. In this configuration OrientDB runs as embedded. The database can't be opened by multiple processes (if you want this you have to use "remote" instead). It's the fastest access because it avoids any network connection and transfers local:C:/temp/databases/petshop/petshop
memory Open a database completely in memory memory:petshop

Database usage

The database must always be closed once you've finished working with it.

NOTE: OrientDB automatically closes all opened storages when the process dies softly (not by force killing). This is assured if the Operating System allows a graceful shutdown.

⚠️ **GitHub.com Fallback** ⚠️