Concepts - aemadrid/orientdb GitHub Wiki
<wiki:toc max_depth="4" />
It's the real physical database. It can be:
- local, where the access is made in the same process
- remote, by using the network to access a remote storage
- memory, all data remain in memory without to use the file system at all
A Storage is composed of multiple Cluster and Data Segments. You must move real files in your file system only by using the OrientDB APIs to avoid data corruption.
OrientDB uses clusters to store links to the data. A cluster is a very generic way to group records and it is a concept that does not exists in the Relational world. You can use a cluster to group all the record of a certain type, or by a specific value. Example:
- Use the cluster "Person" to group all the records of type "Person". This approach is similar to the RDBMS where each table is a cluster.
- Use the cluster "Cache" to group all the records most accessed.
- Use the cluster "Today" to group all the record created today
- Use the cluster "!CityCar" to group all the city cars
These are some examples about the clustering concepts. If you have a background from the Relational DBMS world, you can think of a cluster like a table and use it to group all the records by types.
A cluster can be local (physical) or in-memory.
Note: Logical Clusters are not supported anymore since 1.0.
The cluster is mapped 1-by-2 to files in the underlying File System. The local physical cluster uses two or more files: One or more files with extension "ocl" (OrientDB Cluster) and only one file with extension "och" (OrientDB Cluster Holes).
For example, if you create the "Person" cluster, the following files will be created in the folder that contains your database:
- person.0.ocl
- person.och
The first file contains the pointers to the record content in ODA (OrientDB Data Segment). The '0' in the name indicates that more successive data files can be created for this cluster. You can split a physical cluster into multiple real files. This behavior depends on your configuration. When a cluster file is full, a new file will be used.
The second file is the "Hole" file that stores the holes in the cluster that were generated by deleted data.
NOTE : You can move real files in your file system only by using the OrientDB APIs.
The information stored in this kind of cluster is volatile and is never stored on disk. Use this cluster only to work with temporary data. If you need an In-Memory database, create it as an In-memory Database. In-memory databases have only In-memory clusters.
OrientDB uses data segments to store the record content. The data segment behaves similar to the physical cluster files: it uses two or more files. One or multiple files with the extension "oda" (OrientDB DAta) and only one file with the extension "odh" (OrientDB Data Holes).
By default OrientDB creates the first data segment named "default". In the folder that contains your database you will find the following files:
- default.0.oda
- default.odh
The first file is the one that contains the real data. The '0' in the name indicates that more successive data files can be created for this cluster. You can split a data segment into multiple real files. This behavior depends on your configuration. When a data segment file is full, a new file will be used.
NOTE: You can move real files in your file system only by using the OrientDB APIs.
Interaction between components: load record use case:
A record is the smallest unit that can be loaded from - and stored into the database.
There are several types of records.
It's the most flexible record available in OrientDB. It's softly typed. Types are the schema classes with the defined constraints, but can be used also in schema-less mode. It handles fields in a flexible way. A document can be easily imported and exported in JSON format. Example of a Document in JSON format:
{
"name": "Jay",
"surname": "Miner",
"job": "Developer",
"creations": [
{ "name": "Amiga 1000",
"company": "Commodore Inc."
},
{ "name": "Amiga 500",
"company": "Commodore Inc."
}
]
}
OrientDB Documents support complex relationships. From a programmer's perspective this can be seen as a sort of persistent Map<String,Object>.
Records are strings. No fields are supported, no indexing, no schema.
In OrientDB, each record has a unique ID. The RecordID is composed in this way: {{{#[Where:
- cluster, is the cluster id. Positive numbers mean [#Physical_Cluster physical clusters](:]}}}). Negative numbers mean temporary records, like those used in result set for queries when using projections.
- position, is the absolute position of the record inside a cluster.
NOTE: After the release 1.0rc4 the prefix character # is mandatory to recognize a RecordID.
The record never looses its identity unless is deleted. Once deleted its identity could be recycled and assigned to a new record. See the Inverse relationships to know more about this.
You can access a record directly by its RecordID. For this reason you don't need to create a field as a primary key like in a Relational DBMS.
Each record maintains its own version number that is incremented at every update. When a record is created, the version is zero. In optimistic transactions the version is checked in order to avoid conflicts at commit time.
A Class is a concept taken from the Object Oriented paradigm. In OrientDB defines a type of record. It's the closest concept to a Relational DBMS Table. Class can be schema-less, schema-full or mixed.
A class can inherit from another, shaping a tree of classes. Inheritance means that the sub-class extends the parent one, inheriting all the attributes.
Each class has its clusters. A class must have at least one cluster defined (its default cluster), but can support multiple ones. In this case by default OrientDB will write new records in the default cluster, but reads will always involve all the defined clusters.
When you create a new class by default a new physical cluster is created with the same name of the class in lowercase.
If you know Object Orientation you already know what is an abstract class. For all the rest:
- http://en.wikipedia.org/wiki/Abstract_type
- http://docs.oracle.com/javase/tutorial/java/IandI/abstract.html In few words is a class can't have instances and it's used, usually, as base class to extend by concrete classes.
To create a new abstract class look at [Abstract classes are useful to support Object Orientation at 100% without spamming the database with always empty auto-created clusters. NOTE: available since 1.2.0
Look at this example: you create the class "Invoice" and the 2 clusters "invoice2011" and "invoice2012". This allow to query all the invoices by using the class as target in SQL select:
SELECT FROM Invoice
If you want to filter per year 2012 and you've create a "year" field in Invoice class do:
SELECT FROM Invoice where year = 2012
But splitting the Class Invoice in multiple clusters and inserting the invoice in the right cluster, one per year, allows you to reach the same goal using:
SELECT FROM cluster:invoice2012
This is much faster because OrientDB doesn't need to browse all the cluster but only the right one-
The combination Class/Cluster is very powerful and allows to resolve many use cases.
OrientDB supports two kind of relationships: referenced and embedded. OrientDB can manage relationships in a http://code.google.com/p/orient/wiki/Schema#Define_relationships Schema.]] or in Schema-less scenario.
Relationships in OrientDB are managed natively without computing costly JOINs as in the Relational DBMSs. In fact OrientDB stores the direct link(s) to the target objects of the relationship. This boost up the load of entire graph of connected objects like in Graph and Object DBMSs. Example:
customer
Record A -------------> Record B
CLASS=Invoice CLASS=Customer
RID=5:23 RID=10:2
Record A will contain the reference to Record B in the property called "customer". Note that both records are reachable by other records since they have a RecordID.
This kind of relationships are expressed using the LINK type.
This kind of relationships are expressed using the collection of links such as:
- LINKLIST, as an ordered list of links
- LINKSET, as an unordered set of links. It doesn't accepts duplicates
- LINKMAP, as an ordered map of links with key a String. It doesn't accepts duplicated keys
Embedded records, instead, are contained inside the record that embeds them. It's a kind of relationship stronger than the reference. It can be represented like the UML Composition relationship. The embedded record will not have an own RecordID, since it can't be directly referenced by other records. It's only accessible through the container record. If the container record is deleted, then the embedded record will be deleted too. Example:
address
Record A <>----------> Record B
CLASS=Account CLASS=Address
RID=5:23 NO RID!
Record A will contain the entire Record B in the property called "address". Record B can be reached only by traversing the container record.
Example:
SELECT FROM account WHERE address.city = 'Rome'
This kind of relationships are expressed using the EMBEDDED type.
This kind of relationships are expressed using the collection of links such as:
- EMBEDDEDLIST, as an ordered list of records
- EMBEDDEDSET, as an unordered set of records. It doesn't accepts duplicates
- EMBEDDEDMAP, as an ordered map of records as value with key a String. It doesn't accepts duplicated keys
Until support for Inverse Relationships is implemented natively, the application developer is responsible for maintinaing their integrity. (See issue [For this reason when a relationship is changed, the developer needs to update the referenced object by hand, removing the back relationship to the original.
A database is an interface to access to the real #Storage Storage.). The database knows all the high-level concepts such as Query, Schema, Metadata, Indexes, etc. OrientDB provides multiple database types. Take a look to the Database types to know more about it.
Each server or JVM can handle multiple database instance, but the database name must be UNIQUE. So you can't manage at the same time 2 databases named "customer" in 2 different paths. To handle this case use the $ (dollar) as separator instead of / (slash). OrientDB will bind the entire name, so it will be unique, but at the file system level it will convert $ with / allowing multiple databases with the same name in different paths. Example:
test$customers -> test/customers
production$customers = production/customers
The database must be opened as:
test = new ODatabaseDocumentTx("remote:localhost/test$customers");
production = ODatabaseDocumentTx("remote:localhost/production$customers");
OrientDB has its own URL format:
<engine>:<db-name>
Where:
- db-name is the database name and depends on the engine used (see below)
- engine can be:
Engine | Description | Example |
---|---|---|
remote | The storage will be opened via remote network connection. It requires a OrientDB Server up and running. In this mode, the database is shared among multiple clients. Syntax: remote:<server>:[<port>]/db-name . The port is optional and if not specified is 2480. |
remote:localhost/petshop |
local | Direct access via the local File System using the path. In this configuration OrientDB runs as embedded. The database can't be opened by multiple processes (if you want this you have to use "remote" instead). It's the fastest access because it avoids any network connection and transfers | local:C:/temp/databases/petshop/petshop |
memory | Open a database completely in memory | memory:petshop |
The database must always be closed once you've finished working with it.
NOTE: OrientDB automatically closes all opened storages when the process dies softly (not by force killing). This is assured if the Operating System allows a graceful shutdown.