Advanced features - HaxeFoundation/hashlink GitHub Wiki

While HashLink is a high level virtual machine, it is highly compatible with C code as it follows C standard for data layout and functions calls.

Structs

One thing that HashLink does is that each class instance starts with hl_type* type information that allows to know what object this pointer represents, do reflection, etc.

But if you want to directly be able to call C functions from HashLink you might need to be able to define structures like C struct. This can be done by using the @:struct Haxe metadata and as a result it will omit the type information at the beginning of the data.

So the following Haxe struct:

@:struct class Point {
   var x : Float;
   var y : Float;
}

Is exactly equivalent to C:

struct _Point {
  double x;
  double y;
};
typedef struct _Point *Point;

Please note that struct are still always passed as reference and not as value to function calls so it's always a pointer on the struct in terms of C.

As a consequence, when a struct needs to be passed as a Dynamic or stored into an Array, a small vdynamic* wrapper will be allocated each time in order to store both the data and the type information.

Packed struct

When wrapping some C/C++ data you will sometimes have structures that are not pointers, but you might want to keep the semantics, for example:

struct _Point {
  double x;
  double y;
};
struct _Line{
   struct _Point start;
   struct _Point end;
};

In order to be able to define such type in Haxe/HL, you can use @:packed metadata which will "pack" the struct into its parent object:

@:struct class Point {
    public var x : Float;
    public var y : Float;
}
@:struct class Line {
    @:packed public var start(default,never) : Point;
    @:packed public var end(default,never) : Point;
}

Please note that you cannot set a packed struct, as that would mean copying the struct by value, which is currently not supported. Also, when on the Haxe side, you read a packed struct, remember you'll get a pointer inside your object, which means you should retain a pointer to that object so it doesn't get garbage collected, and your pointer inside it becomes invalid.

C Array

Another commonly encountered C data structure is a pointer of memory aligned structs:

struct _Point {
  double x;
  double y;
};
struct _Point points[256];

In Haxe if you create an Array of objects, this will give you an Array of pointers on these objects, but in Haxe/HL you can use hl.CArray to directly allocate several of these structs or objects:

@:struct class Point {
    public var x : Float;
    public var y : Float;
}
class PointMap {
    var points : hl.CArray<Point>;
    public function new() {
        points = hl.CArray.alloc(Point, 256);       
    }
}

C-Array is very performant, but to use with care:

there is not any null check or bounds checks performed
an array access is just a memory offset, we don't access any memory
you can allocate a lot of objects with a single GC allocated pointer, so it helps the Garbage Collector instead of scanning sometimes thousands of separate small objects
similar to @:packed, you need to keep a reference on your CArray while you still have any live reference to one of its values, or it might be garbage collected and thus your values will be overwritten later by another allocation

C Array support requires Haxe 4.4+ with -D hl-ver=1.14.0

Prefetch

One of the most common cause of bad CPU performance these days is not due to the CPU performing too much calculus, but instead to latency due to loading the data from the DRAM to your CPU cache. In some cases - when you access the memory sequentially for example - the CPU will be able to fetch ahead the data, resulting in low latency. But if you access some random or less ordered data, your CPU will often stall on each new memory access, making the whole code much slower than it should be.

The performance can be significantly improved by doing manual prefetch. This consists to fetch-ahead some memory that you know will be required later, in order to give time for the CPU to fetch it asynchronously so there's more chance it will be available in the cache when you need it.

For example let's say you have an Array of randomly sorted objects, if you do that while they are not in CPU cache, you will get some DRAM locks:

var sum = 0.;
for( obj in array )
   sum += obj.complexCalculus(); // potential DRAM lock on obj fields accesses

Instead, you can choose to prefetch them ahead, then perform the calculus:

for( obj in array ) {
   (untyped $prefetch)(obj.field1,2);
   (untyped $prefetch)(obj.field2,2);
}
var sum = 0.;
for( obj in array )
   sum += obj.complexCalculus(); // potential DRAM lock on obj fields accesses

Please note that:

you can tell which memory line to prefetch by doing $prefetch(obj,mode) or $prefetch(obj.field,mode), the cache line (usually 64 bytes) that contain this address will be fetched.
if your calculus is not complex enough, or if your objects are already in the CPU cache, adding prefetch and doing another loop will decrease performances
mode can be 0/1/2 for the cpu cache Level0/1/2, mode 3 is the NTA hint prefetch to avoid cache polution and mode 4 is the prefetch-write to tell you will write the memory afterwards
you should use Intel VTune (see below) prior to perform any DRAM/prefetch optimization

Prefetch support requires Haxe 4.4+ with -D hl-ver=1.14.0

Intel VTune support

Starting from HL 1.14, the JIT now supports Intel VTune, allowing you to get very fine grained low level performance information for your Haxe/HL code. Please note that for high level performance measurement and optimization you should instead use the Sampling Profiler

Simply start your HL code from VTune, then you will be able to get this kind of detailed report in Microarchitecture Exploration mode: