Rubex v0.1 goals - SciRuby/rubex GitHub Wiki

Rubex mini proposal

Over the next few weeks Rubex will be made ready for alpha release by adding features that will be actually used for wrapping a real-world Ruby library, called rcsv. You can find the Rubex code that I've written for wrapping rcsv here. The aim is to eventually be able to compile the rcsv Rubex wrapper.

In the next iteration of Rubex, I will start adding support for heap-based memory allocation and deallocation using internal Rubex classes that will allow one to allocate structs and allow them to be collected by the Ruby GC so that the programmer does not need to deal with freeing memory. Support for Ruby-like error handling using begin-rescue-ensure and Ruby blocks will also be added later.

I will elaborate more on these other objectives later.

For this proposal, I will outline the goals that will need to be accomplished in the next few weeks in order to bring Rubex to a level that can be termed worthy of being a (somewhat) production ready language.

Goals:

  • Support for string literals and Ruby-like string interpolation.
  • Get line and column numbers for each symbol and have it printed in the output file.
  • Support for class keyword and inheritance using the < directive.
  • Support for defining class methods using the self keyword.
  • Support for calling methods like [], []=, =, !=, etc. on arbitrary Ruby objects.
  • Make Rubex aware of certain Ruby primitive data types like string, hash and array nby specifying data types of certain variables so that methods called on these types can be optimized using C API calls rather than having to go to Ruby-land and call some method.
  • Support for initializing Arrays, Hashes and String with literals (like [], {} and "").
  • Support for initializing Ruby classes (like StringIO.new).
  • Support for Ruby symbols.
  • Support for binary operators like |, &, etc.
  • Support for C functions with the cdef keyword.
  • Support for passing by reference to C functions (using & for actual parameters and * in formal parameters).
  • Ability to declare C function callbacks as pointers to functions inside structs and formal arguments of functions.
  • Ability to send function pointers to other C functions as callbacks.
  • Support for Ruby-style raise clauses.
  • Support error handling with begin-rescue-ensure.

Particulars:

Get line and column numbers for each symbol and have it printed in the output file.

Currently, when the compiler reads symbols from a file, no information about their location (with regards to file name and line number) is stored. This first milestone will focus on making that happen.

Implementation: Oedipus lex provides the line number and column number of a particular matched element with <> option.

Each statement (or an instance of Rubex::AST::Statement) will have a line_no attribute and a file_name attribute for storing the line number and file name respectively.

Support for string literals and Ruby-like string interpolation.

This feature will add support for string literals using double quotes (eg - "this is a string.") and also allow string interpolation using the Ruby-like #{} syntax within strings. After this milestone the following code will be made possible:

def strings
  char* s = "My name is Ted."

  obj = "My name is also Ted."
  int a = 4
  float b = 5.6

  print "The number a is : #{a}\nThe number b is: #{b}"
  print "Char star says #{s} and obj says #{obj}."

  return obj
end

Support for class keyword and inheritance using the < directive.

Support for the class keyword will allow users to encapsulate methods inside Ruby classes and not be compelled to have them under Object, as was the case before. As is the case in Ruby, class names must start with a capital letter. One will also be able to inherit from custom user-defined classes or built-in Ruby classes which will allow to create custom errors. So after this milestone, the following Rubex code will be made possible:

class Kustom
  def bye
    print "Bye world!"
  end
end

class Klass < Kustom
  def hello
    print "Hello world!"
  end
end

Support for defining class methods using the self keyword.

Once support for Ruby classes has been added, defining class methods using self can also be supported. For now, class methods can also be addded using the self. syntax. Support for class << self will not exist as of now. Following Rubex code can then be written:

class Bhau
  def self.say_what(string)
    int i = 0

    while i < 10
      print string
      i += 1
    end
  end
end

Support for calling methods like [], []= etc. on arbitrary Ruby objects.

Until now it was not possible to get the individual elements of a string or an array. This milestone will focus on the ability to call Ruby methods that look like operators on any Ruby object. It will thus become possible to acquire eleements of an Array or a Hash. Equality will also be supported.

The following code will work after this:

class StringCaller
  def call_now(string)
    print string[0]
    string[2] = "h"
    print string

    return string
  end
end

Make Rubex aware of certain Ruby primitive data types like string, hash and array

One of the problems that exist with C extensions is that method calls to Ruby-land are extremely expensive. Thus, if Rubex is made aware that a particular Ruby object is of a specific type (like a String, Array or Hash), then certain methods called on that object can be directly optimized by Rubex by directly using optimizations from the CRuby C API, instead of going through a Ruby method call.

For example, if a statement like a.size is encountered, and a is a string, Rubex can directly translate this code to RSTRING_LEN(a) instead of rb_funcall(a, rb_intern("size"), 0, NULL). The former is much faster the latter.

The string class in Ruby will be represented in Rubex with the string data type, Hash with hash and Array with array. Code using these will look like this:

class RubyTypes
  def these_types(string str, array arr, hash h)
    str[4] = "a"
    arr.append(44)

    print str
    print arr

    print hash["fff"]
    hash["fff"] = 565
    print hash["fff"]

    return hash
  end
end

Support for initializing Arrays, Hashes and String with literals (like [], {} and "").

This milestone will involve adding support for initialization of hashes, strings and arrays with literals that are familiar to every Ruby developer. Users will also be able to populate these data structures with data using expressions or more literals. Mainly, it will involve adding support for easy initialization of Ruby data structures by implicitly using the CRuby C API.

Example code:

class DataInit
  def init_this(a, b, c)
    arr = [1,2,3,4,5,6]
    str = "Hello world! Lets have a picnic!"
    h = {
      "hello" => arr,
      "world" => 666,
      "message" => str
    }

    print h["hello"]

    return h
  end
end

Support for initializing Ruby classes (like StringIO.new).

This feature will add support for initializing Ruby classes from within Rubex. Whenever an identifier starting with a capital letter is encountered, it will be treated as a Ruby constant and any methods called on that constant will be translated into the corresponding C API calls.

User defined Ruby classes in Rubex can also be initialized in this way.

Example code:

class InitRubyClass
  def init_classes
    a = String.new
    a[0] = "5"

    f = StringIO.new("Hello! This is a test")
    s = f.read

    return s
  end
end

Support for Ruby symbols.

Symbols are an integral part of Ruby and will be supported in this milestone.

Example code:

class RubySymbols
  def symbol_support(a, b)
    hash h = {}
    h[:first] = a
    h[:second] = b

    other_hash = {
      :third => 69
    }
    h[:third] = other_hash

    return h
  end
end

Support for C functions with the cdef keyword.

Until now, all the methods that you could define through Rubex were exposed to Ruby, i.e. they could be directly called from a Ruby script. However, there are some operations for which defining pure C functions (i.e. functions that are only visible to the generated C program) is important, for example, providing functions as callbacks to other functions. Rubex will allow this with the cdef keyword.

The user will have to specify the return type of the method if they use cdef. The scope of a cdef function will be local to the class that it has been defined in. If a class inherits from another class, the cdef functions will be inherited as well. These functions will not be callable from a Ruby script. Moreover, pure Ruby methods will be able to call these functions just like any other function.

Example code:

class CFunctions
  def pure_ruby_method
    a = 55
    b = 5.43
    int c = first_c_function(a, b)

    return a + c
  end

  cdef int first_c_function(int a, float b)
    int c = a + 5
    int d = (c * b + 3)/5

    return c - d
  end
end

Support for passing by reference to C functions

One of the tenets of C programming is the ability to pass arguments by reference to other functions. Rubex will support this with a syntax similar to C. Users can declare formal arguments of C functions as pointers with the * operator (the way they do for any other data type) and can pass the address of a variable using &.

Rubex will not support the -> operator for denoting elements of a struct pointer. Instead, users will first have to dereference the struct using [0] (like choosing the 0th element of an array of structs starting from that particular pointer position) and then use . for referring to elements inside a struct.

Example code:

struct attribs do
  int a, b
  float c
end

class CallByReference
  def reference_call
    attribs a
    int b_flat = 460
    a.a = 56
    a.b = 65
    a.c = 23

    c_function(&a, b_flat)

    return a.a
  end

  cdef void c_function(attribs *a, int b)
    a[0].a = b
  end
end

Ability to send function pointers to other C functions as callbacks.

This functionality will allow users to send C functions defined by cdef as pointers inside other functions so that they can be used as callbacks later. This functionality is crucial for wrapping many modern C libraries and hence will be supported in the next immediate release.

Example:

TODO

Support for Ruby-style raise clauses and support error handling with begin-rescue-ensure.

This will be a very critical functionality in Rubex for throwing errors and dealing with them in a manner that is similar to the way Ruby deals with errors. It will build error handling right into Rubex and much work will not be necessary on part of the programmer for handling exceptions in C extensions.

Support for this functionality will be put in the future and will not be a part of the release that will contain all of the above functionality.