Xaraya M.R. van der Boom 
Request for Comments: 0045 Xaraya Development Group 
Category: Best Current Practice April 2003 

RFC-0045: Multi database considerations

Status of this Memo

This document specifies a Xaraya Best Current Practices for the Xaraya Community, and requests discussion and suggestions for improvements. Distribution of this memo is unlimited.

Copyright Notice

Copyright © The Digital Development Foundation (2003). All Rights Reserved.

Abstract

1 This RFC documents the considerations we take into account for supporting multiple databases. Each database has its own peculiarities and specific requirements. The document will be helpful to developers to ensure the code runs with all supported databases.


Table of Contents


1. Introduction

2 For each database we support there is a section in this document containing information which needs to be taken into account writing portable code for all supported databases.

3 Also some dedicated sections will be written for special situations.


2. Overall consideration

1. Naming Rules
Since ADODB supports many databases, database objects (tables, rows, indexes, etc.) must be restricted to no more than twenty-six (26) characters. An object must start with a letter or an underscore; the rest of the string can contain letters, digits, and underscores. Special characters are not permitted.
2. Default Values
When a non-null string column is created in MySQL, the default value for that column will be '' if not specified. For other databases, such as Oracle and PostgreSQL, the default value will be NULL. If you do not specify a default value in Oracle or PostgreSQL and try to inserting NULL values in non-NULL columns, the database will give errors.
To resolve this problem, always set a default value in the table definition when creating a table. All NOT NULL columns must have an explicit default value declared unless you can be sure that column will always be explicitly set when the table is updated.
3. Reserved Words
Every database reserves certain words and you cannot use those words to name objects. Because ADODB supports a variety of databases, it is important not to use a generic name on an object (e.g. "type"). Always include the prefix "xar_" on column names and indexes.
A complete list of the reserved words for each database can be found at:
4. Index Names
Index names must be unique across all Xaraya modules. To guarantee that an index name will be unique, the index should include the site table prefix returned from xarDBGetSiteTablePrefix(), module name or abbreviated module name, table name and/or column name.
For example:
          $index = array('name'   => 'i_' . xarDBGetSiteTablePrefix() . '_cat_cid',
                         'fields' => array('xar_cid'),
                         'unique' => FALSE);

          Creates the non-unique index "i_xar_cat_cid"
          
          
5. Looping Over Results
Correctly loop over the result set returned from the database. Otherwise the internal cursor may return a database error.
Incorrect:
          if (!$result->EOF) {
            while(!$result->EOF) {
              list($value1, $value2) = $result->fields;
              $result->MoveNext();
              processValues($value1, $value2);
              ...
            }
          }
          
          
Correct:
          while(!$result->EOF) {
            list($value1, $value2) = $result->fields;                 
            processValues($value1, $value2);
            ...
            $result->MoveNext();
          }
          
          
6. Binary Data (BLOB)
They require special treatment for several databases. Try to avoid them where possible.
7. Table and Column Aliases
Never use AS for table aliases - always use AS for column aliases (for now).
Example:
          SELECT a.myverylongfieldname AS afield1, b.myotherlongfield AS bfield2, COUNT(*) AS mycount
          FROM myverylongtablename a, myotherlongtablename b [Oracle constraint]
          WHERE afield1 = ...
          GROUP BY a.myverylongfieldname, b.myotherlongtablename [SQL Server constraint]
          ORDER BY mycount DESC
          
          

3. MySQL

1. Group By
In MYSQL, you can use columns or calculations in the SELECT expressions that don't appear in the GROUP BY part. This is not true for other databases. When writing a SELECT using GROUP BY, you must include the columns or calculations in the GROUP BY that appear in the SELECT expression.
In general GROUP BY is used with group functions, such as MAX, MIN, SUM and COUNT.
For example:
          SELECT order.custid, customer.name, SUM(order.value) AS total_value
          FROM order, customer
          WHERE order.custid = customer.custid
          GROUP BY order.custid, customer.name;
          
          
2. Multi-table DELETE
Multi-table DELETE is supported starting from MySQL 4.0. However, neither PostgreSQL nor Oracle support this functionality. DELETE syntax should delete rows from only one table.

4. PostgreSQL

1. Sequences on Module Initialization
PostgreSQL sequences are not created when a table is created on module initialization but are created when the sequence is first called by the ADODB function GenID(). If a module needs to insert initial records into a table on initialization, the initial value of the sequence must be determined by calling GenID(). You cannot INSERT the initial rows into the table as an error will occur because the sequence value returned from GenID() will be incorrect.
2. SELECT DISTINCT with ORDER BY
The results of a SELECT can be returned in sorted order or with duplicate rows removed but the ORDER BY columns must appear in SELECT DISTINCT target list.
For example:
          SELECT DISTINCT firstname, lastname, address
          FROM customer
          ORDER BY lastname;
          
          
3. HAVING Clauses
You cannot use column aliases for expressions in HAVING clauses, but you can use that same expresssion in the HAVING clause.

5. Oracle

1. Outer Joins
Not all databases support outer joins. Furthermore the syntax for outer joins differs dramatically between database vendors. One portable (and possibly slower) method of implementing outer joins is using UNION:
For example an ANSI-92 left outer join between two tables t1 and t2 could look like:
          SELECT t1.col1, t1.col2, t2.cola
          FROM t1 
          LEFT JOIN t2 ON t1.col = t2.col
          
          
This can be re-written using UNION:
          SELECT t1.col1, t1.col2, t2.cola 
          FROM t1, t2
          WHERE t1.col = t2.col 
          UNION ALL
              SELECT col1, col2, null 
              FROM t1
              WHERE t1.col NOT IN (SELECT DISTINCT col FROM t2)
          
          
2. Binary Data (BLOB)
You need to use UpdateBlob() to insert data into a binary field (cfr. xarSession__phpWrite). Try to avoid binary fields and use text fields where possible.
3. Text Data (TEXT)
You may need to TO_CHAR text fields when mixing data types (cfr. Dynamic_VariableTable_DataStore). No cross-database recommendations yet.

6. MS SQL Server

1. IDENTITY
IDENTITY fields cannot be used - use the normal Xaraya GenID() approach (cfr. ADODB xarmssql driver).
2. GROUP BY Fields
You cannot use column aliases for expressions in GROUP BY fields, but you can use that same expresssion as a GROUP BY field.
3. Data Types
Numeric values cannot be inserted directly in varchar/text fields. Use bind variables and type casting (cfr. xarVar__SetVarByAlias and modules_init()).
4. Binary Data (BLOB)
You need to use UpdateBlob() to insert data into a binary field (cfr. xarSession__phpWrite). Try to avoid binary fields and use text fields where possible.
5. Text Data (TEXT)
You may need to CAST text fields to VARCHAR(8000) [or smaller] when mixing data types (cfr. Dynamic_VariableTable_DataStore). No cross-database recommendations yet.
6. Empty Strings
From the Transact-SQL user guide : The empty string is interpreted as a single space in all char, varchar, nchar, nvarchar, and text concatenation, and in varchar insert and assignment statements.
From the SQL Server 2000 help : Interpretation of an empty string is controlled by the compatibility level, which is set with the sp_dbcmptlevel system stored procedure. If the compatibility level is 65 or lower, SQL Server interprets empty strings as single spaces. If the compatibility level is 70 or 80, SQL Server interprets empty strings as empty strings. For more information, see sp_dbcmptlevel. Expect differences in interpretation of empty strings compared to earlier versions of SQL Server.
In short, you have no idea what will be stored or what you'll get back. The good news is that SELECT ... FROM ... WHERE myfield = '' will work either way. The bad news is that if you try to check for empty($myfield) in PHP afterwards, you may get a difference since a single space is ... not empty.
Solution : the ADODB mssql driver was adapted to automatically replace single spaces by empty strings in results, so that you can forget about this problem...

7. Data Binding

Data binding concerns the linking of data in PHP to database queries. Binding can work in either direction - passing data from PHP variables to queries, and passing query results direct to PHP variables. For maximum portability, Xaraya only supports the passing of data from PHP into queries.
1. Direct Query Building
One way of binding data to queries is to insert the data directly into a query. The result of this method is a complete query, ready to execute.
Data inserted into a query will be either numeric or a string (we will disregard BLOBs for now, as these require special handling). Numeric values can be inserted directly into a query. Strings need to be quoted, and quoted strings need internal quotes escaped, the escaping method being particular to the type of database. The ADODB database connection object provides a method for quoting and escaping a string.
Example: the following query selects the 'xar_name' column from table 'xar_table' for a given ID:
          // $id must be numeric - make sure it gets validated.
          $id = 123;
          $dbconn =& xarDBGetConn();
          $query = 'select xar_name from xar_table where xar_id = ' . $id;
          $dbconn->Execute($query);
          
          
Example: the following query selects the 'xar_id' column from table 'xar_table' for a given name. Note that it is not necessary to add further quotes to the string in the WHERE-clause, as the qstr() method does that. The method will also esacape any quotes or special characters that may be present in $name:
          // $name is a string variable, so it will get quoted.
          $name = 'a_name';
          $dbconn =& xarDBGetConn();
          $query = 'select xar_id from xar_table where xar_name = ' . $dbconn->qstr($name);
          $dbconn->Execute($query);
          
          
With this method, you retain control over whether a value is treated as numeric or a string, regardless of the real data type of the variable that is being inserted into the query. It is therefore important to ensure variables are properly validated, as an unquoted string being treated as a numeric value could allow various exploits to be used against the database.
The qstr() method supercedes the deprecated xarVarPrepForStore(), but is not a direct replacement without some effort to remove the extra quotes around strings, from queries that used xarVarPrepForStore().
2. Bind Variables
The second (and preferred) method of binding PHP data to queries, is to used the ADODB bind variable syntax. This method allows you to put placeholders in your query, and bind data to those placeholders during the parse stage when executing the query. Some database types support this kind of data binding natively, some do not. Either way, the method is the same as ADODB will provide emulation for bind variables where necessary.
A placeholder is the '?' character. Any number of placeholders can be added to a query, and they will be parsed in order, from left to right. If you need to use a literal '?' in a query, then it can be passed in as bind data. This does not affect the use of '?' within bind data itself.
The data for binding takes the form of an ordered array, that is passed in to the Execute() or SelectLimit() methods of the database connection object. The datatype of the data will be used to determine how that data will be bound to the query: PHP strings will be bound as query strings, PHP integers as query integers etc. It is therefore important to ensure the datatypes are correct when the query is executed.
String quoting is not necessary when using bind variables, nor is escaping of special characters within a string value.
Example: this query uses two bind variables, a string in the SELECT-clause and an integer in the WHERE-clause. In this example, if $id happened to be a string (even though it contains numeric data), then the database would need to perform an implicit type conversion. Normally this is not an issue, but some complex queries can result in type conversions being less predictable then you would hope for.
          // $notes is a string and $id is numeric
          $notes = 'This is a "note"';
          $id = 123;
          $dbconn =& xarDBGetConn();
          $query = 'select xar_name, ? as notes from xar_table where xar_id = ?';
          $dbconn->Execute($query, array($notes, $id));
          
          
Example: the following query demonstrates one way to handle a common need to build queries dynamically. Because the bind variables are added dynamically, the bind values must also be built up the same way. Note: error handling is omited for clarity.
          $name = 'a_name';
          $id = 123;
          $dbconn =& xarDBGetConn();
          // Initialise the where-clause and bind data arrays.
          $where = array();
          $bind = array();
          // Add some where-clause conditions (one or the other or both)
          if (test1) {
             $where[] = 'xar_name = ?';
             $bind[] = $name;
          }
          if (test2) {
             $where[] = 'xar_id = ?';
             $bind[] = $id;
          }
          // Execute the query
          $query = 'select * from xar_table where ' . implode(' and ', $where);
          $dbconn->Execute($query, $bind);
          
          
With this method, queries can be smaller and more legible, though it can sometimes be difficult to match the placeholders to the bind array elements when debugging. Although these examples are SELECT statements, the binding method applies equally to other DML statements, such as UPDATE, INSERT and DELETE.
Databases that support data binding natively can take advantage of query caching, where a single parsed query can be brought back from cache for execution many times, using different bound values. This can improve performance considerably.

8. Revision history

80 

2003-04-18: MrB: created
2003-05-05: Richard Cave: First stab at database requirements.
2003-05-14: Richard Cave: Added MySQL multi-table DELETE.
2003-06-12: Richard Cave: Added PostgreSQL SELECT DISTINCT with ORDER BY and updated index names.
2003-02-21: Jason Judge: Added section on data binding.
2005-07-02: Mike's Pub: Added section on MS SQL Server and comments about Binary Data/Text Data/HAVING