Tuning SQL Relay

Connect To The Database Using a Local Connection

One of the most common mistakes that people make when running SQL Relay on the same machine as the database is not using a "local connection" to the database. It's an easy mistake to make, but when remedied, it can result in a substantial improvement in performance.

Most databases support "remote connections" (usually over an inet socket) and "local connections" (usually over a unix socket or IPC). If you're running SQL Relay on a seperate machine from the database, you have to configure it to connect to the database using a remote connection. However, if you're running SQL Relay on the same machine as the database, then you should configure it to connect to the database using a local connection if possible.

MySQL

MySQL supports local connections over a unix socket.

By default, MySQL is usually configured to accept local connections. To make sure MySQL is configuerd to accept local connections, edit the MySQL configuration file (usually /etc/my.cnf) and add a line like the following to the [mysqld] section if it's not already there:

socket=/var/lib/mysql/mysql.sock

Now restart MySQL.

To configure SQL Relay to connect to MySQL using the local connection, edit the configuration file and make sure that the connectstring for the MySQL instance uses the socket parameter, as follows:

user=testuser;password=testpassword;db=testdb;socket=/var/lib/mysql/mysql.sock

as opposed to:

user=testuser;password=testpassword;db=testdb;host=localhost;port=5432

PostgreSQL

PostgreSQL supports local connections over a unix socket.

By default, PostgreSQL is configured to accept local connections. I'm not sure you can even turn it off. By default, PostgreSQL listens on /tmp/.s.PGSQL.5432. The directory can be overridden by modifying the PostgreSQL configuration file (usually /var/lib/pgsql/data/postgresql.conf). The following parameters affect which file PostgreSQL listens on and that file's properties. By default these parameters are commented out. To modify the parameters, remove the leading # and set the parameter. By default, the parameters look like this:

#unix_socket_directory = ''
#unix_socket_group = ''
#unix_socket_permissions = 0777 # octal

#port = 5432

The socket file that PostgreSQL will listen on will be found in /tmp by default, or in whatever directory is specified by unix_socket_directory. The file always start with .s.PGSQL. and end with whatever port is specified with the port parameter. Note that the port parameter also determines which inet port PostgreSQL will listen on. By default it will have 777 permissions and be owned by whatever user and group the postmaster process is running as. You can override the permissions using the unix_socket_permissions parameter and the group using the unix_socket_group parameter.

The postmaster process also takes several command line parameters. The -k parameter overrides the unix_socket_directory parameter and the -p parameter overrides the port parameter.

After you modify postgresql.conf, restart PostgreSQL.

To configure SQL Relay to connect to PostgreSQL using the local connection, edit the configuration file and make sure that the connectstring for the PostgreSQL instance uses the host and port parameters as follows:

user=testuser;password=testpassword;host=/tmp;port=5432;

as opposed to:

user=testuser;password=testpassword;host=localhost;port=5432;

Ordinarily, the host parameter tells SQL Relay what host to connect to, but if it starts with a / then it indicates what directory to look for the socket file in. Likewise, the port parameter ordinarily tells SQL Relay what port to connect to, but if host starts with a / then it indicates that SQL Relay should look for a file in the directory specifid by the host parameter starting with .s.PGSQL. and ending in the port number.

Oracle

Oracle supports local connections over IPC using shared memory.

By default, Oracle isn't usually set up to listen for IPC connections. To set up Oracle to listen for IPC connections, edit $ORACLE_HOME/network/admin/listener.ora and look for a section like this:

LISTENER =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS_LIST =
        (ADDRESS = (PROTOCOL = TCP)(HOST = localhost.localdomain)(PORT = 1521))
      )
    )
  )

Add an additional ADDRESS section as follows:

LISTENER =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS_LIST =
        (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC))
        (ADDRESS = (PROTOCOL = TCP)(HOST = localhost.localdomain)(PORT = 1521))
      )
    )
  )

Now Oracle's listener is configured to listen for both IPC and inet socket connections. Restart Oracle

SQL Relay looks up the ORACLE_SID in the $ORACLE_HOME/network/admin/tnsnames.ora file and uses the configuration parameters there to connect to the database. By default, the connection to the database is configured to use an inet socket. To set up an ORACLE_SID to use IPC instead, edit $ORACLE_HOME/network/admin/tnsnames.ora and look for a section for the ORACLE_SID like this:

ORA1 =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = localhost.localdomain)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = ora1)
    )
  )

Modify it as follows:

ORA1 =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = ora1)
    )
  )

SQL Relay requires no specific modifications to connect to the database using IPC instead of an inet socket. Just configure the connect string to use an SID that uses IPC to connect to the database. For example:

user=testuser;password=testpassword;oracle_sid=ora1;

SQLite

SQLite is different from other databases. SQL Relay's sqlite connection uses the sqlite library to access the database file directly. Effectively, the database connection is always local.

IBM DB2

As far as I can tell, DB2 supports local connections over IPC using shared memory, semaphores and message queues (though I can't find any documentation that specifically says that).

By default, clients connect to DB2 databases using local connections. In fact, to access a remote instance, you have to set up a stub local instance which relays to the remote instance and connect to the local instance.

SQL Relay requires no specific modifications to connect to the database. Just configure the connect string to use the local database. For example:

db=testdb

Informix

Informix databases support connections over a wide range of protocols, including remote connections using TCP/IP sockets and local connections using shared memory.

Databases are identified in the INFORMIXDIR/etc/sqlhosts file. The second parameter for each database entry identifies the protocol to use to connect to it. The onsoctcp protocol tells client programs to connect to the database using a TCP/IP socket and the onipcshm protocol tells client programs to connect to the database using shared memory. For a local connection, use the onipcshm protocol.

Once sqlhosts is configured, just configure SQL Relay to use that database. For example:

db=testdb

SAP/Sybase

As far as I can tell, SAP/Sybase only supports client connections over inet sockets. If anyone knows differently, send mail to dev@firstworks.com.

Firebird

As far as I can tell, Firebird only supports client connections over inet sockets. If anyone knows differently, send mail to dev@firstworks.com.

Connect To SQL Relay Using a Unix Socket

SQL Relay supports inet and unix sockets. If you are runing the SQL Relay client (your program) and the SQL Relay server on the same machine, make sure that they are talking to each other over the unix socket.

First make sure that SQL Relay is configured to listen on a unix socket. In the configuration file, the instance that is talking to mysql should have a socket attribute:

<instance id="mysqltest" port="8006" socket="/tmp/mysqltest.socket" ... >

In the above example, the SQL Relay server will listen on inet port 8006 and unix socket /tmp/mysqltest.socket.

Now make sure that your app is using that socket as well. When you create a new sqlrconnection, make sure you tell it to use the unix socket:

sqlrconnection *con=new sqlrconnection("localhost",8006,"/tmp/mysqltest.socket","user","password",0,1);

as opposed to:

sqlrconnection *con=new sqlrconnection("localhost",8006,NULL,"user","password",0,1);

(for non-C++ api's the syntax is similar)

The client will first try the unix socket, then if it can't connect, it will try the host/port.

Make sure to type the filename of the unix socket correctly. I've seen several cases where people were really confused why they were getting bad performance even though they were using unix sockets. They had just typed the unix socket file name wrong and it was trying it, failing, then falling back to host/port.

Use a Platform That Supports File Descriptor Passing

When a client connects to the SQL Relay listener process, it waits for an available database connection, then either hands the client off to that process via file descriptor passing, or proxies the incoming client, ferrying data back and forth between it and the database connecton process.

Whether to pass or proxy can be specified by the "handoff" parameter in ths configuration file. If you omit the handoff parameter or set handoff="pass" then file descriptor passing will be used. If you set handoff="proxy" then proxying will be used.

File descriptor passing is faster and lighter than proxying, and is the default, but it isn't supported on all platforms. For example, Linux kernels prior to 2.2 and the Cygwin environment under Windows, do not support file descriptor passing and it is likely that other platforms don't as well, especially older platforms.

If you want the best performance, you must use a platform that supports file descriptor passing.

Use a Platform That Supports Threads

When a client connects to the SQL Relay listener process, it forks off either a thread or child process and waits for an available database connection. Support for forking a thread, rather than a process was added in version 0.54.

Whether to fork a thread or process can be specified by the "sessionhandler" parameter in ths configuration file. If you omit this parameter or set sessionhandler="thread" then a thread will be used. If you set sessionhandler="process" then a child process will be used. Note that sessionhandler="thread" became the default in version 0.58. Previously it defaulted to "process".

Forking a thread is faster and lighter than forking a child process, and is the default (as of version 0.58), but it isn't supported on all platforms, especially older platforms. Also, if the Rudiments library that SQL Relay is built on was intentionally compiled without thread support then SQL Relay will also not be able to use threads.

If SQL Relay is configured to use threads but can't, then it falls back to forking a child process and prints a warning on the screen when the sqlr-listener process starts.

If you want the best performance, you must use a platform that supports threads.

Tweak Various System Parameters

Certain system parameters can be tuned to get better performance out of SQL Relay.

TIME_WAIT timeout

The first paramater that comes to mind is the TIME_WAIT timeout. When a TCP client disconnects from a server, the socket that the client was connected on goes into a TIME_WAIT state for typically between a minute and 4 minutes.

For servers serving data over the the unreliable internet, this is probably reasonable. For internal servers, dedicated to serving other internal servers on a reliable network, reducing the length of the timeout is probably OK.

Here's why it helps...

The kernel keeps a list of sockets in the TIME_WAIT state. When the list is full, failures start to occur.

If your server is getting new client connections faster than it can bleed off sockets in the TIME_WAIT state, the list will ultimately get full. Decreasing the timeout increases the bleed-off rate.

The following instructions illustrate how to change the timeout rate for Linux and Solaris. Note that I got these instructions off of the web and have not tried all of them myself.

For Linux, set the timeout by executing the following command. In this example, the timeout is set to 30 seconds. You should put this command in a system startup file so it will be executed at boot time.

echo 30 > /proc/sys/net/ipv4/tcp_fin_timeout

You might also try enabling tcp_tw_recycle and tcp_tw_reuse as follows:

echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse
echo 1 > /proc/sys/net/ipv4/tcp_tw_recycle

Those parameters enable fast recycling and reuse of sockets in the TIME_WAIT state. I can't find any documentation about them other than the following though:

tcp_tw_recycle - BOOLEAN
 Enable fast recycling TIME-WAIT sockets. Default value is 0.
 It should not be changed without advice/request of technical
 experts.

tcp_tw_reuse - BOOLEAN
 Allow to reuse TIME-WAIT sockets for new connections when it is
 safe from protocol viewpoint. Default value is 0.
 It should not be changed without advice/request of technical
 experts.

For Solaris, the parameter can be modified while the system is running using the ndd command to set the number of milliseconds to wait. These examples set the timeout to 30 seconds. You should put these commands in a system startup file so they'll be executed at boot time.

For Solaris 7 and later:

ndd -set /dev/tcp tcp_time_wait_interval 30000

For Solaris 2.6 and earlier:

ndd -set /dev/tcp tcp_close_wait_interval 30000

Port range

Another paramter that you may want to tweak is the range of available ports. On Linux, you can display the range by running:

/sbin/sysctl net.ipv4.ip_local_port_range

You can increase this to range from 1024 to 65535 by running the following command:

/sbin/sysctl -w net.ipv4.ip_local_port_range="1024 65535"

You should put this command in a system startup file so it'll be executed at boot time.

I'm not sure what the default port range is or how to change it on other operating systems.

Don't Get Unnecessary Column Info

If you run select queries but don't need to know the names/types/sizes of the columns, rather you can just refer to them by number, then you can call dontGetColumnInfo() before running your query and much less data will be transmitted from the SQL Relay server to the client. This is really useful for speeding up 1 column queries like "select count(*) from mytable".

Use Result Set Buffer Size

By default, SQL Relay fetches the entire result set of a query into memory. SQL Relay was originally written for use with web-based apps, and most web pages display small result sets so generally it's faster if the entire result set is buffered. But, if you run select queries that return really long result sets, you should use setResultSetBufferSize() to fetch rows in groups instead of all at once or 1 at a time. The value to use tends to be query-specific though so you usually have to do some benchmarking to get it optimized. If you fetch too many rows at once, memory has to be allocated which slows things down (and in really bad cases can cause swapping) but if you fetch too few at once, it causes too many round trips to the database. Each query has a sweet spot, finding it may dramatically improve that query's performance.

Memory Usage

Many database API's allocate memory internally to store result sets. However, Oracle, DB2, Informix, SAP/Sybase, FreeTDS, MySQL, Firebird and ODBC allow you to allocate buffers yourself and fetch result sets into these buffers. When using these databases, SQL Relay preallocates a result set buffer and re-uses it over and over rather than dynamically allocating memory for each result set. This improves performance at the cost of memory. There are parameters for tuning the size of the result set buffer though.

The size of the result set buffer is determined by 3 parameters:

The third parameter might be a little confusing. Some databases only support single-row fetches; you run a query, then fetch 1 row at a time from the database. Oracle, DB2 and SAP/Sybase support array fetches; you can run a query and then fetch any number of rows from the database in a single round-trip. This can improve performance significantly.

The default result set buffer size for Oracle, DB2, Informix, and SAP/Sybase is 80mb (256 fields per row * 32768 bytes per field * 10 rows to fetch at once). Informix, FreeTDS, MySQL, Firebird, and ODBC don't support multi-row fetches so its default buffer size is about 8mb (256*32768*1 bytes).

It is important to note that each cursor contains a result set buffer and each connection opens a configurable number of cursors.

For example, if you open 10 connections with 2 cursors each, then since each cursor will consume 80mb, each connection will consume about 160mb, and all 10 connections will consume about 1.6g.

If that seems like more memory than you'd like to allocate to SQL Relay then you can tune it down.

For the best performance/memory-usage trade-off, the "maximum number of columns in the result set" should be set to the largest number of columns that a query run by your apps could return. However, this parameter can also be set to -1, causing column buffers to be allocated dynamically for each result set. This won't perform as well but will conserve memory.

The "maximum size of an individual non-LOB field" parameter should be set to the longest non-LOB field that your app could fetch.

There is no rule of thumb for the "number of rows of the result set to fetch at once" parameter. More tends to be better, but not always. You may have to tune this by trial and error.

For Oracle, DB2, and SAP/Sybase, the maxselectlistsize, maxitembuffersize and fetchatonce parameters of the string attribute of the connection tag in the configuration file may be set to control these values at run time. For Informix, MySQL, Firebird, and FreeTDS, the maxselectlistsize and maxitembuffersize parameters are available but the fetchatonce parameter is not. When routing queries or sessions, the fetchatonce parameter is available.

See the SQL Relay Configuration Reference for details.

Possibly more important than the parameters controlling the result set buffer size though, are the cursors and connections parameters as they are both multipliers. See tips below for each.

How Many Connections Should I Run?

A good rule of thumb is to run as many connections as your database can handle.

The best way is to determine how many connections to run is to make an educated guess, let your applications run, monitor the performance of the SQL Relay server and database and adjust the number of connections accordingly.

If you are implementing SQL Relay because your database had been getting overloaded with too many connections, you may have a pretty good idea of how many connections your database can handle, and you can use that number as a starting point.

When just letting your applications run isn't feasable, you can use a load-testing program like apachebench or LoadRunner to do automated testing. You'll have to set up scripts and pages to simulate the kind of load that your applications will place on SQL Relay and the database, but you can get an accurate idea of how your application will perform end-to-end. You can let the load-testing program run, monitor the performance of the SQL Relay server and database and adjust the number of connections accordingly.

There is also a program in the test/stress directory of the SQL Relay distribution called querytest that logs into the SQL Relay server, runs a series of queries and logs out, over and over, as fast as possible. You can run many instances of querytest simultaneously to simulate the kind of load your applications might place on SQL Relay and the database. querytest is OK as a starting point, but to do any serious testing, you should modify the queries that it runs to more accurately simulate your applications.

If you are concerned about memory usage, consider allowing connections to scale up and down as necessary by using the connections, maxconnections, maxqueuelength and growby parameters in the configuration file. See the SQL Relay Configuration Reference for detailed information on these parameters.

How Many Cursors Should I Allocate?

Usually not very many. Most often, only one. If your app requires lots of cursors, then it's as likely as not that it isn't closing or freeing them properly.

By default, unless setResultSetBufferSize() is called, SQL Relay clients use a server-side cursor to fetch the entire result set into memory and then release the cursor. This way, only one server-side cursor is required, independent of how many client-side cursors are used.

However, if you configure the client to fetch say, 10 rows at a time, and run nested queries - eg. run a select, and then run an insert for each row of the select, then two server side cursors would be required. One for the outer select and another for the inner inserts. This is a legitimate use of more than 1 cursor.

Sometimes though, an app will allocate 10 cursors, set each to fetch 5 rows at a time, run a query using each cursor and then free all 10 cursors. This would require 10 server-side cursors. Only one server-side cursor would be required if: one cursor were allocated and reused, each cursor was closed when the app was done with it, or each cursor was freed when the app was done with it. This is a common problem and, unfortunately, sometimes architectural constraints make it unavoidable.

SQL Relay has parameters that allow cursors to scale up and down as needed. If you are concerned about memory usage, consider using the cursors, maxcursors and cursors_growby parameters in the configuration file. See the SQL Relay Configuration Reference for detailed information on these parameters.