W3cubDocs

/MariaDB

CONNECT Data Types

MariaDB starting with 10.0

The CONNECT handler was introduced in MariaDB 10.0.

Many data types make no or little sense when applied to plain files. This why CONNECT supports only a restricted set of data types. However, ODBC, JDBC or MYSQL source tables may contain data types not supported by CONNECT. In this case, CONNECT makes an automatic conversion to a similar supported type when it is possible.

The data types currently supported by CONNECT are:

Type name Description Used for
TYPE_STRING Zero ended string char, varchar, text
TYPE_INT 4 bytes integer int, mediumint, integer
TYPE_SHORT 2 bytes integer smallint
TYPE_TINY 1 byte integer tinyint
TYPE_BIGINT 8 bytes integer bigint, longlong
TYPE_DOUBLE 8 bytes floating point double, float, real
TYPE_DECIM Numeric value decimal, numeric, number
TYPE_DATE 4 bytes integer date, datetime, time, timestamp, year

TYPE_STRING

This type corresponds to what is generally known as CHAR or VARCHAR by database users, or as strings by programmers. Columns containing characters have a maximum length but the character string is of fixed or variable length depending on the file format.

The DATA_CHARSET option must be used to specify the character set used in the data source or file. Note that, unlike usually with MariaDB, when a multi-byte character set is used, the column size represents the number of bytes the column value can contain, not the number of characters.

TYPE_INT

The ]INTEGER type contains signed integer numeric 4-byte values (the int/ of the C language) ranging from –2,147,483,648 to 2,147,483,647 for signed type and 0 to 4,294,967,295 for unsigned type.

TYPE_SHORT

The SHORT data type contains signed integer numeric 2-byte values (the short integer of the C language) ranging from –32,768 to 32,767 for signed type and 0 to 65,535 for unsigned type.

TYPE_TINY

The TINY data type contains integer numeric 1-byte values (the char of the C language) ranging from –128 to 127 for signed type and 0 to 255 for unsigned type. For some table types, TYPE_TINY is used to represent Boolean values (0 is false, anything else is true).

TYPE_BIGINT

The BIGINT data type contains signed integer 8-byte values (the long long of the C language) ranging from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 for signed type and from 0 to 18,446,744,073,709,551,615 for unsigned type.

Inside tables, the coding of all integer values depends on the table type. In tables represented by text files, the number is written in characters, while in tables represented by binary files (BIN or VEC) the number is directly stored in the binary representation corresponding to the platform.

The length (or precision) specification corresponds to the length of the table field in which the value is stored for text files only. It is used to set the output field length for all table types.

TYPE_DOUBLE

The DOUBLE data type corresponds to the C language double type, a floating-point double precision value coded with 8 bytes. Like for integers, the internal coding in tables depends on the table type, characters for text files, and platform binary representation for binary files.

The length specification corresponds to the length of the table field in which the value is stored for text files only. The scale (was precision) is the number of decimal digits written into text files. For binary table types (BIN and VEC) this does not apply. The length and scale specifications are used to set the output field length and number of decimals for all types of tables.

TYPE_DECIM

The DECIMAL data type corresponds to what MariaDB or ODBC data sources call NUMBER, NUMERIC, or DECIMAL: a numeric value with a maximum number of digits (the precision) some of them eventually being decimal digits (the scale). The internal coding in CONNECT is a character representation of the number. For instance:

colname decimal(14,6)

This defines a column colname as a number having a precision of 14 and a scale of 6. Supposing it is populated by:

insert into xxx values (-2658.74);

The internal representation of it will be the character string -2658.740000. The way it is stored in a file table depends on the table type. The length field specification corresponds to the length of the table field in which the value is stored and is calculated by CONNECT from the precision and the scale values. This length is precision plus 1 if scale is not 0 (for the decimal point) plus 1 if this column is not unsigned (for the eventual minus sign). In fix formatted tables the number is right justified in the field of width length, for variable formatted tables, such as CSV, the field is the representing character string.

Because this type is mainly used by CONNECT to handle numeric or decimal fields of ODBC, JDBC and MySQL table types, CONNECT does not provide decimal calculations or comparison by itself. This is why decimal columns of CONNECT tables cannot be indexed.

DATE Data type

Internally, date/time values are stored by CONNECT as a signed 4-byte integer. The value 0 corresponds to 01 January 1970 12:00:00 am coordinated universal time (UTC). All other date/time values are represented by the number of seconds elapsed since or before midnight (00:00:00), 1 January 1970, to that date/time value. Date/time values before midnight 1 January 1970 are represented by a negative number of seconds.

CONNECT handles dates from 13 December 1901, 20:45:52 to 18 January 2038, 19:14:07.

Although date and time information can be represented in both CHAR and INTEGER data types, the DATE data type has special associated properties. For each DATE value, CONNECT can store all or only some of the following information: century, year, month, day, hour, minute, and second.

Date Format in Text Tables

Internally, date/time values are handled as a signed 4-byte integer. But in text tables (type DOS, FIX, CSV, FMT, and DBF) dates are most of the time stored as a formatted character string (although they also can be stored as a numeric string representing their internal value). Because there are infinite ways to format a date, the format to use for decoding dates, as well as the field length in the file, must be associated to date columns (except when they are stored as the internal numeric value).

Note that this associated format is used only to describe the way the temporal value is stored internally. This format is used both for output to decode the date in a SELECT statement as well as for input to encode the date in INSERT or UPDATE statements. However, what is kept in this value depends on the data type used in the column definition (all the MariaDB temporal values can be specified). When creating a table, the format is associated to a date column using the DATE_FORMAT option in the column definition, for instance:

create table birthday (
  Name varchar(17),
  Bday date field_length=10 date_format='MM/DD/YYYY',
  Btime time field_length=8 date_format='hh:mm tt')
engine=CONNECT table_type=CSV;

insert into birthday values ('Charlie','2012-11-12','15:30:00');

select * from birthday;

The SELECT query returns:

Name Bday Btime
Charlie 2012-11-12 15:30:00

The values of the INSERT statement must be specified using the standard MariaDB syntax and these values are displayed as MariaDB temporal values. Sure enough, the column formats apply only to the way these values are represented inside the CSV files. Here, the inserted record will be:

Charlie,11/12/2012,03:30 PM

Note: The field_length option exists because the MariaDB syntax does not allow specifying the field length between parentheses for temporal column types. If not specified, the field length is calculated from the date format (sometimes as a max value) or made equal to the default length value if there is no date format. In the above example it could have been removed as the calculated values are the ones specified. However, if the table type would have been DOS or FIX, these values could be adjusted to fit the actual field length within the file.

A CONNECT format string consists of a series of elements that represent a particular piece of information and define its format. The elements will be recognized in the order they appear in the format string. Date and time format elements will be replaced by the actual date and time as they appear in the source string. They are defined by the following groups of characters:

Element Description
YY The last two digits of the year (that is, 1996 would be coded as "96").
YYYY The full year (that is, 1996 could be entered as "96" but displayed as “1996”).
MM The one or two-digit month number.
MMM The three-character month abbreviation.
MMMM The full month name.
DD The one or two-digit month day.
DDD The three-character weekday abbreviation.
DDDD The full weekday name.
hh The one or two-digit hour in 12-hour or 24-hour format.
mm The one or two-digit minute.
ss The one or two-digit second.
t The one-letter AM/PM abbreviation (that is, AM is entered as "A").
tt The two-letter AM/PM abbreviation (that is, AM is entered as "AM").

Usage Notes

  • To match the source string, you can add body text to the format string, enclosing it in single quotes or double quotes if it would be ambiguous. Punctuation marks do not need to be quoted.
  • The hour information is regarded as 12-hour format if a “t” or “tt” element follows the “hh” element in the format or as 24-hour format otherwise.
  • The "MM", "DD", "hh", "mm", "ss" elements can be specified with one or two letters (e.g. "MM" or "M") making no difference on input, but placing a leading zero to one-digit values on output[1] for two-letter elements.
  • If the format contains elements DDD or DDDD, the day of week name is skipped on input and ignored to calculate the internal date value. On output, the correct day of week name is generated and displayed.
  • Temporal values are always stored as numeric in BIN and VEC tables.

Handling dates that are out of the range of supported CONNECT dates

If you want to make a table containing, for instance, historical dates not being convertible into CONNECT dates, make your column CHAR or VARCHAR and store the dates in the MariaDB format. All date functions applied to these strings will convert them to MariaDB dates and will work as if they were real dates. Of course they must be inserted and will be displayed using the MariaDB format.

NULL handling

CONNECT handles null values for data sources able to produce nulls. Currently this concerns mainly the ODBC, JDBC, MONGO, MYSQL, XML, JSON and INI table types. For INI, JSON, MONGO or XML types, null values are returned when the key is missing in the section (INI) or when the corresponding node does not exist in a row (XML, JSON, MONGO).

For other file tables, the issue is to define what a null value is. In a numeric column, 0 can sometimes be a valid value but, in some other cases, it can make no sense. The same for character columns; is a blank field a valid value or not?

A special case is DATE columns with a DATE_FORMAT specified. Any value not matching the format can be regarded as NULL.

CONNECT leaves the decision to you. When declaring a column in the CREATE TABLE statement, if it is declared NOT NULL, blank or zero values will be considered as valid values. Otherwise they will be considered as NULL values. In all cases, nulls are replaced on insert or update by pseudo null values, a zero-length character string for text types or a zero value for numeric types. Once converted to pseudo null values, they will be recognized as NULL only for columns declared as nullable.

For instance:

create table t1 (a int, b char(10)) engine=connect;
insert into t1 values (0,'zero'),(1,'one'),(2,'two'),(null,'???');
select * from t1 where a is null;

The select query replies:

a b
NULL zero
NULL ???

Sure enough, the value 0 entered on the first row is regarded as NULL for a nullable column. However, if we execute the query:

select * from t1 where a = 0;

This will return no line because a NULL is not equal to 0 in an SQL where clause.

Now let us see what happens with not null columns:

create table t1 (a int not null, b char(10) not null) engine=connect;
insert into t1 values (0,'zero'),(1,'one'),(2,'two'),(null,'???');

The insert statement will produce a warning saying:

Level Code Message
Warning 1048 Column 'a' cannot be null

It is replaced by a pseudo null 0 on the fourth row. Let us see the result:

select * from t1 where a is null;
select * from t1 where a = 0;

The first query returns no rows, 0 are valid values and not NULL. The second query replies:

a b
0 zero
0 ???

It shows that the NULL inserted value was replaced by a valid 0 value.

Unsigned numeric types

They are supported by CONNECT since version 1.01.0010 for fixed numeric types (TINY, SHORT, INTEGER, and BITINT).

Data type conversion

CONNECT is able to convert data from one type to another in most cases. These conversions are done without warning even when this leads to truncation or loss of precision. This is true, in particular, for tables of type ODBC, JDBC, MYSQL and PROXY (via MySQL) because the source table may contain some data types not supported by CONNECT. They are converted when possible to CONNECT types.

When converted, MariaDB types are converted as:

MariaDB Types CONNECT Type Remark
integer, medium integer TYPE_INT 4 byte integer
small integer TYPE_SHORT 2 byte integer
tiny integer TYPE_TINY 1 byte integer
char, varchar TYPE_STRING Same length
double, float, real TYPE_DOUBLE 8 byte floating point
decimal, numeric TYPE_DECIM Length depends on precision and scale
all date related types TYPE_DATE Date format can be set accordingly
bigint, longlong TYPE_BIGINT 8 byte integer
SQL_GUID TYPE_STRING Len = 36
SQL_BINARY, SQL_VARBINARY, SQL_LONGVARBINARY TYPE_STRING len = min(abs(len), connect_conv_size) (Only if the value of connect_type_conv is FORCE. The column should use the binary charset).
Other types TYPE_ERROR Not supported, no conversion provided.

In the case of TEXT columns, the handling depends on the values given to the connect_type_conv and connect_conv_size system variables. By default no conversion is permitted.

Note: BLOB is currently not converted by default until a TYPE_BIN type is added to CONNECT. However, the FORCE option (from Connect 1.06.006) can be specified for blob columns containing text and the SKIP option also applies to ODBC BLOB columns.

ODBC SQL types are converted as:

SQL Types Connect Type Remark
SQL_CHAR, SQL_VARCHAR TYPE_STRING
SQL_LONGVARCHAR TYPE_STRING len = min(abs(len), 255) If the column is generated by discovery (columns not specified) its length is connect_conv_size.
SQL_NUMERIC, SQL_DECIMAL TYPE_DECIM
SQL_INTEGER TYPE_INT
SQL_SMALLINT TYPE_SHORT
SQL_TINYINT, SQL_BIT TYPE_TINY
SQL_FLOAT, SQL_REAL, SQL_DOUBLE TYPE_DOUBLE
SQL_DATETIME TYPE_DATE len = 10
SQL_INTERVAL TYPE_STRING len = 8 + ((scale) ? (scale+1) : 0)
SQL_TIMESTAMP TYPE_DATE len = 19 + ((scale) ? (scale +1) : 0)
SQL_BIGINT TYPE_BIGINT
Other types TYPE_ERROR Not supported.

JDBC SQL types are converted as:

JDBC Types Connect Type Remark
(N)CHAR, (N)VARCHAR TYPE_STRING
LONG(N)VARCHAR TYPE_STRING len = min(abs(len), connect_conv_size) If the column is generated by discovery (columns not specified), its length is connect_conv_size
NUMERIC, DECIMAL, VARBINARY TYPE_DECIM
INTEGER TYPE_INT
SMALLINT TYPE_SHORT
TINYINT, BIT TYPE_TINY
FLOAT, REAL, DOUBLE TYPE_DOUBLE
DATE TYPE_DATE len = 10
TIME TYPE_DATE len = 8 + ((scale) ? (scale+1) : 0)
TIMESTAMP TYPE_DATE len = 19 + ((scale) ? (scale +1) : 0)
BIGINT TYPE_BIGINT
Other types TYPE_ERROR Not supported.

Note: The connect_type_conv SKIP option also applies to ODBC and JDBC tables.


  1. Here input and output are used to specify respectively decoding the date to get its numeric value from the data file and encoding a date to write it in the table file. Input is performed within SELECT queries; output is performed in UPDATE or INSERT queries.
Content reproduced on this site is the property of its respective owners, and this content is not reviewed in advance by MariaDB. The views, information and opinions expressed by this content do not necessarily represent those of MariaDB or any other party.

© 2019 MariaDB
Licensed under the Creative Commons Attribution 3.0 Unported License and the GNU Free Documentation License.
https://mariadb.com/kb/en/connect-data-types/