Hazaar DBI Data Synchronisation

About 9 min

Hazaar DBI Data Synchronisation

Hazaar DBI allows developers to define static data that can be imported into a database using a powerful data synchronisation process. This process is designed to ensure that required data exists in the database, whether for initial installation, ongoing maintenance, or business operations. The synchronisation engine is highly flexible, supporting custom SQL, staged operations, variable substitution, macros for foreign key resolution, and advanced row differencing to minimise unnecessary updates.

Typical use cases include:

Initialising a newly created database after software installation.
Ensuring critical data is maintained, such as administration or system user accounts.
Managing business operation data and workflows, including types, reference tables, and workflow states.

The data sync process offers multiple methods of synchronising data into a database, from custom SQL queries to an advanced row differencing engine that ensures only necessary changes are made. It also supports staged operations, variable and macro resolution, and remote data sources.

Data Sync File Format

Datasync files are text files containing JSON-formatted objects. The main container is a JSON array, where each element represents a synchronisation item. Items are processed sequentially, allowing for complex workflows and dependencies, such as linking records across tables. This item-based approach is especially useful in relational databases where referential integrity and ordering matter.

An example layout could be:

[                                               //Sync Container Array
    {                                           //Sync Item Object
        "message": "Your custom message"        //Sync log message output
    },  
    {                                           //Sync Item Object
        "exec": "DELETE FROM tablename"         //Custom SQL execution
    },    
    {
        "vars": {
            "foo": "bar"                        //Global variables are defined when there is no "table" property.  These variables will be saved across sync items.
        }
    },
    {                                           //Sync Item Object - Row Sync Object
        "table": "{{tablename}}",               //The name of the table this element will work with.
        "insertonly": false,                    //Will only insert new records.  Useful for system initialisation.
        "truncate": true,                       //Boolean indicating that the table should be truncated and rebuilt on execution of the sync.  This resets any sequences on the table.
        "purge": "SQL CRITERIA",                //Purges specific rows from the table before the sync.  This is similar to truncate except only rows that match the SQL WHERE criteria specific will be removed and the sequences are not reset.  Note: Do NOT include the WHERE, just the criteria. ie: `module = 'foobar'`.
        "keys": [ "optional" ],                 //Optional list of fields to use as the unique identifier keys (see Unique Identifier Keys).
        "vars": [ "optional" ],                 //Defines variables for use in macros
        "refs": [ "optional" ],                 //Defines data references that apply to every row that will be sync'd in this item.
        "rows": [                               //An array of row object elements that will be sync'd into the database. 
            {
                "field1": "value1",             //String fields are supported.
                "field2": 2,                    //Numeric fields, including floats/doubles, are supported.
                "field3": true,                 //Boolean fields are supported.
                "field4": [ "a", "b", "c" ],    //Native [ARRAY] columns are also supported.
                "field5": {                     //JSON Objects are also supported if the target column has a JSON data type.  These fields also support
                    "key1": "element1",         //Array objects which are defined the same as the above [ARRAY] data type fields.  The data sync engine
                    "key2": "element2"          //will detect that target column type and convert the field value as needed.
                }
            }
        ],
        "source": {                             //An object that contains remote data sync properties
            "config": "string/object",          //Optional. The DBI config property.  This can be a string that names a DBI
                                                //configuration, or an object that contains DBI config properties.  Required if hostURL is not set.
            "hostURL": "string",                //Optional. A remote host running Hazaar DBI that will allow data sync access. Required if config is not set.
            "syncKey": "string",                //WARNING:  This is only for testing.  Sync files are not secure and
                                                //should not contain sensitive keys like this.  This should be set on
                                                //the DBI configuration object stored in an encrypted DBI config file.
            "select": {},                       //Optional. Sent to the DBI\Table::select() function to define a fieldset.
            "criteria": {},                     //Optional. Sent to the DBI\Table::find() function to set query criteris.
            "map": {},                          //Optional. Data map object sent to DBI\Datamapper::map() to map fields.
            "ignoreErrors": false               //If true then if the remote data source is not available the sync will log the error and continue.  Default: false
        }
    }
]

Datasync files are used to make sure that required data exists in the database. This can be for an initial installation that will create records in a database automatically as part of system initialisation. These can also be used to ensure that any data that is referenced in code actually exists in the database. Normally this is done to define types of things that have an identifier that will be referenced in code but linked to other entities in the database.

Sync Container Array

This array simply contains sync item objects that will be executed in the order they are defined in the array. The data sync process can be broken up into separate items that perform a unique operation. Normally a single sync item will define rows that will be synchronised into a single table.

See below for how to define Sync Item objects.

Sync Stage Objects

Each sync item object is a JSON object describing the actions to be performed during that item. The supported actions are:

message — Outputs a log message. Useful for documenting progress or context.
exec — Executes custom SQL. Can be a string or array of SQL statements. Variables and macros are resolved before execution.
rows — Synchronises rows in a table. Requires the table property. Supports row differencing, macros, and variable substitution.
update — Updates existing rows in a table. Requires table.
delete — Deletes rows from a table. Requires table.

Actions can be combined in a single item, but best practice is to keep exec separate from rows, update, or delete for clarity and maintainability. If combined, execution order is: message, exec, rows, update, delete.

Combining message with other actions is encouraged for user-friendly log output.

Log Message Output

The simplest action in a sync item is to output a log message using the message property. This can be any text, and is useful for documenting the purpose or progress of each item. Log messages are processed before other actions in the item.

Example

{
    "message": "This is a log message"
}

Execute Custom SQL

Custom SQL can be executed during a sync item using the exec property. SQL is executed "as is" and its return value is not processed. You can define a single SQL string or an array of SQL statements, which will be executed sequentially. If an error occurs during execution, the sync process will halt and log the error.

Example - Single

{
    "exec": "DELETE FROM my_table"
}

Example - Multiple

{
    "exec": [
        "DELETE FROM my_table",
        "DELETE FROM you_table"
    ]
}

Using Variables

Variables can be included in SQL using mustache tags (e.g., {{VAR_NAME}}). Variables are defined in the vars property of the sync item. Before SQL execution, all variables are resolved and substituted. This allows for dynamic and reusable SQL statements.

Example:

[
    {
        "vars": {
            "TEST_DATA": 1234
        },
        "exec": "UPDATE test_table SET value={{TEST_DATA}} WHERE value IS NULL;"
    }
]

This will replace {{TEST_DATA}} with 1234, resulting in UPDATE test_table SET value=1234 WHERE value IS NULL;.

Row Sync Objects

Row sync objects are used when the rows property is defined in a sync item. The rows property is an array of objects, each representing a database row to be synchronised. Each object contains key/value pairs where the key is the column name and the value is the desired value.

The synchronisation engine detects whether each row already exists based on defined criteria, then compares the existing row to the sync definition. Only columns defined in the sync file are updated, minimising unnecessary changes. This process supports macros for foreign key resolution and variable substitution.

Existing Row Detection

The sync engine ensures that each row defined in the sync file exists in the database. Existing rows are identified using one of three modes, prioritised as follows:

Primary Key Mode — If the row object defines the primary key column, its value is used to find the existing record. This is the fastest and most reliable method.
Key List Mode — If the sync item defines a keys property (an array of column names), those columns are used to find existing records. This is useful for composite keys or when primary keys are not available.
Object Mode — If neither primary key nor keys are defined, the entire row object is used as criteria to find existing records. This is the simplest but least efficient method, and does not support updates (only inserts).

Mixing modes is possible, but not recommended. For best performance and reliability, use primary key or key list mode.

Primary Key Mode

Primary key mode is the fastest and most reliable method. If the row object includes the primary key column, its value is used to find the existing record. If no record exists, a new one is inserted. If a record exists, only the columns defined in the sync file are updated if differences are detected.

Note: Columns not defined in the row object are not changed. To explicitly clear a column, set its value to null.

Caveat: Managing primary keys across many sync files and developers can be challenging. For large systems, consider using key list mode and macros for linking records.

Example

In the below example, id is the primary key field and so should be defined in each row object.

[
    {
        "table": "test",
        "rows": [
            {
                "id": 1,
                "name": "one",
                "label": "Row Number #1"
            },
            {
                "id": 2,
                "name": "two",
                "label": "Row Number #2"
            }
        ]
    }
]

In the above example, the sync engine will look for each record where the id column contains the defined value. If one doesn't exist, a new record will be inserted. If one does exist, it will ensure that the name and label columns contain their defined values.

Object Mode

Object mode uses all defined column values to find an existing record. If no record exists, a new one is inserted. Updates are not possible in this mode, as the criteria matches the entire object.

Warning: Object mode is slow and only supports inserts. Use only for simple cases or when no keys are available.

Example

[
    {
        "table": "test",
        "rows": [
            {
                "name": "one",
                "label": "Row Number #1"
            },
            {
                "name": "two",
                "label": "Row Number #2"
            }
        ]
    }
]

In this example, the sync engine will make sure that the defined rows exist in the database. If they already exist, then nothing will be changed.

Key List Mode

Key list mode is a hybrid approach. Instead of using the primary key, the sync item defines a keys property (array of column names) used to find existing records. If no record exists, an insert is performed. If a record exists, only the columns defined in the sync file are updated if differences are detected.

Tip: Define database indexes for columns used in keys to improve performance.

Example

[
    {
        "table": "test",
        "keys": [ "name" ],
        "rows": [
            {
                "name": "one",
                "label": "Row Number #1"
            },
            {
                "name": "two",
                "label": "Row Number #2"
            }
        ]
    }
]

In the above example, the name column is used to find an existing record. If one does not exist an insert is performed using the defined column values. If a record does exist, it will ensure that the label field contains the defined value.

::: notice It is a good idea to make sure a database index is defined for the columns used in the keys attribute. This will greatly improve performance during data synchronisation. :::

Row Object Macros

Row object macros allow for dynamic lookup of values during synchronisation, typically for foreign key resolution. Macros are string values prefixed with :: and follow the pattern:

::table(field):criteria

Where:

table is the source table to query
field is the column to return
criteria is a comma-separated list of field=value pairs (only AND logic supported)

Macros are evaluated per row and can reference variables using mustache tags. If the referenced value is already known from the sync file, it is used directly; otherwise, a database query is performed and cached for efficiency.

Example

In the below example we have data for two tables. test_type and test which has a column named type_id that references the test_type table's id column. In test_type the id column is a serial primary key.

You can see here that we have a simple macro defined in each field value for the type_id column that looks up id column of the internal record in the test_type table.

[
    {
        "table": "test_type",
        "keys": [ "name" ],
        "rows": [
            {
                "name": "internal",
                "label": "Internal Test Type"
            }
        ]
    },
    {
        "table": "test",
        "keys": [ "name" ],
        "rows": [
            {
                "name": "one",
                "label": "Row Number #1",
                "type_id":  "::test_type(id):name=internal"
            },
            {
                "name": "two",
                "label": "Row Number #2",
                "type_id":  "::test_type(id):name=internal"
            }
        ]
    }
]

Macro Optimisation

Macro queries are optimised for speed. If the referenced value is already defined in the loaded sync files, it is used directly. Otherwise, a database query is performed and the result is cached for the duration of the sync process. Subsequent identical macro lookups use the cached value, improving performance.

Table References

Table references allow you to define values that are populated in every row synchronised into a table. This is useful for setting common values shared across all rows in an item. Table references are defined as JSON objects, where each property is a column name and its value is used in every row.

References can use macros, which are resolved at the start of table processing and remain constant throughout the item. Table references can also be used in the keys property for composite key lookups.

Example - Basic

An example could be to group contact item types such as email addresses, physical addresses or phone numbers.

[
    {
        "table": "contact_types",
        "keys": [ "group_name", "type_name" ],
        "refs": {
            "group_name": "email"
        },
        "rows": [
            {
                "type_name": "Personal Email"
            },
            {
                "type_name": "Business Email"
            }
        ]
    },
    {
        "table": "contact_types",
        "keys": [ "group_name", "type_name" ],
        "refs": {
            "group_name": "phone"
        },
        "rows": [
            {
                "type_name": "Home Phone"
            },
            {
                "type_name": "Work Phone"
            },
            {
                "type_name": "Mobile Phone"
            }
        ]
    },
    {
        "table": "contact_types",
        "keys": [ "group_name", "type_name" ],
        "refs": {
            "group_name": "email"
        },
        "rows": [
            {
                "type_name": "Street Address"
            },
            {
                "type_name": "Postal Address"
            }
        ]
    }
]

Example - Using Macros

In this example we set the reference value type_id to the id value from the items table where the group_name=email and name=home.

[
    {
        "table": "items",
        "keys": [ "type_id", "type_name" ],
        "refs": {
            "type_id": "::item_types(id):group_name=email,name=home"
        },
        "rows": [
            {
                "item_name": "Test Item #1"
            },
            {
                "item_name": "Test Item #2"
            }
        ]
    }
]

Stage Variables

Variables can be defined in a sync item to simplify data manipulation. Define common values once in the vars property and reference them throughout the item using mustache tags (e.g., {{my_variable}}). This makes updates easier and reduces duplication.

Example

[
    {
        "table": "my_table",
        "vars": {
            "TEST_DATA": 1234
        },
        "rows": [
            {
                "name": "Test Row #1",
                "value": "{{TEST_DATA}}"
            },
            {
                "name": "Test Row #2",
                "value": "{{TEST_DATA}}"
            },
            {
                "name": "Test Row #3",
                "value": "{{TEST_DATA}}"
            }
        ]
    }
]

Macros Variables

Variables can also be used within macros, allowing for multi-item lookups. For example, you can use one macro to obtain a value, store it in a variable, and use that variable in another macro to simulate a join. This enables complex reference resolution without direct SQL joins.

Example

In this example we create a variable that references the main_type table and gets the id for a row that has the name default and version 2. This variable is then used to find a sub_type of the main type with the name init.

::: notice This example is the reason this feature was created. The use-case was there were multiple main types which had a list of sub-types that could have the same names, making it impossible to use macros to look up these row ids. :::

[
    {
        "table": "my_table",
        "vars": {
            "main_type_id": "main_type(id):name=default,version=2"
        },
        "refs": {
            "sub_type_id": "sub_type(id):main_type_id={{main_type_id}},name=init"
        },
        "rows": [
            {
                "name": "Row #1"
            },
            {
                "name": "Row #2"
            },
            {
                "name": "Row #3",
                "other_field": "other_type(id):main_type_id={{main_type_id}},name=test"
            }
        ]
    }
]

Remote Data Sync

Remote data sync allows you to synchronise data from external sources, either directly from another database server or via another Hazaar/DBI application configured with a secure sync key. This is useful for distributed systems or when direct database access is not possible.

Direct Database Sync

Direct database sync is used when the source database is accessible by the application. You can define the DBI configuration directly in the sync file or reference a named configuration stored securely. For best security, use named configurations and avoid embedding credentials in sync files.

Example

[
    {
        "table": "local_table",           //Local target table
        "source": {
            "config": "remote_database",  //DBI named configuration
            "table": "remote_table",      //Source table on remote
            "criteria": {                 //Query criteria
                "active": true
            },
            "ignoreErrors": true          //Continue if there's an error
        }
    }
]

DBI Database Sync

DBI database sync allows routing synchronisation via another Hazaar/DBI application, which is useful when direct database access is not possible or safe. The source application must have a syncKey configured in its DBI configuration. For security, avoid setting the syncKey directly in the sync file; use configuration files instead.

Example

[
    {
        "table": "local_table",                    //Local target table
        "source": {
            "hostURL": "http://remote.server.com", //Remote host URL.  This MUST be a Hazaar/DBI application base path.
            "config": "remote_database",           //A named configuration directive that must exist on both sides and contain a syncKey.
            "table": "remote_table",               //Source table on remote
            "criteria": {                          //Query criteria
                "active": true
            }
        }
    }
]