© Copyright 2023-2024 the original authors.

Abstract

This is the manual for the official Neo4j JDBC Driver.

The Neo4j JDBC Driver is in Early Access Preview (EAP) and might break unexpectedly. Being in EAP means that you have the opportunity to provide feedback on the implementation of the driver. The functionality and behaviour released in the GA release may differ from those available in the EAP.

This driver is officially supported and endorsed by Neo4j. It is a standalone driver, independent of and not built on top of the common Neo4j Java Driver. While the latter provides a Neo4j-idiomatic way to access Neo4j from Java, the JDBC driver adheres to JDBC 4.3.

1. Introduction

The JDBC acronym stands for "Java Database Connectivity" and as such is not bound exclusively to relational databases. Nevertheless, JDBC is highly influenced by the SQL standard and existing, relational databases, in regard to terms, definitions and behaviour defined. Neo4j is a graph database with quite a different paradigm than relational and a non-standardized behaviour in some areas. There might be some details that don’t map 100% in each place, and we make sure to educate you about these in this documentation

Inside this documentation we will refer to this driver as the Neo4j JDBC Driver and to the idiomatic Neo4j driver as the common Neo4j Java Driver.

The Neo4j JDBC Driver requires JDK 17 on the client side and a minimum version of Neo4j 5.5 on the server side. To use it against a Neo4j cluster, server-side routing must be enabled on the cluster.

1.1. Features

  • JDK 17 baseline

  • Fully supports the Java module system

  • Adheres to JDBC 4.3

  • Can run any Cypher statement

  • Implements DatabaseMetaData and ResultSetMetaData as fully as possible with a nearly schemaless database and general very flexible result sets, allowing for automatic metadata retrieval from ETL and ELT tools

  • Provides an SPI to hook in translators from SQL to Cypher

  • Provides an optional default implementation to translate many SQL statements into semantically similar Cypher statements

  • Can be safely used with JDBC connection pools as opposed to the common Neo4j Java Driver or any JDBC driver based on that, as it doesn’t do internal connection pooling and transaction management otherwise than dictated by the JDBC Spec

The absence of any connection pooling and transaction management is actually an advantage of the Neo4j JDBC Driver over the common Neo4j Java Driver. It allows to pick and choose any database connection pooling system such as HikariCP and transaction management such as Jakarta Transactions.

The default SQL to Cypher translation implementation is based on jOOQ by Datageekery. We are a long-time fans of how Lukas Eder—inventor of jOOQ—has bridged the gap between Java and database querying. It even inspired the Cypher-DSL, providing the other half of our translation layer. We are grateful for kick-starting the original Sql2Cypher project together in early 2023, on which we can build now.

1.2. Limitations

  • The database metadata is retrieved on a best effort base, using existing schema methods of Neo4j, such as db.labels, db.schema.nodeTypeProperties()

  • While single label nodes map naturally to table names, Nodes with multiple labels don’t

  • There is no reliable way to always determine the datatype for properties on nodes without reading all of them (which this driver does not do)

  • Some JDBC features are not yet supported (such as the CallableStatement), some feature won’t ever be supported

  • The SQL to Cypher translator does only support a limited subset of clauses and SQL constructs that can be semantically equivalent translated to Cypher (See Section 7.3.3)

  • There is no "right" way to map JOIN statements to relations, so your mileage may vary

1.3. When to use the Neo4j JDBC Driver?

This driver has been developed with the following use-cases in mind:

  • Integration with ETL and ELT tools that don’t offer an integration based on the common Neo4j Java driver

  • An easier on-ramp towards Neo4j for teams that are familiar with JDBC and want to keep on using that API, but with Cypher and Neo4j

  • Integration for ecosystems like Jakarta EE whose transaction management will directly support any compliant JDBC driver

  • Integration with database migration tools such as Flyway

There is no need to redesign an application that is build on the common Neo4j Java Driver to use this driver. If your ecosystem already provides a higher-level integration based on the common Neo4j Java Driver, such as Spring Data Neo4j (SDN) for Spring, there is no need to switch to something else. In case of Quarkus the Neo4j JDBC Driver is an option to consider: While we do provide an integration for the common Neo4j Java Driver, this integration does not support Quarkus' transaction systems in contrast to this driver.

While there is little incentive to use this driver with Hibernate (Neo4j-OGM or SDN are the better alternatives for Neo4j), it might be worth giving Spring Data JDBC a try.

1.4. Differences to the previous versions of this driver and other JDBC drivers for Neo4j

Several other JDBC drivers exists for Neo4j, most notably the previous versions 4 and 5 of this driver, originally developed by Larus BA, Italy for Neo4j. Most—if not all of them—do wrap the common Neo4j Java Driver and implement the JDBC spec on top of that. This comes with a bunch of problems:

  • The common Neo4j Java Driver manages a connection pool; JDBC drivers on the other hand delegate this task to dedicated pooling solutions: If you take the above-mentioned driver into a standard container, you will eventually end up with a pool of pools

  • The transaction management of the common Neo4j Java Driver is not exactly aligned with the way JDBC thinks about transactions, it’s usually hard to get this exactly right

  • Additionally, the original JDBC driver from Larus shades a couple of dependencies, such as Jackson as well as additional logging frameworks which takes a toll on the classpath and in case of logging, does actually lead to runtime problems

  • Existing drivers with a SQL to Cypher translation layer are "read-only" and don’t support write statements

There are some drivers available that provide a SQL to Cypher translation layer as well. Those however are read-only and cannot be used for ETL use-cases aiming to ingest data into Neo4j.

One feature that this driver does not provide is automatic reshaping or flattening of the result-sets, as the previous incarnation does: If you query for objects such as nodes, relationships, paths or maps you can and should use getObject on the result-sets and cast to the appropriate type (you find all of them inside the package org.neo4j.jdbc.values). However, the default SQL to Cypher translator will—when connected to a database—figure out what properties labels have and turn the asterisk (*) into individual columns of nodes and relationships, just like what you would expect when running a SELECT * statement.

2. About this documentation

In this documentation we will focus on getting and configuring the Neo4j JDBC Driver and enabling optional features. We will discuss the design choices made in the metadata retrieval and how we map Neo4j labels to tables as well as our opinionated choices with regards the automatic SQL to Cypher translation.

While we do provide runnable examples that due to their nature will show how to use JDBC in general, this is not a documentation about how to correctly use JDBC as an API.

3. Quickstart

Add the JDBC driver to your application, for example as a Gradle dependency:

Listing 1. Using the full bundle as a runtime dependency inside a Gradle based project
dependencies {
    runtimeOnly(org.neo4j:neo4j-jdbc-full-bundle:6.0.0-M04)
}

With that in place, you can use the JDBC driver for Neo4j as you would do with any other JDBC driver.

Listing 2. Acquire a connection using the JDBC driver and execute a query
import java.sql.DriverManager;
import java.sql.SQLException;
import java.util.List;

public final class Quickstart {

    static void queryWithCypher() {
        var query = """
                MATCH (m:Movie)<-[:ACTED_IN]-(p:Person)
                RETURN m.title AS title, collect(p.name) AS actors
                ORDER BY m.title
                """;

        var url = "jdbc:neo4j://localhost:7687";
        var username = "neo4j";
        var password = "verysecret";

        try (var con = DriverManager.getConnection(url, username, password); (1)
                var stmt = con.createStatement(); (2)
                var result = stmt.executeQuery(query)) { (3)

            while (result.next()) { (4)
                var movie = result.getString(1); (5)
                var actors = (List<String>) result.getObject("actors"); (6)
                System.out.printf("%s%n", movie);
                actors.forEach(actor -> System.out.printf("\t * %s%n", actor));
            }
        }
        catch (SQLException ex) {
            throw new RuntimeException(ex);
        }
    }

}
1 Get a JDBC connection, no need to do any class loading beforehand, the driver will be automatically registered
2 Create a reusable statement
3 Immediate execute a query on it
4 Iterate the results like you would do with any other JDBC result set
5 JDBC is index 1 based
6 JDBC also allows retrieval of result columns by name; the Neo4j JDBC driver also supports complexe objects such as lists

In the example above we used Neo4j’s lingua franca, Cypher, to query Neo4j. The Neo4j JDBC Driver has limited support for using SQL as well. It will do so automatically or on a case by case basis. For the latter you can use java.sql.Connection#nativeSQL(String) and use the result in your queries For automatic translation instantiate the driver using an additional URL parameter, sql2cypher set to true. The following example shows how:

Listing 3. Configure the JDBC driver to automatically translate SQL to cypher.
var query = """
        SELECT m.title AS title, collect(p.name) AS actors
        FROM Person as p
        JOIN Movie as m ON (m.id = p.ACTED_IN)
        ORDER BY m.title
        """; (1)

var url = "jdbc:neo4j://localhost:7687?enableSQLTranslation=true";

try (var con = DriverManager.getConnection(url, username, password);
        var stmt = con.createStatement();
        var result = stmt.executeQuery(query)) {

    while (result.next()) {
        var movie = result.getString(1);
        var actors = (List<String>) result.getObject("actors");
        System.out.printf("%s%n", movie);
        actors.forEach(actor -> System.out.printf("\t * %s%n", actor));
    }
}
1 This SQL query will be translated into the same Cypher query shown before, the remainder of the method is identical to before.
In case any tooling asks you for the name of the concrete driver class, it is: org.neo4j.jdbc.Neo4jDriver.

We will cover the SQL to Cypher translation in detail later in this manual.

The JDBC Spec does not support named parameters, only index based parameters. Indexes start at 1 in SQL. So for all PreparedStatement instances you would want to specify your parameters like this:

Listing 4. Using parameters with a PreparedStatement
var cypher = "CREATE (m:Movie {title: $1})";
try (var con = DriverManager.getConnection(url, username, password);
        PreparedStatement stmt = con.prepareStatement(cypher);) {
    stmt.setString(1, "Test");
    stmt.executeUpdate();
}

This is independent of the SQL to Cypher translation mechanism:

Listing 5. Using parameters with a PreparedStatement (SQL variant)
var sql = "INSERT INTO Movie (title) VALUES (?)";
try (var con = DriverManager.getConnection(url + "?enableSQLTranslation=true", username, password);
        PreparedStatement stmt = con.prepareStatement(sql);) {
    stmt.setString(1, "Test");
    stmt.executeUpdate();
}

You can downcast the PreparedStatement to Neo4jPreparedStatement. This JDBC extension will let you use named parameters:

Listing 6. Using named parameters with the Neo4jPreparedStatement
var match = "MATCH (n:Movie {title: $title}) RETURN n.title AS title";
try (var con = DriverManager.getConnection(url, username, password);
        Neo4jPreparedStatement stmt = (Neo4jPreparedStatement) con.prepareStatement(match);) {
    stmt.setString("title", "Test");
    try (var resultSet = stmt.executeQuery()) {
        while (resultSet.next()) {
            LOGGER.info(resultSet.getString("title"));
        }
    }
}

3.1. Getting a connection via environment variables

If you are happy to depend directly on org.neo4j.jdbc.Neo4jDriver and want to get a connection as easy as possible, you might want to use fromEnv:

Listing 7. Get a connection from the environment.
// import org.neo4j.jdbc.Neo4jDriver;

try (var con = Neo4jDriver.fromEnv().orElseThrow(); (1)
        var stmt = con.createStatement();
        var result = stmt.executeQuery(query)) {

    // Same loop as earlier
}
catch (SQLException ex) {
    throw new RuntimeException(ex);
}
1 Notice how we directly use the concrete driver class here and how the methods returns an optional: If we don’t find the required properties, we can’t create a connection.

The fromEnv looks for a couple of specific system environment variables and it will be true to the principles of the 12 factor app while doing so:

  • It will look in the System environment first

  • It will then look for a file named .env in the current working directory

There are overloads that let you configure the directory and the filename to look for. The properties we support are as follows:

NEO4J_URI

The address or URI of the instance to connect to</li>

NEO4J_USERNAME

Optional username

NEO4J_PASSWORD

Optional password

NEO4J_SQL_TRANSLATION_ENABLED

Optional flag to enable full SQL to Cypher translation, defaults to false

Information from both the System environment and the .env files are combined. If for example NEO4J_SQL_TRANSLATION_ENABLED is in the System environment but not in the .env file, it will still be picked up.

This feature is especially useful with Neo4j AuraDB. When creating a new AuraDB instance you are required to download an .env file and you can directly use it with the Neo4j JDBC Driver like this:

Listing 8. Using a .env file from AuraDB
try (
    var con = Neo4jDriver.fromEnv("Neo4j-cb3d8b2d-Created-2024-02-14.txt")
        .orElseThrow();
    var stmt = con.createStatement();
    var movies = stmt.executeQuery("MATCH (n:Movie) RETURN n.title AS title")
) {
    while (movies.next()) {
        System.out.println(movies.getString("title"));
    }
}

4. Distribution

4.1. Driver modules and dependencies

The driver consists of 3 modules:

neo4j-jdbc

This is the actual JDBC implementation for Neo4j

neo4j-jdbc-translator-spi

This is the SPI for the default SQL to Cypher translation implementation and for any further or custom implementation.

neo4j-jdbc-translator-impl

This is the optional default implementation of a SQL to Cypher translator.

If you just want to use the Neo4j JDBC Driver for running Cypher statements, the only module you need to think about is org.neo4j:neo4j-jdbc, and in case you are developing in an environment with proper dependency management, this is all you need to declare.

While the translator SPI is a required dependency and pulled in via Maven or Gradle, the actual implementation is not. This allows for

  • The possibility to opt-out of having additional dependencies in all cases you are not using SQL to Cypher translation

  • Requiring different JDK baselines or licensing modules for our implementations

  • Allowing you to create alternative ways of creating a translator

4.1.1. Dependencies

We offer two "dependency free" modules, that shade everything into one binary artifact. Those bundles should cover a lot of tooling, and we discuss those bundles below in Available bundles. In case you are interested in the actual dependencies of the driver, those are the direct, compile-time dependencies of the driver:

Listing 9. Dependency tree of the core driver.
org.neo4j:neo4j-jdbc
+- io.github.cdimascio:dotenv-java
+- io.netty:netty-handler
|  +- io.netty:netty-common
|  +- io.netty:netty-resolver
|  +- io.netty:netty-buffer
|  +- io.netty:netty-transport
|  +- io.netty:netty-transport-native-unix-common
|  \- io.netty:netty-codec
+- org.neo4j:neo4j-cypher-dsl-schema-name-support
+- org.neo4j:neo4j-jdbc-translator-spi

4.2. Available bundles

All bundles of the Neo4j JDBC Driver are distributed on Maven Central. The bundles have different characteristics that we will explain the following sections. Depending on your use-case or your environment you pick one or the other bundle, but not several at once.

4.2.1. Individual components

Pick this distribution if you have an application that uses dependencies management in which you don’t have to care about dealing with binary artifacts yourself.

We offer a Maven BOM project, which you should import when using the individual modules, so that you have always a consistent set of versions. The coordinates are org.neo4j:neo4j-jdbc-bom.

For Maven you will want to use this dependency declaration:

Listing 10. Maven dependency for the core Neo4j JDBC Driver artifact
<dependency>
    <groupId>org.neo4j</groupId>
    <artifactId>neo4j-jdbc</artifactId>
    <version>6.0.0-M04</version>
</dependency>

And for Gradle you would want to declare the following runtime dependency:

Listing 11. Neo4j JDBC Driver as Gradle runtime dependency
dependencies {
    runtimeOnly 'org.neo4j:neo4j-jdbc:6.0.0-M04'
}

If you want to use the SQL to Cypher translation from Neo4j, you need to add the following dependency in your Maven build:

Listing 12. Maven dependency for the default SQL to Cypher translator
<dependency>
    <groupId>org.neo4j</groupId>
    <artifactId>neo4j-jdbc-translator-impl</artifactId>
    <version>6.0.0-M04</version>
</dependency>

and the same coordinates, but for Gradle:

Listing 13. Gradle dependency for the default SQL to Cypher translator
dependencies {
    runtimeOnly 'org.neo4j:neo4j-jdbc-translator-impl:6.0.0-M04'
}

4.2.2. Small bundle

Pick this distribution if you work with ETL tools or tooling for relational databases that allow adding JDBC driver only as single artifacts or that make it unnecessary hard to add additional jars. This bundle does not contain the default SQL to Cypher translator! The coordinates of this bundle are org.neo4j:neo4j-jdbc-bundle and you can download it from Maven central:

All bundles can certainly be used as a normal project dependency as well. This might be useful for example if your project depends on another, potentially conflicting Netty version.

4.2.3. Full bundle

This bundle contains the default SQL to Cypher translator. Otherwise, it is identical to the small bundle. Its coordinates are org.neo4j:neo4j-jdbc-full-bundle and you can download it from central here:

4.2.4. text2cypher bundle

We also ship a very experimental bundle that we call text2cypher:

See text2cypher for more information about the text2cypher translator.

5. Configuration

5.1. Driver class name

The Neo4j JDBC Driver is org.neo4j.jdbc.Neo4jDriver. In modern Java tools you should not have to touch this class directly, but there are some connection pools and front-ends that will ask you for this. The class is public API.

We also provide org.neo4j.jdbc.Neo4jDataSource as javax.sql.DataSource.

5.2. Causal clustering and bookmarks

The Neo4j JDBC Driver uses bookmarks by default to provide causal consistency when running against a Neo4j cluster. Bookmarks are managed on the driver level itself, not on the connections spawned by an instance of the driver. Thus, all connections that are spawned by one instance, will partake in the same causal chain of transactions. Connections from different instances of the drivers will not use the same set of bookmarks and there is no built-in machinery that would enable this. If you want or need this, you can directly access the Neo4jDriver type to retrieve the current set of known bookmarks and pass them to another driver instance.

5.3. URL and connection properties

The canonical URL format for the Neo4j JDBC Driver is

jdbc:neo4j://<host>:<port>/<database>?param1=value1&param2=value2

The database name and any query parameters are optional and can be omitted. All configuration arguments, can be passed either as query parameters or via a java.util.Properties object. The latter is sometimes defined by tooling for you. With regard to authentication it’s highly recommended to follow the JDBC spec, which discourages using any form of URL authentication. All query parameters must be percent-encoded if they contain special characters, e.g., …​?param1=space%20separated.

There is no need to specify bolt or other additional prefixes in the URL, as a matter of fact: The driver does not support this. It does however support adding details about the transport being used:
neo4j+s enables encryption and only accepts SSL certificates from the server that are signed by a known certificate authority. neo4j+ssc enables encryption and accepts self-signed certificates (which must be added to the certificate store).

The driver accepts the following configuration arguments, either properties or as URL query parameters:

Table 1. Configuration arguments
Name Type Meaning Default

timeout

Integer

Timeout for connection acquisition in milliseconds

1000

agent

String

User agent

neo4j-jdbc

enableSQLTranslation

Boolean

Flag that enables automatic translation from SQL to Cypher (requires a translator on the classpath)

false

cacheSQLTranslations

Boolean

Flag that enables caching of translations. SQL translations are not "free", parsing of SQL costs a bit of time, and so does Cypher rendering. In addition, we might up look up metadata to be able to project individual properties. If this takes to long, any translation might be cached.

false

rewritePlaceholders

Boolean

Flag that allows you to use ? as placeholders in Cypher statements (as required by JDBC). These will automatically be rewritten into $1, $2$n, starting at 1, so that the numbering matches the 1-based JDBC index.

Defaults to true when enableSQLTranslation is false, false otherwise

ssl

Boolean

An optional flag that is an alternative to neo4j+s. It can be used for example to programmatically enable the full SSL chain.

null

sslMode

Enum<SSLMode>

An optional configuration for fine-grained control over SSL configuration. Allowed values are disable, require, verify-full.

null

user

String

The username (principal) to use for authentication. NOT RECOMMENDED as URL query parameter for security reasons.

neo4j

password

String

The password (credentials) to use for authentication. NOT RECOMMENDED as URL query parameter for security reasons.

password

authRealm

String

The realm to use for authentication. NOT RECOMMENDED as URL query parameter for security reasons.

null

authScheme

String

The authentication scheme to use. NOT RECOMMENDED as URL query parameter for security reasons. Currently supported values are:

  • basic (default) for basic authentication.

  • none for no authentication. The properties user, password and authRealm have no effect.

  • bearer for bearer authentication (SSO). password should be set to the bearer token; user and authRealm have no effect.

  • kerberos for kerberos authentication. Requires password to be set to the kerberos ticket; user and authRealm have no effect.

basic

useBookmarks

boolean

Enables bookmark management for full causal cluster support. This is enabled by default and the recommended setting for all scenarions that use a connection pool. If you disable it, it will only be disabled for the specific connection. Other connections retrieved from the driver instance are not affected, and the driver will still manage their bookmarks.

true

5.4. Getting a driver or a connection instance

This section most likely only applies if you use the Neo4j JDBC Driver as part of application development in contrast to using it as part of front-end tool such as DBeaver, DataGrip or UI-based ETL tools.

The easiest way to acquire a connection is directly through the java.sql.DriverManager.

Listing 14. Acquiring a JDBC connection towards Neo4j
import java.sql.DriverManager;
import java.sql.SQLException;
import java.util.Properties;

class Configuration {

    void obtainConnection() throws SQLException {

        var url = "jdbc:neo4j://localhost:7687";
        var username = "neo4j";
        var password = "verysecret";
        var connection = DriverManager.getConnection(url, username, password);
    }

}

While our connection implementation is thread-safe, it does—as dictated by the JDBC Spec—only allow one concurrent transaction per connection. For a multi-thread application you want to use a connection pool. There’s HikariCP, but usually application server and containers or frameworks bring their own. It’s safe to use any of them, as the Neo4j JDBC Driver does not do internal pooling.

If you need access to an instance of the Neo4j driver itself, you can use the following approach:

Listing 15. Acquiring an instance of the Neo4j JDBC Driver
import java.sql.DriverManager;
import java.sql.SQLException;
import java.util.Properties;

class Configuration {

    void obtainDriverAndConnection() throws SQLException {

        var url = "jdbc:neo4j://localhost:7687";
        var driver = DriverManager.getDriver(url);

        var properties = new Properties();
        properties.put("username", "neo4j");
        properties.put("password", "verysecret");
        var connection = driver.connect(url, properties);
    }

}

5.5. Securing your connection by using SSL

The Neo4j JDBC Driver supports the same SSL option as the common Java driver does, with the same URL protocols, using +s or +ssc as additional indicators for the level of security that is required.

The same configuration can also be achieved with a URL query parameter or an entry in the properties passed to the DriverManager or driver instance when asking for a connection. As long as you don’t specify contradicting values, it’s fine to combine both approaches.

5.5.1. Understanding the SSL mode

The possible settings are best explained using the available SSL modes. The following list is ordered by ascending security:

  • disable default, "I don’t care about security and don’t want to pay the overhead for encryption"

  • require "I want my data to be encrypted, and I accept the overhead. I trust that the network will make sure I always connect to the server I want." (Server must support encryption, no hostname / CA validation is done, this is only secure on private networks, without going through the "hassle" of proper certificates and should not really be used over public internet)

  • verify-full "I want my data encrypted, and I accept the overhead. I want to be sure that I connect to a server I trust, and that it’s the one I specify."

The Neo4j JDBC Driver does not include revocation checks.

The most secure option can also be enabled by just using ssl=true either as query parameter or as property entry passed to the DriverManager. This option corresponds to neo4j+s. require on the other hand corresponds to neo4j+ssc.

You might wonder why using the additional enum: We might support additional modes in the future, such as letting the service decide about SSL, or being able to express a preference towards SSL without requiring it.

Neo4j cluster can offer both plain bolt connections and encrypted SSL connection or just one of them. So just because you can connect using neo4j+s does not mean that you cannot connect using just neo4j or vice versa. This is dependent on the setup of your database. Neo4js managed offering, Neo4j AuraDB, only supports encrypted connection, so you must use +s, ssl=true or sslMode=verify-full.

5.5.2. Valid URLs

The following URLs are all valid

neo4j+s://xyz.databases.neo4j.io

Use full verification with the xzy instance at AuraDB

neo4j://xyz.databases.neo4j.io?ssl=true

The same, but using the shorthand URL parameter

neo4j://xyz.databases.neo4j.io?sslMode=verfiy-full

The same, but using the explicit mode

neo4j+s://xyz.databases.neo4j.io?ssl=true&sslMode=verify-full

Not more secure, but does not fail on you

neo4j+ssc://this.is.a.trustworthy.instance.for.sure.com

Trust whatever certificate and hostname there is, but do use SSL

neo4j://my-testing-instance.local

Use a plain connection.

We only refuse contradicting values when you use several configuration mechanism:

  • +s with ssl=false or sslMode set to disable

  • +ssc with ssl=false or any sslmode not equal to require

In essence, you cannot express to use SSL and not use it at the same time. The reason to offer several mechanism is that we want you to be able to use a fixed URL with dynamic query parameters or dynamic URLs or whatever way of configuring you prefer in a programmatic way.

5.5.3. Using .dotenv files

When you sign up for Neo4j AuraDB and create a database, you will be asked to download a text-file named similar to Neo4j-9df57663-Created-2023-06-12.txt. This is essentially a .dotenv file containing all necessary information required to connect to an aura Database.

These files can be directly used via Neo4jDriver.fromEnv(). This method exists in several overloads, which let you configure both filename and directory. Additionally, the builder behind it let’s you configure options that are not usually contained in the files from AuraDB.

6. Metadata

6.1. Catalog and schema

All the methods on connection level dealing with metadata and information about the available content provide information in terms that are defined in the SQL standard, including catalogs and schemas.

From the SQL 1992 standard (find an archived copy here):

(4.12) Catalogs are named collections of schemas in an SQL-environment. An SQL-environment contains zero or more catalogs. A catalog con- tains one or more schemas, but always contains a schema named INFORMATION_SCHEMA that contains the views and domains of the Information Schema.

We decided in 55 to not support catalogs at the beginning, so any metadata result set will return literal null when asked for the catalog of a database object. No metadata method does support filtering on a non-null catalog parameter and no catalog specifier can be used in a query. Future developments might use catalogs to describe composite databases, in essence listing the constituents of the composite database defined in the connection.

The same standard defines schemas as follows:

(4.11) An SQL-schema is a persistent descriptor that includes:

[…] the descriptor of every component of the SQL-schema.

In this International Standard, the term "schema" is used only in the sense of SQL-schema. Each component descriptor is either a domain descriptor, a base table descriptor, a view descriptor, an assertion descriptor, a privilege descriptor, a character set descriptor, a collation descriptor, or a translation descriptor. The persistent objects described by the descriptors are said to be owned by or to have been created by the <authorization identifier> of the schema.

We report the literal name public as schema for any result component of a metadata result set. We support querying for objects in any schema, however only literal null or public will potentially produce a result.

Labels will be reported as table objects with the TABLE_TYPE being literal TABLE.

6.1.1. Summary

  • Catalog: Always null, trying to filter on anything non-null does not yield results

  • Schema: Always public, filtering on public and literal will yield result, anything else won’t.

  • Table descriptors: Reported as TABLE in the TABLE_TYPE column.

6.2. Labels to tables

The CLG and Langstar groups speak about "Node type combinations" and gravitates towards "open node type semantics" in the GQL standard:

Node type combinations is a less permissive form of open node type semantics. The idea of node type combinations is that nodes also conform to a graph type if they are not of one of the node types in the node type set of the graph type, but of a node type that is an intersection of (a subset of) the node type in a node type set of the graph type.

An example for their proposal can be seen here.

We therefor compute node types in a similar way:

  • Single label nodes will map naturally to a table name, the single label will become the table name

    • The label name will be taken as is and will be case-sensitive. So if you have three labels Movie, movie, MOVIE you will see three tables in the meta-data

    • This is in-line with the default SQL to Cypher translation

  • Node type combinations will map to table names composed as label1_label2, sorting the labels alphabetically, so that it is independent of the way Neo4j will return those

  • Property sets for these Node type combinations will then be computed

7. SQL to Cypher translation

7.1. Introduction

The translation from SQL queries to Cypher is an optional feature of this driver that consists of two parts:

  • The translator SPI, which you find in the module org.neo4j:neo4j-jdbc-translator-spi. It consists of two interfaces, SqlTranslatorFactory and the actual SqlTranslator.

  • A concrete implementation of this SPI, published as org.neo4j:neo4j-jdbc-translator-impl.

The latter is covered in "Using the default translator" and bundled as the "full bundle", described in Section 4.2. The former is provided for two reasons: It allows us to distribute the driver with and without the bundled, default translator and can be an option for you to run your custom translator.

Translators can be chained, and you can have as many translators on the classpath as you want. They will be ordered by a configurable precedence with our default implementation having the lowest precedence. Thus, you can have for example a custom translator that takes care of a fixed set of queries, and if it cannot translate another, it will just be passed down to our implementation.

Translating arbitrary SQL queries to Cypher is an opinionated task as there is no right way to map table names to objects in the graph: A table name can be used as is as a label, you might want to transform it to a singular form etc. And then we haven’t even started how to map relationships: Do you want to have relationship types derived from a join table, a join column (in that case, which one?) or the name of a foreign key?

We made some assumptions that we find to match various use cases and instead of providing configuration and more code to cater for all scenarios, we offer the possibility to write your own translation layer. The driver will use the standard Java service loader mechanism to find an implementation of the SPI on the module- or classpath.

Some tools like Tableau use a class-loader that won’t let the driver use the standard Java service loader mechanism. For these scenarios we provide an additional configuration property named translatorFactory. Set this to DEFAULT for directly loading our default implementation or to a fully-qualified classname for any other factory. Be aware that either our default implementation or your custom one needs to be on the classpath nevertheless.

7.2. Translating SQL to Cypher

There’s only one requirement to enable SQL to Cypher translation: You have to have one module implementing the SPI on the classpath. This is automatically the case if you use the full-bundle under those coordinates: org.neo4j:neo4j-jdbc-full-bundle. In that case, you don’t need to add any other dependencies. If you use the individual distribution or the "small" bundle org.neo4j:neo4j-jdbc-bundle you must add the following artifact: org.neo4j:neo4j-jdbc-translator-impl.

The implementation will be automatically loaded. If you use the translation on a case-by-case basis, it will be lazily loaded, that is: No additional classes are touched or loaded into memory. If you configure automatic translation for all statements, the implementation will be eagerly loaded. There are no further configuration option with regard to loading the implementation.

7.2.1. On a case-by-case basis

The translator can be used on a case-by-case basis through the official JDBC API nativeSQL, which you find on the java.sql.Connection class. With the following imports

import java.sql.DriverManager;
import java.sql.SQLException;
import java.util.Properties;
import java.util.logging.Logger;

You just pass your SQL statement to nativeSQL and you will get Cypher back:

try (var connection = DriverManager.getConnection(url, username, password)) {
    var sql = connection.nativeSQL("SELECT * FROM Movie n");
    assert """
            MATCH (n:Movie)
            RETURN *""".equals(sql);
}

7.2.2. For all queries

If you open the connection to your Neo4j instance using enableSQLTranslation=true either as URL parameter or configuration property, all statements will be translated from SQL to Cypher on your behalf. If you configure the driver in that way, the translator will be eagerly loaded

var url = "jdbc:neo4j://localhost:7687?enableSQLTranslation=true";
try (var connection = DriverManager.getConnection(url, username, password);
        var stmnt = connection.createStatement();
        var result = stmnt.executeQuery("SELECT n.title FROM Movie n")) {
    while (result.next()) {
        LOGGER.info(result.getString("n.title"));
    }
}

Sometimes it maybe is necessary to fall back to Cypher for some statements. You might want to use some constructs that you cannot express with SQL or our default translator cannot handle the SQL necessary. We offer a special comment that you can use as a hint in your statement that will stop automatic translation: /*+ NEO4J FORCE_CYPHER */. Use it like this:

var url = "jdbc:neo4j://localhost:7687?enableSQLTranslation=true";
var query = """
        /*+ NEO4J FORCE_CYPHER */
        MATCH (:Station { name: 'Denmark Hill' })<-[:CALLS_AT]-(d:Stop)
            ((:Stop)-[:NEXT]->(:Stop)){1,3}
            (a:Stop)-[:CALLS_AT]->(:Station { name: 'Clapham Junction' })
        RETURN localtime(d.departs) AS departureTime,
            localtime(a.arrives) AS arrivalTime
        """;
try (var connection = DriverManager.getConnection(url, username, password);
        var stmnt = connection.createStatement();
        var result = stmnt.executeQuery(query)) {
    while (result.next()) {
        LOGGER.info(result.getTime("departureTime").toString());
    }
}

7.2.3. Possible error scenarios

A NoSuchElementException with a message of No SQL translators available will be thrown when there is no implementation of the SQL to Cypher translator available, and you either use java.sql.Connection.nativeSQL or enable automatic translation. The exception will be thrown when you access the method or eagerly on opening a connection in the latter case.

7.3. Using the default translator

7.3.1. Supported SQL dialects

Our default translator uses the OSS parser from jOOQ, which supports a broad spectrum of SQL dialects already. We picked the generic, default dialect of jOOQ as our default dialect, but you can overwrite this in the SQL to Cypher configuration using the parameter s2c.sqlDialect with one of the supported dialects listed in configuration below. POSTGRES might be a good choice for several integrations.

Bear in mind though that any shortcomings in the translation are probably not due to the lack of parser support, but due to the lack of an obvious, semantically equivalent Cypher construct. That means we might be able to parse a certain piece of SQL, but are unable to translate in into something meaningful that Neo4j can understand without additional, contextual information.

7.3.2. Configuration

The default implementation provides a number of configuration settings. They must be prefixed with s2c in the URL or config options:

Name Meaning Default

parseNameCase

Whether to parse table names as is or not

As is

tableToLabelMappings

A map from table names to labels

An empty map

joinColumnsToTypeMappings

A map from column names to relationship type names

An empty map

prettyPrint

Whether to format the generated Cypher or not

true

alwaysEscapeNames

Whether to always escape names

Unless explicitly configured false when pretty printing is on, otherwise true

sqlDialect

Which dialect to use when parsing, supported values are POSTGRES, SQLITE, MYSQL, H2, HSQLDB, DERBY and DEFAULT

DEFAULT

Here are a couple of examples (Note that we are using the properties config to avoid terrible long URLs in this documentation, however, all the attributes can be specified via URL, too):

Listing 16. Disable pretty printing, only escape if necessary, configure dedicated table mappings
var properties = new Properties();
properties.put("username", "neo4j");
properties.put("password", "verysecret");
properties.put("enableSQLTranslation", "true");
properties.put("s2c.prettyPrint", "false");
properties.put("s2c.alwaysEscapeNames", "false");
properties.put("s2c.tableToLabelMappings", "people:Person;movies:Movie;movie_actors:ACTED_IN");

var url = "jdbc:neo4j://localhost:7687";
var query = """
        SELECT p.name, m.title
        FROM people p
        JOIN movie_actors r ON r.person_id = p.id
        JOIN movies m ON m.id = r.person_id""";
try (var connection = DriverManager.getConnection(url, properties)) {
    var sql = connection.nativeSQL(query);
    assert "MATCH (p:Person)-[r:ACTED_IN]->(m:Movie) RETURN p.name, m.title".equals(sql);
}

In the next example we parse the table names into upper case, which might be helpful in some situations:

Listing 17. Upper case parsing
var properties = new Properties();
properties.put("username", "neo4j");
properties.put("password", "verysecret");
properties.put("enableSQLTranslation", "true");
properties.put("s2c.parseNameCase", "UPPER");

var url = "jdbc:neo4j://localhost:7687";
var query = "SELECT * FROM people";
try (var connection = DriverManager.getConnection(url, properties)) {
    var sql = connection.nativeSQL(query);
    assert """
            MATCH (people:PEOPLE)
            RETURN *""".equals(sql);
}

Here we change the prefix of parameters (they are still going only by index in SQL) and add mappings for join columns:

Listing 18. Disable pretty printing, only escape if necessary, configure dedicated table mappings
var properties = new Properties();
properties.put("username", "neo4j");
properties.put("password", "verysecret");
properties.put("enableSQLTranslation", "true");
properties.put("s2c.parseNamedParamPrefix", "$");
properties.put("s2c.joinColumnsToTypeMappings", "people.movie_id:DIRECTED");

var url = "jdbc:neo4j://localhost:7687";
var query = """
        SELECT *
        FROM people p
        JOIN movies m ON m.id = p.movie_id
        WHERE p.name = $1
        """;
try (var connection = DriverManager.getConnection(url, properties)) {
    var sql = connection.nativeSQL(query);
    assert """
            MATCH (p:people)-[:DIRECTED]->(m:movies)
            WHERE p.name = $1
            RETURN *""".equals(sql);
}

7.3.3. Supported statements

The following statements are all under tests and describe what you can expect from the default translation layer:

Translation concepts
Table names to labels

The most simple select statement to translate is a statement without the FROM clause, such as:

SELECT 1

It is equivalent without loss to the following Cypher:

RETURN 1

SELECT statements without further JOIN clauses are pretty straight forward to translate. The hardest challenge here is how to map the table name to labels:

  • We parse the SQL Statement case-sensitive by default

  • Table names will be mapped to node labels

  • Table aliases will be used as identifiable symbolic names

SELECT t.a, t.b
FROM My_Table (1)
  AS t (2)
WHERE t.a = 1
1 Will be used as the label to match, as-is, i.e. My_Table
2 The table alias will become the node-alias

The whole construct will be translated to

MATCH (t:My_Table)
WHERE t.a = 1
RETURN t.a, t.b

We recommend using table aliases, but the translations works without them as well.

Star-Selects

A star- or * select comes in different forms:

Unqualified

SELECT * FROM table

Qualified

SELECT t.* FROM table t

And a variant, selecting the relation itself: SELECT t FROM table t.

We make use of this fact to give users a way to decide whether they want to return Neo4j nodes and relationships as entities, maps or flattened to individual columns. The latter however requires our translator to have access to the schema of the underlying Neo4j database. The following sections will describe the use-cases:

Projecting individual properties

Don’t use a star-select but enumerate the properties:

SELECT m.title FROM Movie m

The table alias will be used as a symbolic name

MATCH (m:Movie)
RETURN m.title;

You can omit the table alias:

SELECT title FROM Movie

The lower-cased table name will be the symbolic name:

MATCH (movie:Movie)
RETURN movie.title;

If you access your JDBC columns by name, this might lead to code that is hard to maintain You might want to alias the column, then:

SELECT title AS title FROM Movie

So that it has a stable, well-known name:

MATCH (movie:Movie)
RETURN movie.title AS title;
Projecting all properties

If you run a

SELECT * FROM Movie m

you will get the following Cypher statement in case you run the translation offline:

MATCH (m:Movie) RETURN *

The above query will return one column ("m"), which is a Neo4j node. This is usually not what is expected in the relational world. If you run the translation online and Neo4j metadata can be retrieved, it will generate a statement that flattens the properties of each node and relationship plus their element ids:

In case the Person node has properties born and name

SELECT * FROM Person p

you will get this Cypher statement

MATCH (p:Person)
RETURN elementId(p) AS element_id,
       p.born AS born, p.name AS name

This works well with multiple tables as well (Movie has properties title and released)

SELECT * FROM Person p JOIN Movie m ON m.id = p.acted_in

Properties will be delivered as requested:

MATCH (p:Person)-[acted_in:ACTED_IN]->(m:Movie)
RETURN elementId(p) AS element_id, p.born AS born, p.name AS name,
       elementId(m) AS element_id1, m.title AS title, m.released AS released

We add increasing numbers to column names if they clash (we duplicated the name property and added a remark to the metadata):

SELECT * FROM Person p JOIN Movie m ON m.id = p.acted_in

Note the increasing numbers per duplicate name:

MATCH (p:Person)-[acted_in:ACTED_IN]->(m:Movie)
RETURN elementId(p) AS element_id,
       p.born AS born, p.name AS name, p.remark AS remark,
       elementId(m) AS element_id1,
       m.name AS name1, m.released AS released, m.remark AS remark1

The following example uses a join-table to access relationships (we explain this later in this manual), but the flattening of properties works there as well:

SELECT *
FROM people p
JOIN movie_actors r ON r.person_id = p.id
JOIN movies m ON m.id = r.person_id
MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
RETURN elementId(p) AS element_id,
       p.born AS born, p.name AS name,
       elementId(r) AS element_id1, r.role AS role,
       elementId(m) AS element_id2,
       m.title AS title, m.released AS released

Ordering without specifying a table alias does work as expected:

SELECT * FROM Person p ORDER BY name ASC

you will get this Cypher statement

MATCH (p:Person)
RETURN elementId(p) AS element_id,
       p.born AS born, p.name AS name
ORDER BY p.name

A qualified alias can be used as well. In case without meta-data it will return a map of properties of the node or relationship in question:

SELECT m.*, p.*
FROM Person p
JOIN Movie m ON m.id = p.acted_in

The corresponding columns must be downcast to a Map in JDBC:

MATCH (p:Person)-[acted_in:ACTED_IN]->(m:Movie)
RETURN m{.*} AS m, p{.*} AS p

If we add meta-data (here born and name to Person), the qualified star will project all those:

SELECT p.*, m.title AS title
FROM Person p
JOIN Movie m ON m.id = p.acted_in

Note how we also projected one single, known property from the Movie node:

MATCH (p:Person)-[acted_in:ACTED_IN]->(m:Movie)
RETURN elementId(p) AS element_id, p.born AS born, p.name AS name, m.title AS title
Returning nodes and relationships

A statement that projects a table alias such as

SELECT m FROM Movie m

will result in a Cypher statement returning the matched node as node.

MATCH (m:Movie)
RETURN m;

A node can be alias as well:

SELECT m AS node FROM Movie m

will result in a Cypher statement returning the matched node as node.

MATCH (m:Movie)
RETURN m AS node;

Unaliased tables can be used as well:

SELECT movie FROM Movie
MATCH (movie:Movie)
RETURN movie;

And multiple entities are supported, too:

SELECT p, r, m FROM Person p
JOIN ACTED_IN r ON r.person_id = p.id
JOIN Movie m ON m.id = r.movie_id
MATCH (p:Person)-[r:ACTED_IN]->(m:Movie) RETURN p, r, m
Comparing SQL with Cypher examples

Sources of the following examples are from Comparing SQL with Cypher.

Find all Products
Select and Return Records

Easy in SQL, just select everything from the products table.

SELECT p.*
FROM products as p

Similarly, in Cypher, you just match a simple pattern: all nodes with the label :Product and RETURN them.

MATCH (p:Product)
RETURN p{.*} AS p

The above query will project all properties of the matched node. If you want to select the node itself, just select it without using the asterisk:

SELECT p
FROM products as p

It will be translated into a query that returns nodes:

MATCH (p:Product)
RETURN p
Field Access, Ordering and Paging

More efficient is to return only a subset of attributes, like ProductName and UnitPrice. And while we’re on it, let’s also order by price and only return the 10 most expensive items.

SELECT p.`productName`, p.`unitPrice`
FROM products as p
ORDER BY p.`unitPrice` DESC
LIMIT 10

You can copy and paste the changes from SQL to Cypher, it’s thankfully unsurprising. But remember that labels, relationship-types and property-names are case sensitive in Neo4j.

MATCH (p:Product)
RETURN p.productName, p.unitPrice ORDER BY p.unitPrice DESC LIMIT 10

Default order direction will be translated as is:

SELECT * FROM Movies m ORDER BY m.title
MATCH (m:Movies)
RETURN * ORDER BY m.title
Expressions

Most SQL expressions will have a corresponding Cypher expression and can be translated straigt forward:

Literal Values

Literal values are usually 1:1 translations:

SELECT
    1, TRUE, FALSE, NULL, 'a'

become

RETURN 1, TRUE, FALSE, NULL, 'a'
Arithmetic expressions

So are the arithmetic expressions:

SELECT
    1 + 2,
    1 - 2,
    1 * 2,
    1 / 2,
    square(2)

Note that the underlying tech of the default translator uses Cypher-DSL internally, which will wrap arithmetic (and logical expressions) with parentheses:

RETURN
    (1 + 2),
    (1 - 2),
    (1 * 2),
    (1 / 2),
    (2 * 2)
Functions
Numeric functions

We can translate all numeric functions that are supported by Neo4j’s Cypher implementation: Mathematical functions - numeric:

SELECT
    abs(1),
    ceil(1),
    floor(1),
    round(1),
    round(1, 1),
    sign(1)

will be translated to

RETURN
    abs(1),
    ceil(1),
    floor(1),
    round(1),
    round(1, 1),
    sign(1)
Logarithmic functions

Neo4j supports a brought range of logarithmic functions and the input

SELECT
    exp(1),
    ln(1),
    log(2, 1),
    log10(1),
    sqrt(1)

will be translated to

RETURN
    exp(1),
    log(1),
    (log(1) / log(2)),
    log10(1),
    sqrt(1)
Trigonometric functions

Calls to trigonometric functions

SELECT
    acos(1),
    asin(1),
    atan(1),
    atan2(1, 2),
    cos(1),
    cot(1),
    degrees(1),
    pi(),
    radians(1),
    sin(1),
    tan(1)

will be translated to the corresponding Neo4j functions:

RETURN
    acos(1),
    asin(1),
    atan(1),
    atan2(1, 2),
    cos(1),
    cot(1),
    degrees(1),
    pi(),
    radians(1),
    sin(1),
    tan(1)
String functions

The following string manipulations are guaranteed to work:

SELECT
    lower('abc'),
    cast(3 as varchar),
    trim(' abc '),
    length('abc'),
    left('abc', 2),
    ltrim(' abc '),
    replace('abc', 'b'),
    replace('abc', 'b', 'x'),
    reverse('abc'),
    right('abc', 2),
    rtrim(' abc '),
    substring('abc', 2 - 1),
    substring('abc', 2 - 1, 2),
    upper('abc')

and will be translated to Neo4j’s versions:

RETURN
    toLower('abc'),
    toString(3),
    trim(' abc '),
    size('abc'),
    left('abc', 2),
    ltrim(' abc '),
    replace('abc', 'b', NULL),
    replace('abc', 'b', 'x'),
    reverse('abc'),
    right('abc', 2),
    rtrim(' abc '),
    substring('abc', (2 - 1)),
    substring('abc', (2 - 1), 2),
    toUpper('abc')
Scalar functions

The input

SELECT
    coalesce(1, 2),
    coalesce(1, 2, 3),
    nvl(1, 2),
    cast('1' as boolean),
    cast(1 as float),
    cast(1 as double precision),
    cast(1 as real),
    cast(1 as tinyint),
    cast(1 as smallint),
    cast(1 as int),
    cast(1 as bigint)

will be translated to (Compare Scalar functions):

RETURN
    coalesce(1, 2),
    coalesce(1, 2, 3),
    coalesce(1, 2),
    toBoolean('1'),
    toFloat(1),
    toFloat(1),
    toFloat(1),
    toInteger(1),
    toInteger(1),
    toInteger(1),
    toInteger(1)
Query expressions

Several advanced SQL expressions are supported as well, such as

CASE simple

The simple CASE expressions

SELECT
    CASE 1 WHEN 2 THEN 3 END,
    CASE 1 WHEN 2 THEN 3 ELSE 4 END,
    CASE 1 WHEN 2 THEN 3 WHEN 4 THEN 5 END,
    CASE 1 WHEN 2 THEN 3 WHEN 4 THEN 5 ELSE 6 END

which will be translated to

RETURN CASE 1 WHEN 2 THEN 3 END, CASE 1 WHEN 2 THEN 3 ELSE 4 END, CASE 1 WHEN 2 THEN 3 WHEN 4 THEN 5 END, CASE 1 WHEN 2 THEN 3 WHEN 4 THEN 5 ELSE 6 END
CASE advanced

And CASE statement using a search (See expressions for more information):

SELECT
    CASE WHEN 1 = 2 THEN 3 END,
    CASE WHEN 1 = 2 THEN 3 ELSE 4 END,
    CASE WHEN 1 = 2 THEN 3 WHEN 4 = 5 THEN 6 END,
    CASE WHEN 1 = 2 THEN 3 WHEN 4 = 5 THEN 6 ELSE 7 END

will be translated to

RETURN
    CASE WHEN 1 = 2 THEN 3 END,
    CASE WHEN 1 = 2 THEN 3 ELSE 4 END,
    CASE WHEN 1 = 2 THEN 3 WHEN 4 = 5 THEN 6 END,
    CASE WHEN 1 = 2 THEN 3 WHEN 4 = 5 THEN 6 ELSE 7 END
CASE abbreviations (which aren’t COALESCE or NVL)

The input

SELECT
    nullif(1, 2),
    nvl2(1, 2, 3)

will be translated to

RETURN
    CASE WHEN 1 = 2 THEN NULL ELSE 1 END,
    CASE WHEN 1 IS NOT NULL THEN 2 ELSE 3 END
Predicates

As with expressions a lot of logical SQL expressions and conditions used as predicates can be translated into straight forward into Cypher predicates.

Conjunctions and disjunctions.

Logical conjunctions and disjunctions are all supported:

SELECT 1 FROM p WHERE 1 = 1 AND 2 = 2 OR 3 = 3

will be translated to

MATCH (p:p)
WHERE ((1 = 1
    AND 2 = 2)
  OR 3 = 3)
RETURN 1

The input

SELECT 1 FROM p WHERE NOT 1 = 1 XOR 2 = 2

will be translated to

MATCH (p:p)
WHERE (NOT (1 = 1)
  XOR 2 = 2)
RETURN 1
Operators
Arithmetic operators

The input

SELECT 1 FROM p WHERE 1 = 1 AND 2 > 1 AND 1 < 2 AND 1 <= 2 AND 2 >= 1 AND 1 != 2

will be translated to

MATCH (p:p)
WHERE (1 = 1
  AND 2 > 1
  AND 1 < 2
  AND 1 <= 2
  AND 2 >= 1
  AND 1 <> 2)
RETURN 1
Between

Between in SQL is inclusive

SELECT 1 FROM p WHERE 2 BETWEEN 1 AND 3

and will be translated to (we can’t generate the shorter form (1 ⇐ 2 ⇐ 3) due to restrictions in the underlying generator):

MATCH (p:p)
WHERE (1 <= 2) AND (2 <= 3)
RETURN 1

SQL has a SYMMETRIC keyword for the BETWEEN clause, to indicate that you do not care whouch bound of the range is larger than the other:

SELECT 1 FROM p WHERE 2 BETWEEN SYMMETRIC 3 AND 1

We translate this into a disjunction:

MATCH (p:p)
WHERE (3 <= 2) AND (2 <= 1) OR (1 <= 2) AND (2 <= 3)
RETURN 1
Logical row value expressions

The above examples are based on scalar expressions. Row value expressions will be translated as well:

SELECT 1
FROM p
WHERE (1, 2) = (3, 4)
OR (1, 2) < (3, 4)
OR (1, 2) <= (3, 4)
OR (1, 2, 3) <> (4, 5, 6)
OR (1, 2, 3) > (4, 5, 6)
OR (1, 2, 3) >= (4, 5, 6)

Leads to a semantically equivalent cypher:

MATCH (p:p)
WHERE 1 = 3 AND 2 = 4
OR (1 < 3 OR 1 = 3 AND 2 < 4)
OR (1 < 3 OR 1 = 3 AND 2 <= 4)
OR (1 != 4 AND 2 != 5 AND 3 != 6)
OR (1 > 4 OR 1 = 4 AND (2 > 5 OR 2 = 5 AND 3 > 6))
OR (1 > 4 OR 1 = 4 AND (2 > 5 OR 2 = 5 AND 3 >= 6))
RETURN 1
Null Handling
For scalar expressions

The input

SELECT 1 FROM p WHERE 1 IS NULL AND 2 IS NOT NULL

will be translated to

MATCH (p:p)
WHERE (1 IS NULL
  AND 2 IS NOT NULL)
RETURN 1
For row value expressions

The input

SELECT 1 FROM p WHERE (1, 2) IS NULL OR (3, 4) IS NOT NULL

will be translated to

MATCH (p:p)
WHERE
  (1 IS NULL AND 2 IS NULL)
  OR (3 IS NOT NULL AND 4 IS NOT NULL)
RETURN 1
LIKE operator

The LIKE operator

SELECT * FROM movies m WHERE m.title LIKE '%Matrix%'

will be translated into a regular expressions, replacing the % with .*:

MATCH (m:`movies`) WHERE m.title =~ '.*Matrix.*'
RETURN *
Using joins to map relationships

On the surface joins are relationships materialized in SQL (Foreign keys are not). Sadly, it’s not as straight forward to map. There are several options and possibilities to handle things:

  • When joining two tables on a column, taking the left hand table column, use its name as relationship type and treat it as outgoing from left to right

  • When joining two tables with an intersection table (which you usually model in SQL for m:n relationships with attributes), use the name of that intersection table as relationship type

We implemented some variants thereof, however we don’t claim their absolute usefulness in all cases

1:n joins
Natural joins

SQL NATURAL joins are the easiest way to denote relationship names without having todo any mapping.

SELECT p, m FROM Person p
NATURAL JOIN Movie m

A one-hope NATURAL JOIN will resolve to an anonymous, wildcard relationship:

MATCH (p:Person)-->(m:Movie) RETURN p, m

NATURAL joins can be chained like this, the connecting join table does not need to exist:

SELECT p.name, r.role, m.* FROM Person p
NATURAL JOIN ACTED_IN r
NATURAL JOIN Movie m

It will be turned into a Neo4j relation:

MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
RETURN p.name, r.role,
       elementId(m) AS element_id, m.title AS title, m.released AS released
Simple join

We configured the translator use the following table mapping:

  • people mapped to label People

  • movies mapped to label Movie

With that in place, we translate

SELECT p.name, m.title
FROM people p
JOIN movies m ON m.id = p.directed

to

MATCH (p:Person)-[directed:DIRECTED]->(m:Movie)
RETURN p.name, m.title

DIRECTED is the uppercase version of the join column in the left table (p.directed).

We can add a join column mapping in the form of people.movie_id:DIRECTED if we have different column names:

SELECT p.name, m.title
FROM people p
JOIN movies m ON m.id = p.movie_id

to

MATCH (p:Person)-[directed:DIRECTED]->(m:Movie)
RETURN p.name, m.title
Using the ON clause

We used backticks here for the table- and column names and no mapping.

SELECT p.name, m.title
FROM `Person` as p
JOIN `Movie` as m ON (m.id = p.`DIRECTED`)

The translation yields in the same result as before

MATCH (p:Person)-[directed:DIRECTED]->(m:Movie)
RETURN p.name, m.title
m:n joins

An intersection table is a table that contains references to two other tables in the form of at least two columns—hopefully with foreign keys and therefor indexes defined on them. This construct is usually required in the relational model to create an m:n relationship. Sometimes the intersection table also spots additional columns. Of course, such an auxiliary construct is not necessary in Neo4j. We can model as many outgoing and incoming relationships from one label to another as we desire and they can also have properties. We can hover use that construct for our translator

The following example uses a configured mapping as follows:

  • people mapped to label People

  • movies mapped to label Movie

  • movie_actors mapped to ACTED_IN

SELECT p.name, m.title
FROM people p (1)
JOIN movie_actors r ON r.person_id = p.id (2)
JOIN movies m ON m.id = r.person_id (3)
1 The driving table from which we map outgoing relationships
2 An intersection table, that is used again in the next JOIN clause
3 The final join clause

We don’t do semantic analysis, the order of the joins matter and will lead to the following query:

MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
RETURN p.name, m.title

Multiple joins will result in a chain of relationships:

SELECT p.name AS actor, d.name AS director, m.title
FROM people p
 JOIN movie_actors r ON r.person_id = p.id
 JOIN movies m ON m.id = r.person_id
 JOIN movie_directors r2 ON r2.movie_id = m.id
 JOIN people d ON r2.person_id = d.id

as demonstrated by

MATCH (p:`Person`)-[r:`ACTED_IN`]->(m:`Movie`)<-[r2:`DIRECTED`]-(d:`Person`)
RETURN p.name AS actor, d.name AS director, m.title

Notice how the direction of the DIRECTED relationship is defined by the order of the join columns.

DML statements

In this section we list the supported Data-Manipulation-Language (DML) statements. While a SELECT statement is technically DML as well, as it is used to project existing relations into new relations, it has been covered previously.

Deleting nodes

Nodes can be deleted via the SQL DELETE statement. This can happen unconditionally:

DELETE FROM person

which will delete all person nodes:

MATCH (person:person)
DELETE person

A WHERE clause can be added to prevent this and all conditions can be used:

DELETE FROM person
WHERE person.id = 1

so that only the person node with a matching property is deleted.

MATCH (person:person)
WHERE person.id = 1
DELETE person

If you want to delete everything, but your tooling is complaining, just add a conditions that is always true:

DELETE FROM person
WHERE true

Your data is gone, either way:

MATCH (person:person)
WHERE true
DELETE person

This is safer, but also pointless:

DELETE FROM person
WHERE false

Your data is gone, either way:

MATCH (person:person)
WHERE false
DELETE person

Tables can be aliased

DELETE FROM person p

and the alias will be used in Cypher, too:

MATCH (p:person)
DELETE p

Alias tables is also support in combination with specifying the label to which the table name is mapped. Using the same query with table_mappings=person:Person configured,

DELETE FROM person p

will be translated to

MATCH (p:Person)
DELETE p

You can use SQL TRUNCATE to detach delete nodes.

TRUNCATE TABLE people

will be translated to

MATCH (people:Person)
DETACH DELETE people
Inserting data

A single list of values with explicit columns and constant values can be inserted with a simple INSERT statement:

INSERT INTO People (first_name, last_name, born) VALUES ('Helge', 'Schneider', 1955)

which will be translated to:

CREATE (people:`Person` {first_name: 'Helge', last_name: 'Schneider', born: 1955})

All expressions, including parameters, are supported:

INSERT INTO People (first_name, last_name, born) VALUES (?, ?, ?)

Parameters will be named from 0 on upwards in Cypher:

CREATE (people:`Person` {first_name: $1, last_name: $2, born: $3})

If you omit the columns names on the insertion target, we generate names:

INSERT INTO People VALUES ('Helge', 'Schneider', 1955)

Note the unknown field xxx graph properties created:

CREATE (people:`Person` {`unknown field 0`: 'Helge', `unknown field 1`: 'Schneider', `unknown field 2`: 1955})

The SQL VALUES clause actually supports list of values:

INSERT INTO People (first_name, last_name, born) VALUES
    ('Helge', 'Schneider', 1955),
    ('Bela', 'B', 1962)

Those values will be translated into a Cypher array to be unwind in the Cypher statement. This is a great solution for batching inserts:

UNWIND [
  {first_name: 'Helge', last_name: 'Schneider', born: 1955},
  {first_name: 'Bela', last_name: 'B', born: 1962}]
AS properties
CREATE (people:`Person`)
SET people = properties

A returning clause is supported as well, so that

INSERT INTO People p (name) VALUES (?) RETURNING elementId(p)

is translated into

CREATE (p:Person {name: $1}) RETURN elementId(p)
Upserts

We support a restricted range of "upserts" via the non-standard but pretty common ON DUPLICATE and ON CONFLICT SQL clauses. Upserts will generally be translated to MERGE statements. While they tend to work without constraints, you really should but unique-constraints on the node properties you merge on. Otherwise, Neo4j create duplicates due to locking issues. Read me more about the latter here.

Two options are possible to merge on all columns inserted via ON DUPLICATE KEY IGNORE and ON CONFLICT IGNORE.

INSERT INTO Movie(title, released) VALUES(?, ?) ON DUPLICATE KEY IGNORE

will be translated to:

MERGE (movie:`Movie` {title: $1, released: $2})

The same goes for ON CONFLICT DO NOTHING. In the example we configured a table mapping:

INSERT INTO actors(name, firstname) VALUES(?, ?) ON CONFLICT DO NOTHING

will be translated to:

MERGE (actors:`Actor` {name: $1, firstname: $2})

If you want to define an action, you must use ON CONFLICT specifying the key you want to merge on. While ON DUPLICATE KEY does offer upgrade options, it assumes the primary (or unique) key being violated to be known. This is most certainly the case in a relational system, but not in this translation layer, that does run without a database connection:

INSERT INTO tbl(i, j, k) VALUES (1, 40, 700)
ON CONFLICT (i) DO UPDATE SET j = 0, k = 2 * EXCLUDED.k

Take note how the special reference EXCLUDED can be used to refer to the values of columns that have not been part of the key. They will be reused with their values in the ON MATCH SET clause.

MERGE (tbl:`tbl` {i: 1})
ON CREATE SET tbl.j = 40, tbl.k = 700
ON MATCH SET tbl.j = 0, tbl.k = (2 * 700)

This works with parameters, too:

INSERT INTO tbl(i, j, k) VALUES (1, 2, ?)
ON CONFLICT (i) DO UPDATE SET j = EXCLUDED.k

Same result, but referring to the parameter:

MERGE (tbl:`tbl` {i: 1})
ON CREATE SET tbl.j = 2, tbl.k = $1
ON MATCH SET tbl.j = $1

If you just want to specify a concrete merge column instead of merging on all columns, this possible too:

INSERT INTO tbl(i, j, k) VALUES (1, 40, 700)
ON CONFLICT (i) DO NOTHING

will be using ON CREATE only:

MERGE (tbl:`tbl` {i: 1})
ON CREATE SET tbl.j = 40, tbl.k = 700

Using ON CONFLICT and specifying a key is the only way to insert multiple rows with a MERGE statement:

INSERT INTO People (first_name, last_name, born) VALUES
    ('Helge', 'Schneider', 1955),
    ('Bela', 'B', 1962)
ON CONFLICT(last_name) DO UPDATE SET born = EXCLUDED.born
UNWIND [{first_name: 'Helge', last_name: 'Schneider', born: 1955}, {first_name: 'Bela', last_name: 'B', born: 1962}] AS properties
MERGE (people:`People` {last_name: properties['last_name']})
ON CREATE SET
  people.first_name = properties.first_name,
  people.born = properties.born
ON MATCH SET people.born = properties['born']

8. text2cypher

8.1. Introduction

This is an experimental translator inspired by the Neo4j Labs project text2cypher.

If you add this translator to the classpath or use the text2cypher bundle, all queries that start with

🤖,

will be treated as natural language queries written in plain english. The driver will strip the prefix, and use OpenAI for translating the input into a Cypher statement. The driver will augment the generation of the query by passing the current graphs schema along with the input question.

Keep in mind that the following data will be sent to an external API:

  • Your database schema, including label names

  • Any natural language question that you or one of your user write

Don’t use this translator if you don’t want the above or are not a llowed to do so.

This module requires one additional configuration, the OpenAI API Key. You can use either a URL parameter, JDBC Properties entry or a environment variable:

  • URL parameter or properties name is: openAIApiKey

  • Name of the environment variable is: OPEN_AI_API_KEY

One example of a valid URL is:

jdbc:neo4j://localhost:7687?openAIApiKey=sk-xxx-your-key

With that in place, a query such as the following should work just fine:

🤖, How was The Da Vinci Code rated?
Once a natural language query was translated to Cypher, the result will be cached and further invocations of that query will use the cached result.

All that statements that do not start with 🤖 will be used as is and treated as Cypher.

Appendix A: Neo4j specific conversions

Neo4j does not offer all types used in the relational world. For some of them we offer conversions that used to be helpful in frameworks such as Spring Data Neo4j (SDN) and we implement here in the exact same way, so you could use both SDN and this driver interchangeable.

A.1. Data types for Fixed-point arithmetic

Neo4j does not support BigInteger and BigDecimal. The only way to store them is as String and read them back into the corresponding type. This is inline with SDN and OGM.

So any parameter of those types passed to PreparedStatement or CallableStatement will be stored as String, but can be equally read back through corresponding methods on the resultsets.

A.2. SQL Date, Time and Timestamps

java.sql.Date

Maps to Cypher DATE

java.sql.Time

Maps to Cypher LOCAL TIME

java.sql.Timestamp

Maps to Cypher LOCAL DATETIME

For more precise mapping use a Neo4j Value instance with the appropriate type and setObject respectively getObject.

Appendix B: Neo4j specific syntax for statements

This section deals with conventions for indexing and naming parameters in all types of statements (standard statements, prepared statements and callable statements) and other Neo4j specific syntax.

B.1. Callable statements

B.1.1. General syntax

You can invoke stored procedures as follows

Common JDBC syntax

{? = call db.index.fulltext.queryNodes(?, ?)}

Standard Neo4j syntax

call db.index.fulltext.queryNodes(?, ?) yield *

Enumerating yielded columns

{$propertyName = call db.schema.nodeTypeProperties()}

Return (only applicable for functions)

RETURN sin(?)

B.1.2. Named parameters

Our callable statement implementation (org.neo4j.jdbc.Neo4jCallableStatement) does support named parameters. As per JDBC spec those are not named placeholders in a query- or statement-string, but the actual, formal parameter for the stored procedures to be called. We support both the $ and the colon syntax, that is you might use either $name or :name.

The assigment {? = call xxx()} will be rewritten into call xxx() yield *, and {$x = call xxx()} will be rewritten as call xxx() yield x.

B.1.3. Result sets of callable statements

When you execute a callable statement via ´executeQuery`, you must use the result set returned. If you just use execute, we assume that the underlying procedure does only return one row and you use the getters on the statement itself.

Appendix C: Migrating from older versions or other JDBC drivers for Neo4j

There are some other JDBC drivers for Neo4j, under various licenses and with varying features and capabilities. In the following we outline possible migration processes. The basic usage patterns for all JDBC drivers for Neo4j are very similar, in the end it’s "just" using a JDBC compliant driver, and you would use it as described in the original Java tutorial about JDBC.

C.1. Migrating from version 4 or 5 of this driver

Version 5 and 4 of the Neo4j JDBC Driver have been mainly developed by Larus BA, Italy, a certified consulting and integration solutions partner for Neo4j. Thank you so much for all your work.

The most important change that you need to make is removing the dependency on org.neo4j:neo4j-jdbc-bolt. You need to replace it with org.neo4j:neo4j-jdbc or one of the bundles we provide, see available bundles.

C.1.1. URL format, parameters and main driver class

The previous version mapped the URLs from the common Java driver onto jdbc:, i.e. using jdbc:neo4j:neo4j+s://foobar.io:7687/ for connecting against a database running on host foobar.io on port 7687.

The following URLs (direct bolt connection) are not supported:

  • jdbc:neo4j:bolt://<host>:<port>/

  • jdbc:neo4j:bolt+s://<host>:<port>/

  • jdbc:neo4j:bolt+ssc://<host>:<port>/

The following URLs behave the same but must be rewritten:

  • jdbc:neo4j:neo4j://<host>:<port>/ becomes jdbc:neo4j://<host>:<port>/

  • jdbc:neo4j:neo4j+s://<host>:<port>/ becomes jdbc:neo4j+s://<host>:<port>/

  • jdbc:neo4j:neo4j+ssc://<host>:<port>/ becomes jdbc:neo4j+ssc://<host>:<port>/

The following configuration properties are not supported and have no other replacement:

  • leaked.sessions.logging

  • readonly

  • usebookmarks

We don’t offer any build-in retry mechanism, so the corresponding setting don’t have an effect:

  • max.transaction.retry.time

As with any persistent database connection you want to cater for failed transactions. We made good experience with resilience4j which does fit in well with common Java frameworks, such as Spring Boot.

The following properties can be achieved with standardized JDBC settings:

  • encryption: Use the appropriate transport scheme (neo4j, neo4j+s or neo4j+ssc)

  • autocommit: Use java.sql.Connection.setAutoCommit

Connection pooling can be achieved with any JDBC compliant connection pool, the following properties don’t have an effect:

  • max.connection.lifetime

  • max.connection.poolsize

The following properties just have different names:

  • connection.acquisition.timeout is timeout (as query-parameter to the URL or inside the configuration properties)

The following properties can be achieved using a different URL:

  • database is now part of the URL, instead of specifying jdbc:neo4j:neo4j+s://foobar.io:7687?database=abc you would use the database name as path segment in the url, such as: jdbc:neo4j+s://foobar.io:7687/abc

In case your tooling requires to use a concrete driver class: This JDBC driver has only org.neo4j.jdbc.Neo4jDriver. If you depend on a javax.sql.DataSource, we provide org.neo4j.jdbc.Neo4jDataSource.

C.1.2. Flattening

While the Neo4j JDBC Driver does not support flatten option, it can emulate its effect. flatten did unnest returned nodes and relationships by providing all their properties as individual columns.

If you enable automatic SQL to Cypher translation (See Section 7.1), any *-select will inspect whether it affects nodes or relationship and will unnest their properties, so that a SELECT * FROM Movie m will effectively become MATCH (m:Movie) RETURN m.title AS title, m.released AS released. Read more about this topic in Section 7.3.3.1.2).

In case you want to access the actual node, return the whole table alias or just use Cypher. The Neo4j JDBC Driver does support complex object types as return types.