© Copyright 2023-2024 the original authors.
Abstract
This is the manual for the official Neo4j JDBC Driver.
The Neo4j JDBC Driver is in Early Access Preview (EAP) and might break unexpectedly. Being in EAP means that you have the opportunity to provide feedback on the implementation of the driver. The functionality and behaviour released in the GA release may differ from those available in the EAP. |
This driver is officially supported and endorsed by Neo4j. It is a standalone driver, independent of and not built on top of the common Neo4j Java Driver. While the latter provides a Neo4j-idiomatic way to access Neo4j from Java, the JDBC driver adheres to JDBC 4.3.
1. Introduction
The JDBC acronym stands for "Java Database Connectivity" and as such is not bound exclusively to relational databases. Nevertheless, JDBC is highly influenced by the SQL standard and existing, relational databases, in regard to terms, definitions and behaviour defined. Neo4j is a graph database with quite a different paradigm than relational and a non-standardized behaviour in some areas. There might be some details that don’t map 100% in each place, and we make sure to educate you about these in this documentation
Inside this documentation we will refer to this driver as the Neo4j JDBC Driver and to the idiomatic Neo4j driver as the common Neo4j Java Driver. |
The Neo4j JDBC Driver requires JDK 17 on the client side and a minimum version of Neo4j 5.5 on the server side. To use it against a Neo4j cluster, server-side routing must be enabled on the cluster.
1.1. Features
-
JDK 17 baseline
-
Fully supports the Java module system
-
Adheres to JDBC 4.3
-
Can run any Cypher statement
-
Implements
DatabaseMetaData
andResultSetMetaData
as fully as possible with a nearly schemaless database and general very flexible result sets, allowing for automatic metadata retrieval from ETL and ELT tools -
Provides an SPI to hook in translators from SQL to Cypher
-
Provides an optional default implementation to translate many SQL statements into semantically similar Cypher statements
-
Can be safely used with JDBC connection pools as opposed to the common Neo4j Java Driver or any JDBC driver based on that, as it doesn’t do internal connection pooling and transaction management otherwise than dictated by the JDBC Spec
The absence of any connection pooling and transaction management is actually an advantage of the Neo4j JDBC Driver over the common Neo4j Java Driver. It allows to pick and choose any database connection pooling system such as HikariCP and transaction management such as Jakarta Transactions.
The default SQL to Cypher translation implementation is based on jOOQ by Datageekery. We are a long-time fans of how Lukas Eder—inventor of jOOQ—has bridged the gap between Java and database querying. It even inspired the Cypher-DSL, providing the other half of our translation layer. We are grateful for kick-starting the original Sql2Cypher project together in early 2023, on which we can build now. |
1.2. Limitations
-
The database metadata is retrieved on a best effort base, using existing schema methods of Neo4j, such as
db.labels
,db.schema.nodeTypeProperties()
-
While single label nodes map naturally to table names, Nodes with multiple labels don’t
-
There is no reliable way to always determine the datatype for properties on nodes without reading all of them (which this driver does not do)
-
Some JDBC features are not yet supported (such as the
CallableStatement
), some feature won’t ever be supported -
The SQL to Cypher translator does only support a limited subset of clauses and SQL constructs that can be semantically equivalent translated to Cypher (See Section 7.3.3)
-
There is no "right" way to map
JOIN
statements to relations, so your mileage may vary
1.3. When to use the Neo4j JDBC Driver?
This driver has been developed with the following use-cases in mind:
-
Integration with ETL and ELT tools that don’t offer an integration based on the common Neo4j Java driver
-
An easier on-ramp towards Neo4j for teams that are familiar with JDBC and want to keep on using that API, but with Cypher and Neo4j
-
Integration for ecosystems like Jakarta EE whose transaction management will directly support any compliant JDBC driver
-
Integration with database migration tools such as Flyway
There is no need to redesign an application that is build on the common Neo4j Java Driver to use this driver. If your ecosystem already provides a higher-level integration based on the common Neo4j Java Driver, such as Spring Data Neo4j (SDN) for Spring, there is no need to switch to something else. In case of Quarkus the Neo4j JDBC Driver is an option to consider: While we do provide an integration for the common Neo4j Java Driver, this integration does not support Quarkus' transaction systems in contrast to this driver.
While there is little incentive to use this driver with Hibernate (Neo4j-OGM or SDN are the better alternatives for Neo4j), it might be worth giving Spring Data JDBC a try.
1.4. Differences to the previous versions of this driver and other JDBC drivers for Neo4j
Several other JDBC drivers exists for Neo4j, most notably the previous versions 4 and 5 of this driver, originally developed by Larus BA, Italy for Neo4j. Most—if not all of them—do wrap the common Neo4j Java Driver and implement the JDBC spec on top of that. This comes with a bunch of problems:
-
The common Neo4j Java Driver manages a connection pool; JDBC drivers on the other hand delegate this task to dedicated pooling solutions: If you take the above-mentioned driver into a standard container, you will eventually end up with a pool of pools
-
The transaction management of the common Neo4j Java Driver is not exactly aligned with the way JDBC thinks about transactions, it’s usually hard to get this exactly right
-
Additionally, the original JDBC driver from Larus shades a couple of dependencies, such as Jackson as well as additional logging frameworks which takes a toll on the classpath and in case of logging, does actually lead to runtime problems
-
Existing drivers with a SQL to Cypher translation layer are "read-only" and don’t support write statements
There are some drivers available that provide a SQL to Cypher translation layer as well. Those however are read-only and cannot be used for ETL use-cases aiming to ingest data into Neo4j.
One feature that this driver does not provide is automatic reshaping or flattening of the result-sets, as the previous incarnation does:
If you query for objects such as nodes, relationships, paths or maps you can and should use getObject
on the result-sets and cast to the appropriate type (you find all of them inside the package org.neo4j.jdbc.values
).
However, the default SQL to Cypher translator will—when connected to a database—figure out what properties labels have and turn the asterisk (*
) into individual columns of nodes and relationships, just like what you would expect when running a SELECT *
statement.
2. About this documentation
In this documentation we will focus on getting and configuring the Neo4j JDBC Driver and enabling optional features. We will discuss the design choices made in the metadata retrieval and how we map Neo4j labels to tables as well as our opinionated choices with regards the automatic SQL to Cypher translation.
While we do provide runnable examples that due to their nature will show how to use JDBC in general, this is not a documentation about how to correctly use JDBC as an API.
3. Quickstart
Add the JDBC driver to your application, for example as a Gradle dependency:
dependencies {
runtimeOnly(org.neo4j:neo4j-jdbc-full-bundle:6.0.0-M04)
}
With that in place, you can use the JDBC driver for Neo4j as you would do with any other JDBC driver.
import java.sql.DriverManager;
import java.sql.SQLException;
import java.util.List;
public final class Quickstart {
static void queryWithCypher() {
var query = """
MATCH (m:Movie)<-[:ACTED_IN]-(p:Person)
RETURN m.title AS title, collect(p.name) AS actors
ORDER BY m.title
""";
var url = "jdbc:neo4j://localhost:7687";
var username = "neo4j";
var password = "verysecret";
try (var con = DriverManager.getConnection(url, username, password); (1)
var stmt = con.createStatement(); (2)
var result = stmt.executeQuery(query)) { (3)
while (result.next()) { (4)
var movie = result.getString(1); (5)
var actors = (List<String>) result.getObject("actors"); (6)
System.out.printf("%s%n", movie);
actors.forEach(actor -> System.out.printf("\t * %s%n", actor));
}
}
catch (SQLException ex) {
throw new RuntimeException(ex);
}
}
}
1 | Get a JDBC connection, no need to do any class loading beforehand, the driver will be automatically registered |
2 | Create a reusable statement |
3 | Immediate execute a query on it |
4 | Iterate the results like you would do with any other JDBC result set |
5 | JDBC is index 1 based |
6 | JDBC also allows retrieval of result columns by name; the Neo4j JDBC driver also supports complexe objects such as lists |
In the example above we used Neo4j’s lingua franca, Cypher, to query Neo4j.
The Neo4j JDBC Driver has limited support for using SQL as well.
It will do so automatically or on a case by case basis.
For the latter you can use java.sql.Connection#nativeSQL(String)
and use the result in your queries
For automatic translation instantiate the driver using an additional URL parameter, sql2cypher
set to true
.
The following example shows how:
var query = """
SELECT m.title AS title, collect(p.name) AS actors
FROM Person as p
JOIN Movie as m ON (m.id = p.ACTED_IN)
ORDER BY m.title
"""; (1)
var url = "jdbc:neo4j://localhost:7687?enableSQLTranslation=true";
try (var con = DriverManager.getConnection(url, username, password);
var stmt = con.createStatement();
var result = stmt.executeQuery(query)) {
while (result.next()) {
var movie = result.getString(1);
var actors = (List<String>) result.getObject("actors");
System.out.printf("%s%n", movie);
actors.forEach(actor -> System.out.printf("\t * %s%n", actor));
}
}
1 | This SQL query will be translated into the same Cypher query shown before, the remainder of the method is identical to before. |
In case any tooling asks you for the name of the concrete driver class, it is: org.neo4j.jdbc.Neo4jDriver .
|
We will cover the SQL to Cypher translation in detail later in this manual.
The JDBC Spec does not support named parameters, only index based parameters.
Indexes start at 1
in SQL.
So for all PreparedStatement
instances you would want to specify your parameters like this:
PreparedStatement
var cypher = "CREATE (m:Movie {title: $1})";
try (var con = DriverManager.getConnection(url, username, password);
PreparedStatement stmt = con.prepareStatement(cypher);) {
stmt.setString(1, "Test");
stmt.executeUpdate();
}
This is independent of the SQL to Cypher translation mechanism:
PreparedStatement
(SQL variant)var sql = "INSERT INTO Movie (title) VALUES (?)";
try (var con = DriverManager.getConnection(url + "?enableSQLTranslation=true", username, password);
PreparedStatement stmt = con.prepareStatement(sql);) {
stmt.setString(1, "Test");
stmt.executeUpdate();
}
You can downcast the PreparedStatement
to Neo4jPreparedStatement
.
This JDBC extension will let you use named parameters:
Neo4jPreparedStatement
var match = "MATCH (n:Movie {title: $title}) RETURN n.title AS title";
try (var con = DriverManager.getConnection(url, username, password);
Neo4jPreparedStatement stmt = (Neo4jPreparedStatement) con.prepareStatement(match);) {
stmt.setString("title", "Test");
try (var resultSet = stmt.executeQuery()) {
while (resultSet.next()) {
LOGGER.info(resultSet.getString("title"));
}
}
}
3.1. Getting a connection via environment variables
If you are happy to depend directly on org.neo4j.jdbc.Neo4jDriver
and want to get a connection as easy as possible, you might want to use fromEnv
:
// import org.neo4j.jdbc.Neo4jDriver;
try (var con = Neo4jDriver.fromEnv().orElseThrow(); (1)
var stmt = con.createStatement();
var result = stmt.executeQuery(query)) {
// Same loop as earlier
}
catch (SQLException ex) {
throw new RuntimeException(ex);
}
1 | Notice how we directly use the concrete driver class here and how the methods returns an optional: If we don’t find the required properties, we can’t create a connection. |
The fromEnv
looks for a couple of specific system environment variables and it will be true to the principles of the 12 factor app while doing so:
-
It will look in the System environment first
-
It will then look for a file named
.env
in the current working directory
There are overloads that let you configure the directory and the filename to look for. The properties we support are as follows:
NEO4J_URI
-
The address or URI of the instance to connect to</li>
NEO4J_USERNAME
-
Optional username
NEO4J_PASSWORD
-
Optional password
NEO4J_SQL_TRANSLATION_ENABLED
-
Optional flag to enable full SQL to Cypher translation, defaults to
false
Information from both the System environment and the .env files are combined. If for example NEO4J_SQL_TRANSLATION_ENABLED is in the System environment but not in the .env file, it will still be picked up.
|
This feature is especially useful with Neo4j AuraDB.
When creating a new AuraDB instance you are required to download an .env
file and you can directly use it with the Neo4j JDBC Driver like this:
try (
var con = Neo4jDriver.fromEnv("Neo4j-cb3d8b2d-Created-2024-02-14.txt")
.orElseThrow();
var stmt = con.createStatement();
var movies = stmt.executeQuery("MATCH (n:Movie) RETURN n.title AS title")
) {
while (movies.next()) {
System.out.println(movies.getString("title"));
}
}
4. Distribution
4.1. Driver modules and dependencies
The driver consists of 3 modules:
- neo4j-jdbc
-
This is the actual JDBC implementation for Neo4j
- neo4j-jdbc-translator-spi
-
This is the SPI for the default SQL to Cypher translation implementation and for any further or custom implementation.
- neo4j-jdbc-translator-impl
-
This is the optional default implementation of a SQL to Cypher translator.
If you just want to use the Neo4j JDBC Driver for running Cypher statements, the only module you need to think about is org.neo4j:neo4j-jdbc , and in case you are developing in an environment with proper dependency management, this is all you need to declare.
|
While the translator SPI is a required dependency and pulled in via Maven or Gradle, the actual implementation is not. This allows for
-
The possibility to opt-out of having additional dependencies in all cases you are not using SQL to Cypher translation
-
Requiring different JDK baselines or licensing modules for our implementations
-
Allowing you to create alternative ways of creating a translator
4.1.1. Dependencies
We offer two "dependency free" modules, that shade everything into one binary artifact. Those bundles should cover a lot of tooling, and we discuss those bundles below in Available bundles. In case you are interested in the actual dependencies of the driver, those are the direct, compile-time dependencies of the driver:
org.neo4j:neo4j-jdbc
+- io.github.cdimascio:dotenv-java
+- io.netty:netty-handler
| +- io.netty:netty-common
| +- io.netty:netty-resolver
| +- io.netty:netty-buffer
| +- io.netty:netty-transport
| +- io.netty:netty-transport-native-unix-common
| \- io.netty:netty-codec
+- org.neo4j:neo4j-cypher-dsl-schema-name-support
+- org.neo4j:neo4j-jdbc-translator-spi
4.2. Available bundles
All bundles of the Neo4j JDBC Driver are distributed on Maven Central. The bundles have different characteristics that we will explain the following sections. Depending on your use-case or your environment you pick one or the other bundle, but not several at once.
4.2.1. Individual components
Pick this distribution if you have an application that uses dependencies management in which you don’t have to care about dealing with binary artifacts yourself.
We offer a Maven BOM project, which you should import when using the individual modules, so that you have always a consistent set of versions. The coordinates are org.neo4j:neo4j-jdbc-bom .
|
For Maven you will want to use this dependency declaration:
<dependency>
<groupId>org.neo4j</groupId>
<artifactId>neo4j-jdbc</artifactId>
<version>6.0.0-M04</version>
</dependency>
And for Gradle you would want to declare the following runtime dependency:
dependencies {
runtimeOnly 'org.neo4j:neo4j-jdbc:6.0.0-M04'
}
If you want to use the SQL to Cypher translation from Neo4j, you need to add the following dependency in your Maven build:
<dependency>
<groupId>org.neo4j</groupId>
<artifactId>neo4j-jdbc-translator-impl</artifactId>
<version>6.0.0-M04</version>
</dependency>
and the same coordinates, but for Gradle:
dependencies {
runtimeOnly 'org.neo4j:neo4j-jdbc-translator-impl:6.0.0-M04'
}
4.2.2. Small bundle
Pick this distribution if you work with ETL tools or tooling for relational databases that allow adding JDBC driver only as single artifacts or that make it unnecessary hard to add additional jars.
This bundle does not contain the default SQL to Cypher translator!
The coordinates of this bundle are org.neo4j:neo4j-jdbc-bundle
and you can download it from Maven central:
All bundles can certainly be used as a normal project dependency as well. This might be useful for example if your project depends on another, potentially conflicting Netty version. |
4.2.3. Full bundle
This bundle contains the default SQL to Cypher translator.
Otherwise, it is identical to the small bundle.
Its coordinates are org.neo4j:neo4j-jdbc-full-bundle
and you can download it from central here:
4.2.4. text2cypher bundle
We also ship a very experimental bundle that we call text2cypher
:
See text2cypher for more information about the text2cypher translator.
5. Configuration
5.1. Driver class name
The Neo4j JDBC Driver is org.neo4j.jdbc.Neo4jDriver
.
In modern Java tools you should not have to touch this class directly, but there are some connection pools and front-ends that will ask you for this.
The class is public API.
We also provide org.neo4j.jdbc.Neo4jDataSource
as javax.sql.DataSource
.
5.2. Causal clustering and bookmarks
The Neo4j JDBC Driver uses bookmarks by default to provide causal consistency when running against a Neo4j cluster.
Bookmarks are managed on the driver level itself, not on the connections spawned by an instance of the driver.
Thus, all connections that are spawned by one instance, will partake in the same causal chain of transactions.
Connections from different instances of the drivers will not use the same set of bookmarks and there is no built-in machinery that would enable this.
If you want or need this, you can directly access the Neo4jDriver
type to retrieve the current set of known bookmarks and pass them to another driver instance.
5.3. URL and connection properties
The canonical URL format for the Neo4j JDBC Driver is
jdbc:neo4j://<host>:<port>/<database>?param1=value1¶m2=value2
The database name and any query parameters are optional and can be omitted.
All configuration arguments, can be passed either as query parameters or via a java.util.Properties
object.
The latter is sometimes defined by tooling for you.
With regard to authentication it’s highly recommended to follow the JDBC spec, which discourages using any form of URL authentication.
All query parameters must be percent-encoded if they contain special characters, e.g., …?param1=space%20separated
.
There is no need to specify bolt or other additional prefixes in the URL, as a matter of fact: The driver does not support this.
It does however support adding details about the transport being used:
neo4j+s enables encryption and only accepts SSL certificates from the server that are signed by a known certificate authority.
neo4j+ssc enables encryption and accepts self-signed certificates (which must be added to the certificate store).
|
The driver accepts the following configuration arguments, either properties or as URL query parameters:
Name | Type | Meaning | Default |
---|---|---|---|
|
|
Timeout for connection acquisition in milliseconds |
|
|
|
User agent |
|
|
|
Flag that enables automatic translation from SQL to Cypher (requires a translator on the classpath) |
|
|
|
Flag that enables caching of translations. SQL translations are not "free", parsing of SQL costs a bit of time, and so does Cypher rendering. In addition, we might up look up metadata to be able to project individual properties. If this takes to long, any translation might be cached. |
|
|
|
Flag that allows you to use |
Defaults to |
|
|
An optional flag that is an alternative to |
|
|
|
An optional configuration for fine-grained control over SSL configuration. Allowed values are |
|
|
|
The username (principal) to use for authentication. NOT RECOMMENDED as URL query parameter for security reasons. |
|
|
|
The password (credentials) to use for authentication. NOT RECOMMENDED as URL query parameter for security reasons. |
|
|
|
The realm to use for authentication. NOT RECOMMENDED as URL query parameter for security reasons. |
|
|
|
The authentication scheme to use. NOT RECOMMENDED as URL query parameter for security reasons. Currently supported values are:
|
|
|
|
Enables bookmark management for full causal cluster support. This is enabled by default and the recommended setting for all scenarions that use a connection pool. If you disable it, it will only be disabled for the specific connection. Other connections retrieved from the driver instance are not affected, and the driver will still manage their bookmarks. |
|
5.4. Getting a driver or a connection instance
This section most likely only applies if you use the Neo4j JDBC Driver as part of application development in contrast to using it as part of front-end tool such as DBeaver, DataGrip or UI-based ETL tools. |
The easiest way to acquire a connection is directly through the java.sql.DriverManager
.
import java.sql.DriverManager;
import java.sql.SQLException;
import java.util.Properties;
class Configuration {
void obtainConnection() throws SQLException {
var url = "jdbc:neo4j://localhost:7687";
var username = "neo4j";
var password = "verysecret";
var connection = DriverManager.getConnection(url, username, password);
}
}
While our connection implementation is thread-safe, it does—as dictated by the JDBC Spec—only allow one concurrent transaction per connection. For a multi-thread application you want to use a connection pool. There’s HikariCP, but usually application server and containers or frameworks bring their own. It’s safe to use any of them, as the Neo4j JDBC Driver does not do internal pooling.
If you need access to an instance of the Neo4j driver itself, you can use the following approach:
import java.sql.DriverManager;
import java.sql.SQLException;
import java.util.Properties;
class Configuration {
void obtainDriverAndConnection() throws SQLException {
var url = "jdbc:neo4j://localhost:7687";
var driver = DriverManager.getDriver(url);
var properties = new Properties();
properties.put("username", "neo4j");
properties.put("password", "verysecret");
var connection = driver.connect(url, properties);
}
}
5.5. Securing your connection by using SSL
The Neo4j JDBC Driver supports the same SSL option as the common Java driver does, with the same URL protocols, using +s
or +ssc
as additional indicators for the level of security that is required.
The same configuration can also be achieved with a URL query parameter or an entry in the properties passed to the DriverManager
or driver instance when asking for a connection.
As long as you don’t specify contradicting values, it’s fine to combine both approaches.
5.5.1. Understanding the SSL mode
The possible settings are best explained using the available SSL modes. The following list is ordered by ascending security:
-
disable
default, "I don’t care about security and don’t want to pay the overhead for encryption" -
require
"I want my data to be encrypted, and I accept the overhead. I trust that the network will make sure I always connect to the server I want." (Server must support encryption, no hostname / CA validation is done, this is only secure on private networks, without going through the "hassle" of proper certificates and should not really be used over public internet) -
verify-full
"I want my data encrypted, and I accept the overhead. I want to be sure that I connect to a server I trust, and that it’s the one I specify."
The Neo4j JDBC Driver does not include revocation checks. |
The most secure option can also be enabled by just using ssl=true
either as query parameter or as property entry passed to the DriverManager
.
This option corresponds to neo4j+s
.
require
on the other hand corresponds to neo4j+ssc
.
You might wonder why using the additional enum: We might support additional modes in the future, such as letting the service decide about SSL, or being able to express a preference towards SSL without requiring it.
Neo4j cluster can offer both plain bolt connections and encrypted SSL connection or just one of them. So just because you can connect using neo4j+s does not mean that you cannot connect using just neo4j or vice versa. This is dependent on the setup of your database. Neo4js managed offering, Neo4j AuraDB, only supports encrypted connection, so you must use +s , ssl=true or sslMode=verify-full .
|
5.5.2. Valid URLs
The following URLs are all valid
neo4j+s://xyz.databases.neo4j.io
-
Use full verification with the
xzy
instance at AuraDB neo4j://xyz.databases.neo4j.io?ssl=true
-
The same, but using the shorthand URL parameter
neo4j://xyz.databases.neo4j.io?sslMode=verfiy-full
-
The same, but using the explicit mode
neo4j+s://xyz.databases.neo4j.io?ssl=true&sslMode=verify-full
-
Not more secure, but does not fail on you
neo4j+ssc://this.is.a.trustworthy.instance.for.sure.com
-
Trust whatever certificate and hostname there is, but do use SSL
neo4j://my-testing-instance.local
-
Use a plain connection.
We only refuse contradicting values when you use several configuration mechanism:
-
+s
withssl=false
orsslMode
set todisable
-
+ssc
withssl=false
or anysslmode
not equal torequire
In essence, you cannot express to use SSL and not use it at the same time. The reason to offer several mechanism is that we want you to be able to use a fixed URL with dynamic query parameters or dynamic URLs or whatever way of configuring you prefer in a programmatic way.
5.5.3. Using .dotenv files
When you sign up for Neo4j AuraDB and create a database, you will be asked to download a text-file named similar to Neo4j-9df57663-Created-2023-06-12.txt
. This is essentially a .dotenv file containing all necessary information required to connect to an aura Database.
These files can be directly used via Neo4jDriver.fromEnv()
. This method exists in several overloads, which let you configure both filename and directory. Additionally, the builder behind it let’s you configure options that are not usually contained in the files from AuraDB.
6. Metadata
6.1. Catalog and schema
All the methods on connection level dealing with metadata and information about the available content provide information in terms that are defined in the SQL standard, including catalogs and schemas.
From the SQL 1992 standard (find an archived copy here):
(4.12) Catalogs are named collections of schemas in an SQL-environment. An SQL-environment contains zero or more catalogs. A catalog con- tains one or more schemas, but always contains a schema named INFORMATION_SCHEMA that contains the views and domains of the Information Schema.
We decided in 55 to not support catalogs at the beginning, so any metadata result set will return literal null
when asked for the catalog of a database object.
No metadata method does support filtering on a non-null catalog parameter and no catalog specifier can be used in a query.
Future developments might use catalogs to describe composite databases, in essence listing the constituents of the composite database defined in the connection.
The same standard defines schemas as follows:
(4.11) An SQL-schema is a persistent descriptor that includes:
[…] the descriptor of every component of the SQL-schema.
In this International Standard, the term "schema" is used only in the sense of SQL-schema. Each component descriptor is either a domain descriptor, a base table descriptor, a view descriptor, an assertion descriptor, a privilege descriptor, a character set descriptor, a collation descriptor, or a translation descriptor. The persistent objects described by the descriptors are said to be owned by or to have been created by the <authorization identifier> of the schema.
We report the literal name public
as schema for any result component of a metadata result set.
We support querying for objects in any schema, however only literal null
or public
will potentially produce a result.
Labels will be reported as table objects with the TABLE_TYPE
being literal TABLE
.
6.1.1. Summary
-
Catalog: Always
null
, trying to filter on anything non-null does not yield results -
Schema: Always
public
, filtering onpublic
and literal will yield result, anything else won’t. -
Table descriptors: Reported as
TABLE
in theTABLE_TYPE
column.
6.2. Labels to tables
The CLG and Langstar groups speak about "Node type combinations" and gravitates towards "open node type semantics" in the GQL standard:
Node type combinations is a less permissive form of open node type semantics. The idea of node type combinations is that nodes also conform to a graph type if they are not of one of the node types in the node type set of the graph type, but of a node type that is an intersection of (a subset of) the node type in a node type set of the graph type.
An example for their proposal can be seen here.
We therefor compute node types in a similar way:
-
Single label nodes will map naturally to a table name, the single label will become the table name
-
The label name will be taken as is and will be case-sensitive. So if you have three labels
Movie
,movie
,MOVIE
you will see three tables in the meta-data -
This is in-line with the default SQL to Cypher translation
-
-
Node type combinations will map to table names composed as
label1_label2
, sorting the labels alphabetically, so that it is independent of the way Neo4j will return those -
Property sets for these Node type combinations will then be computed
7. SQL to Cypher translation
7.1. Introduction
The translation from SQL queries to Cypher is an optional feature of this driver that consists of two parts:
-
The translator SPI, which you find in the module
org.neo4j:neo4j-jdbc-translator-spi
. It consists of two interfaces,SqlTranslatorFactory
and the actualSqlTranslator
. -
A concrete implementation of this SPI, published as
org.neo4j:neo4j-jdbc-translator-impl
.
The latter is covered in "Using the default translator" and bundled as the "full bundle", described in Section 4.2. The former is provided for two reasons: It allows us to distribute the driver with and without the bundled, default translator and can be an option for you to run your custom translator.
Translators can be chained, and you can have as many translators on the classpath as you want. They will be ordered by a configurable precedence with our default implementation having the lowest precedence. Thus, you can have for example a custom translator that takes care of a fixed set of queries, and if it cannot translate another, it will just be passed down to our implementation.
Translating arbitrary SQL queries to Cypher is an opinionated task as there is no right way to map table names to objects in the graph: A table name can be used as is as a label, you might want to transform it to a singular form etc. And then we haven’t even started how to map relationships: Do you want to have relationship types derived from a join table, a join column (in that case, which one?) or the name of a foreign key?
We made some assumptions that we find to match various use cases and instead of providing configuration and more code to cater for all scenarios, we offer the possibility to write your own translation layer. The driver will use the standard Java service loader mechanism to find an implementation of the SPI on the module- or classpath.
Some tools like Tableau use a class-loader that won’t let the driver use the standard Java service loader mechanism.
For these scenarios we provide an additional configuration property named translatorFactory .
Set this to DEFAULT for directly loading our default implementation or to a fully-qualified classname for any other factory.
Be aware that either our default implementation or your custom one needs to be on the classpath nevertheless.
|
7.2. Translating SQL to Cypher
There’s only one requirement to enable SQL to Cypher translation:
You have to have one module implementing the SPI on the classpath.
This is automatically the case if you use the full-bundle under those coordinates: org.neo4j:neo4j-jdbc-full-bundle
.
In that case, you don’t need to add any other dependencies.
If you use the individual distribution or the "small" bundle org.neo4j:neo4j-jdbc-bundle
you must add the following artifact: org.neo4j:neo4j-jdbc-translator-impl
.
The implementation will be automatically loaded. If you use the translation on a case-by-case basis, it will be lazily loaded, that is: No additional classes are touched or loaded into memory. If you configure automatic translation for all statements, the implementation will be eagerly loaded. There are no further configuration option with regard to loading the implementation.
7.2.1. On a case-by-case basis
The translator can be used on a case-by-case basis through the official JDBC API nativeSQL
, which you find on the java.sql.Connection
class.
With the following imports
import java.sql.DriverManager;
import java.sql.SQLException;
import java.util.Properties;
import java.util.logging.Logger;
You just pass your SQL statement to nativeSQL
and you will get Cypher back:
try (var connection = DriverManager.getConnection(url, username, password)) {
var sql = connection.nativeSQL("SELECT * FROM Movie n");
assert """
MATCH (n:Movie)
RETURN *""".equals(sql);
}
7.2.2. For all queries
If you open the connection to your Neo4j instance using enableSQLTranslation=true
either as URL parameter or configuration property, all statements will be translated from SQL to Cypher on your behalf.
If you configure the driver in that way, the translator will be eagerly loaded
var url = "jdbc:neo4j://localhost:7687?enableSQLTranslation=true";
try (var connection = DriverManager.getConnection(url, username, password);
var stmnt = connection.createStatement();
var result = stmnt.executeQuery("SELECT n.title FROM Movie n")) {
while (result.next()) {
LOGGER.info(result.getString("n.title"));
}
}
Sometimes it maybe is necessary to fall back to Cypher for some statements.
You might want to use some constructs that you cannot express with SQL or our default translator cannot handle the SQL necessary.
We offer a special comment that you can use as a hint in your statement that will stop automatic translation: /*+ NEO4J FORCE_CYPHER */
.
Use it like this:
var url = "jdbc:neo4j://localhost:7687?enableSQLTranslation=true";
var query = """
/*+ NEO4J FORCE_CYPHER */
MATCH (:Station { name: 'Denmark Hill' })<-[:CALLS_AT]-(d:Stop)
((:Stop)-[:NEXT]->(:Stop)){1,3}
(a:Stop)-[:CALLS_AT]->(:Station { name: 'Clapham Junction' })
RETURN localtime(d.departs) AS departureTime,
localtime(a.arrives) AS arrivalTime
""";
try (var connection = DriverManager.getConnection(url, username, password);
var stmnt = connection.createStatement();
var result = stmnt.executeQuery(query)) {
while (result.next()) {
LOGGER.info(result.getTime("departureTime").toString());
}
}
7.2.3. Possible error scenarios
A NoSuchElementException
with a message of No SQL translators available
will be thrown when there is no implementation of the SQL to Cypher translator available, and you either use java.sql.Connection.nativeSQL
or enable automatic translation. The exception will be thrown when you access the method or eagerly on opening a connection in the latter case.
7.3. Using the default translator
7.3.1. Supported SQL dialects
Our default translator uses the OSS parser from jOOQ, which supports a broad spectrum of SQL dialects already.
We picked the generic, default dialect of jOOQ as our default dialect, but you can overwrite this in the SQL to Cypher configuration using the parameter s2c.sqlDialect
with one of the supported dialects listed in configuration below.
POSTGRES
might be a good choice for several integrations.
Bear in mind though that any shortcomings in the translation are probably not due to the lack of parser support, but due to the lack of an obvious, semantically equivalent Cypher construct. That means we might be able to parse a certain piece of SQL, but are unable to translate in into something meaningful that Neo4j can understand without additional, contextual information.
7.3.2. Configuration
The default implementation provides a number of configuration settings.
They must be prefixed with s2c
in the URL or config options:
Name | Meaning | Default |
---|---|---|
|
Whether to parse table names as is or not |
As is |
|
A map from table names to labels |
An empty map |
|
A map from column names to relationship type names |
An empty map |
|
Whether to format the generated Cypher or not |
|
|
Whether to always escape names |
Unless explicitly configured |
|
Which dialect to use when parsing, supported values are |
|
Here are a couple of examples (Note that we are using the properties
config to avoid terrible long URLs in this documentation, however, all the attributes can be specified via URL, too):
var properties = new Properties();
properties.put("username", "neo4j");
properties.put("password", "verysecret");
properties.put("enableSQLTranslation", "true");
properties.put("s2c.prettyPrint", "false");
properties.put("s2c.alwaysEscapeNames", "false");
properties.put("s2c.tableToLabelMappings", "people:Person;movies:Movie;movie_actors:ACTED_IN");
var url = "jdbc:neo4j://localhost:7687";
var query = """
SELECT p.name, m.title
FROM people p
JOIN movie_actors r ON r.person_id = p.id
JOIN movies m ON m.id = r.person_id""";
try (var connection = DriverManager.getConnection(url, properties)) {
var sql = connection.nativeSQL(query);
assert "MATCH (p:Person)-[r:ACTED_IN]->(m:Movie) RETURN p.name, m.title".equals(sql);
}
In the next example we parse the table names into upper case, which might be helpful in some situations:
var properties = new Properties();
properties.put("username", "neo4j");
properties.put("password", "verysecret");
properties.put("enableSQLTranslation", "true");
properties.put("s2c.parseNameCase", "UPPER");
var url = "jdbc:neo4j://localhost:7687";
var query = "SELECT * FROM people";
try (var connection = DriverManager.getConnection(url, properties)) {
var sql = connection.nativeSQL(query);
assert """
MATCH (people:PEOPLE)
RETURN *""".equals(sql);
}
Here we change the prefix of parameters (they are still going only by index in SQL) and add mappings for join columns:
var properties = new Properties();
properties.put("username", "neo4j");
properties.put("password", "verysecret");
properties.put("enableSQLTranslation", "true");
properties.put("s2c.parseNamedParamPrefix", "$");
properties.put("s2c.joinColumnsToTypeMappings", "people.movie_id:DIRECTED");
var url = "jdbc:neo4j://localhost:7687";
var query = """
SELECT *
FROM people p
JOIN movies m ON m.id = p.movie_id
WHERE p.name = $1
""";
try (var connection = DriverManager.getConnection(url, properties)) {
var sql = connection.nativeSQL(query);
assert """
MATCH (p:people)-[:DIRECTED]->(m:movies)
WHERE p.name = $1
RETURN *""".equals(sql);
}
7.3.3. Supported statements
The following statements are all under tests and describe what you can expect from the default translation layer:
Translation concepts
Table names to labels
The most simple select statement to translate is a statement without the FROM
clause, such as:
SELECT 1
It is equivalent without loss to the following Cypher:
RETURN 1
SELECT
statements without further JOIN
clauses are pretty straight forward to translate.
The hardest challenge here is how to map the table name to labels:
-
We parse the SQL Statement case-sensitive by default
-
Table names will be mapped to node labels
-
Table aliases will be used as identifiable symbolic names
SELECT t.a, t.b
FROM My_Table (1)
AS t (2)
WHERE t.a = 1
1 | Will be used as the label to match, as-is, i.e. My_Table |
2 | The table alias will become the node-alias |
The whole construct will be translated to
MATCH (t:My_Table)
WHERE t.a = 1
RETURN t.a, t.b
We recommend using table aliases, but the translations works without them as well.
Star-Selects
A star- or *
select comes in different forms:
- Unqualified
-
SELECT * FROM table
- Qualified
-
SELECT t.* FROM table t
And a variant, selecting the relation itself: SELECT t FROM table t
.
We make use of this fact to give users a way to decide whether they want to return Neo4j nodes and relationships as entities, maps or flattened to individual columns. The latter however requires our translator to have access to the schema of the underlying Neo4j database. The following sections will describe the use-cases:
Don’t use a star-select but enumerate the properties:
SELECT m.title FROM Movie m
The table alias will be used as a symbolic name
MATCH (m:Movie)
RETURN m.title;
You can omit the table alias:
SELECT title FROM Movie
The lower-cased table name will be the symbolic name:
MATCH (movie:Movie)
RETURN movie.title;
If you access your JDBC columns by name, this might lead to code that is hard to maintain You might want to alias the column, then:
SELECT title AS title FROM Movie
So that it has a stable, well-known name:
MATCH (movie:Movie)
RETURN movie.title AS title;
If you run a
SELECT * FROM Movie m
you will get the following Cypher statement in case you run the translation offline:
MATCH (m:Movie) RETURN *
The above query will return one column ("m"), which is a Neo4j node. This is usually not what is expected in the relational world. If you run the translation online and Neo4j metadata can be retrieved, it will generate a statement that flattens the properties of each node and relationship plus their element ids:
In case the Person
node has properties born
and name
SELECT * FROM Person p
you will get this Cypher statement
MATCH (p:Person)
RETURN elementId(p) AS element_id,
p.born AS born, p.name AS name
This works well with multiple tables as well (Movie
has properties title
and released
)
SELECT * FROM Person p JOIN Movie m ON m.id = p.acted_in
Properties will be delivered as requested:
MATCH (p:Person)-[acted_in:ACTED_IN]->(m:Movie)
RETURN elementId(p) AS element_id, p.born AS born, p.name AS name,
elementId(m) AS element_id1, m.title AS title, m.released AS released
We add increasing numbers to column names if they clash (we duplicated the name
property and added a remark
to the metadata):
SELECT * FROM Person p JOIN Movie m ON m.id = p.acted_in
Note the increasing numbers per duplicate name:
MATCH (p:Person)-[acted_in:ACTED_IN]->(m:Movie)
RETURN elementId(p) AS element_id,
p.born AS born, p.name AS name, p.remark AS remark,
elementId(m) AS element_id1,
m.name AS name1, m.released AS released, m.remark AS remark1
The following example uses a join-table to access relationships (we explain this later in this manual), but the flattening of properties works there as well:
SELECT *
FROM people p
JOIN movie_actors r ON r.person_id = p.id
JOIN movies m ON m.id = r.person_id
MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
RETURN elementId(p) AS element_id,
p.born AS born, p.name AS name,
elementId(r) AS element_id1, r.role AS role,
elementId(m) AS element_id2,
m.title AS title, m.released AS released
Ordering without specifying a table alias does work as expected:
SELECT * FROM Person p ORDER BY name ASC
you will get this Cypher statement
MATCH (p:Person)
RETURN elementId(p) AS element_id,
p.born AS born, p.name AS name
ORDER BY p.name
A qualified alias can be used as well. In case without meta-data it will return a map of properties of the node or relationship in question:
SELECT m.*, p.*
FROM Person p
JOIN Movie m ON m.id = p.acted_in
The corresponding columns must be downcast to a Map in JDBC:
MATCH (p:Person)-[acted_in:ACTED_IN]->(m:Movie)
RETURN m{.*} AS m, p{.*} AS p
If we add meta-data (here born
and name
to Person
), the qualified star will project all those:
SELECT p.*, m.title AS title
FROM Person p
JOIN Movie m ON m.id = p.acted_in
Note how we also projected one single, known property from the Movie
node:
MATCH (p:Person)-[acted_in:ACTED_IN]->(m:Movie)
RETURN elementId(p) AS element_id, p.born AS born, p.name AS name, m.title AS title
A statement that projects a table alias such as
SELECT m FROM Movie m
will result in a Cypher statement returning the matched node as node.
MATCH (m:Movie)
RETURN m;
A node can be alias as well:
SELECT m AS node FROM Movie m
will result in a Cypher statement returning the matched node as node.
MATCH (m:Movie)
RETURN m AS node;
Unaliased tables can be used as well:
SELECT movie FROM Movie
MATCH (movie:Movie)
RETURN movie;
And multiple entities are supported, too:
SELECT p, r, m FROM Person p
JOIN ACTED_IN r ON r.person_id = p.id
JOIN Movie m ON m.id = r.movie_id
MATCH (p:Person)-[r:ACTED_IN]->(m:Movie) RETURN p, r, m
Comparing SQL with Cypher examples
Sources of the following examples are from Comparing SQL with Cypher.
Easy in SQL, just select everything from the products
table.
SELECT p.*
FROM products as p
Similarly, in Cypher, you just match a simple pattern: all nodes with the label :Product
and RETURN
them.
MATCH (p:Product)
RETURN p{.*} AS p
The above query will project all properties of the matched node. If you want to select the node itself, just select it without using the asterisk:
SELECT p
FROM products as p
It will be translated into a query that returns nodes:
MATCH (p:Product)
RETURN p
More efficient is to return only a subset of attributes, like ProductName
and UnitPrice
.
And while we’re on it, let’s also order by price and only return the 10 most expensive items.
SELECT p.`productName`, p.`unitPrice`
FROM products as p
ORDER BY p.`unitPrice` DESC
LIMIT 10
You can copy and paste the changes from SQL to Cypher, it’s thankfully unsurprising. But remember that labels, relationship-types and property-names are case sensitive in Neo4j.
MATCH (p:Product)
RETURN p.productName, p.unitPrice ORDER BY p.unitPrice DESC LIMIT 10
Default order direction will be translated as is:
SELECT * FROM Movies m ORDER BY m.title
MATCH (m:Movies)
RETURN * ORDER BY m.title
Expressions
Most SQL expressions will have a corresponding Cypher expression and can be translated straigt forward:
Literal Values
Literal values are usually 1:1 translations:
SELECT
1, TRUE, FALSE, NULL, 'a'
become
RETURN 1, TRUE, FALSE, NULL, 'a'
Arithmetic expressions
So are the arithmetic expressions:
SELECT
1 + 2,
1 - 2,
1 * 2,
1 / 2,
square(2)
Note that the underlying tech of the default translator uses Cypher-DSL internally, which will wrap arithmetic (and logical expressions) with parentheses:
RETURN
(1 + 2),
(1 - 2),
(1 * 2),
(1 / 2),
(2 * 2)
Functions
We can translate all numeric functions that are supported by Neo4j’s Cypher implementation: Mathematical functions - numeric:
SELECT
abs(1),
ceil(1),
floor(1),
round(1),
round(1, 1),
sign(1)
will be translated to
RETURN
abs(1),
ceil(1),
floor(1),
round(1),
round(1, 1),
sign(1)
Neo4j supports a brought range of logarithmic functions and the input
SELECT
exp(1),
ln(1),
log(2, 1),
log10(1),
sqrt(1)
will be translated to
RETURN
exp(1),
log(1),
(log(1) / log(2)),
log10(1),
sqrt(1)
Calls to trigonometric functions
SELECT
acos(1),
asin(1),
atan(1),
atan2(1, 2),
cos(1),
cot(1),
degrees(1),
pi(),
radians(1),
sin(1),
tan(1)
will be translated to the corresponding Neo4j functions:
RETURN
acos(1),
asin(1),
atan(1),
atan2(1, 2),
cos(1),
cot(1),
degrees(1),
pi(),
radians(1),
sin(1),
tan(1)
The following string manipulations are guaranteed to work:
SELECT
lower('abc'),
cast(3 as varchar),
trim(' abc '),
length('abc'),
left('abc', 2),
ltrim(' abc '),
replace('abc', 'b'),
replace('abc', 'b', 'x'),
reverse('abc'),
right('abc', 2),
rtrim(' abc '),
substring('abc', 2 - 1),
substring('abc', 2 - 1, 2),
upper('abc')
and will be translated to Neo4j’s versions:
RETURN
toLower('abc'),
toString(3),
trim(' abc '),
size('abc'),
left('abc', 2),
ltrim(' abc '),
replace('abc', 'b', NULL),
replace('abc', 'b', 'x'),
reverse('abc'),
right('abc', 2),
rtrim(' abc '),
substring('abc', (2 - 1)),
substring('abc', (2 - 1), 2),
toUpper('abc')
The input
SELECT
coalesce(1, 2),
coalesce(1, 2, 3),
nvl(1, 2),
cast('1' as boolean),
cast(1 as float),
cast(1 as double precision),
cast(1 as real),
cast(1 as tinyint),
cast(1 as smallint),
cast(1 as int),
cast(1 as bigint)
will be translated to (Compare Scalar functions):
RETURN
coalesce(1, 2),
coalesce(1, 2, 3),
coalesce(1, 2),
toBoolean('1'),
toFloat(1),
toFloat(1),
toFloat(1),
toInteger(1),
toInteger(1),
toInteger(1),
toInteger(1)
Query expressions
Several advanced SQL expressions are supported as well, such as
CASE
simpleThe simple CASE
expressions
SELECT
CASE 1 WHEN 2 THEN 3 END,
CASE 1 WHEN 2 THEN 3 ELSE 4 END,
CASE 1 WHEN 2 THEN 3 WHEN 4 THEN 5 END,
CASE 1 WHEN 2 THEN 3 WHEN 4 THEN 5 ELSE 6 END
which will be translated to
RETURN CASE 1 WHEN 2 THEN 3 END, CASE 1 WHEN 2 THEN 3 ELSE 4 END, CASE 1 WHEN 2 THEN 3 WHEN 4 THEN 5 END, CASE 1 WHEN 2 THEN 3 WHEN 4 THEN 5 ELSE 6 END
CASE
advancedAnd CASE
statement using a search (See expressions for more information):
SELECT
CASE WHEN 1 = 2 THEN 3 END,
CASE WHEN 1 = 2 THEN 3 ELSE 4 END,
CASE WHEN 1 = 2 THEN 3 WHEN 4 = 5 THEN 6 END,
CASE WHEN 1 = 2 THEN 3 WHEN 4 = 5 THEN 6 ELSE 7 END
will be translated to
RETURN
CASE WHEN 1 = 2 THEN 3 END,
CASE WHEN 1 = 2 THEN 3 ELSE 4 END,
CASE WHEN 1 = 2 THEN 3 WHEN 4 = 5 THEN 6 END,
CASE WHEN 1 = 2 THEN 3 WHEN 4 = 5 THEN 6 ELSE 7 END
CASE
abbreviations (which aren’t COALESCE
or NVL
)The input
SELECT
nullif(1, 2),
nvl2(1, 2, 3)
will be translated to
RETURN
CASE WHEN 1 = 2 THEN NULL ELSE 1 END,
CASE WHEN 1 IS NOT NULL THEN 2 ELSE 3 END
Predicates
As with expressions a lot of logical SQL expressions and conditions used as predicates can be translated into straight forward into Cypher predicates.
Conjunctions and disjunctions.
Logical conjunctions and disjunctions are all supported:
SELECT 1 FROM p WHERE 1 = 1 AND 2 = 2 OR 3 = 3
will be translated to
MATCH (p:p)
WHERE ((1 = 1
AND 2 = 2)
OR 3 = 3)
RETURN 1
The input
SELECT 1 FROM p WHERE NOT 1 = 1 XOR 2 = 2
will be translated to
MATCH (p:p)
WHERE (NOT (1 = 1)
XOR 2 = 2)
RETURN 1
Operators
The input
SELECT 1 FROM p WHERE 1 = 1 AND 2 > 1 AND 1 < 2 AND 1 <= 2 AND 2 >= 1 AND 1 != 2
will be translated to
MATCH (p:p)
WHERE (1 = 1
AND 2 > 1
AND 1 < 2
AND 1 <= 2
AND 2 >= 1
AND 1 <> 2)
RETURN 1
Between in SQL is inclusive
SELECT 1 FROM p WHERE 2 BETWEEN 1 AND 3
and will be translated to (we can’t generate the shorter form (1 ⇐ 2 ⇐ 3
) due to restrictions in the underlying generator):
MATCH (p:p)
WHERE (1 <= 2) AND (2 <= 3)
RETURN 1
SQL has a SYMMETRIC
keyword for the BETWEEN
clause, to indicate that you do not care whouch bound of the range is larger than the other:
SELECT 1 FROM p WHERE 2 BETWEEN SYMMETRIC 3 AND 1
We translate this into a disjunction:
MATCH (p:p)
WHERE (3 <= 2) AND (2 <= 1) OR (1 <= 2) AND (2 <= 3)
RETURN 1
Logical row value expressions
The above examples are based on scalar expressions. Row value expressions will be translated as well:
SELECT 1
FROM p
WHERE (1, 2) = (3, 4)
OR (1, 2) < (3, 4)
OR (1, 2) <= (3, 4)
OR (1, 2, 3) <> (4, 5, 6)
OR (1, 2, 3) > (4, 5, 6)
OR (1, 2, 3) >= (4, 5, 6)
Leads to a semantically equivalent cypher:
MATCH (p:p)
WHERE 1 = 3 AND 2 = 4
OR (1 < 3 OR 1 = 3 AND 2 < 4)
OR (1 < 3 OR 1 = 3 AND 2 <= 4)
OR (1 != 4 AND 2 != 5 AND 3 != 6)
OR (1 > 4 OR 1 = 4 AND (2 > 5 OR 2 = 5 AND 3 > 6))
OR (1 > 4 OR 1 = 4 AND (2 > 5 OR 2 = 5 AND 3 >= 6))
RETURN 1
Null Handling
The input
SELECT 1 FROM p WHERE 1 IS NULL AND 2 IS NOT NULL
will be translated to
MATCH (p:p)
WHERE (1 IS NULL
AND 2 IS NOT NULL)
RETURN 1
The input
SELECT 1 FROM p WHERE (1, 2) IS NULL OR (3, 4) IS NOT NULL
will be translated to
MATCH (p:p)
WHERE
(1 IS NULL AND 2 IS NULL)
OR (3 IS NOT NULL AND 4 IS NOT NULL)
RETURN 1
LIKE
operator
The LIKE
operator
SELECT * FROM movies m WHERE m.title LIKE '%Matrix%'
will be translated into a regular expressions, replacing the %
with .*
:
MATCH (m:`movies`) WHERE m.title =~ '.*Matrix.*'
RETURN *
Using joins to map relationships
On the surface joins are relationships materialized in SQL (Foreign keys are not). Sadly, it’s not as straight forward to map. There are several options and possibilities to handle things:
-
When joining two tables on a column, taking the left hand table column, use its name as relationship type and treat it as outgoing from left to right
-
When joining two tables with an intersection table (which you usually model in SQL for m:n relationships with attributes), use the name of that intersection table as relationship type
We implemented some variants thereof, however we don’t claim their absolute usefulness in all cases
1:n joins
SQL NATURAL
joins are the easiest way to denote relationship names without having todo any mapping.
SELECT p, m FROM Person p
NATURAL JOIN Movie m
A one-hope NATURAL JOIN
will resolve to an anonymous, wildcard relationship:
MATCH (p:Person)-->(m:Movie) RETURN p, m
NATURAL
joins can be chained like this, the connecting join table does not need to exist:
SELECT p.name, r.role, m.* FROM Person p
NATURAL JOIN ACTED_IN r
NATURAL JOIN Movie m
It will be turned into a Neo4j relation:
MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
RETURN p.name, r.role,
elementId(m) AS element_id, m.title AS title, m.released AS released
We configured the translator use the following table mapping:
-
people
mapped to labelPeople
-
movies
mapped to labelMovie
With that in place, we translate
SELECT p.name, m.title
FROM people p
JOIN movies m ON m.id = p.directed
to
MATCH (p:Person)-[directed:DIRECTED]->(m:Movie)
RETURN p.name, m.title
DIRECTED
is the uppercase version of the join column in the left table (p.directed
).
We can add a join column mapping in the form of people.movie_id:DIRECTED
if we have different column names:
SELECT p.name, m.title
FROM people p
JOIN movies m ON m.id = p.movie_id
to
MATCH (p:Person)-[directed:DIRECTED]->(m:Movie)
RETURN p.name, m.title
ON
clauseWe used backticks here for the table- and column names and no mapping.
SELECT p.name, m.title
FROM `Person` as p
JOIN `Movie` as m ON (m.id = p.`DIRECTED`)
The translation yields in the same result as before
MATCH (p:Person)-[directed:DIRECTED]->(m:Movie)
RETURN p.name, m.title
m:n joins
An intersection table is a table that contains references to two other tables in the form of at least two columns—hopefully with foreign keys and therefor indexes defined on them. This construct is usually required in the relational model to create an m:n relationship. Sometimes the intersection table also spots additional columns. Of course, such an auxiliary construct is not necessary in Neo4j. We can model as many outgoing and incoming relationships from one label to another as we desire and they can also have properties. We can hover use that construct for our translator
The following example uses a configured mapping as follows:
-
people
mapped to labelPeople
-
movies
mapped to labelMovie
-
movie_actors
mapped toACTED_IN
SELECT p.name, m.title
FROM people p (1)
JOIN movie_actors r ON r.person_id = p.id (2)
JOIN movies m ON m.id = r.person_id (3)
1 | The driving table from which we map outgoing relationships |
2 | An intersection table, that is used again in the next JOIN clause |
3 | The final join clause |
We don’t do semantic analysis, the order of the joins matter and will lead to the following query:
MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
RETURN p.name, m.title
Multiple joins will result in a chain of relationships:
SELECT p.name AS actor, d.name AS director, m.title
FROM people p
JOIN movie_actors r ON r.person_id = p.id
JOIN movies m ON m.id = r.person_id
JOIN movie_directors r2 ON r2.movie_id = m.id
JOIN people d ON r2.person_id = d.id
as demonstrated by
MATCH (p:`Person`)-[r:`ACTED_IN`]->(m:`Movie`)<-[r2:`DIRECTED`]-(d:`Person`)
RETURN p.name AS actor, d.name AS director, m.title
Notice how the direction of the DIRECTED
relationship is defined by the order of the join columns.
DML statements
In this section we list the supported Data-Manipulation-Language (DML) statements.
While a SELECT
statement is technically DML as well, as it is used to project existing relations into new relations, it has been covered previously.
Deleting nodes
Nodes can be deleted via the SQL DELETE
statement.
This can happen unconditionally:
DELETE FROM person
which will delete all person
nodes:
MATCH (person:person)
DELETE person
A WHERE
clause can be added to prevent this and all conditions can be used:
DELETE FROM person
WHERE person.id = 1
so that only the person
node with a matching property is deleted.
MATCH (person:person)
WHERE person.id = 1
DELETE person
If you want to delete everything, but your tooling is complaining, just add a conditions that is always true
:
DELETE FROM person
WHERE true
Your data is gone, either way:
MATCH (person:person)
WHERE true
DELETE person
This is safer, but also pointless:
DELETE FROM person
WHERE false
Your data is gone, either way:
MATCH (person:person)
WHERE false
DELETE person
Tables can be aliased
DELETE FROM person p
and the alias will be used in Cypher, too:
MATCH (p:person)
DELETE p
Alias tables is also support in combination with specifying the label to which the table name is mapped.
Using the same query with table_mappings=person:Person
configured,
DELETE FROM person p
will be translated to
MATCH (p:Person)
DELETE p
Deleting nodes and their related nodes
You can use SQL TRUNCATE
to detach delete nodes.
TRUNCATE TABLE people
will be translated to
MATCH (people:Person)
DETACH DELETE people
Inserting data
A single list of values with explicit columns and constant values can be inserted with a simple INSERT
statement:
INSERT INTO People (first_name, last_name, born) VALUES ('Helge', 'Schneider', 1955)
which will be translated to:
CREATE (people:`Person` {first_name: 'Helge', last_name: 'Schneider', born: 1955})
All expressions, including parameters, are supported:
INSERT INTO People (first_name, last_name, born) VALUES (?, ?, ?)
Parameters will be named from 0 on upwards in Cypher:
CREATE (people:`Person` {first_name: $1, last_name: $2, born: $3})
If you omit the columns names on the insertion target, we generate names:
INSERT INTO People VALUES ('Helge', 'Schneider', 1955)
Note the unknown field xxx
graph properties created:
CREATE (people:`Person` {`unknown field 0`: 'Helge', `unknown field 1`: 'Schneider', `unknown field 2`: 1955})
The SQL VALUES
clause actually supports list of values:
INSERT INTO People (first_name, last_name, born) VALUES
('Helge', 'Schneider', 1955),
('Bela', 'B', 1962)
Those values will be translated into a Cypher array to be unwind in the Cypher statement. This is a great solution for batching inserts:
UNWIND [
{first_name: 'Helge', last_name: 'Schneider', born: 1955},
{first_name: 'Bela', last_name: 'B', born: 1962}]
AS properties
CREATE (people:`Person`)
SET people = properties
A returning clause is supported as well, so that
INSERT INTO People p (name) VALUES (?) RETURNING elementId(p)
is translated into
CREATE (p:Person {name: $1}) RETURN elementId(p)
Upserts
We support a restricted range of "upserts" via the non-standard but pretty common ON DUPLICATE
and ON CONFLICT
SQL clauses.
Upserts will generally be translated to MERGE
statements.
While they tend to work without constraints, you really should but unique-constraints on the node properties you merge on.
Otherwise, Neo4j create duplicates due to locking issues.
Read me more about the latter here.
Two options are possible to merge on all columns inserted via ON DUPLICATE KEY IGNORE
and ON CONFLICT IGNORE
.
INSERT INTO Movie(title, released) VALUES(?, ?) ON DUPLICATE KEY IGNORE
will be translated to:
MERGE (movie:`Movie` {title: $1, released: $2})
The same goes for ON CONFLICT DO NOTHING
.
In the example we configured a table mapping:
INSERT INTO actors(name, firstname) VALUES(?, ?) ON CONFLICT DO NOTHING
will be translated to:
MERGE (actors:`Actor` {name: $1, firstname: $2})
If you want to define an action, you must use ON CONFLICT
specifying the key you want to merge on.
While ON DUPLICATE KEY
does offer upgrade options, it assumes the primary (or unique) key being violated to be known.
This is most certainly the case in a relational system, but not in this translation layer, that does run without a database connection:
INSERT INTO tbl(i, j, k) VALUES (1, 40, 700)
ON CONFLICT (i) DO UPDATE SET j = 0, k = 2 * EXCLUDED.k
Take note how the special reference EXCLUDED
can be used to refer to the values of columns that have not been part of the key.
They will be reused with their values in the ON MATCH SET
clause.
MERGE (tbl:`tbl` {i: 1})
ON CREATE SET tbl.j = 40, tbl.k = 700
ON MATCH SET tbl.j = 0, tbl.k = (2 * 700)
This works with parameters, too:
INSERT INTO tbl(i, j, k) VALUES (1, 2, ?)
ON CONFLICT (i) DO UPDATE SET j = EXCLUDED.k
Same result, but referring to the parameter:
MERGE (tbl:`tbl` {i: 1})
ON CREATE SET tbl.j = 2, tbl.k = $1
ON MATCH SET tbl.j = $1
If you just want to specify a concrete merge column instead of merging on all columns, this possible too:
INSERT INTO tbl(i, j, k) VALUES (1, 40, 700)
ON CONFLICT (i) DO NOTHING
will be using ON CREATE
only:
MERGE (tbl:`tbl` {i: 1})
ON CREATE SET tbl.j = 40, tbl.k = 700
Using ON CONFLICT
and specifying a key is the only way to insert multiple rows with a MERGE
statement:
INSERT INTO People (first_name, last_name, born) VALUES
('Helge', 'Schneider', 1955),
('Bela', 'B', 1962)
ON CONFLICT(last_name) DO UPDATE SET born = EXCLUDED.born
UNWIND [{first_name: 'Helge', last_name: 'Schneider', born: 1955}, {first_name: 'Bela', last_name: 'B', born: 1962}] AS properties
MERGE (people:`People` {last_name: properties['last_name']})
ON CREATE SET
people.first_name = properties.first_name,
people.born = properties.born
ON MATCH SET people.born = properties['born']
8. text2cypher
8.1. Introduction
This is an experimental translator inspired by the Neo4j Labs project text2cypher.
If you add this translator to the classpath or use the text2cypher bundle, all queries that start with
🤖,
will be treated as natural language queries written in plain english. The driver will strip the prefix, and use OpenAI for translating the input into a Cypher statement. The driver will augment the generation of the query by passing the current graphs schema along with the input question.
Keep in mind that the following data will be sent to an external API:
Don’t use this translator if you don’t want the above or are not a llowed to do so. |
This module requires one additional configuration, the OpenAI API Key. You can use either a URL parameter, JDBC Properties entry or a environment variable:
-
URL parameter or properties name is:
openAIApiKey
-
Name of the environment variable is:
OPEN_AI_API_KEY
One example of a valid URL is:
jdbc:neo4j://localhost:7687?openAIApiKey=sk-xxx-your-key
With that in place, a query such as the following should work just fine:
🤖, How was The Da Vinci Code rated?
Once a natural language query was translated to Cypher, the result will be cached and further invocations of that query will use the cached result. |
All that statements that do not start with 🤖 will be used as is and treated as Cypher.
Get the full, ready to use bundle here: https://repo.maven.apache.org/maven2/org/neo4j/neo4j-jdbc-text2cypher-bundle/6.0.0-M04/neo4j-jdbc-text2cypher-bundle-6.0.0-M04.jar
Appendix A: Neo4j specific conversions
Neo4j does not offer all types used in the relational world. For some of them we offer conversions that used to be helpful in frameworks such as Spring Data Neo4j (SDN) and we implement here in the exact same way, so you could use both SDN and this driver interchangeable.
A.1. Data types for Fixed-point arithmetic
Neo4j does not support BigInteger
and BigDecimal
.
The only way to store them is as String and read them back into the corresponding type.
This is inline with SDN and OGM.
So any parameter of those types passed to PreparedStatement
or CallableStatement
will be stored as String, but can be equally read back through corresponding methods on the resultsets.
A.2. SQL Date, Time and Timestamps
java.sql.Date
-
Maps to Cypher
DATE
java.sql.Time
-
Maps to Cypher
LOCAL TIME
java.sql.Timestamp
-
Maps to Cypher
LOCAL DATETIME
For more precise mapping use a Neo4j Value
instance with the appropriate type and setObject
respectively getObject
.
Appendix B: Neo4j specific syntax for statements
This section deals with conventions for indexing and naming parameters in all types of statements (standard statements, prepared statements and callable statements) and other Neo4j specific syntax.
B.1. Callable statements
B.1.1. General syntax
You can invoke stored procedures as follows
- Common JDBC syntax
-
{? = call db.index.fulltext.queryNodes(?, ?)}
- Standard Neo4j syntax
-
call db.index.fulltext.queryNodes(?, ?) yield *
- Enumerating yielded columns
-
{$propertyName = call db.schema.nodeTypeProperties()}
- Return (only applicable for functions)
-
RETURN sin(?)
B.1.2. Named parameters
Our callable statement implementation (org.neo4j.jdbc.Neo4jCallableStatement
) does support named parameters. As per JDBC spec those are not named placeholders in a query- or statement-string, but the actual, formal parameter for the stored procedures to be called.
We support both the $
and the colon syntax, that is you might use either $name
or :name
.
The assigment {? = call xxx()}
will be rewritten into call xxx() yield *
, and {$x = call xxx()}
will be rewritten as call xxx() yield x
.
B.1.3. Result sets of callable statements
When you execute a callable statement via ´executeQuery`, you must use the result set returned.
If you just use execute
, we assume that the underlying procedure does only return one row and you use the getters on the statement itself.
Appendix C: Migrating from older versions or other JDBC drivers for Neo4j
There are some other JDBC drivers for Neo4j, under various licenses and with varying features and capabilities. In the following we outline possible migration processes. The basic usage patterns for all JDBC drivers for Neo4j are very similar, in the end it’s "just" using a JDBC compliant driver, and you would use it as described in the original Java tutorial about JDBC.
C.1. Migrating from version 4 or 5 of this driver
Version 5 and 4 of the Neo4j JDBC Driver have been mainly developed by Larus BA, Italy, a certified consulting and integration solutions partner for Neo4j. Thank you so much for all your work.
The most important change that you need to make is removing the dependency on org.neo4j:neo4j-jdbc-bolt . You need to replace it with org.neo4j:neo4j-jdbc or one of the bundles we provide, see available bundles.
|
C.1.1. URL format, parameters and main driver class
The previous version mapped the URLs from the common Java driver onto jdbc:
, i.e. using jdbc:neo4j:neo4j+s://foobar.io:7687/
for connecting against a database running on host foobar.io
on port 7687.
The following URLs (direct bolt connection) are not supported:
-
jdbc:neo4j:bolt://<host>:<port>/
-
jdbc:neo4j:bolt+s://<host>:<port>/
-
jdbc:neo4j:bolt+ssc://<host>:<port>/
The following URLs behave the same but must be rewritten:
-
jdbc:neo4j:neo4j://<host>:<port>/
becomesjdbc:neo4j://<host>:<port>/
-
jdbc:neo4j:neo4j+s://<host>:<port>/
becomesjdbc:neo4j+s://<host>:<port>/
-
jdbc:neo4j:neo4j+ssc://<host>:<port>/
becomesjdbc:neo4j+ssc://<host>:<port>/
The following configuration properties are not supported and have no other replacement:
-
leaked.sessions.logging
-
readonly
-
usebookmarks
We don’t offer any build-in retry mechanism, so the corresponding setting don’t have an effect:
-
max.transaction.retry.time
As with any persistent database connection you want to cater for failed transactions. We made good experience with resilience4j which does fit in well with common Java frameworks, such as Spring Boot.
The following properties can be achieved with standardized JDBC settings:
-
encryption
: Use the appropriate transport scheme (neo4j
,neo4j+s
orneo4j+ssc
) -
autocommit
: Usejava.sql.Connection.setAutoCommit
Connection pooling can be achieved with any JDBC compliant connection pool, the following properties don’t have an effect:
-
max.connection.lifetime
-
max.connection.poolsize
The following properties just have different names:
-
connection.acquisition.timeout
istimeout
(as query-parameter to the URL or inside the configuration properties)
The following properties can be achieved using a different URL:
-
database
is now part of the URL, instead of specifyingjdbc:neo4j:neo4j+s://foobar.io:7687?database=abc
you would use the database name as path segment in the url, such as:jdbc:neo4j+s://foobar.io:7687/abc
In case your tooling requires to use a concrete driver class: This JDBC driver has only org.neo4j.jdbc.Neo4jDriver
.
If you depend on a javax.sql.DataSource
, we provide org.neo4j.jdbc.Neo4jDataSource
.
C.1.2. Flattening
While the Neo4j JDBC Driver does not support flatten
option, it can emulate its effect.
flatten
did unnest returned nodes and relationships by providing all their properties as individual columns.
If you enable automatic SQL to Cypher translation (See Section 7.1), any *
-select will inspect whether it affects nodes or relationship and will unnest their properties, so that a SELECT * FROM Movie m
will effectively become MATCH (m:Movie) RETURN m.title AS title, m.released AS released
. Read more about this topic in Section 7.3.3.1.2).
In case you want to access the actual node, return the whole table alias or just use Cypher. The Neo4j JDBC Driver does support complex object types as return types.