© Copyright 2023-2024 the original authors.
Abstract
This is the manual for the official Neo4j JDBC Driver.
This driver is officially supported and endorsed by Neo4j. It is a standalone driver, independent of and not built on top of the common Neo4j Java Driver. While the latter provides a Neo4j-idiomatic way to access Neo4j from Java, the JDBC driver adheres to JDBC 4.3.
This documentation refers to this driver as the Neo4j JDBC Driver and to the idiomatic Neo4j driver as the common Neo4j Java Driver. |
1. Introduction
JDBC stands for "Java Database Connectivity" and is thus not bound exclusively to relational databases. Nevertheless, JDBC’s terms, definitions, and behavior are highly influenced by SQL and relational databases. As Neo4j is a graph database with quite a different paradigm than relational and a non-standardized behaviour in some areas, there might be some details that don’t map 100% in each place, and we make sure to educate you about these in this documentation.
This documentation focuses on install, use, and configure the Neo4j JDBC Driver, as well as discussing the driver’s design choices. While we do provide runnable examples showing how to use JDBC with Neo4j, this is not a documentation about how to correctly use JDBC as an API.
The Neo4j JDBC Driver requires JDK 17 on the client side and Neo4j 5.5+ on the server side. To use it with a Neo4j cluster, server-side routing must be enabled on the cluster. |
1.1. Features
-
Fully supports the Java module system
-
Adheres to JDBC 4.3
-
Can run any Cypher statement
-
Implements
DatabaseMetaData
andResultSetMetaData
as fully as possible with a nearly schemaless database and general very flexible result sets, allowing for automatic metadata retrieval from ETL and ELT tools -
Provides an SPI to hook in translators from SQL to Cypher
-
Provides an optional default implementation to translate many SQL statements into semantically similar Cypher statements
-
Can be safely used with JDBC connection pools as opposed to the common Neo4j Java Driver or any JDBC driver based on that, as it doesn’t do internal connection pooling and transaction management otherwise than dictated by the JDBC Spec
The absence of any connection pooling and transaction management is an advantage of the Neo4j JDBC Driver over the common Neo4j Java Driver. It allows to pick and choose any database connection pooling system such as HikariCP and transaction management such as Jakarta Transactions.
1.2. Limitations
-
The database metadata is retrieved using Neo4j’s schema methods, such as
db.labels
,db.schema.nodeTypeProperties()
, which may not always be accurate -
While single label nodes map naturally to table names, nodes with multiple labels don’t
-
There is no reliable way to always determine the datatype for properties on nodes, as it would require reading all of them (which this driver does not do)
-
Some JDBC features are not supported yet (such as the
CallableStatement
); some feature will never be supported -
The SQL to Cypher translator supports only a limited subset of clauses and SQL constructs that can be equivalently translated to Cypher (See Supported statements)
-
There is no "right" way to map
JOIN
statements to relationships, so your mileage may vary
1.3. When to use the Neo4j JDBC Driver?
-
Integration with ETL and ELT tools that don’t offer an integration based on the common Neo4j Java driver
-
An easier on-ramp towards Neo4j for people familiar with JDBC, who want to keep using that API, but with Cypher and Neo4j
-
Integration for ecosystems like Jakarta EE whose transaction management directly supports any JDBC-compliant driver
-
Integration with database migration tools such as Flyway
There is no need to redesign an application that is built on the common Neo4j Java Driver to migrate to this driver. If your ecosystem already provides a higher-level integration based on the common Neo4j Java Driver, such as Spring Data Neo4j (SDN) for Spring, there is no need to switch to something else. In case of Quarkus, the Neo4j JDBC Driver is an option to consider: although we do provide an integration for the common Neo4j Java Driver, this integration does not support Quarkus' transaction systems in contrast to this driver.
As there is little incentive to use this driver with Hibernate (Neo4j-OGM or SDN are the best alternatives for Neo4j), it might be worth giving Spring Data JDBC a try.
1.4. Differences with the previous versions of this driver and other JDBC drivers for Neo4j
Several other JDBC drivers exists for Neo4j, most notably the previous versions 4 and 5 of this driver. Most (if not all) of them wrap the common Neo4j Java Driver and implement the JDBC spec on top of that. This comes with a number of issues:
-
You end up with a pool of connection pools, because the common Neo4j Java Driver manages a connection pool, whereas JDBC drivers delegate this task to dedicated pooling solutions.
-
The transaction management of the common Neo4j Java Driver is not aligned with the way JDBC manages transactions.
-
Older versions of the Neo4j JDBC driver shade a few dependencies, such as
Jackson
as well as additional logging frameworks. This takes a toll on the classpath and, in case of logging, it leads to runtime problems. -
Existing drivers with an SQL-to-Cypher translation layer are "read-only" and don’t support write statements, so they cannot be used for ETL use-cases aiming to ingest data into Neo4j.
This driver does not support automatic reshaping or flattening of the result sets, as the previous versions do.
If you query for nodes, relationships, paths, or maps, you should use getObject on the result sets and cast them to the appropriate type (you find all of them inside the package org.neo4j.jdbc.values ).
However, the default SQL-to-Cypher translator will (when connected to a database) figure out what properties nodes have and turn the asterisk (* ) into individual columns of nodes and relationships, just like what you would expect when running a SELECT * statement.
|
For information on upgrade/migration from other drivers to this one, see Neo4j JDBC Driver.
2. Usage
Add the JDBC driver to your application, for example as a Gradle dependency:
dependencies {
runtimeOnly(org.neo4j:neo4j-jdbc-full-bundle:6.1.0) (1)
}
1 | The coordinates are the same for a Maven project. |
You can then use the Neo4j JDBC driver as you would do with any other JDBC driver:
In case any tooling asks you for the name of the concrete driver class, it is org.neo4j.jdbc.Neo4jDriver .
|
import java.sql.DriverManager;
import java.sql.SQLException;
import java.util.List;
public final class Quickstart {
static void queryWithCypher() {
var query = """
MATCH (m:Movie)<-[:ACTED_IN]-(p:Person)
RETURN m.title AS title, collect(p.name) AS actors
ORDER BY m.title
""";
var url = "jdbc:neo4j://localhost:7687";
var username = "neo4j";
var password = "verysecret";
try (var con = DriverManager.getConnection(url, username, password); (1)
var stmt = con.createStatement(); (2)
var result = stmt.executeQuery(query)) { (3)
while (result.next()) { (4)
var movie = result.getString(1); (5)
var actors = (List<String>) result.getObject("actors"); (6)
System.out.printf("%s%n", movie);
actors.forEach(actor -> System.out.printf("\t * %s%n", actor));
}
}
catch (SQLException ex) {
throw new RuntimeException(ex);
}
}
}
1 | Instantiate a JDBC connection. There’s no need to do any class loading beforehand, the driver will be automatically registered |
2 | Create a (reusable) statement |
3 | Execute a query |
4 | Iterate over the results, as with any other JDBC result set |
5 | JDBC’s indexing starts at 1 |
6 | JDBC also allows retrieval of result columns by name; the Neo4j JDBC driver also supports complex objects, such as lists |
In the example above we used Neo4j’s lingua franca, Cypher, to query the database.
The Neo4j JDBC Driver has limited support for using SQL as well.
It can do so automatically or on a case-by-case basis.
To translate a single, call java.sql.Connection#nativeSQL(String)
and use the result in your queries.
For automatic translation, instantiate the driver setting the optional URL parameter sql2cypher
to true
.
The following example shows how:
var query = """
SELECT m.title AS title, collect(p.name) AS actors
FROM Person as p
JOIN Movie as m ON (m.id = p.ACTED_IN)
ORDER BY m.title
"""; (1)
var url = "jdbc:neo4j://localhost:7687?enableSQLTranslation=true";
try (var con = DriverManager.getConnection(url, username, password);
var stmt = con.createStatement();
var result = stmt.executeQuery(query)) {
while (result.next()) {
var movie = result.getString(1);
var actors = (List<String>) result.getObject("actors");
System.out.printf("%s%n", movie);
actors.forEach(actor -> System.out.printf("\t * %s%n", actor));
}
}
1 | This SQL query will be translated into the same Cypher query of the previous example. The remainder of the method is identical to before. |
For more information, see SQL to Cypher translation.
The JDBC specification does not support named parameters, only index-based parameters, starting at 1
.
So for all PreparedStatement
instances you need to specify parameters like this:
PreparedStatement
var cypher = "CREATE (m:Movie {title: $1})";
try (var con = DriverManager.getConnection(url, username, password);
PreparedStatement stmt = con.prepareStatement(cypher);) {
stmt.setString(1, "Test");
stmt.executeUpdate();
}
This is independent of the SQL-to-Cypher translation mechanism:
PreparedStatement
(SQL variant)var sql = "INSERT INTO Movie (title) VALUES (?)";
try (var con = DriverManager.getConnection(url + "?enableSQLTranslation=true", username, password);
PreparedStatement stmt = con.prepareStatement(sql);) {
stmt.setString(1, "Test");
stmt.executeUpdate();
}
To use named parameters, downcast the PreparedStatement
to Neo4jPreparedStatement
.
Neo4jPreparedStatement
var match = "MATCH (n:Movie {title: $title}) RETURN n.title AS title";
try (var con = DriverManager.getConnection(url, username, password);
Neo4jPreparedStatement stmt = (Neo4jPreparedStatement) con.prepareStatement(match);) {
stmt.setString("title", "Test");
try (var resultSet = stmt.executeQuery()) {
while (resultSet.next()) {
LOGGER.info(resultSet.getString("title"));
}
}
}
2.1. Getting a connection via environment variables
If you are happy to depend directly on org.neo4j.jdbc.Neo4jDriver
and want to get a connection as easy as possible, you might want to use fromEnv
:
// import org.neo4j.jdbc.Neo4jDriver;
try (var con = Neo4jDriver.fromEnv().orElseThrow(); (1)
var stmt = con.createStatement();
var result = stmt.executeQuery(query)) {
// Same loop as earlier
}
catch (SQLException ex) {
throw new RuntimeException(ex);
}
1 | Notice how we directly use the concrete driver class here and how the methods return an optional: no connection can be created if the required connection variables are not found. |
The fromEnv
method looks for a few specific system environment variables and it adheres to the 12 factor app principles:
-
First, it looks in the system environment
-
Second, it looks for a file named
.env
in the current working directory. There are overloads that let you configure the directory and the filename to look for.
The supported variables are:
NEO4J_URI
-
The address or URI of the instance to connect to.
NEO4J_USERNAME
-
(Optional) Username.
NEO4J_PASSWORD
-
(Optional) Password.
NEO4J_SQL_TRANSLATION_ENABLED
-
(Optional) Whether to enable full SQL-to-Cypher translation, defaults to
false
.
Information from both the system environment and the .env files are combined.
If for example NEO4J_SQL_TRANSLATION_ENABLED is in the system environment but not in the .env file, it will still be picked up.
Given the order of priority, information in the system environment always has precedence over the .env file.
|
This feature is especially useful with Neo4j AuraDB.
When creating a new AuraDB instance, you download a .env
file that you can directly use with the Neo4j JDBC Driver:
try (
var con = Neo4jDriver.fromEnv("Neo4j-cb3d8b2d-Created-2024-02-14.txt")
.orElseThrow();
var stmt = con.createStatement();
var movies = stmt.executeQuery("MATCH (n:Movie) RETURN n.title AS title")
) {
while (movies.next()) {
System.out.println(movies.getString("title"));
}
}
3. Distribution
3.1. Driver modules and dependencies
The driver consists of 3 modules:
- neo4j-jdbc
-
This is the actual JDBC implementation for Neo4j
- neo4j-jdbc-translator-spi
-
This is the SPI for the default SQL-to-Cypher translation implementation and for any further or custom implementation.
- neo4j-jdbc-translator-impl
-
This is the default and optional implementation of an SQL-to-Cypher translator. It provides a somewhat opinionated approach of translating SQL statements into semantically equivalent Cypher statements.
If you just want to use the Neo4j JDBC Driver to run Cypher statements, you only need the module org.neo4j:neo4j-jdbc . As long as you are developing in an environment with proper dependency management, this is all you need to declare.
|
While the translator SPI is a required dependency and gets pulled in via Maven or Gradle, the actual implementation is not. This allows for:
-
Opting out of having additional dependencies if you don’t need the SQL-to-Cypher translation
-
Requiring different JDK baselines or licensing modules for our implementations
-
Allowing you to create alternative translators
3.1.1. Dependencies
There are two "dependency-free" modules, which shade everything into one binary artifact each. Those bundles cover a lot of tooling (see Available bundles).
The driver’s direct, compile-time dependencies are listed below:
org.neo4j:neo4j-jdbc
+- io.github.cdimascio:dotenv-java
+- io.netty:netty-handler
| +- io.netty:netty-common
| +- io.netty:netty-resolver
| +- io.netty:netty-buffer
| +- io.netty:netty-transport
| +- io.netty:netty-transport-native-unix-common
| \- io.netty:netty-codec
+- org.neo4j:neo4j-cypher-dsl-schema-name-support
+- org.neo4j:neo4j-jdbc-translator-spi
3.2. Available bundles
All bundles of the Neo4j JDBC Driver are distributed on Maven Central. The bundles have different characteristics: depending on your use-case or your environment, you can pick one or the other bundle, but not both at once.
3.2.1. Individual components
Pick this distribution if you have an application that uses dependencies management in which you don’t have to care about dealing with binary artifacts yourself.
We offer a Maven BOM project, which you should import when using the individual modules, so that you have always a consistent set of versions. The coordinates are org.neo4j:neo4j-jdbc-bom .
|
For Maven, use this dependency declaration:
<dependency>
<groupId>org.neo4j</groupId>
<artifactId>neo4j-jdbc</artifactId>
<version>6.1.0</version>
</dependency>
And for Gradle you would want to declare the following runtime dependency:
dependencies {
runtimeOnly 'org.neo4j:neo4j-jdbc:6.1.0'
}
If you want to use the SQL-to-Cypher translation from Neo4j, you need to add the following dependency in your Maven build:
<dependency>
<groupId>org.neo4j</groupId>
<artifactId>neo4j-jdbc-translator-impl</artifactId>
<version>6.1.0</version>
</dependency>
and the same coordinates, but for Gradle:
dependencies {
runtimeOnly 'org.neo4j:neo4j-jdbc-translator-impl:6.1.0'
}
3.2.2. Small bundle
Pick this distribution if you work with ETL tools or tooling for relational databases that allow adding the JDBC driver only as single artifacts or that make it unnecessary hard to add additional jars.
This bundle does not contain the default SQL-to-Cypher translator!
The coordinates of this bundle are org.neo4j:neo4j-jdbc-bundle
and you can download it from Maven central:
All bundles can be used as normal project dependencies as well. This might be useful for example if your project depends on a different, potentially conflicting Netty version. |
3.2.3. Full bundle
This bundle contains the default SQL-to-Cypher translator.
Otherwise, it is identical to the small bundle.
Its coordinates are org.neo4j:neo4j-jdbc-full-bundle
and you can download it from Maven central here:
3.2.4. text2cypher bundle
We also ship a very experimental bundle that we call text2cypher
:
See Neo4j JDBC Driver more information.
4. Configuration
4.1. Driver class name
The Neo4j JDBC Driver is org.neo4j.jdbc.Neo4jDriver
.
With modern Java tools, you should not need to touch this class directly, but there are some connection pools and front-ends that will ask you for this.
The class is public API.
We also provide org.neo4j.jdbc.Neo4jDataSource
as javax.sql.DataSource
.
4.2. Causal clustering and bookmarks
The Neo4j JDBC Driver uses bookmarks by default to provide causal consistency in all Neo4j deployments.
Bookmarks are managed on the driver level itself, not on the connections spawned by an instance of the driver, so all connections spawned by one instance will partake in the same causal chain of transactions.
Connections from different driver instances will not use the same set of bookmarks and there is no built-in machinery that would enable this.
If you want or need this, you can directly access the Neo4jDriver
type to retrieve the current set of known bookmarks and pass them to another driver instance.
4.3. Neo4j transactional metadata
Neo4j supports attaching metadata to transactions, see SHOW TRANSACTIONS
.
As there is no explicit transaction object in the JDBC specification, the Neo4j JDBC driver needs another mechanism to make these configurable.
The JDBC driver provides the extension interface Neo4jMetadataWriter
.
Our driver, the connection implementation, and all statement variants can be unwrapped accordingly.
The configuration is additive: metadata configured for a driver instance will be used for all connections spawned from that driver, connections can add further metadata, and statements can also add their own metadata.
Metadata added on a statement has precedence over connection metadata which in turn has precedence over driver metadata:
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.Map;
import java.util.Properties;
import java.util.logging.Level;
import java.util.logging.Logger;
import org.neo4j.jdbc.Neo4jDriver;
import org.neo4j.jdbc.Neo4jMetadataWriter;
public final class TransactionMetadata {
private static final Logger LOGGER = Logger.getLogger(TransactionMetadata.class.getPackageName());
public static void main(String... args) throws SQLException {
var url = "jdbc:neo4j://localhost:7687";
var driver = (Neo4jDriver) DriverManager.getDriver(url);
driver.withMetadata(Map.of("md_from_driver", "v1", "will_be_overwritten", "irrelevant"));
var properties = new Properties();
properties.put("user", "neo4j");
properties.put("password", "verysecret");
try (
var con = driver.connect(url, properties)
.unwrap(Neo4jMetadataWriter.class)
.withMetadata(Map.of("md_from_connection", "v2", "will_be_overwritten", "xxx"))
.unwrap(Connection.class);
var statement = con.createStatement()
.unwrap(Neo4jMetadataWriter.class)
.withMetadata(Map.of("md_from_stmt", "v3", "will_be_overwritten", "v4"))
.unwrap(Statement.class)
) {
try (var result = statement.executeQuery("SHOW TRANSACTIONS YIELD metaData")) {
while (result.next()) {
var metaData = result.getObject("metaData", Map.class);
LOGGER.log(Level.INFO, "{0}", metaData);
}
}
}
}
}
The output will be similar to this:
Juli 17, 2024 1:18:16 PM org.neo4j.jdbc.docs.TransactionMetadata main
INFORMATION: {md_from_driver=v1, md_from_connection=v2, md_from_stmt=v3, will_be_overwritten=v4}
4.4. URL and connection properties
The canonical URL format for the Neo4j JDBC Driver is
jdbc:neo4j://<host>:<port>/<database>?param1=value1¶m2=value2
The database name and all query parameters are optional and can be omitted.
All configuration arguments can be passed either as query parameters or via a java.util.Properties
object.
The latter is sometimes defined by tooling for you.
With regard to authentication, we recommend to follow the JDBC specification, which discourages using any form of URL authentication.
All query parameters must be percent-encoded if they contain special characters, e.g. …?param1=space%20separated
.
The driver supports the following URI schemes, which tweak the security configuration:
|
The driver accepts the following configuration arguments, either as properties or as URL query parameters:
Name | Type | Meaning | Default |
---|---|---|---|
|
|
Timeout for connection acquisition in milliseconds |
|
|
|
User agent |
|
|
|
Flag that enables automatic translation from SQL-to-Cypher (requires a translator on the classpath) |
|
|
|
Flag that enables caching of translations. SQL translations are not "free": parsing of SQL costs a bit of time, and so does Cypher rendering. In addition, we might up look up metadata to be able to project individual properties. If this takes too long, translations may be cached. |
|
|
|
Flag that allows you to use |
|
|
|
Optional flag, alternative to |
|
|
|
Optional configuration for fine-grained control over SSL configuration. Allowed values are |
|
|
|
The username (principal) to use for authentication. NOT RECOMMENDED as URL query parameter for security reasons. |
|
|
|
The password (credentials) to use for authentication. NOT RECOMMENDED as URL query parameter for security reasons. |
|
|
|
The realm to use for authentication. NOT RECOMMENDED as URL query parameter for security reasons. |
|
|
|
The authentication scheme to use. NOT RECOMMENDED as URL query parameter for security reasons. Currently supported values are:
|
|
|
|
Enables bookmark management for full causal cluster support. This is enabled by default and the recommended setting for all scenarios that use a connection pool. If you disable it, it will only be disabled for the specific connection. Other connections retrieved from the driver instance to the same or to other databases are not affected, and the individual connections will still manage their bookmarks. |
|
4.5. Getting a driver or a connection instance
This section likely only applies if you use the Neo4j JDBC Driver as part of application development in contrast to using it as part of front-end tool such as DBeaver, DataGrip or UI-based ETL tools. |
The easiest way to acquire a connection is directly through the java.sql.DriverManager
.
import java.sql.DriverManager;
import java.sql.SQLException;
import java.util.Properties;
class Configuration {
void obtainConnection() throws SQLException {
var url = "jdbc:neo4j://localhost:7687";
var username = "neo4j";
var password = "verysecret";
var connection = DriverManager.getConnection(url, username, password);
}
}
While our connection implementation is thread-safe, it allows only one concurrent transaction per connection (as dictated by the JDBC specification). For a multi-thread application, use a connection pool. There’s HikariCP, but usually application server and containers/frameworks bring their own. It’s safe to use any of them, as the Neo4j JDBC Driver does no internal pooling.
If you need access to an instance of the Neo4j driver itself, you can use the following approach:
import java.sql.DriverManager;
import java.sql.SQLException;
import java.util.Properties;
class Configuration {
void obtainDriverAndConnection() throws SQLException {
var url = "jdbc:neo4j://localhost:7687";
var driver = DriverManager.getDriver(url);
var properties = new Properties();
properties.put("username", "neo4j");
properties.put("password", "verysecret");
var connection = driver.connect(url, properties);
}
}
4.6. Securing your connection by using SSL
The Neo4j JDBC Driver supports the same SSL option of the common Java driver, with the same URL protocols, using +s
or +ssc
as additional indicators for the required level of security.
The same configuration can also be achieved with a URL query parameter or an entry in the properties passed to the DriverManager
or driver instance when asking for a connection.
As long as you don’t specify contradicting values, it’s fine to combine both approaches.
4.6.1. Understanding the SSL mode
The following list is ordered by ascending security:
-
disable
— (default), "I don’t care about security and don’t want to pay the overhead for encryption." -
require
— "I want my data to be encrypted, and I accept the overhead. I trust that the network will make sure I always connect to the server I want." (Server must support encryption, no hostname/CA validation is done. This saves the hassle of proper certificates and is only secure on private networks; it should not really be used over public networks.) -
verify-full
— "I want my data encrypted, and I accept the overhead. I want to be sure that I connect to a server I trust, and that it’s the one I specify."
The Neo4j JDBC Driver does not include revocation checks. |
The most secure option can also be enabled by setting ssl=true
either as query parameter or as property entry passed to the DriverManager
.
This option corresponds to neo4j+s
.
On the other hand, require
corresponds to neo4j+ssc
.
The additional enum allows us to possibly support additional modes in the future, such as letting the service decide about SSL, or being able to express a preference towards SSL without requiring it.
Neo4j servers can offer both plain Bolt connections and encrypted SSL connection, or just one of them. The fact that you can connect using neo4j+s does not mean that you cannot connect using just neo4j , or viceversa. This is dependent on the server setup. Neo4j Aura, Neo4j’s managed cloud service, only supports encrypted connections, so you must use +s , ssl=true , or sslMode=verify-full .
|
4.6.2. Valid URLs
The following URLs are all valid:
neo4j+s://xyz.databases.neo4j.io
-
Use full verification with the
xzy
instance at AuraDB neo4j://xyz.databases.neo4j.io?ssl=true
-
The same, but using the shorthand URL parameter
neo4j://xyz.databases.neo4j.io?sslMode=verfiy-full
-
The same, but using the explicit mode
neo4j+s://xyz.databases.neo4j.io?ssl=true&sslMode=verify-full
-
Not more secure, but does not fail on you
neo4j+ssc://this.is.a.trustworthy.instance.for.sure.com
-
Trust whatever certificate and hostname there is, but do use SSL
neo4j://my-testing-instance.local
-
Use a plain connection.
The driver only refuses contradicting configurations, such as:
-
+s
withssl=false
, orsslMode
set todisable
-
+ssc
withssl=false
, or anysslmode
not equal torequire
Basically, you cannot ask to use SSL and not use it at the same time. The driver offers several mechanism so that you can use a fixed URL with dynamic query parameters, or dynamic URLs, or whatever way of configuring you prefer in a programmatic way.
4.6.3. Using .dotenv files
When you create a Neo4j Aura instance, you will be asked to download a text-file named similar to Neo4j-9df57663-Created-2023-06-12.txt
.
This is essentially a .dotenv file containing the information required to connect to the database.
These files can be directly used via Neo4jDriver.fromEnv()
(see Getting a connection via environment variables).
This method exists in several overloads, which let you configure both filename and directory.
Additionally, the builder lets you configure options that are not contained in the files from Aura.
5. Metadata
5.1. Catalog and schema
All the methods on connection level dealing with metadata and information about the available content provide information in terms that are defined in the SQL standard, including catalogs and schemas.
In most relational databases, a catalog is equivalent to a specific database on a server or cluster, and the schema refers to the collection of tables, views and procedures in that catalog.
From the SQL 1992 standard (find an archived copy here):
(4.12) Catalogs are named collections of schemas in an SQL-environment. An SQL-environment contains zero or more catalogs. A catalog contains one or more schemas, but always contains a schema named INFORMATION_SCHEMA that contains the views and domains of the Information Schema.
This driver does not support catalogs (see discussion 55), so any metadata result set will return literal null
when asked for the catalog of a database object.
No metadata method supports filtering on a non-null catalog parameter and no catalog specifier can be used in a query.
Future developments might use catalogs to describe composite databases, in essence listing the constituents of the composite database defined in the connection.
The same standard defines schemas as follows:
(4.11) An SQL-schema is a persistent descriptor that includes:
[…] the descriptor of every component of the SQL-schema.
In this International Standard, the term "schema" is used only in the sense of SQL-schema. Each component descriptor is either a domain descriptor, a base table descriptor, a view descriptor, an assertion descriptor, a privilege descriptor, a character set descriptor, a collation descriptor, or a translation descriptor. The persistent objects described by the descriptors are said to be owned by or to have been created by the <authorization identifier> of the schema.
We report the literal name public
as schema for any result component of a metadata result set.
We support querying for objects in any schema, however only literal null
or public
will potentially produce a result.
Labels will be reported as table objects with the TABLE_TYPE
being literal TABLE
.
5.1.1. Summary
-
Catalog: Always
null
; filtering on anything non-null yields no results. -
Schema: Always
public
; filtering onpublic
and literal will yield result, anything else won’t. -
Table descriptors: Reported as
TABLE
in theTABLE_TYPE
column.
5.2. Labels to tables
The CLG and Langstar groups speak about "Node type combinations" and gravitate towards "open node type semantics" in the GQL standard:
Node type combinations is a less permissive form of open node type semantics. The idea of node type combinations is that nodes also conform to a graph type if they are not of one of the node types in the node type set of the graph type, but of a node type that is an intersection of (a subset of) the node type in a node type set of the graph type.
An example for their proposal can be found here.
This driver therefore compute node types in a similar way:
-
Single label nodes will map naturally to a table name, the single label will become the table name
-
The label name will be taken as is and will be case-sensitive. The labels
Movie
,movie
,MOVIE
will result in three tables in the metadata -
This is in line with the default SQL-to-Cypher translation
-
-
Node type combinations will map to table names composed as
label1_label2
, sorting the labels alphabetically to make them independent of the order Neo4j returns them -
Property sets for these node type combinations will then be computed
6. SQL to Cypher translation
6.1. Introduction
The translation of queries from SQL to Cypher is an optional feature of this driver and it consists of two parts:
-
The translator SPI, located in the module
org.neo4j:neo4j-jdbc-translator-spi
. It consists of two interfaces:SqlTranslatorFactory
and the actualSqlTranslator
. -
A concrete implementation of this SPI, published as
org.neo4j:neo4j-jdbc-translator-impl
.
The latter is covered in "Using the default translator" and available in the "full bundle", described in Available bundles. The former is provided for two reasons: it allows us to distribute the driver with and without the bundled, default translator and allows you to run your custom translator.
Translators can be chained, and there can be as many translators on the classpath as you want. Their precedence is configurable, with our default implementation having the lowest precedence. Thus, you can for example have a custom translator that takes care of a fixed set of queries and, if it receives a query it cannot translate, it will pass it down to our implementation.
Translating arbitrary SQL queries to Cypher is an opinionated task, as there is no right way to map table names to graph objects: a table name can be used as-is as a label, or may be transformed into a singular form, etc. Mapping relationships is even trickier: should relationship types derive from a join table, a join column (in that case, which one?), or a foreign key?
We believe our assumptions are appropriate for various use cases and instead of providing configuration to cater for all scenarios, we offer the possibility to write your own translation layer. The driver will use the standard Java service loader mechanism to find an implementation of the SPI on the module- or classpath.
Some tools (like Tableau) use a class-loader that won’t let the driver use the standard Java service loader mechanism.
For these scenarios, we provide an additional configuration property named translatorFactory .
Set this to DEFAULT to directly load our default implementation or to a fully-qualified classname for any other factory.
Be aware that either our default implementation or your custom one must be on the classpath.
|
6.2. Translating SQL to Cypher
There’s only one requirement to enable the SQL-to-Cypher translation: you have to have one module implementing the SPI on the classpath.
This is automatically the case if you use the full-bundle (org.neo4j:neo4j-jdbc-full-bundle
).
In that case, you don’t need to add any other dependency.
If you use the individual distribution or the "small" bundle org.neo4j:neo4j-jdbc-bundle
, you must add the artifact org.neo4j:neo4j-jdbc-translator-impl
.
The implementation will be automatically loaded. If you use the translation on a case-by-case basis, it will be lazily loaded (i.e no additional classes are touched or loaded into memory). If you configure automatic translation for all statements, the implementation will be eagerly loaded. There are no further configuration options with regard to loading the implementation.
6.2.1. On a case-by-case basis
The translator can be used on a case-by-case basis through the official JDBC API nativeSQL
, which you find on the java.sql.Connection
class.
With the following imports:
import java.sql.DriverManager;
import java.sql.SQLException;
import java.util.Properties;
import java.util.logging.Logger;
You just pass your SQL statement to nativeSQL
and you will get Cypher back:
try (var connection = DriverManager.getConnection(url, username, password)) {
var sql = connection.nativeSQL("SELECT * FROM Movie n");
assert """
MATCH (n:Movie)
RETURN *""".equals(sql);
}
6.2.2. For all queries
If you open the connection to your Neo4j instance using enableSQLTranslation=true
either as URL parameter or configuration property, all statements will be translated from SQL to Cypher.
If you configure the driver in this way, the translator will be eagerly loaded.
var url = "jdbc:neo4j://localhost:7687?enableSQLTranslation=true";
try (var connection = DriverManager.getConnection(url, username, password);
var stmnt = connection.createStatement();
var result = stmnt.executeQuery("SELECT n.title FROM Movie n")) {
while (result.next()) {
LOGGER.info(result.getString("n.title"));
}
}
Sometimes you may need to fall back to Cypher for some statements, either to use constructs that you cannot express with SQL, or because our default translator cannot handle your query.
We offer a special comment that you can use as a hint in your statement to stop automatic translation: /*+ NEO4J FORCE_CYPHER */
.
var url = "jdbc:neo4j://localhost:7687?enableSQLTranslation=true";
var query = """
/*+ NEO4J FORCE_CYPHER */
MATCH (:Station { name: 'Denmark Hill' })<-[:CALLS_AT]-(d:Stop)
((:Stop)-[:NEXT]->(:Stop)){1,3}
(a:Stop)-[:CALLS_AT]->(:Station { name: 'Clapham Junction' })
RETURN localtime(d.departs) AS departureTime,
localtime(a.arrives) AS arrivalTime
""";
try (var connection = DriverManager.getConnection(url, username, password);
var stmnt = connection.createStatement();
var result = stmnt.executeQuery(query)) {
while (result.next()) {
LOGGER.info(result.getTime("departureTime").toString());
}
}
6.2.3. Possible error scenarios
A NoSuchElementException
with a message of No SQL translators available
will be thrown when there is no implementation of the SQL to Cypher translator available, and you either used java.sql.Connection.nativeSQL
or enabled automatic translation. The exception will be thrown when you access the method or eagerly on opening a connection in the latter case.
6.3. Using the default translator
6.3.1. Supported SQL dialects
Our default translator uses the OSS parser from jOOQ, which supports a broad spectrum of SQL dialects already.
We picked the generic, default dialect of jOOQ as our default dialect, but you can overwrite this in the SQL to Cypher configuration using the parameter s2c.sqlDialect
with one of the supported dialects listed in configuration below.
POSTGRES
can be a good choice for several integrations.
Bear in mind though that any shortcomings in the translation are probably not due to a lack in the parser, but due to the lack of an obvious, semantically equivalent Cypher construct. That means we might be able to parse a certain piece of SQL, but are unable to translate in into something meaningful that Neo4j can understand without additional, contextual information.
6.3.2. Configuration
The default implementation provides a number of configuration settings.
They must be prefixed with s2c
in the URL or config options:
Name | Meaning | Default |
---|---|---|
|
Whether to parse table names as is or not. |
|
|
A map from table names to labels. |
An empty map |
|
A map from column names to relationship types. |
An empty map |
|
Whether to format the generated Cypher or not. |
|
|
Whether to always escape names. |
Unless explicitly configured |
|
Which dialect to use when parsing. Supported values are |
|
The next few examples use the properties
config to avoid terrible long URLs in this documentation, but all the attributes can be specified via URL as well.
var properties = new Properties();
properties.put("username", "neo4j");
properties.put("password", "verysecret");
properties.put("enableSQLTranslation", "true");
properties.put("s2c.prettyPrint", "false");
properties.put("s2c.alwaysEscapeNames", "false");
properties.put("s2c.tableToLabelMappings", "people:Person;movies:Movie;movie_actors:ACTED_IN");
var url = "jdbc:neo4j://localhost:7687";
var query = """
SELECT p.name, m.title
FROM people p
JOIN movie_actors r ON r.person_id = p.id
JOIN movies m ON m.id = r.person_id""";
try (var connection = DriverManager.getConnection(url, properties)) {
var sql = connection.nativeSQL(query);
assert "MATCH (p:Person)-[r:ACTED_IN]->(m:Movie) RETURN p.name, m.title".equals(sql);
}
var properties = new Properties();
properties.put("username", "neo4j");
properties.put("password", "verysecret");
properties.put("enableSQLTranslation", "true");
properties.put("s2c.parseNameCase", "UPPER");
var url = "jdbc:neo4j://localhost:7687";
var query = "SELECT * FROM people";
try (var connection = DriverManager.getConnection(url, properties)) {
var sql = connection.nativeSQL(query);
assert """
MATCH (people:PEOPLE)
RETURN *""".equals(sql);
}
Named parameter syntax in the SQL parser defaults to :name
(such as supported by Oracle, JPA, Spring, a colon followed by a name).
The following example changes that prefix to $
(the same prefix that Cypher uses):
var properties = new Properties();
properties.put("username", "neo4j");
properties.put("password", "verysecret");
properties.put("enableSQLTranslation", "true");
properties.put("s2c.parseNamedParamPrefix", "$");
properties.put("s2c.joinColumnsToTypeMappings", "people.movie_id:DIRECTED");
var url = "jdbc:neo4j://localhost:7687";
var query = """
SELECT *
FROM people p
JOIN movies m ON m.id = p.movie_id
WHERE p.name = $1
""";
try (var connection = DriverManager.getConnection(url, properties)) {
var sql = connection.nativeSQL(query);
assert """
MATCH (p:people)-[:DIRECTED]->(m:movies)
WHERE p.name = $1
RETURN *""".equals(sql);
}
This is helpful when a tool generates names like that and does not allow customization.
6.3.3. Supported statements
The following statements are all under tests and describe what you can expect from the default translation layer:
6.3.4. Translation concepts
Table names to labels
The most simple SELECT
statement to translate is one without FROM
clause, such as:
SELECT 1
It is equivalent to a Cypher RETURN
:
RETURN 1
SELECT
statements without JOIN
clauses are pretty straightforward to translate.
The challenge here is how to map the table name to labels:
-
We parse the SQL statement case-sensitive by default
-
Table names are mapped to node labels
-
Table aliases are used as identifiable symbolic names
SELECT t.a, t.b
FROM My_Table (1)
AS t (2)
WHERE t.a = 1
1 | Will be used as the label to match, as-is, i.e. My_Table |
2 | The table alias will become the node alias |
The whole construct will be translated to
MATCH (t:My_Table)
WHERE t.a = 1
RETURN t.a, t.b
Table aliases are optional, if you omit them, we derive aliases from the labels and types. If you inspect the translated queries, we recommend using aliases, as this makes the queries better readable.
Star-Selects
A star- or *
- select comes in different forms:
- Unqualified
-
SELECT * FROM table
- Qualified
-
SELECT t.* FROM table t
And a variant, selecting the relation itself: SELECT t FROM table t
.
We make use of this to let you decide whether you want to return Neo4j nodes and relationships as entities, maps, or flattened to individual columns. The latter requires our translator to have access to the schema of the underlying Neo4j database. The following sections describe the use-cases.
Projecting individual properties
Don’t use a star-select but enumerate the properties:
SELECT m.title FROM Movie m
The table alias will be used as a symbolic name:
MATCH (m:Movie)
RETURN m.title;
You can omit the table alias:
SELECT title FROM Movie
The lower-cased table name will be the symbolic name:
MATCH (movie:Movie)
RETURN movie.title;
Accessing JDBC columns by name leads to code that is hard to maintain, as column renamings impact code as well. To avoid this, alias the column:
SELECT title AS title FROM Movie
So that it has a stable, well-known name:
MATCH (movie:Movie)
RETURN movie.title AS title;
Projecting all properties
A SELECT *
statement gets translated differently depending on whether the connection to the Neo4j database is available.
SELECT * FROM Person p
If you are offline, you will get the following Cypher statement:
MATCH (p:Person) RETURN *
The above query will return one column (p
) containing a Neo4j node.
This is usually not what you expect in the relational world.
If you run the translation online and Neo4j metadata can be retrieved,
you will get a statement that flattens the properties of each node and relationship, plus their element IDs:
In case the Person
node has properties born
and name
,
SELECT * FROM Person p
you will get this Cypher statement
MATCH (p:Person)
RETURN elementId(p) AS `v$id`,
p.born AS born, p.name AS name
This works well with multiple tables as well (Movie
has properties title
and released
):
SELECT * FROM Person p JOIN Movie m ON m.id = p.acted_in
MATCH (p:Person)-[acted_in:ACTED_IN]->(m:Movie)
RETURN elementId(p) AS `v$id`, p.born AS born, p.name AS name,
elementId(m) AS `v$id1`, m.title AS title, m.released AS released
We append increasing numbers to column names to clashing ones (ex. with name
and remark
properties both in Movie
and Person
):
SELECT * FROM Person p JOIN Movie m ON m.id = p.acted_in
Note the increasing numbers per duplicate name:
MATCH (p:Person)-[acted_in:ACTED_IN]->(m:Movie)
RETURN elementId(p) AS `v$id`,
p.born AS born, p.name AS name, p.remark AS remark,
elementId(m) AS `v$id1`,
m.name AS name1, m.released AS released, m.remark AS remark1
The following example uses a join-table to access relationships (we explain this later in this manual when discussing joins), and the flattening of properties works here as well:
SELECT *
FROM people p
JOIN movie_actors r ON r.person_id = p.id
JOIN movies m ON m.id = r.person_id
MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
RETURN elementId(p) AS `v$id`,
p.born AS born, p.name AS name,
elementId(p) AS `v$person_id`,
elementId(r) AS `v$id1`, r.role AS role,
elementId(m) AS `v$movie_id`,
elementId(m) AS `v$id2`,
m.title AS title, m.released AS released
SELECT * FROM Person p ORDER BY name ASC
MATCH (p:Person)
RETURN elementId(p) AS `v$id`,
p.born AS born, p.name AS name
ORDER BY p.name
A qualified alias can be used as well. If no Neo4j metadata is available, you will get a map of properties of the node/relationship:
SELECT m.*, p.*
FROM Person p
JOIN Movie m ON m.id = p.acted_in
The corresponding columns must be downcast to a map in JDBC:
MATCH (p:Person)-[acted_in:ACTED_IN]->(m:Movie)
RETURN m{.*} AS m, p{.*} AS p
If we add more data (ex. born
and name
to Person
), the qualified star will project all of them (note how we also project one single, known column from the Movie
table):
SELECT p.*, m.title AS title
FROM Person p
JOIN Movie m ON m.id = p.acted_in
MATCH (p:Person)-[acted_in:ACTED_IN]->(m:Movie)
RETURN elementId(p) AS `v$id`, p.born AS born, p.name AS name, m.title AS title
Returning nodes and relationships
A statement that projects a table alias such as
SELECT m FROM Movie m
will result in a Cypher statement returning the matched node as node.
MATCH (m:Movie)
RETURN m;
A node can be aliased as well:
SELECT m AS node FROM Movie m
MATCH (m:Movie)
RETURN m AS node;
Un-aliased tables can be used as well:
SELECT movie FROM Movie
MATCH (movie:Movie)
RETURN movie;
Multiple entities are supported, too:
SELECT p, r, m FROM Person p
JOIN ACTED_IN r ON r.person_id = p.id
JOIN Movie m ON m.id = r.movie_id
MATCH (p:Person)-[r:ACTED_IN]->(m:Movie) RETURN p, r, m
Comparing SQL with Cypher examples
The source of the following examples is: Comparing SQL with Cypher.
Find all Products
Select and Return Records
Select everything from the products
table.
SELECT p.*
FROM products as p
Similarly, in Cypher, you just MATCH
a simple pattern: all nodes with the label Product
and RETURN
them.
MATCH (p:Product)
RETURN p{.*} AS p
The above query will project all properties of the matched node. If you want to return the node itself, select it without using the asterisk:
SELECT p
FROM products as p
MATCH (p:Product)
RETURN p
Field Access, Ordering and Paging
It is more efficient to return only a subset of attributes, like ProductName
and UnitPrice
.
And while we are at it, let’s also order by price and only return the 10 most expensive items.
(Remember that labels, relationship-types and property-names are case sensitive in Neo4j.)
SELECT p.`productName`, p.`unitPrice`
FROM products as p
ORDER BY p.`unitPrice` DESC
LIMIT 10
MATCH (p:Product)
RETURN p.productName, p.unitPrice ORDER BY p.unitPrice DESC LIMIT 10
The default order direction will be translated as is:
SELECT * FROM Movies m ORDER BY m.title
MATCH (m:Movies)
RETURN * ORDER BY m.title
DISTINCT
projections
The DISTINCT
keyword for projections is handled:
SELECT DISTINCT m.released FROM Movies m
MATCH (m:Movies)
RETURN DISTINCT m.released
It works with *
projections as well:
SELECT DISTINCT m.* FROM Movies m
MATCH (m:Movies)
RETURN DISTINCT m {.*} AS m
However, as the qualified asterisks will use metadata if available, the translation with a database connection is different:
SELECT DISTINCT m.* FROM Movies m
MATCH (m:Movies)
RETURN DISTINCT elementId(m) AS `v$id`, m.title AS title, m.released AS released
Note that each row includes the Neo4j element ID, making each row unique.
This being said, the DISCTINCT
clause is of limited use with the asterisk.
6.3.5. Expressions
Most SQL expressions have corresponding Cypher expressions and can be translated straightforward.
Literal Values
Literal values are 1:1 translations.
SELECT
1, TRUE, FALSE, NULL, 'a'
RETURN 1, TRUE, FALSE, NULL, 'a'
Arithmetic expressions
Arithmetic expressions are 1:1 translations.
SELECT
1 + 2,
1 - 2,
1 * 2,
1 / 2,
square(2)
Note that the underlying tech of the default translator uses Cypher-DSL internally, which will wrap arithmetic (and logical) expressions with parentheses:
RETURN
(1 + 2),
(1 - 2),
(1 * 2),
(1 / 2),
(2 * 2)
Functions
Numeric functions
We can translate all numeric functions that are supported by Neo4j’s Cypher implementation: Mathematical functions - Numeric:
SELECT
abs(1),
ceil(1),
floor(1),
round(1),
round(1, 1),
sign(1)
will be translated to
RETURN
abs(1),
ceil(1),
floor(1),
round(1),
round(1, 1),
sign(1)
Logarithmic functions
Neo4j supports a broad range of logarithmic functions.
SELECT
exp(1),
ln(1),
log(2, 1),
log10(1),
sqrt(1)
will be translated to
RETURN
exp(1),
log(1),
(log(1) / log(2)),
log10(1),
sqrt(1)
Trigonometric functions
Calls to trigonometric functions
SELECT
acos(1),
asin(1),
atan(1),
atan2(1, 2),
cos(1),
cot(1),
degrees(1),
pi(),
radians(1),
sin(1),
tan(1)
will be translated to the corresponding Neo4j functions:
RETURN
acos(1),
asin(1),
atan(1),
atan2(1, 2),
cos(1),
cot(1),
degrees(1),
pi(),
radians(1),
sin(1),
tan(1)
String functions
The following string manipulations are guaranteed to work:
SELECT
lower('abc'),
cast(3 as varchar),
trim(' abc '),
length('abc'),
left('abc', 2),
ltrim(' abc '),
replace('abc', 'b'),
replace('abc', 'b', 'x'),
reverse('abc'),
right('abc', 2),
rtrim(' abc '),
substring('abc', 2 - 1),
substring('abc', 2 - 1, 2),
upper('abc')
and will be translated to Neo4j’s versions:
RETURN
toLower('abc'),
toString(3),
trim(' abc '),
size('abc'),
left('abc', 2),
ltrim(' abc '),
replace('abc', 'b', NULL),
replace('abc', 'b', 'x'),
reverse('abc'),
right('abc', 2),
rtrim(' abc '),
substring('abc', (2 - 1)),
substring('abc', (2 - 1), 2),
toUpper('abc')
Scalar functions
The input
SELECT
coalesce(1, 2),
coalesce(1, 2, 3),
nvl(1, 2),
cast('1' as boolean),
cast(1 as float),
cast(1 as double precision),
cast(1 as real),
cast(1 as tinyint),
cast(1 as smallint),
cast(1 as int),
cast(1 as bigint)
will be translated to (see Scalar functions):
RETURN
coalesce(1, 2),
coalesce(1, 2, 3),
coalesce(1, 2),
toBoolean('1'),
toFloat(1),
toFloat(1),
toFloat(1),
toInteger(1),
toInteger(1),
toInteger(1),
toInteger(1)
Query expressions
Several advanced SQL expressions are supported as well.
CASE
simple
The simple CASE
expressions
SELECT
CASE 1 WHEN 2 THEN 3 END,
CASE 1 WHEN 2 THEN 3 ELSE 4 END,
CASE 1 WHEN 2 THEN 3 WHEN 4 THEN 5 END,
CASE 1 WHEN 2 THEN 3 WHEN 4 THEN 5 ELSE 6 END
RETURN CASE 1 WHEN 2 THEN 3 END, CASE 1 WHEN 2 THEN 3 ELSE 4 END, CASE 1 WHEN 2 THEN 3 WHEN 4 THEN 5 END, CASE 1 WHEN 2 THEN 3 WHEN 4 THEN 5 ELSE 6 END
See Cypher → Conditional expressions (CASE) for more information.
CASE
advanced
And CASE
statement using a search:
SELECT
CASE WHEN 1 = 2 THEN 3 END,
CASE WHEN 1 = 2 THEN 3 ELSE 4 END,
CASE WHEN 1 = 2 THEN 3 WHEN 4 = 5 THEN 6 END,
CASE WHEN 1 = 2 THEN 3 WHEN 4 = 5 THEN 6 ELSE 7 END
will be translated to
RETURN
CASE WHEN 1 = 2 THEN 3 END,
CASE WHEN 1 = 2 THEN 3 ELSE 4 END,
CASE WHEN 1 = 2 THEN 3 WHEN 4 = 5 THEN 6 END,
CASE WHEN 1 = 2 THEN 3 WHEN 4 = 5 THEN 6 ELSE 7 END
See Cypher → Conditional expressions (CASE) for more information.
CASE
abbreviations (which are not COALESCE
or NVL
)
The input
SELECT
nullif(1, 2),
nvl2(1, 2, 3)
will be translated to
RETURN
CASE WHEN 1 = 2 THEN NULL ELSE 1 END,
CASE WHEN 1 IS NOT NULL THEN 2 ELSE 3 END
6.3.6. Predicates
As with expressions a lot of logical SQL expressions and conditions used as predicates can be translated straightforward into Cypher predicates.
Conjunctions and disjunctions
Logical conjunctions and disjunctions are all supported.
SELECT 1 FROM p WHERE 1 = 1 AND 2 = 2 OR 3 = 3
MATCH (p:p)
WHERE ((1 = 1
AND 2 = 2)
OR 3 = 3)
RETURN 1
The input
SELECT 1 FROM p WHERE NOT 1 = 1 XOR 2 = 2
will be translated to
MATCH (p:p)
WHERE (NOT (1 = 1)
XOR 2 = 2)
RETURN 1
Operators
Arithmetic operators
The input
SELECT 1 FROM p WHERE 1 = 1 AND 2 > 1 AND 1 < 2 AND 1 <= 2 AND 2 >= 1 AND 1 != 2
will be translated to
MATCH (p:p)
WHERE (1 = 1
AND 2 > 1
AND 1 < 2
AND 1 <= 2
AND 2 >= 1
AND 1 <> 2)
RETURN 1
Between
Between in SQL is inclusive
SELECT 1 FROM p WHERE 2 BETWEEN 1 AND 3
and will be translated to (we can’t generate the shorter form (1 ⇐ 2 ⇐ 3
) due to restrictions in the underlying generator):
MATCH (p:p)
WHERE (1 <= 2) AND (2 <= 3)
RETURN 1
SQL has a SYMMETRIC
keyword for the BETWEEN
clause, to indicate that you do not care which bound of the range is larger:
SELECT 1 FROM p WHERE 2 BETWEEN SYMMETRIC 3 AND 1
We translate this into a disjunction:
MATCH (p:p)
WHERE (3 <= 2) AND (2 <= 1) OR (1 <= 2) AND (2 <= 3)
RETURN 1
Logical row value expressions
The above examples are based on scalar expressions. Row value expressions will be translated as well:
SELECT 1
FROM p
WHERE (1, 2) = (3, 4)
OR (1, 2) < (3, 4)
OR (1, 2) <= (3, 4)
OR (1, 2, 3) <> (4, 5, 6)
OR (1, 2, 3) > (4, 5, 6)
OR (1, 2, 3) >= (4, 5, 6)
Leads to a semantically equivalent cypher:
MATCH (p:p)
WHERE 1 = 3 AND 2 = 4
OR (1 < 3 OR 1 = 3 AND 2 < 4)
OR (1 < 3 OR 1 = 3 AND 2 <= 4)
OR (1 != 4 AND 2 != 5 AND 3 != 6)
OR (1 > 4 OR 1 = 4 AND (2 > 5 OR 2 = 5 AND 3 > 6))
OR (1 > 4 OR 1 = 4 AND (2 > 5 OR 2 = 5 AND 3 >= 6))
RETURN 1
Null Handling
For scalar expressions
The input
SELECT 1 FROM p WHERE 1 IS NULL AND 2 IS NOT NULL
will be translated to
MATCH (p:p)
WHERE (1 IS NULL
AND 2 IS NOT NULL)
RETURN 1
For row value expressions
The input
SELECT 1 FROM p WHERE (1, 2) IS NULL OR (3, 4) IS NOT NULL
will be translated to
MATCH (p:p)
WHERE
(1 IS NULL AND 2 IS NULL)
OR (3 IS NOT NULL AND 4 IS NOT NULL)
RETURN 1
LIKE
operator
The LIKE
operator
SELECT * FROM movies m WHERE m.title LIKE '%Matrix%'
will be translated into a regular expressions, replacing the %
with .*
:
MATCH (m:`movies`) WHERE m.title =~ '.*Matrix.*'
RETURN *
6.3.7. Using joins to map relationships
On the surface, joins are relationships materialized in SQL (foreign keys are not). Sadly, it’s not as straightforward to map to Cypher. There are several implementation options:
-
When joining two tables on a column, take the left hand table column, use its name as relationship type, and treat it as outgoing from left to right.
-
When joining two tables with an intersection table (which it’s usually modeled in SQL for
m:n
relationships with attributes), use the name of that intersection table as relationship type.
We implemented some variants thereof, however we don’t claim their absolute usefulness in all cases.
1:n joins
Natural joins
SQL NATURAL
joins are the easiest way to denote relationship names without having to do any mapping.
A one-hope NATURAL JOIN
will translate to an anonymous, wildcard relationship.
SELECT p, m FROM Person p
NATURAL JOIN Movie m
MATCH (p:Person)-->(m:Movie) RETURN p, m
NATURAL
joins can be chained, and the connecting join table does not need to exist.
This will be turned into a Neo4j relationship:
SELECT p.name, r.roles, m.* FROM Person p
NATURAL JOIN ACTED_IN r
NATURAL JOIN Movie m
MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
RETURN p.name, r.roles,
elementId(m) AS `v$id`, m.title AS title, m.released AS released
Simple join
Assume we configured the translator to use the following table mapping:
-
people
mapped to labelPeople
-
movies
mapped to labelMovie
With that in place, we translate
SELECT p.name, m.title
FROM people p
JOIN movies m ON m.id = p.directed
to
MATCH (p:Person)-[directed:DIRECTED]->(m:Movie)
RETURN p.name, m.title
DIRECTED
is the uppercase version of the join column in the left table (p.directed
).
We can add a join column mapping in the form of people.movie_id:DIRECTED
if we have different column names:
SELECT p.name, m.title
FROM people p
JOIN movies m ON m.id = p.movie_id
to
MATCH (p:Person)-[directed:DIRECTED]->(m:Movie)
RETURN p.name, m.title
Using the ON
clause
We used backticks here for the table and column names and no mapping.
SELECT p.name, m.title
FROM `Person` as p
JOIN `Movie` as m ON (m.id = p.`DIRECTED`)
The translation is same as before:
MATCH (p:Person)-[directed:DIRECTED]->(m:Movie)
RETURN p.name, m.title
m:n joins
An intersection table is a table that contains references to two other tables in the form of at least two columns.
This construct is usually required in the relational model to create an m:n
relationship.
Such an auxiliary construct is not necessary in Neo4j.
We can model as many outgoing and incoming relationships from one label to another, and they can also have properties.
We can thus use that construct for our translator.
The following example uses a configured mapping as follows:
-
people
mapped to labelPeople
-
movies
mapped to labelMovie
-
movie_actors
mapped toACTED_IN
SELECT p.name, m.title
FROM people p (1)
JOIN movie_actors r ON r.person_id = p.id (2)
JOIN movies m ON m.id = r.person_id (3)
1 | The table from which to map outgoing relationships |
2 | An intersection table, that is used again in the next JOIN clause |
3 | The final JOIN clause |
We do no semantic analysis: the order of the joins matter, and will lead to the following query:
MATCH (p:Person)-[r:ACTED_IN]->(m:Movie)
RETURN p.name, m.title
Multiple joins will result in a chain of relationships:
SELECT p.name AS actor, d.name AS director, m.title
FROM people p
JOIN movie_actors r ON r.person_id = p.id
JOIN movies m ON m.id = r.person_id
JOIN movie_directors r2 ON r2.movie_id = m.id
JOIN people d ON r2.person_id = d.id
MATCH (p:`Person`)-[r:`ACTED_IN`]->(m:`Movie`)<-[r2:`DIRECTED`]-(d:`Person`)
RETURN p.name AS actor, d.name AS director, m.title
Notice how the direction of the DIRECTED
relationship is defined by the order of the join columns.
6.3.8. DML statements
This section lists the supported Data-Manipulation-Language (DML) statements.
Although a SELECT
statement is technically DML as well, it is covered in Neo4j JDBC Driver.
Deleting nodes
Nodes can be deleted via the SQL DELETE
statement.
For example, to unconditionally delete all person
nodes:
DELETE FROM person
MATCH (person:person)
DELETE person
A WHERE
clause can be added to prevent this:
DELETE FROM person
WHERE person.id = 1
MATCH (person:person)
WHERE person.id = 1
DELETE person
If you want to delete everything, but your tooling complains, just add a conditions that is always true
:
DELETE FROM person
WHERE true
MATCH (person:person)
WHERE true
DELETE person
Of, the condition can also be that always evaluates to false
, never deleting anything:
DELETE FROM person
WHERE false
MATCH (person:person)
WHERE false
DELETE person
Tables can be aliased, and the alias will be used in Cypher, too:
DELETE FROM person p
MATCH (p:person)
DELETE p
Aliasing tables is also supported in combination with specifying the label to which the table name is mapped.
Using the same query with table_mappings=person:Person
configured,
DELETE FROM person p
will be translated to
MATCH (p:Person)
DELETE p
Deleting nodes and their related nodes
You can use SQL TRUNCATE
to DETACH DELETE
nodes.
TRUNCATE TABLE people
will be translated to
MATCH (people:Person)
DETACH DELETE people
Inserting data
A single list of values with explicit columns and constant values can be inserted with a simple INSERT
statement:
INSERT INTO People (first_name, last_name, born) VALUES ('Helge', 'Schneider', 1955)
CREATE (people:`Person` {first_name: 'Helge', last_name: 'Schneider', born: 1955})
All expressions, including parameters, are supported. Parameters will be named from 1 on upwards in Cypher.
INSERT INTO People (first_name, last_name, born) VALUES (?, ?, ?)
CREATE (people:`Person` {first_name: $1, last_name: $2, born: $3})
If you omit the column names on the insertion target, we generate names:
INSERT INTO People VALUES ('Helge', 'Schneider', 1955)
Note the unknown field xxx
property names:
CREATE (people:`Person` {`unknown field 0`: 'Helge', `unknown field 1`: 'Schneider', `unknown field 2`: 1955})
The SQL VALUES
clause actually supports lists of values:
INSERT INTO People (first_name, last_name, born) VALUES
('Helge', 'Schneider', 1955),
('Bela', 'B', 1962)
Those values will be translated into a Cypher array to be unwound in the Cypher statement. This is a great solution for batching inserts:
UNWIND [
{first_name: 'Helge', last_name: 'Schneider', born: 1955},
{first_name: 'Bela', last_name: 'B', born: 1962}]
AS properties
CREATE (people:`Person`)
SET people = properties
A returning clause is supported as well:
INSERT INTO People p (name) VALUES (?) RETURNING elementId(p)
CREATE (p:Person {name: $1}) RETURN elementId(p)
Upserts
We support a restricted range of "upserts" via the non-standard but pretty common ON DUPLICATE
and ON CONFLICT
SQL clauses.
Upserts are translated into MERGE
statements.
While they work without constraints, you should really have uniqueness-constraints on the node properties you merge on, or Neo4j may create duplicates (see Understanding how merge works).
Upserts on all columns can happen via either ON DUPLICATE KEY IGNORE
or ON CONFLICT IGNORE
.
While ON DUPLICATE KEY
does offer upgrade options, it assumes the primary (or unique) key being violated to be known.
Although this is most certainly the case in a relational system, this translation layer that runs without a database connection doesn’t know.
ON DUPLICATE KEY IGNORE
INSERT INTO Movie(title, released) VALUES(?, ?) ON DUPLICATE KEY IGNORE
MERGE (movie:`Movie` {title: $1, released: $2})
ON CONFLICT IGNORE
INSERT INTO actors(name, firstname) VALUES(?, ?) ON CONFLICT DO NOTHING
MERGE (actors:`Actor` {name: $1, firstname: $2})
If you want to define an action, you must use ON CONFLICT
and specify the key you want to merge on.
INSERT INTO tbl(i, j, k) VALUES (1, 40, 700)
ON CONFLICT (i) DO UPDATE SET j = 0, k = 2 * EXCLUDED.k
Note how the special reference EXCLUDED
can be used to refer to the values of columns that have not been part of the key.
They will be reused with their values in the ON MATCH SET
clause.
MERGE (tbl:`tbl` {i: 1})
ON CREATE SET tbl.j = 40, tbl.k = 700
ON MATCH SET tbl.j = 0, tbl.k = (2 * 700)
This works with parameters, too:
INSERT INTO tbl(i, j, k) VALUES (1, 2, ?)
ON CONFLICT (i) DO UPDATE SET j = EXCLUDED.k
MERGE (tbl:`tbl` {i: 1})
ON CREATE SET tbl.j = 2, tbl.k = $1
ON MATCH SET tbl.j = $1
It’s possible to just specify a concrete merge column instead of merging on all columns as well.
It will be translated with ON CREATE
:
INSERT INTO tbl(i, j, k) VALUES (1, 40, 700)
ON CONFLICT (i) DO NOTHING
MERGE (tbl:`tbl` {i: 1})
ON CREATE SET tbl.j = 40, tbl.k = 700
Using ON CONFLICT
and specifying a key is the only way to insert multiple rows with a MERGE
statement:
INSERT INTO People (first_name, last_name, born) VALUES
('Helge', 'Schneider', 1955),
('Bela', 'B', 1962)
ON CONFLICT(last_name) DO UPDATE SET born = EXCLUDED.born
UNWIND [{first_name: 'Helge', last_name: 'Schneider', born: 1955}, {first_name: 'Bela', last_name: 'B', born: 1962}] AS properties
MERGE (people:`People` {last_name: properties['last_name']})
ON CREATE SET
people.first_name = properties.first_name,
people.born = properties.born
ON MATCH SET people.born = properties['born']
7. text2cypher
This is an experimental translator inspired by the Neo4j Labs project text2cypher.
If you add this translator to the classpath or use the text2cypher bundle, all queries that start with
🤖,
will be treated as natural language queries written in plain english. The driver will strip the prefix, and use OpenAI to translate the input into a Cypher statement. The driver will augment the generation of the query by passing the current graph schema along with the input question.
The following data will be sent to an external API:
Don’t use this translator if you don’t want the above, or are not allowed to do so. |
This module requires one additional configuration: the OpenAI API key. You can use either a URL parameter, a JDBC property entry, or an environment variable:
-
URL parameter/property name is
openAIApiKey
-
Environment variable name is
OPEN_AI_API_KEY
jdbc:neo4j://localhost:7687?openAIApiKey=sk-xxx-your-key
Additional configuration properties are
property name | default value |
---|---|
openAIBaseUrl |
https://api.openai.com/v1 (defined by langchain4j) |
openAIModelName |
gpt-4-turbo |
openAITemperature |
0.0 |
With that in place, a query such as the following can be translated into Cypher:
🤖, How was The Da Vinci Code rated?
The outcome of the LLM is not deterministic and is likely to vary.
While you can execute it directly, we strongly recommend to use Connection#nativeSQL
to retrieve the Cypher statement, inspect it, and then run it separately.
In our test runs, the above questions was most often correctly translated to
MATCH (m:`Movie` {
title: 'The Da Vinci Code'
})<-[r:`REVIEWED`]-(p:`Person`)
RETURN r.rating AS Rating, p.name AS ReviewerName
Other times the result was a syntactically correct statement, but it would only return the reviewers and the movie itself. Also note that while a human likely recognizes that you are actually thinking about the average rating, the LLM does not infer this. Making the question more explicit gives better results:
🤖, How was The Da Vinci Code rated on average?
is translated more accurately to:
MATCH (m:`Movie` {
title: 'The Da Vinci Code'
})<-[:`REVIEWED`]-(p:`Person`)
RETURN avg(p.rating) AS AverageRating
Once a natural language query gets translated into Cypher, the result will be cached and further invocations of that query will use the cached result. |
All that statements that do not start with 🤖 will be used as-is and treated as Cypher.
Get the full, ready to use bundle here: https://repo.maven.apache.org/maven2/org/neo4j/neo4j-jdbc-text2cypher-bundle/6.1.0/neo4j-jdbc-text2cypher-bundle-6.1.0.jar. More information in Available bundles.
Appendix A: Neo4j specific conversions
Neo4j does not support all types used in the relational world. For some of them we offer conversions that are also available in frameworks such as Spring Data Neo4j (SDN). Those conversions are available in this driver as well, so you could use both SDN and this driver interchangeably.
A.1. Data types for fixed-point arithmetic
Neo4j does not support BigInteger
and BigDecimal
.
The only way to store them is as String
, and to read them back into the corresponding type.
This is in line with SDN and OGM.
Any parameter of those types passed to PreparedStatement
or CallableStatement
will be stored as String
, but can be read back through corresponding methods on the result sets.
A.2. SQL Date, Time and Timestamps
java.sql.Date
-
Maps to Cypher
DATE
java.sql.Time
-
Maps to Cypher
LOCAL TIME
java.sql.Timestamp
-
Maps to Cypher
LOCAL DATETIME
For information on Cypher date types, see Temporal types.
For more precise a mapping, use a Neo4j Value
instance with the appropriate type and its methods setObject
and getObject
.
Appendix B: Neo4j specific syntax for statements
This section deals with conventions for indexing and naming parameters in all types of statements (standard statements, prepared statements and callable statements) and other Neo4j specific syntax.
B.1. Callable statements
B.1.1. General syntax
You can invoke stored procedures as follows:
- Common JDBC syntax
-
{? = call db.index.fulltext.queryNodes(?, ?)}
- Standard Neo4j syntax
-
call db.index.fulltext.queryNodes(?, ?) yield *
- Enumerating yielded columns
-
{$propertyName = call db.schema.nodeTypeProperties()}
- Return (only applicable for functions)
-
RETURN sin(?)
B.1.2. Named parameters
Our callable statement implementation (org.neo4j.jdbc.Neo4jCallableStatement
) does support named parameters. As per the JDBC specification, those are not named placeholders in a query- or statement-string, but the actual, formal parameters for the stored procedures to be called.
We support both the $
and the colon syntax (i.e. either $name
or :name
).
The assigment {? = call xxx()}
will be rewritten to call xxx() yield *
, and {$x = call xxx()}
will be rewritten to call xxx() yield x
.
B.1.3. Result sets of callable statements
When you execute a callable statement via executeQuery
, you must use the result set returned.
If you just use execute
, we assume that the underlying procedure returns only one row and that you use the getters on the statement itself.
Appendix C: Migrating from older versions or other JDBC drivers for Neo4j
There are some other JDBC drivers for Neo4j, under various licenses and with varying features and capabilities. In the following we outline possible migration processes. The basic usage patterns for all JDBC drivers for Neo4j are very similar: in the end, it’s "just" about using a JDBC compliant driver, and you would use it as described in the original Java tutorial about JDBC.
C.1. Migrating from version 4 or 5 of this driver
Version 5 and 4 of the Neo4j JDBC Driver have been mainly developed by Larus BA, Italy, a certified consulting and integration solutions partner for Neo4j.
The most important change that you need to make is removing the dependency on org.neo4j:neo4j-jdbc-bolt . You need to replace it with org.neo4j:neo4j-jdbc or one of the bundles we provide, see available bundles.
|
C.1.1. URL format, parameters and main driver class
The previous version mapped the URLs from the common Java driver onto jdbc:
, i.e. using jdbc:neo4j:neo4j+s://foobar.io:7687/
for connecting against a database running on host foobar.io
on port 7687.
The following URLs (direct bolt connection) are not supported:
-
jdbc:neo4j:bolt://<host>:<port>/
-
jdbc:neo4j:bolt+s://<host>:<port>/
-
jdbc:neo4j:bolt+ssc://<host>:<port>/
The following URLs behave the same but must be rewritten:
-
jdbc:neo4j:neo4j://<host>:<port>/
becomesjdbc:neo4j://<host>:<port>/
-
jdbc:neo4j:neo4j+s://<host>:<port>/
becomesjdbc:neo4j+s://<host>:<port>/
-
jdbc:neo4j:neo4j+ssc://<host>:<port>/
becomesjdbc:neo4j+ssc://<host>:<port>/
The following configuration properties are not supported and have no replacement:
-
leaked.sessions.logging
-
readonly
-
usebookmarks
-
max.transaction.retry.time
(this driver has no built-in retry mechanism)
As with any persistent database connection you want to cater for failed transactions. We made good experience with resilience4j which fits in well with common Java frameworks, such as Spring Boot.
The following properties can be achieved with standardized JDBC settings:
-
encryption
— Use the appropriate transport scheme (neo4j
,neo4j+s
orneo4j+ssc
) -
autocommit
— Usejava.sql.Connection.setAutoCommit
Connection pooling can be achieved with any JDBC-compliant connection pool. The following properties have no effect:
-
max.connection.lifetime
-
max.connection.poolsize
The following properties just have different names:
-
connection.acquisition.timeout
istimeout
(as query-parameter to the URL or inside the configuration properties)
The following properties can be achieved using a different URL:
-
database
is now part of the URL: instead of specifyingjdbc:neo4j:neo4j+s://foobar.io:7687?database=abc
you would use the database name as path segment in the url, such as:jdbc:neo4j+s://foobar.io:7687/abc
If your tooling requires to use a concrete driver class, this JDBC driver has only org.neo4j.jdbc.Neo4jDriver
.
If you depend on a javax.sql.DataSource
, we provide org.neo4j.jdbc.Neo4jDataSource
.
C.1.2. Flattening
While the Neo4j JDBC Driver does not support the flatten
option, it can emulate its effect.
flatten
un-nests returned nodes and relationships by providing all their properties as individual columns.
If you enable automatic SQL to Cypher translation, any *
-select will inspect whether it affects nodes or relationships and will un-nest their properties, so that a SELECT * FROM Movie m
will effectively become MATCH (m:Movie) RETURN m.title AS title, m.released AS released
.
For more information, see Star-Selects.
In case you want to access the actual node, return the whole table alias or just use Cypher.
The Neo4j JDBC Driver supports complex object types as return types.