banner by @Ria
Migrating from MySQL to Neo4j
After investigating Neo4j and MySQL with Spring Boot with Neo4j & MySQL , LogicGate as a platform has migrated :100:% of our internal and production instances to Neo4j and retired our :dolphin: MySQL databases. Today there are a plethora of options for migrating existing MySQL databases to Neo4j. Neo4j itself has some helpful examples on how to do just that: Import Data Into Neo4j . While these tutorials give you an idea about how graph databases differ from traditional relational databases, they gloss over some of the finer details.
Every Migration is Different
One key thing to keep in mind when migrating data from one platform to another is that there will be structural differences that need to be accounted for.
1. Attribute names might change
Let’s say you have a table in MySQL that looks like this:
id | name | creation_date | last_updated | role_id
but in Neo4j we might want to attributes to be:
id | name | creationDate | lastUpdated
and remove the role_id entirely. We’ll be going into re-organizing your data for a post-SQL world in a bit here.
2. Pivot / join tables become relationships
Taking a look at our user -> role_id example, many SQL databases contain a roles table and a user_roles that maps a user to their particular roles. For each row in this join table we’ll want to create a relationship, instead of a node.
3. Identifiers
In many databases, if not all, we reference rows by unique identifiers — be it a sequential id (1, 2, 3, 4) or a globally unique identifier such as a GUID or UUID. While these ids were useful in SQL, they may lose meaning or become inaccurate in a graph world where objects are not being appended to a single table.
Migration Plan
At a high level, the fastest and most reliable way to ingest large amounts of data into Neo4j is via the Cypher Shell . This allows us to load data directly into the database, instead of relying on sending all the data over a network.
Going back to the idea of Nodes and Relationships, we separated our database tables into two categories:
Queries
For each table that contains an entity (such as a User or Role for example), we created a custom SQL statement and appended that to a master YAML file:
queries.yml
This strategy let us pull together any data related to that entity that might be stored in other tables. For example, in SQL a user’s data was split across a users table as well as user_login_details . We wanted to consolidate all that information into a single unified User entity.
We then used Python and SQLAlchemy to run each statement and create a .csv file for each entity specified in our query.yml file.
While we chose to write the glue code in Python but many (if not all) languages can generate CSV / YAML files from SQL databases. We’re primarily a Java Spring shop but decided a scripting language was more appropriate here.
This approach gave us an easily version-able migration document and an auditable way of verifying that all the data we were extracting from MySQL was correct before being imported into Neo4j.
Relationships
The second part of the migration was to map the various pivot and join tables to be Relationships between Nodes. Whereas in SQL these were bonafide JPA entities, in Neo4j these rows would either become a @RelationshipEntity (with attributes) or just a regular relationship label. We took a similar approach to exporting relationships from MySQL as we did entities.
relationships.yml
Using a YAML file we designate the relationship label and SQL query that generates a CSV like this:
user_roles.csv
This maps the SQL id of an entity with another related entity. Using generated CSV files we can verify that the CSV representation mirrors what is in MySQL.
Import CSV into Neo4j
Now that we’ve automagically converted and mapped our SQL tables to a pile of CSV files, we can start actually importing our data into Neo4j. To accomplish this, we take our generated CSV files and create a native Cypher (*.cql) file that Neo4j understands. In this Cypher file we can write Cypher statements that looks similar to a SQL statement. Here is what importing the user.csv looks like in Cypher:
Let’s break this down:
Indexes Save Lives
If nothing else this was the biggest lesson learned during this migration. In the next phase, when creating relationships, adding an INDEX to the id attribute became a necessity when a 100 MB migration would take several hours. Adding indexes to the ids made mapping relationships orders of magnitude faster. This shouldn’t come as a surprise as we see similar implications in SQL when creating indexes. Just because we’re leaving SQL doesn’t mean many of the base concepts need not apply.
LOAD CSV
Allows for performant loading of large CSVs. some of our CSV files were 20,000 lines in length.
Convert from INT to BOOLEAN
In MySQL, booleans are stored as either a 0 or a 1. In Neo4j, booleans are treated as the string “true” or the string “false”. We accomplish this mapping with a Cypher-style trie:
Dates
One gotcha with dates and Neo4j is that since Neo4j is Java-based, it expects dates to be in milliseconds since epoch. In MySQL, dates are generally stored in seconds. We can do an easy conversion with:
Counts
To verify that all rows in our CSV were imported we can do a quick count in Neo4j with:
Importing Relationships into Neo4j
Relationships turned out to be a bit tricky. Earlier we talked about how indexing ids vastly improve the performance of importing data into Neo4j. We can relate Nodes in Neo4j with a simple Cypher statement:
Note: :begin and :commit force Neo4j to open a new transaction for this query so that in the event of a failure, we can rollback just this query.
But for queries that pull back thousands of nodes, it is far more efficient to create a CSV file and create the relationships using the same LOAD CSV command we did when importing entities:
Tips & Tricks
Neo4j has some useful tools that may help you on your journey to migrating to Neo4j. First among them would be the APOC, A suite of methods that can be called directly in Cypher, similar to SQL stored procedures:
Running the migration
In our case, we were able to pause and migrate a particular instance from MySQL to Neo4j and not have to worry about streaming updates. This approach assumes that this is a one-time migration.
For expediency we would run this migration on the Neo4j server itself — allowing our Python script to connect to Neo4j, run the above-mentioned SQL statements, and import them directly into Neo4j on the same box using Cypher shell.
Today
As of this writing, LogicGate has finished migrating all of our internal and production instances. Hopefully this has been a helpful introduction and primer on preparing for your own migration to Neo4j.
LogicGate :heart: :chart_with_upwards_trend:
~ LogicGate
注意:本文来自LogicGate Engineering Blog。本站无法对本文内容的真实性、完整性、及时性、原创性提供任何保证,请您自行验证核实并承担相关的风险与后果!
CoLaBug.com遵循[CC BY-SA 4.0]分享并保持客观立场,本站不承担此类作品侵权行为的直接责任及连带责任。您有版权、意见、投诉等问题,请通过[eMail]联系我们处理,如需商业授权请联系原作者/原网站。