impor mysql to neo4j_Migrating from MySQL to Neo4j

1aaf112ce9d3822ce3c9446ef370c4f8.pngbanner by @Ria

Migrating from MySQL to Neo4j

After investigating Neo4j and MySQL with Spring Boot with Neo4j & MySQL , LogicGate as a platform has migrated :100:% of our internal and production instances to Neo4j and retired our :dolphin: MySQL databases. Today there are a plethora of options for migrating existing MySQL databases to Neo4j. Neo4j itself has some helpful examples on how to do just that: Import Data Into Neo4j . While these tutorials give you an idea about how graph databases differ from traditional relational databases, they gloss over some of the finer details.

Every Migration is Different

One key thing to keep in mind when migrating data from one platform to another is that there will be structural differences that need to be accounted for.

1. Attribute names might change

Let’s say you have a table in MySQL that looks like this:

id | name | creation_date | last_updated | role_id

but in Neo4j we might want to attributes to be:

id | name | creationDate | lastUpdated

and remove the role_id entirely. We’ll be going into re-organizing your data for a post-SQL world in a bit here.

2. Pivot / join tables become relationships

Taking a look at our user -> role_id example, many SQL databases contain a roles table and a user_roles that maps a user to their particular roles. For each row in this join table we’ll want to create a relationship, instead of a node.

3. Identifiers

In many databases, if not all, we reference rows by unique identifiers — be it a sequential id (1, 2, 3, 4) or a globally unique identifier such as a GUID or UUID. While these ids were useful in SQL, they may lose meaning or become inaccurate in a graph world where objects are not being appended to a single table.

Migration Plan

At a high level, the fastest and most reliable way to ingest large amounts of data into Neo4j is via the Cypher Shell . This allows us to load data directly into the database, instead of relying on sending all the data over a network.

Going back to the idea of Nodes and Relationships, we separated our database tables into two categories:

Queries

For each table that contains an entity (such as a User or Role for example), we created a custom SQL statement and appended that to a master YAML file:

queries.yml

This strategy let us pull together any data related to that entity that might be stored in other tables. For example, in SQL a user’s data was split across a users table as well as user_login_details . We wanted to consolidate all that information into a single unified User entity.

We then used Python and SQLAlchemy to run each statement and create a .csv file for each entity specified in our query.yml file.

While we chose to write the glue code in Python but many (if not all) languages can generate CSV / YAML files from SQL databases. We’re primarily a Java Spring shop but decided a scripting language was more appropriate here.

This approach gave us an easily version-able migration document and an auditable way of verifying that all the data we were extracting from MySQL was correct before being imported into Neo4j.

Relationships

The second part of the migration was to map the various pivot and join tables to be Relationships between Nodes. Whereas in SQL these were bonafide JPA entities, in Neo4j these rows would either become a @RelationshipEntity (with attributes) or just a regular relationship label. We took a similar approach to exporting relationships from MySQL as we did entities.

relationships.yml

Using a YAML file we designate the relationship label and SQL query that generates a CSV like this:

user_roles.csv

This maps the SQL id of an entity with another related entity. Using generated CSV files we can verify that the CSV representation mirrors what is in MySQL.

Import CSV into Neo4j

Now that we’ve automagically converted and mapped our SQL tables to a pile of CSV files, we can start actually importing our data into Neo4j. To accomplish this, we take our generated CSV files and create a native Cypher (*.cql) file that Neo4j understands. In this Cypher file we can write Cypher statements that looks similar to a SQL statement. Here is what importing the user.csv looks like in Cypher:

Let’s break this down:

Indexes Save Lives

If nothing else this was the biggest lesson learned during this migration. In the next phase, when creating relationships, adding an INDEX to the id attribute became a necessity when a 100 MB migration would take several hours. Adding indexes to the ids made mapping relationships orders of magnitude faster. This shouldn’t come as a surprise as we see similar implications in SQL when creating indexes. Just because we’re leaving SQL doesn’t mean many of the base concepts need not apply.

LOAD CSV

Allows for performant loading of large CSVs. some of our CSV files were 20,000 lines in length.

Convert from INT to BOOLEAN

In MySQL, booleans are stored as either a 0 or a 1. In Neo4j, booleans are treated as the string “true” or the string “false”. We accomplish this mapping with a Cypher-style trie:

Dates

One gotcha with dates and Neo4j is that since Neo4j is Java-based, it expects dates to be in milliseconds since epoch. In MySQL, dates are generally stored in seconds. We can do an easy conversion with:

Counts

To verify that all rows in our CSV were imported we can do a quick count in Neo4j with:

Importing Relationships into Neo4j

Relationships turned out to be a bit tricky. Earlier we talked about how indexing ids vastly improve the performance of importing data into Neo4j. We can relate Nodes in Neo4j with a simple Cypher statement:

Note: :begin and :commit force Neo4j to open a new transaction for this query so that in the event of a failure, we can rollback just this query.

But for queries that pull back thousands of nodes, it is far more efficient to create a CSV file and create the relationships using the same LOAD CSV command we did when importing entities:

Tips & Tricks

Neo4j has some useful tools that may help you on your journey to migrating to Neo4j. First among them would be the APOC, A suite of methods that can be called directly in Cypher, similar to SQL stored procedures:

Running the migration

In our case, we were able to pause and migrate a particular instance from MySQL to Neo4j and not have to worry about streaming updates. This approach assumes that this is a one-time migration.

For expediency we would run this migration on the Neo4j server itself — allowing our Python script to connect to Neo4j, run the above-mentioned SQL statements, and import them directly into Neo4j on the same box using Cypher shell.

Today

As of this writing, LogicGate has finished migrating all of our internal and production instances. Hopefully this has been a helpful introduction and primer on preparing for your own migration to Neo4j.

LogicGate :heart: :chart_with_upwards_trend:

~ LogicGate

注意:本文来自LogicGate Engineering Blog。本站无法对本文内容的真实性、完整性、及时性、原创性提供任何保证,请您自行验证核实并承担相关的风险与后果!

CoLaBug.com遵循[CC BY-SA 4.0]分享并保持客观立场,本站不承担此类作品侵权行为的直接责任及连带责任。您有版权、意见、投诉等问题,请通过[eMail]联系我们处理,如需商业授权请联系原作者/原网站。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值