neo4j 向量索引

以下资源提供了在 Neo4j 中使用 LLM 和向量索引的实践教程:

Neo4j 向量索引由Apache Lucene索引和搜索库提供支持。 [ 1 ]

示例图

本页上的示例使用Neo4j 电影推荐数据集,重点关注节点的plot和属性。属性由和属性组合而成的 1536 维向量嵌入组成。embeddingMovieembeddingplottitle

该图包含 28863 个节点和 332522 个关系。

要重新创建图表,请下载此转储文件并将其导入空的 Neo4j 数据库(运行 5.13 或更高版本)。可以为Aura本地实例导入转储文件。

用于加载数据集的转储文件包含使用模型由OpenAItext-embedding-ada-002生成的嵌入。

Neo4j 中的向量和嵌入

向量索引允许您从大型数据集中查询向量嵌入。嵌入是数据对象(例如文本、图像或文档)的数值表示。文本中的每个单词或标记通常表示为高维向量,其中每个维度代表单词含义的某个方面。

特定数据对象的嵌入可以由专有(例如Vertex AIOpenAI)和开源(例如sentence-transformers)嵌入生成器创建,它们可以生成维度为 256、768、1536 和 3072 的向量嵌入。在 Neo4j 中,向量嵌入存储为LIST<INTEGER | FLOAT>节点或关系上的属性。

有关如何生成嵌入并将其存储为属性的信息,请参阅:

例如电影《教父》有以下内容plot"The aging patriarch of an organized crime dynasty transfers control of his clandestine empire to his reluctant son." 这是它的 1536 维embedding属性,其中每个元素LIST代表情节含义的某个特定方面:

[0.005967312026768923,-0.03817005082964897,0.0014667075593024492,-0.03868866711854935,-0.006505374796688557,0.02090017683 8040352, -0.0027551413513720036, -0.0024731445591896772, -0.03734026849269867, -0.02228747308254242, 0.028783122077584267, 0.017905177548527718、0.011396560817956924、0.014235977083444595、0.023143187165260315、-0.014184115454554558、0.0298462826 75862312, -0.011928141117095947, 0.018838683143258095, -0.0019172541797161102, 0.0033483069855719805, 0.009497134014964104, -0.03516208380460739、0.0021441481076180935、0.002657901030033827、0.0030760341323912144、0.004255882930010557、-0.020809419 453144073, 0.02358401007950306, -0.013808120042085648, 0.01064456906169653, -0.006975369527935982, 0.007318951655179262, -0.013872946612536907, 0.005905726458877325, -0.010689947754144669, 0.0020225979387760162, -0.016245609149336815, -0.000388150 77277831733, -0.007163367234170437, 0.027668101713061333, 0.007215228863060474, -0.009380445815622807, -0.02956104464828968, -0.000863007502630353、0.012142069637775421、0.0012957267463207245、-0.027953339740633965、-0.016414159908890724、0.008453421 294689178, -0.0010777463903650641, 0.03311355784535408, -0.013639570213854313, -0.052457891404628754, 0.0010242642601951957, 0.0034390646032989025、-0.01049546804279089、0.006456754636019468、0.003970644902437925、-0.011629937216639519、0.0052801473 06621075, -0.023402493447065353, -0.014689764939248562, -0.007623638026416302, -0.002453696448355913, 0.02290981076657772, 0.0017989451298490167, 0.0013427261728793383, -0.001776255783624947, -0.002414800226688385, 0.04833490028977394, 0.031142819 672822952, -0.0033013075590133667, 0.017879245802760124, 0.0070077828131616116, -0.016154851764440536, -0.005772831384092569, 0.019875913858413696、-0.018008900806307793、0.012764407321810722、0.0055232481099665165、-0.027901478111743927、-0.003490925 9993582964, 0.0307279285043478, 0.006472961511462927, 0.008861830458045006, -0.01802186481654644, 0.018281172960996628, -0.014223011210560799、-0.00018313586770091206、0.0026352116838097572、0.0006754148053005338、0.014975002966821194、0.024361 930787563324、-0.017166150733828545、0.0028880364261567593、0.011824417859315872、0.01710132323205471、 -0.0005003822734579444, -0.018890544772148132, -0.002192768268287182, -0.0018264965619891882, 0.011033530347049236, -0.00909520 7788050175, -0.022689398378133774, -0.004281813744455576, 0.007092057727277279, -0.015247276052832603, 0.024115590378642082, 0.002996621420606971, -0.02834230102598667, 0.030546413734555244, 0.02350621670484543, -0.020511215552687645, 0.010190781205 892563, -0.016582708805799484, 0.028238577768206596, -0.011967036873102188, 0.011623455211520195, -0.02797926962375641, 0.0026254875119775534、0.018307102844119072、0.0038701631128787994、-0.03850715234875679、0.006246067117899656、-0.000631251 4888122678, 0.010352848097682, -0.02358401007950306, -0.026708664372563362, -0.002863726345822215, 0.035862214863300323, 0.009860164485871792,-0.01726987399160862,0。004275330808013678, -0.02663087099790573, 0.009140586480498314, -0.013872946612536907, 0.019136887043714523, -0.0208353511989 11667、-0.0250879917293787、0.03044269047677517、0.026280807331204414、-0.013406192883849144、0.006683648563921452、 -0.01216800045222044、0.007824601605534554、0.031505849212408066、0.023726629093289375、0.0294832531362772、-0.0136784659698 60554, 0.033891480416059494, 0.009211895987391472, 0.017088359221816063, -0.02183368429541588, 0.01847565360367298, 0.004644844215363264, -0.009834233671426773, -0.011344699189066887, -0.0006725785788148642, 0.00012691882147919387, 0.0153380 33437728882, 0.025736261159181595, -0.003967403434216976, -0.007312469184398651, -0.01312743779271841, 0.02350621670484543, -0.0006843284936621785, -0.011785522103309631, 0.006570201832801104, -0.004187814891338348, -0.0070013003423810005, 0.01651788 31666708, -0.004537879955023527, 0.022715330123901367, -0.0025120405480265617, 0.025580676272511482, 0.005053253378719091, -0.0020063910633325577、-0.039285074919462204、-0.001816772622987628、0.0007224142318591475、0.0161029901355505、0.0408668480 8135033, 0.03536953032016754, 0.009626788087189198, -0.023571044206619263, -0.009607339277863503, 0.011085391975939274, 0.020835351198911667, -0.0009027139167301357, -0.007584741804748774, 0.016958704218268394, 0.011130770668387413, -0.016829051 077365875, -0.6712950468063354, -0.006511857267469168, -0.024854615330696106, -0.02663087099790573, -0.00008933950448408723, 0.0061779990792274475、0.004605947993695736、0.013231161050498486、-0.020187081769108772、0.00798666849732399、-0.0018475652 90518105, 0.04086684808135033, 0.007519915234297514, 0.0040808506309986115, -0.034021131694316864, -0.01997963711619377, -0.004972219467163086, -0.023220978677272797, 0.012129104696214199, 0.0018329792656004429, -0.011649386025965214, 0.028446022 421121597, -0.0010356089333072305, -0.006223377771675587, 0.021211346611380577, 0.004006299655884504, 0.021937407553195953, -0.02927580662071705, -0.01129283756017685, -0.009296170435845852, -0.01864420250058174, 0.02717541716992855, -0.00035553477937 35564, 0.0021700789220631123, 0.048360832035541534, -0.002277043182402849, -0.009049829095602036, 0.033969271928071976, 0.004557327833026648、0.018916476517915726、-0.000779542897362262、-0.00638544512912631、0.022183749824762344、-0.01275792438 5368824, -0.027149485424160957, -0.012278205715119839, 0.0238303504884243, -0.02963883802294731, 0.005218561738729477, -0.004434156697243452、0.013665501028299332、-0.0024520757142454386、0.002124700229614973、-0.007273572962731123、-0.003565476 9744724035, -0.0028621056117117405, 0.020640870556235313, 0.01091684214770794, -0.0006867594784125686, -0.011694764718413353, 0.011215046048164368、0.016504917293787003、0.00827838946133852、-0.00444712​​21044659615、0.010676982812583447、0.0277718249 70841408, -0.0133802630007267, 0.029820352792739868, 0.008349698968231678, -0.014573076739907265, -0.009017415344715118, 0.011655868031084538,-0.0061066895723342896,-0.013082059100270271、0.004353123251348734、0.00672254478558898、0.01773662678897381、0.012433790601789951、0.023843316361 308098, 0.015221345238387585, -0.0046221548691391945, -0.00026214358513243496, -0.016582708805799484, 0.016504917293787003, 0.028005201369524002、0.005516765173524618、-0.04309689253568649、0.013743292540311813、-0.0064308238215744495、-0.0071763326 41392946, 0.01911095716059208, 0.00446332897990942, -0.012971853837370872, -0.016919808462262154, 0.010048162192106247, 0.0032769974786788225、-0.021548446267843246、0.001816772622987628、0.01856641098856926、-0.04804966226220131、0.00728653836 9953632, -0.007299503777176142, -0.014080392196774483, 0.008952588774263859, 0.023908143863081932, 0.012932957150042057, -0.008433973416686058、0.012783855199813843、0.0430709607899189、-0.01015836838632822、0.03534360229969025、-0.00758474180474 8774, -0.016453055664896965, -0.005720969755202532, -0.014871280640363693, -0.026540113613009453, 0.005228285677731037, 0.0004019264888484031、0.005931657273322344、-0.02533433400094509、-0.018825719133019447、0.0023353875149041414、0.001405932 3584660888, -0.02020004764199257, 0.022481953725218773, 0.034980569034814835, -0.02709762565791607, -0.022974636405706406, -0.025023166090250015、0.00641785841435194、-0.00019822835747618228、-0.004845807328820229、0.0003723492263816297、-0.01013243 757188​​3202, 0.01498796883970499, 0.001948046963661909, -0.0020161152351647615, -0.008842382580041885, 0.0223652645945549, -0.013574742712080479、-0.002369421534240246、0.003275376744568348、0.005879795644432306、0.005789037793874741、0.0063595143 14681292, -0.03549918532371521, 0.003118171589449048, -0.026993902400135994, -0.01614188589155674, 0.011578075587749481, 0.0008524731383658946、-0.013367297127842903、0.004194297362118959、0.019331367686390877、0.006152068264782429、-0.015208380 296826363, -0.0018005658639594913, -0.015714028850197792, -0.01681608520448208, -0.028990568593144417, 0.010676982812583447, 0.024595309048891068, -0.045560311526060104, -0.0009262136882171035, 0.014845349825918674, -0.020887212827801704, 0.015739960 595965385, 0.011727177537977695, 0.0012560202740132809, -0.023052429780364037, 0.0014245701022446156, -0.013062611222267151, -0.011299320496618748、0.022274507209658623、0.011338216252624989、-0.007908876053988934、0.010339883156120777、-0.0061326203 86779308, 0.01247916929423809, -0.007947771809995174, -0.0025347298942506313, -0.011416008695960045, 0.011027047410607338, 0.004521673079580069、0.04880165681242943、0.0012543996563181281、0.02115948498249054、0.0165178831666708、-0.0253732297569 51332, 0.026125222444534302, -0.0031262750271707773, 0.007669016718864441, 0.003821542952209711, -0.021561412140727043, 0.008187631145119667、0.02358401007950306、0.02249491773545742、0.015247276052832603、-0.004560569301247597、0.030753860250 115395, 0.031090958043932915, -0.021457688882946968, 0.027694031596183777, -0.004823117982596159, 0.0049171168357133865, -0.018346000462770462,-0。0030355174094438553, -0.011176149360835552, 0.024102624505758286, 0.006923507899045944, 0.010009266436100006, -0.00510187353 938818, 0.0007916979375295341, -0.004722636193037033, 0.019914809614419937, 0.026190048083662987, -0.013289504684507847, 0.006346548907458782、-0.015415825881063938、-0.026734594255685806、0.003623821074143052、0.005325525999069214、-0.0039220247 41768837, -0.00640813447535038, -0
### 使用 Neo4j 创建向量数据库 Neo4j 是一种强大的图形数据库,支持多种高级功能,包括创建和管理向量数据。为了实现这一点,通常会利用 Neo4j 的图数据科学库 (Graph Data Science Library, GDS),该库提供了丰富的机器学习算法集。 #### 安装必要的组件 首先,在 CentOS 上安装 Neo4j 可以通过指定 URL 下载 tarball 文件并解压到目标位置完成[^3]: ```bash curl -O https://neo4j.com/artifact.php?name=neo4j-community-3.5.6-unix.tar.gz tar -xf neo4j-community-3.5.6-unix.tar.gz cd neo4j-community-3.5.6/ bin/neo4j console ``` 接着配置 `neo4j.conf` 文件中的参数设置,确保启用了 APOC 插件以及内存映射等功能以便更好地处理大型矩阵运算[^2]: 编辑 `/etc/neo4j/neo4j.conf` 添加如下行: ```properties dbms.security.procedures.unrestricted=apoc.* dbms.memory.pagecache.size=2g ``` 重启服务使更改生效。 #### 构建向量索引 对于构建向量数据库而言,重要的是定义好节点属性作为特征向量,并建立相应的索引来加速查询效率。假设有一个电影推荐场景下的评分表单,其中每部影片由多个维度描述其特性,则可以在导入这些记录之前先设计好模式结构。 创建标签为 "Movie" 和具有浮点型数组字段 "features" 表示各维权重的节点类目: ```cypher CREATE CONSTRAINT ON (m:Movie) ASSERT m.id IS UNIQUE; CREATE INDEX FOR (m:Movie { features:[0.0] }) ON EACH m.features; ``` 这里需要注意两点:一是确保唯一键约束的存在;二是针对多维数值建立了专门的索引形式来优化后续操作性能。 #### 数据加载与预处理 接下来就是实际的数据填充过程了。可以通过批量插入语句或者借助外部工具如 Python 脚本来读取 CSV 文件等形式将原始资料转化为 Cypher 查询命令执行入库动作。考虑到可能存在的缺失值情况,建议预先做适当清洗工作再上传至服务器端解析运行。 例如,如果有一份名为 movies.csv 的文件包含了 id、title 字段外加若干个代表不同特性的列名(比如 genre、year 等),那么可以编写一段简单的脚本将其转换成适合 Neo4j 接受的形式: ```python import csv with open('movies.csv', 'r') as f: reader = csv.DictReader(f) for row in reader: feature_vector = [ float(row['genre']), int(row['year']) ] query = """ MERGE (m:Movie{id:{id}}) SET m += { title: {title}, features: {features} } """.format( id=row["id"], title=row["title"], features=str(feature_vector).replace("'", "") ) # 执行Cypher查询... ``` 此部分具体取决于所使用的编程环境和个人偏好而定。 #### 应用案例分析 最后介绍一个简单应用实例——基于余弦相似度计算两部作品间的关联程度。这有助于理解如何运用已有的向量化表示来进行更复杂的推理任务。给定任意两个 Movie 类型实体 u 和 v ,可通过下面这段代码片段求得二者间夹角余弦值从而反映彼此之间潜在联系强度: ```cypher MATCH (u:Movie), (v:Movie) WHERE ID(u)=<source_id> AND ID(v)=<target_id> RETURN apoc.algo.cosineSimilarity(u.features, v.features); ``` 上述表达式中调用了来自APOC插件的方法实现了快速高效的矢量空间模型比较机制[^1]。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

北京橙溪 www.enwing.com

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值