一、原始需求分析
DuckDB发布1.0版本后,貌似性能很强悍的样子,它是一个单机的嵌入式数据路,支持SQL语言操作数据,所以现在的需求是:基于DuckDB实现一个数据关联碰撞分析的功能。
1、定义一套标准的JSON格式规范,让使用方按规范传入JSON实体,方便后续提供API接口让前端页面调用;
2、根据JSON格式规范,采用JAVA语言开发解释代码,生成最终的SQL查询语句;
3、在DuckDB执行最终的SQL查询语句,返回结果;
根据需要内容,首先找一下有没现成的JSON转SQL的规范和开源框架,发现还没有符合要求的,所以需要定义JSON格式规范。
总体流程为:
二、JSON格式规范设计
以上,大模型已经按提示要求,完成了JSON格式规范的定义,其实它还把规范的必填、可选、描述、每个结构的示例都输出了,只是这里没有粘贴上来。结果如下:
JSON格式规范文档
1、顶层结构
{` `"select": [ ... ],` `"from": "string",` `"joins": [ ... ],` `"where": { ... },` `"group_by": [ ... ],` `"having": { ... },` `"order_by": [ ... ],` `"limit": integer,` `"offset": integer``}
2、示例JSON数据
{` `"select": ["users.name", "users.email", {"function": "COUNT", "field": "*", "alias": "post_count"}],` `"from": "users",` `"joins": [` `{` `"type": "INNER JOIN",` `"table": "posts",` `"on": {` `"conditions": [` `{"left": "users.id", "operator": "=", "right": "posts.user_id"},` `{"left": "users.status", "operator": "=", "right": "posts.status"}` `],` `"logic": "AND"` `}` `}` `],` `"where": {` `"conditions": [` `{` `"logic": "AND",` `"conditions": [` `{"field": "users.age", "operator": ">", "value": 21},` `{"field": "users.status", "operator": "=", "value": "active"}` `]` `},` `{` `"logic": "OR",` `"conditions": [` `{"field": "users.name", "operator": "=", "value": "Alice"},` `{"field": "users.name", "operator": "=", "value": "Carol"}` `]` `}` `],` `"logic": "AND"` `},` `"group_by": ["users.name", "users.email"],` `"having": {` `"conditions": [` `{"field": "post_count", "operator": ">", "value": 1}` `],` `"logic": "AND"` `},` `"order_by": [` `{"field": "users.name", "direction": "ASC"}` `],` `"limit": 10,` `"offset": 0``}
三、JAVA代码实现JSON转SQL
一开始大模型使用org.json库来解释JSON,它可以快速按我的指示,使用fastjson库来实现。
四、JAVA代码实现DuckDB读写
1、完整的Java代码
import com.alibaba.fastjson.JSONArray;``import com.alibaba.fastjson.JSONObject;`` ``import java.sql.Connection;``import java.sql.DriverManager;``import java.sql.ResultSet;``import java.sql.Statement;`` ``public class JSONToSQL {`` ` `public static void main(String[] args) {` `try {` `// 示例JSON数据` `String jsonString = "{\n" +` `" \"select\": [\"users.name\", \"users.email\", {\"function\": \"COUNT\", \"field\": \"*\", \"alias\": \"post_count\"}],\n" +` `" \"from\": \"users\",\n" +` `" \"joins\": [\n" +` `" {\n" +` `" \"type\": \"INNER JOIN\",\n" +` `" \"table\": \"posts\",\n" +` `" \"on\": {\n" +` `" \"conditions\": [\n" +` `" {\"left\": \"users.id\", \"operator\": \"=\", \"right\": \"posts.user_id\"},\n" +` `" {\"left\": \"users.status\", \"operator\": \"=\", \"right\": \"posts.status\"}\n" +` `" ],\n" +` `" \"logic\": \"AND\"\n" +` `" }\n" +` `" }\n" +` `" ],\n" +` `" \"where\": {\n" +` `" \"conditions\": [\n" +` `" {\n" +` `" \"logic\": \"AND\",\n" +` `" \"conditions\": [\n" +` `" {\"field\": \"users.age\", \"operator\": \">\", \"value\": 21},\n" +` `" {\"field\": \"users.status\", \"operator\": \"=\", \"value\": \"active\"}\n" +` `" ]\n" +` `" },\n" +` `" {\n" +` `" \"logic\": \"OR\",\n" +` `" \"conditions\": [\n" +` `" {\"field\": \"users.name\", \"operator\": \"=\", \"value\": \"Alice\"},\n" +` `" {\"field\": \"users.name\", \"operator\": \"=\", \"value\": \"Carol\"}\n" +` `" ]\n" +` `" }\n" +` `" ],\n" +` `" \"logic\": \"AND\"\n" +` `" },\n" +` `" \"group_by\": [\"users.name\", \"users.email\"],\n" +` `" \"having\": {\n" +` `" \"conditions\": [\n" +` `" {\"field\": \"post_count\", \"operator\": \">\", \"value\": 1}\n" +` `" ],\n" +` `" \"logic\": \"AND\"\n" +` `" },\n" +` `" \"order_by\": [\n" +` `" {\"field\": \"users.name\", \"direction\": \"ASC\"}\n" +` `" ],\n" +` `" \"limit\": 10,\n" +` `" \"offset\": 0\n" +` `"}";`` ` `// 解析JSON并生成SQL查询` `JSONObject json = JSONObject.parseObject(jsonString);` `String sqlQuery = jsonToSQL(json);` `System.out.println("Generated SQL Query: " + sqlQuery);`` ` `// 创建DuckDB连接` `Connection conn = DriverManager.getConnection("jdbc:duckdb:");` `Statement stmt = conn.createStatement();`` ` `// 创建示例数据表` `stmt.execute("CREATE TABLE users (id INTEGER, name VARCHAR, email VARCHAR, age INTEGER, status VARCHAR);");` `stmt.execute("CREATE TABLE posts (id INTEGER, user_id INTEGER, title VARCHAR, content VARCHAR, status VARCHAR);");`` ` `// 插入示例数据` `stmt.execute("INSERT INTO users VALUES (1, 'Alice', 'alice@example.com', 25, 'active');");` `stmt.execute("INSERT INTO users VALUES (2, 'Bob', 'bob@example.com', 30, 'inactive');");` `stmt.execute("INSERT INTO users VALUES (3, 'Carol', 'carol@example.com', 22, 'active');");`` ` `stmt.execute("INSERT INTO posts VALUES (1, 1, 'Post 1', 'Content 1', 'active');");` `stmt.execute("INSERT INTO posts VALUES (2, 1, 'Post 2', 'Content 2', 'active');");` `stmt.execute("INSERT INTO posts VALUES (3, 2, 'Post 3', 'Content 3', 'inactive');");` `stmt.execute("INSERT INTO posts VALUES (4, 3, 'Post 4', 'Content 4', 'active');");` `stmt.execute("INSERT INTO posts VALUES (5, 3, 'Post 5', 'Content 5', 'active');");`` ` `// 执行生成的SQL查询` `ResultSet rs = stmt.executeQuery(sqlQuery);`` ` `// 显示查询结果` `while (rs.next()) {` `System.out.println("Name: " + rs.getString("name"));` `System.out.println("Email: " + rs.getString("email"));` `System.out.println("Post Count: " + rs.getInt("post_count"));` `System.out.println("----");` `}`` ` `// 关闭连接` `stmt.close();` `conn.close();` `} catch (Exception e) {` `e.printStackTrace();` `}` `}`` ` `public static String jsonToSQL(JSONObject json) {` `StringBuilder sql = new StringBuilder();`` ` `// SELECT clause` `sql.append("SELECT ");` `JSONArray selectArray = json.getJSONArray("select");` `for (int i = 0; i < selectArray.size(); i++) {` `Object item = selectArray.get(i);` `if (item instanceof String) {` `sql.append(item);` `} else if (item instanceof JSONObject) {` `JSONObject functionObj = (JSONObject) item;` `sql.append(functionObj.getString("function"))` `.append("(")` `.append(functionObj.getString("field"))` `.append(") AS ")` `.append(functionObj.getString("alias"));` `}` `if (i < selectArray.size() - 1) {` `sql.append(", ");` `}` `}`` ` `// FROM clause` `sql.append(" FROM ").append(json.getString("from"));`` ` `// JOINS clause` `if (json.containsKey("joins")) {` `JSONArray joinsArray = json.getJSONArray("joins");` `for (int i = 0; i < joinsArray.size(); i++) {` `JSONObject joinObj = joinsArray.getJSONObject(i);` `sql.append(" ").append(joinObj.getString("type"))` `.append(" ").append(joinObj.getString("table"))` `.append(" ON ");` `JSONObject onObj = joinObj.getJSONObject("on");` `JSONArray conditionsArray = onObj.getJSONArray("conditions");` `String logic = onObj.getString("logic");` `for (int j = 0; j < conditionsArray.size(); j++) {` `JSONObject condition = conditionsArray.getJSONObject(j);` `sql.append(condition.getString("left"))` `.append(" ").append(condition.getString("operator"))` `.append(" ").append(condition.getString("right"));` `if (j < conditionsArray.size() - 1) {` `sql.append(" ").append(logic).append(" ");` `}` `}` `}` `}`` ` `// WHERE clause` `if (json.containsKey("where")) {` `sql.append(" WHERE ");` `appendConditions(sql, json.getJSONObject("where"));` `}`` ` `// GROUP BY clause` `if (json.containsKey("group_by")) {` `sql.append(" GROUP BY ");` `JSONArray groupByArray = json.getJSONArray("group_by");` `for (int i = 0; i < groupByArray.size(); i++) {` `sql.append(groupByArray.getString(i));` `if (i < groupByArray.size() - 1) {` `sql.append(", ");` `}` `}` `}`` ` `// HAVING clause` `if (json.containsKey("having")) {` `sql.append(" HAVING ");` `appendConditions(sql, json.getJSONObject("having"));` `}`` ` `// ORDER BY clause` `if (json.containsKey("order_by")) {` `sql.append(" ORDER BY ");``JSONArray orderByArray = json.getJSONArray("order_by");` `for (int i = 0; i < orderByArray.size(); i++) {` `JSONObject orderBy = orderByArray.getJSONObject(i);` `sql.append(orderBy.getString("field"))` `.append(" ").append(orderBy.getString("direction"));` `if (i < orderByArray.size() - 1) {` `sql.append(", ");` `}` `}` `}`` ` `// LIMIT clause` `if (json.containsKey("limit")) {` `sql.append(" LIMIT ").append(json.getIntValue("limit"));` `}`` ` `// OFFSET clause` `if (json.containsKey("offset")) {` `sql.append(" OFFSET ").append(json.getIntValue("offset"));` `}`` ` `sql.append(";");` `return sql.toString();` `}`` ` `private static void appendConditions(StringBuilder sql, JSONObject conditionObj) {` `JSONArray conditionsArray = conditionObj.getJSONArray("conditions");` `String logic = conditionObj.getString("logic");`` ` `for (int i = 0; i < conditionsArray.size(); i++) {` `Object item = conditionsArray.get(i);` `if (item instanceof JSONObject && ((JSONObject) item).containsKey("conditions")) {` `sql.append("(");` `appendConditions(sql, (JSONObject) item);` `sql.append(")");` `} else {` `JSONObject condition = (JSONObject) item;` `sql.append(condition.getString("field"))` `.append(" ").append(condition.getString("operator"))` `.append(" ");` `Object value = condition.get("value");` `if (value instanceof String) {` `sql.append("'").append(value).append("'");` `} else {` `sql.append(value);` `}` `}` `if (i < conditionsArray.size() - 1) {` `sql.append(" ").append(logic).append(" ");` `}` `}` `}``}
五、单元测试
1、单元测试代码
import com.alibaba.fastjson.JSONObject;``import org.junit.After;``import org.junit.Before;``import org.junit.Test;`` ``import java.sql.Connection;``import java.sql.DriverManager;``import java.sql.ResultSet;``import java.sql.Statement;`` ``import static org.junit.Assert.*;`` ``public class JSONToSQLTest {`` ` `private Connection conn;` `private Statement stmt;`` ` `@Before` `public void setUp() throws Exception {` `// 创建DuckDB连接` `conn = DriverManager.getConnection("jdbc:duckdb:");` `stmt = conn.createStatement();`` ` `// 创建示例数据表` `stmt.execute("CREATE TABLE users (id INTEGER, name VARCHAR, email VARCHAR, age INTEGER, status VARCHAR);");` `stmt.execute("CREATE TABLE posts (id INTEGER, user_id INTEGER, title VARCHAR, content VARCHAR, status VARCHAR);");`` ` `// 插入示例数据` `stmt.execute("INSERT INTO users VALUES (1, 'Alice', 'alice@example.com', 25, 'active');");` `stmt.execute("INSERT INTO users VALUES (2, 'Bob', 'bob@example.com', 30, 'inactive');");` `stmt.execute("INSERT INTO users VALUES (3, 'Carol', 'carol@example.com', 22, 'active');");`` ` `stmt.execute("INSERT INTO posts VALUES (1, 1, 'Post 1', 'Content 1', 'active');");` `stmt.execute("INSERT INTO posts VALUES (2, 1, 'Post 2', 'Content 2', 'active');");` `stmt.execute("INSERT INTO posts VALUES (3, 2, 'Post 3', 'Content 3', 'inactive');");` `stmt.execute("INSERT INTO posts VALUES (4, 3, 'Post 4', 'Content 4', 'active');");` `stmt.execute("INSERT INTO posts VALUES (5, 3, 'Post 5', 'Content 5', 'active');");` `}`` ` `@After` `public void tearDown() throws Exception {` `stmt.execute("DROP TABLE users;");` `stmt.execute("DROP TABLE posts;");` `stmt.close();` `conn.close();` `}`` ` `@Test` `public void testJsonToSQL() throws Exception {` `// 示例JSON数据` `String jsonString = "{\n" +` `" \"select\": [\"users.name\", \"users.email\", {\"function\": \"COUNT\", \"field\": \"*\", \"alias\": \"post_count\"}],\n" +` `" \"from\": \"users\",\n" +` `" \"joins\": [\n" +` `" {\n" +` `" \"type\": \"INNER JOIN\",\n" +` `" \"table\": \"posts\",\n" +` `" \"on\": {\n" +` `" \"conditions\": [\n" +` `" {\"left\": \"users.id\", \"operator\": \"=\", \"right\": \"posts.user_id\"},\n" +` `" {\"left\": \"users.status\", \"operator\": \"=\", \"right\": \"posts.status\"}\n" +` `" ],\n" +` `" \"logic\": \"AND\"\n" +` `" }\n" +` `" }\n" +` `" ],\n" +` `" \"where\": {\n" +` `" \"conditions\": [\n" +` `" {\n" +` `" \"logic\": \"AND\",\n" +` `" \"conditions\": [\n" +` `" {\"field\": \"users.age\", \"operator\": \">\", \"value\": 21},\n" +` `" {\"field\": \"users.status\", \"operator\": \"=\", \"value\": \"active\"}\n" +` `" ]\n" +` `" },\n" +` `" {\n" +` `" \"logic\": \"OR\",\n" +` `" \"conditions\": [\n" +` `" {\"field\": \"users.name\", \"operator\": \"=\", \"value\": \"Alice\"},\n" +` `" {\"field\": \"users.name\", \"operator\": \"=\", \"value\": \"Carol\"}\n" +` `" ]\n" +` `" }\n" +` `" ],\n" +` `" \"logic\": \"AND\"\n" +` `" },\n" +` `" \"group_by\": [\"users.name\", \"users.email\"],\n" +` `" \"having\": {\n" +` `" \"conditions\": [\n" +` `" {\"field\": \"post_count\", \"operator\": \">\", \"value\": 1}\n" +` `" ],\n" +` `" \"logic\": \"AND\"\n" +` `" },\n" +` `" \"order_by\": [\n" +` `" {\"field\": \"users.name\", \"direction\": \"ASC\"}\n" +` `" ],\n" +` `" \"limit\": 10,\n" +` `" \"offset\": 0\n" +` `"}";`` ` `JSONObject json = JSONObject.parseObject(jsonString);` `String sqlQuery = JSONToSQL.jsonToSQL(json);` `System.out.println("Generated SQL Query: " + sqlQuery);`` ` `// 验证生成的SQL语句是否正确` `String expectedSQL = "SELECT users.name, users.email, COUNT(*) AS post_count FROM users " +` `"INNER JOIN posts ON users.id = posts.user_id AND users.status = posts.status " +` `"WHERE (users.age > 21 AND users.status = 'active') " +` `"AND (users.name = 'Alice' OR users.name = 'Carol') " +` `"GROUP BY users.name, users.email " +` `"HAVING post_count > 1 " +` `"ORDER BY users.name ASC " +` `"LIMIT 10 OFFSET 0;";` `assertEquals(expectedSQL, sqlQuery);`` ` `// 执行生成的SQL查询` `ResultSet rs = stmt.executeQuery(sqlQuery);`` ` `// 验证查询结果` `assertTrue(rs.next());` `assertEquals("Alice", rs.getString("name"));` `assertEquals("alice@example.com", rs.getString("email"));` `assertEquals(2, rs.getInt("post_count"));`` ` `assertTrue(rs.next());` `assertEquals("Carol", rs.getString("name"));` `assertEquals("carol@example.com", rs.getString("email"));` `assertEquals(2, rs.getInt("post_count"));`` ` `assertFalse(rs.next());` `}``}
六、在Idea工具运行代码
将代码复制到Idea,直接运行成功。绝大部分程序员应该是做不到的。
我们也可以进一步让大模型优化一些细节,比如校验JSON参数是否符合规范、生成更大的数据集进行测试、生成更复杂的测试用例等等;
读者福利:如果大家对大模型感兴趣,这套大模型学习资料一定对你有用
对于0基础小白入门:
如果你是零基础小白,想快速入门大模型是可以考虑的。
一方面是学习时间相对较短,学习内容更全面更集中。
二方面是可以根据这些资料规划好学习计划和方向。
包括:大模型学习线路汇总、学习阶段,大模型实战案例,大模型学习视频,人工智能、机器学习、大模型书籍PDF。带你从零基础系统性的学好大模型!
😝有需要的小伙伴,可以保存图片到wx扫描二v码免费领取【保证100%免费
】🆓
👉AI大模型学习路线汇总👈
大模型学习路线图,整体分为7个大的阶段:(全套教程文末领取哈)
第一阶段: 从大模型系统设计入手,讲解大模型的主要方法;
第二阶段: 在通过大模型提示词工程从Prompts角度入手更好发挥模型的作用;
第三阶段: 大模型平台应用开发借助阿里云PAI平台构建电商领域虚拟试衣系统;
第四阶段: 大模型知识库应用开发以LangChain框架为例,构建物流行业咨询智能问答系统;
第五阶段: 大模型微调开发借助以大健康、新零售、新媒体领域构建适合当前领域大模型;
第六阶段: 以SD多模态大模型为主,搭建了文生图小程序案例;
第七阶段: 以大模型平台应用与开发为主,通过星火大模型,文心大模型等成熟大模型构建大模型行业应用。
👉大模型实战案例👈
光学理论是没用的,要学会跟着一起做,要动手实操,才能将自己的所学运用到实际当中去,这时候可以搞点实战案例来学习。
👉大模型视频和PDF合集👈
观看零基础学习书籍和视频,看书籍和视频学习是最快捷也是最有效果的方式,跟着视频中老师的思路,从基础到深入,还是很容易入门的。
👉学会后的收获:👈
• 基于大模型全栈工程实现(前端、后端、产品经理、设计、数据分析等),通过这门课可获得不同能力;
• 能够利用大模型解决相关实际项目需求: 大数据时代,越来越多的企业和机构需要处理海量数据,利用大模型技术可以更好地处理这些数据,提高数据分析和决策的准确性。因此,掌握大模型应用开发技能,可以让程序员更好地应对实际项目需求;
• 基于大模型和企业数据AI应用开发,实现大模型理论、掌握GPU算力、硬件、LangChain开发框架和项目实战技能, 学会Fine-tuning垂直训练大模型(数据准备、数据蒸馏、大模型部署)一站式掌握;
• 能够完成时下热门大模型垂直领域模型训练能力,提高程序员的编码能力: 大模型应用开发需要掌握机器学习算法、深度学习框架等技术,这些技术的掌握可以提高程序员的编码能力和分析能力,让程序员更加熟练地编写高质量的代码。
👉获取方式:
😝有需要的小伙伴,可以保存图片到wx扫描二v码免费领取【保证100%免费
】🆓