ACM SIGMOD Blog | The LLM+KG Workshop (VLDB‘24) 专题研讨报告

Large Language Models, Knowledge Graphs, and Vector Databases: Synergy and Opportunities for Data Management

(A Report on the LLM+KG@VLDB24 Workshop’s Panel Discussion)

Xi Chen, Wei Hu, Arijit Khan, Shreya Shankar,
Haofen Wang, Jianguo Wang, and Tianxing Wu

NOVEMBER 22, 2024

(https://wp.sigmod.org/?p=3813)

INTRODUCTION

Large language models (LLMs) and vector databases (Vector DBs) are becoming two vital enablers of generative AI (GenAI), a form of artificial intelligence that learns from massive datasets to generate new data, showcasing human-like creativity in text, images to code, speech, and video. In particular, LLMs are currently revolutionizing the field of natural language processing with their remarkable capacity to understand and generate human-like texts [1]. On the other hand, the transformation of heterogeneous data forms – such as images, documents, and videos – into dense vectors with deep learning models sets the foundation for multimodal data querying and retrieval systems using semantic similarity. To manage these dense, high-dimensional, billion-scale vectors, Vector DBs have recently emerged as a hot topic [2]. Last but not least, knowledge graphs (KGs) can support a holistic integration solution for multi-modal data from heterogeneous sources. For instance, KGs use graph structures to describe relationships between things, where nodes and edges can have features of different types, e.g., tabular, key-value pairs, text, images, multimedia, etc. Therefore, KGs are increasingly being used to model cross-domain and diverse data. They are playing a central role in AI systems such as semantic search, intelligent QA, and recommendations [3].

The LLM+KG@VLDB24 panel’s goal was to bring together several leading experts in these three booming technologies (LLMs, Vector DBs, and KGs) to share their insights and experiences from various perspectives [4]. What are the synergies among LLMs, Vector DBs, and KGs? What are really new, what are the pain points, how to tackle them, and what future holds? The panel also aimed at discussing what interesting opportunities are waiting for the data management researchers in this domain, as the interdisciplinary fields like data-centric approaches to AI and ML and data science gain prominence. The panelists were Wei Hu (Nanjing University, China), Shreya Shankar (UC Berkeley, USA), Haofen Wang (Tongji University, China), and Jianguo Wang (Purdue University, USA). The panel was moderated by the LLM+KG@VLDB24 workshop co-chairs Arijit Khan (Aalborg University, Denmark), Tianxing Wu (Southeast University, China), and Xi Chen (Tencent, China). The panel was also well-attended, with a peak audience of over 150 persons.

OVERVIEW OF PANEL DISCUSSION

The moderators organized the discussion into five broad themes as follows.

Theme-1: Unifying LLMs + KGs + Vector Databases

Context/Background and Questions: While recent years have witnessed the rapid developments of LLMs and Vector DBs, KGs have become mainstream and one of the most popular technologies in the AI community since Google introduced their knowledge graph in the past decade. Are LLMs and Vector DBs new kids on the block or noise from the hype cycle? What are the synergies among LLMs, vector databases, and graph data management including knowledge graphs? How can they benefit each other? What are the killer applications/ domains that require their unification?

Panel Discussion Summary

Wei Hu. LLMs (a.k.a. foundation models) are currently on the center stage and are definitely a key to the artificial general intelligence (AGI). KGs and Vector DBs can support LLMs. For instance, KGs assist in knowledge enhancement to knowledge update of LLMs via parameter-efficient fine-tuning (PEFT) or retrieval-augmented generation (RAG) and resolve their hallucination problem. On the other hand, one of the most critical use cases of Vector DBs for LLMs is the RAG. Vector DBs such as Weaviate, Chroma, FAISS, Milvus, Qdrant, Vespa, or Pinecone act as external knowledge bases or external memory of LLMs. They enable low-latency similarity search using approximate nearest neighbor search (ANNS) algorithms over the vector space, providing the retrieved knowledge efficiently to the LLMs for high-quality generation.

Shreya Shankar. Many old DB ideas are relevant around LLMs’ self-consistency, thinking step-by-step, etc. [5]. Additionally, LLM pipelines are much faster than traditional ML lifecycles. This is because data preparation for traditional ML is challenging and expensive, whereas LLMs circumvent it entirely thanks to simpler prompt-based interactions without any requirement of model training, making it easy to build AI pipelines around LLMs. Therefore, LLMs expand the domain of downstream applications and what is feasible. For instance, LLMs can help automate knowledge graph creation in domain-specific settings. Journalists in the Berkeley Institute of Data Science have created knowledge graphs from civic data by supervising teams of annotators [6]. Can we replace part of the workflow with LLMs?

Haofen Wang. LLMs, Vector DBs, and KGs benefit each other by combining structured data for accuracy (knowledge graphs), efficient data retrieval (vector databases), and contextual understanding (LLMs), ensuring robust querying, reasoning, and interpretability.

The synergies have the potential to revolutionize data management and usage in various domains. The synergy is particularly transformative in domains like personalized healthcare and financial analytics. In healthcare, integrating patient data in knowledge graphs with LLMs can provide precise, personalized treatment recommendations [7]. In finance, it enables real-time risk assessments and portfolio optimization through a better understanding of complex relationships and data trends [8].

While some aspects may seem part of the hype cycle, the foundational ideas behind the integration of LLMs, vector databases, and knowledge graphs are well-grounded in addressing real-world data challenges. They are not merely “new kids on the block”, but rather evolving technologies that offer substantial benefits as they mature. As the underlying technology and methodologies improve, their combined application promises not only to address current inefficiencies, but also to unlock new, innovative capabilities in data management and AI.

Jianguo Wang. While vector-based RAGs are useful for LLMs, LLMs over graph-based applications need both vector- and graph-based RAGs, e.g., consider queries like “What do others say about my papers?” or “Find competitors with similar products to mine and analyze their pricing strategies for different products”. In particular, GraphRAG [9] is emerging as a viable alternative to conventional RAG to achieve more precise search results.

Theme-2: Data Management for LLMs, KGs, and Vector Data

Context/Background and Questions: Data play a fundamental role in the modern machine learning ecosystem. How can state-of-the-art data curation, data management systems and algorithms (including graph systems) improve LLMs and vector data management?

Panel Discussion Summary:

Wei Hu. Huge multi-modal data, e.g., text, code, etc. are important for training LLMs. Therefore, data cleaning, data masking, data denoising, and other data engineering efforts are important for improving LLM’s performance and training efficiency during pre-training and fine-tuning. Considering broader machine learning systems (MLSys), effective and efficient DB implementations such as Graph DBs (KGs), Vector DBs, and even blockchains can be useful [10]. Core DB concepts such as declarative querying, query processing and optimization, indexing, checkpointing, etc. are critical for designing effective, efficient, scalable, and robust ML systems.

Shreya Shankar. There are plenty of opportunities for data management researchers in these emerging technologies. Training-wise, one needs sophisticated techniques to filter, or even repair bad data, e.g., see all the data filtering strategies used in the Llama 3 [11]. Deployment-wise, bolt-on data quality constraints for LLM-generated data is required [12]. Analogously for RAG pipelines, LLM agents must check whether the retrieved data is high-quality and fix the data if they are not. For incorporating dynamic updates to data and models, data lifecycle management such as versioning, provenance tracking, incremental updates to vector indexes, etc. are also crucial, especially in the context of RAG-based LLMs.

Haofen Wang. State-of-the-art data curation and management systems, including graph systems, can significantly enhance LLMs and vector data management by ensuring high-quality, well-structured data input and efficient handling of complex data relationships. Advanced curation techniques reduce noise and bias, providing LLMs with cleaner, more relevant data, improving their accuracy and contextual understanding. Graph systems facilitate the management of intricate relationships within data, enriching vector representations and enabling better semantic understanding and retrieval capabilities. This integration supports more precise applications in search, recommendations, and decision-making processes.

Jianguo Wang. Relational DBs may support efficient vector data management. A recent study [13] identified the underlying root causes of the performance gap between generalized vector databases and specialized vector databases, and suggested directions for building future generalized vector databases to achieve comparable performance to high-performance specialized vector databases. Moreover, SingleStore, a modern distributed relational database system, has already supported vector search efficiently [28].

Theme-3: LLMs, KGs, and Vector Data for Data Science

Context/Background and Questions: Data science [14] is an interdisciplinary field of scientific methods, tools, algorithms, and systems, involving data engineering, analytics, regulation, and ethics, studying each phase of the data pipeline, but ultimately for data use, to extract its full potential in an application-driven manner.How can LLMs and vector databases enhance data management, data science, and data pipelines? What roles can LLMs and vector databases play in the management of relational data, graphs, multi-modal data, and data lakes? How will they reshape databases, knowledge bases, and data science?

Panel Discussion Summary:

Wei Hu. Potential areas or success stories of LLMs for data management include NLIDB (natural language interfaces for databases)/ Text2SQL [15], query optimization [16], data curation [17], self-driving DBs [18], data education [19], etc. There are also many downstream applications of LLMs for data science such as generation tasks, e.g., protein sequence generation, and analysis tasks, e.g., summarization and statistics for end users. However, transparency and explainability remain key challenges, where KGs can potentially help, e.g., via fact-checking to justify LLMs’ decisions.

Shreya Shankar. LLMs and vector databases make complex, intelligent data processing pipelines much more feasible, e.g., a hospital employee can write a pipeline to extract all instances of medications from unstructured doctor notes via simpler prompt-based interactions without training any models. Researchers are already working on unstructured to structured data extraction with LLMs [20]. But how can one support more complex views beyond just data extraction? More “reasoning” tasks? E.g., a hospital employee needs to write a pipeline that strips all personal identifiable information (PII) from unstructured doctor notes and then summarizes any poor reactions to medications. This would require domain-specific understanding, domain-specific validators/guardrails, declarative interfaces, and more. Some interesting systems in this space are recently emerging, e.g., [21, 22, 23].

Haofen Wang. LLMs and Vector DBs enhance data management, data science, and data pipelines by providing improved data retrieval, semantic search capabilities, and insightful data analytics. LLMs can analyze and generate insights from relational data, graphs (e.g., OpenKG + Semantic-enhanced Programmable Graph framework (SPG) [24]), and multi-modal data by understanding context and relationships, while vector databases enable efficient storage and retrieval of high-dimensional data across diverse data types, including data lakes.

They will reshape databases and knowledge bases by enabling more dynamic and responsive search and query responses, facilitating richer interactions with data. In data science, they enhance model training and insights extraction by allowing for deeper, more interconnected data analyses, leading to more predictive and generalized models.

Jianguo Wang. LLMs, Vector DBs, and KGs have expanded the horizon of data management from traditional relational data, graph data, key-value pairs, etc. to embracing multi-modal data such as images and unstructured text. The future of data management is to unlock the full potential of an organization’s multi-modal data, and the benefits are evident across diverse domains such as e-commerce, healthcare, and finance.

Theme-4: Human-in-the-Loop for LLMs, KGs, and Vector Data Management

Context/Background and Questions: LLMs learn parametric knowledge from large training corpora, and may not explicitly store consistent representations of knowledge, hence they can output unreliable and incoherent responses, and often hallucinate by generating factually incorrect statements. How can human-in-the-loop and KGs facilitate the alignment of LLMs? LLMs are “black-box” systems and can reveal private or sensitive data. Analogously, vector data are difficult to interpret. What is the significance of explainability, fairness, privacy, and responsible AI in LLM systems and Vector DBs?

Panel Discussion Summary:

Wei Hu. Fine-tuning LLMs with human feedback can be done via reinforcement learning (RLHF). Therefore, in LLMs, human task design is important. Users can rank the model’s responses based on their preferences. This ranking provides a reward signal to align the model and enhances its outputs. More recently, reinforcement learning from AI feedback (RLAIF) improves on RLHF by getting feedback from an AI model or directly from the environment. KGs can also provide domain-specific or up-to-date knowledge to LLMs via pre-training, fine-tuning, and retrieval-augmented generation. Crowdsourcing and active learning can help in KG integration such as entity alignment [25, 26].

Shreya Shankar. LLMs make mistakes and require guardrails. Many of these guardrails are powered by LLMs, e.g., collaborative LLMs, LLM-as-a-Judge. Can we augment the LLM guardrail prompts with knowledge graphs? This requires retrieving the relevant subgraphs from KGs via graph reasoning or subgraph retrieval methods.

Haofen Wang. Significance of human-in-the-loop and other aspects: 1) Human-in-the-Loop: In LLM systems and vector databases, having a human-in-the-loop ensures that AI outputs are reviewed and refined by human expertise, improving accuracy, contextual relevance, and reducing errors. 2) Explainability: It is crucial for interpreting and understanding AI decisions, allowing users to trust and verify model predictions and data representations, especially in complex systems. 3) Fairness: Ensuring fairness in AI helps prevent biases that can arise from skewed data or model training, promoting equitable outcomes across different demographics. 4) Privacy: Respecting data privacy is critical, as LLMs and vector databases often handle sensitive information. Strategies to maintain privacy, such as data anonymization and secure data handling protocols, are essential. 5) Responsible AI: This involves developing AI systems that adhere to ethical guidelines, emphasizing safety, accountability, and transparency in AI operations.

Role of knowledge graphs in LLM Alignment: knowledge graphs can align LLMs by providing structured and interconnected information to enhance the contextual understanding of language models. They serve as a rich source of semantic relationships and factual data, helping LLMs to disambiguate meanings, answer queries more accurately, and maintain a comprehensive knowledge base. This integration ensures that LLMs are more grounded in real-world information, improving their logical coherence and ability to generate reliable and relevant outputs.

Theme-5: Interdisciplinary Collaboration on LLMs, KGs, and Vector Data

Context/Background and Questions: How can research in this arena benefit through interdisciplinary collaborations across DB, ML, Systems, NLP, HCI, CV, among others? What would be the roles of benchmarking, open-source models, tools, and datasets? How to foster academia + industry partnership for actual impacts?

Panel Discussion Summary:

Wei Hu. Benchmarks and open-source LLMs are very important. However, recently there are also concerns because of too many benchmark datasets and empirical studies. This is due to domain-specific LLMs reporting only “biased” results, whereas general LLMs cannot achieve SOTA results on all benchmarks.

Academia + industry collaborations are really important, since industry leads the LLM development. They also have plenty of GPU resources, engineering teams, and money. It is crucial to figure out how academia can participate, e.g., via student internship programs and industrial collaborations.

Shreya Shankar. This is a very interdisciplinary area. There are lots of working mechanisms in LLM pipelines and interfaces around them. The DB community is well-positioned to own the data preprocessing and validation parts of pipelines (see the DEEM workshop [27] at SIGMOD). The DB community also has a lot of knowledge about structured data and relational data and can potentially train custom models for these settings. LLM startups in industry are very open to collaboration in my experience.

Haofen Wang. Interdisciplinary collaborations in this domain can drive innovations, create comprehensive solutions, and encourage idea exchange by integrating diverse expertise from fields like DB, ML, NLP, HCI, and CV. It is important to identify what database systems can offer to ML such as ML-native DBs.

Benchmarking provides performance standards and comparison metrics. Open-source models/tools/data can enhance accessibility, speeds up innovation, and fosters community collaboration. To foster effective academia + industry partnerships, it is important to align objectives, collaborate on joint projects, offer internships, and co-fund initiatives for practical impacts and knowledge exchange.

Jianguo Wang. Startups can embrace disruptive technologies. To make impacts in this domain, it is important to work on real-world problems and build real systems.

CONCLUDING REMARKS

The panel concluded by discussing the most pressing challenges, which include conducting neural-symbolic reasoning, scaling integration, ensuring data privacy and compliance, and managing complex, dynamic knowledge graphs. Off-the-shelf LLMs do not follow complex instructions well and need guardrails. Annotators learn and maintain information across document boundaries. Operating across document boundaries is difficult with LLMs. Aligning LLMs with structured data to reduce biases and managing the computational costs of querying vectorized data, while maintaining performance also remains significant obstacles. Hopefully, this panel discussion and the open problems will inspire others to work on the emerging data management issues in this domain.

REFERENCES

[1] Jingfeng Yang, Hongye Jin, Ruixiang Tang, Xiaotian Han, Qizhang Feng, Haoming Jiang, Shaochen Zhong, Bing Yin, and Xia Ben Hu, “Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond,” ACM Trans. Knowl. Discov. Data, vol. 18, no. 6, pp. 160:1-160:32, 2024.

[2] James Jie Pan, Jianguo Wang, and Guoliang Li, “Survey of Vector Database Management Systems,” VLDB J., vol. 33, no. 5, pp. 1591–1615, 2024.

[3] Vinay K. Chaudhri, Chaitanya K. Baru, Naren Chittar, Xin Luna Dong, Michael R. Genesereth, James A. Hendler, Aditya Kalyanpur, Douglas B. Lenat, Juan Sequeda, Denny Vrandecic, and Kuansan Wang, “Knowledge Graphs: Introduction, History and, Perspectives,” AI Mag., vol. 43, no. 1, pp. 17-29, 2022.

[4] Arijit Khan, Tianxing Wu, and Xi Chen, “LLM+KG@VLDB’24 Workshop Summary,” arXiv:2410.01978, 2024.

[5] Aditya G. Parameswaran, Shreya Shankar, Parth Asawa, Naman Jain, and Yujie Wang, “Revisiting Prompt Engineering via Declarative Crowdsourcing,” CIDR, 2024.

[6] “California Police Records Access Project,” https://bids.berkeley.edu/california-police-records-access-project.

[7] Yanjun Gao, Ruizhe Li, John R. Caskey, Dmitriy Dligach, Timothy A. Miller, Matthew M. Churpek, and Majid Afshar, “Leveraging A Medical Knowledge Graph into Large Language Models for Diagnosis Prediction”, arXiv:2308.14321, 2023

[8] Xiaohui Victor Li and Francesco Sanna Passino, “FinDKG: Dynamic Knowledge Graphs with Large Language Models for Detecting Global Trends in Financial Markets”, arXiv:2407.10909, 2024

[9] Zhentao Xu, Mark Jerome Cruz, Matthew Guevara, Tie Wang, Manasi Deshpande, Xiaofeng Wang, and Zheng Li, “Retrieval-Augmented Generation with Knowledge Graphs for Customer Service Question Answering,” SIGIR, pp. 2905-2909, 2024.

[10] Honghu Wu, Xiangrong Zhu, and Wei Hu, “A Blockchain System for Clustered Federated Learning with Peer-to-Peer Knowledge Transfer,” Proc. VLDB Endow., vol. 17, no. 5, pp. 966-979, 2024.

[11] Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, and et al., “The Llama 3 Herd of Models,” arXiv:2407.21783, 2024.

[12] Shreya Shankar, Haotian Li, Parth Asawa, Madelon Hulsebos, Yiming Lin, J. D. Zamfirscu-Pereira, Harrison Chase, Will Fu-Hinthorn, Aditya G. Parameswaran, and Eugene Wu, “SPADE: Synthesizing Data Quality Assertions for Large Language Model Pipelines”, Proc. VLDB Endow., vol. 17, no. 12, pp. 4173-4186, 2024.

[13] Yunan Zhang, Shige Liu, and Jianguo Wang, “Are There Fundamental Limitations in Supporting Vector Data Management in Relational Databases? A Case Study of PostgreSQL,” ICDE, pp. 3640-3653, 2024.

[14] M. T. Ozsu, “Data Science – A Systematic Treatment,” Commun. ACM, vol. 66, no. 7, pp. 106–116, 2023.

[15] Dawei Gao, Haibin Wang, Yaliang Li, Xiuyu Sun, Yichen Qian, Bolin Ding, and Jingren Zhou, “Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation,” Proc. VLDB Endow., vol. 17, no. 5, pp. 1132-1145, 2024.

[16] Zhaodonghui Li, Haitao Yuan, Huiming Wang, Gao Cong, and Lidong Bing, “LLM-R2: A Large Language Model Enhanced Rule-based Rewrite System for Boosting Query Efficiency,” arXiv:2404.12872, 2024.

[17] Chengliang Chai, Nan Tang, Ju Fan, and Yuyu Luo, “Demystifying Artificial Intelligence for Data Preparation,” SIGMOD, 13-20, 2023.

[18] Immanuel Trummer, “DB-BERT: Making Database Tuning Tools “Read” the Manual,” VLDB J., vol. 33, no. 4, pp. 1085-1104, 2024.

[19] Sihem Amer-Yahia, Angela Bonifati, Lei Chen, Guoliang Li, Kyuseok Shim, Jianliang Xu, and Xiaochun Yang, “From Large Language Models to Databases and Back: A Discussion on Research and Education,” SIGMOD Rec., vol. 52, no. 3, pp. 49-56, 2023.

[20] Simran Arora, Brandon Yang, Sabri Eyuboglu, Avanika Narayan, Andrew Hojel, Immanuel Trummer, and Christopher Ré, “Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes,” Proc. VLDB Endow., vol. 17, no. 2, pp. 92-105, 2023.

[21] Chunwei Liu, Matthew Russo, Michael J. Cafarella, Lei Cao, Peter Baile Chen, Zui Chen, Michael J. Franklin, Tim Kraska, Samuel Madden, and Gerardo Vitagliano, “A Declarative System for Optimizing AI Workloads,” arXiv:2405.14696, 2024.

[22] Liana Patel, Siddharth Jha, Carlos Guestrin, and Matei Zaharia, “LOTUS: Enabling Semantic Queries with LLMs Over Tables of Unstructured and Structured Data,” arXiv:2407.11418, 2024.

[23] Shreya Shankar, Aditya Parameswaran, and Eugene Wu, “Reimagining LLM-Powered Unstructured Data Analysis with DocETL,” https://data-people-group.github.io/blogs/2024/09/24/docetl/, 2024.

[24] Ant Group and OpenKG, “Semantic-enhanced Programmable Knowledge Graph (SPG) White paper (v1.0) ,” https://spg.openkg.cn/en-US, 2023.

[25] Jiacheng Huang, Wei Hu, Zhifeng Bao, Qijin Chen, and Yuzhong Qu, “Deep Entity Matching with Adversarial Active Learning,” VLDB J., vol. 32, no. 1, pp. 229-255, 2023.

[26] Jiacheng Huang, Zequn Sun, Qijin Chen, Xiaozhou Xu, Weijun Ren, and Wei Hu, “Deep Active Alignment of Knowledge Graph Entities and Schemata,” Proc. ACM Manag. Data, vol. 1, no. 2, pp. 159:1-159:26, 2023.

[27] DEEM’24: Proceedings of the Eighth Workshop on Data Management for End-to-End Machine Learning. Association for Computing Machinery, 2024.

[28] Cheng Chen, Chenzhe Jin, Yunan Zhang, Sasha Podolsky, Chun Wu, Szu-Po Wang, Eric Hanson, Zhou Sun, Robert Walzer, and Jianguo Wang. SingleStore-V: An Integrated Vector Database System in SingleStore. PVLDB, 17(12): 3772-3785, 2024.

Biographies of Panelists

Wei Hu is a full professor in the School of Computer Science at Nanjing University. His main research areas include knowledge graph, data integration, and intelligent software. He has conducted visiting research at VU University Amsterdam, Stanford University, and University of Toronto. He has published over 50 papers in top-tier conferences and journals, such as SIGMOD, VLDB, ICDE, WWW, SIGIR, ICML, NeurIPS, AAAI, IJCAI, ACL, EMNLP, NAACL, ICSE, TKDE, VLDBJ, TSE, and TNNLS. He has received the Best Paper Awards at JIST, CCKS, and CHIP, and the Best Paper Nomination at ISWC. More information can be found at https://ws.nju.edu.cn/~whu.

Shreya Shankar is a PhD student in computer science at UC Berkeley, advised by Dr. Aditya Parameswaran. Her research addresses data challenges in production ML pipelines through a human-centered lens, focusing on data quality, observability, and more recently, leveraging large language models for data preprocessing. Shreya’s work has appeared in top data management and HCI venues, including SIGMOD, VLDB, CIDR, CSCW, and UIST. She is a recipient of the NDSEG Fellowship and co-organizes the DEEM workshop at SIGMOD, which focuses on data management in end-to-end machine learning. Prior to her PhD, Shreya worked as an ML engineer and completed her undergraduate degree in computer science at Stanford University. More information can be found at https://www.sh-reya.com/.

Haofen Wang is a Distinguished Researcher and Ph.D. supervisor under the “100 People Plan” at Tongji University. He is one of the initiators of OpenKG, the world’s largest alliance for Chinese open knowledge graphs. He has participated in and led several national-level AI-related projects, published over 100 high-level papers in the AI field with more than 3,900 citations and an H-index of 29. He developed the world’s first interactive virtual idol—”Amber Xuyan.” Additionally, the intelligent customer service robots he built have served over 1 billion users. Currently, he holds several social positions including Vice Chairman of the Terminology Committee of the Chinese Computer Federation (CCF), Secretary-General of the Natural Language Processing Society, Director of the Chinese Information Society of China, Executive Committee member of the Large Model Committee, Deputy Secretary-General of the Language and Knowledge Computing Committee, and Deputy Director of the Natural Language Processing Committee of the Shanghai Computer Society. More information can be found at https://scholar.google.com/citations?user=1FhdXpsAAAAJ&hl=en.

Jianguo Wang is an Assistant Professor of Computer Science at Purdue University. He obtained his Ph.D. from the University of California, San Diego. He has worked or interned at Zilliz, Amazon AWS, Microsoft Research, Oracle, and Samsung on various database systems. His current research interests include database systems for the cloud and large language models, especially disaggregated databases and vector databases. He regularly publishes and serves as a program committee member at premier database conferences such as SIGMOD, VLDB, and ICDE. He also served as a panel moderator for theVLDB’24 panel on vector databases. He is a recipient of the NSF CAREER Award. More information can be found at https://cs.purdue.edu/homes/csjgwang/.

Biographies of Moderators

Arijit Khan is an IEEE senior member, an ACM distinguished speaker, and an associate professor in the Department of Computer Science, Aalborg University, Denmark. He earned his Ph.D. from UC Santa Barbara, USA and did a postdoc at ETH Zurich, Switzerland. He has been an assistant professor at NTU Singapore. Arijit is the recipient of the IBM Ph.D. Fellowship (2012-13), a VLDB Distinguished Reviewer award (2022), and a SIGMOD Distinguished PC award (2024). He published over 80 papers in premier data management and mining venues, e.g., SIGMOD, VLDB, TKDE, ICDE, WWW, SDM, EDBT, CIKM, WSDM, and TKDD. Arijit co-presented tutorials on graph queries, systems, applications, and machine learning at VLDB, ICDE, CIKM, and DSAA; and is serving in the program committee/ senior program committee of KDD, SIGMOD, VLDB, ICDE, ICDM, EDBT, SDM, CIKM, AAAI, WWW, and an associate editor of TKDE and TKDD. Arijit wrote a book on uncertain graphs in the Morgan & Claypool’s Synthesis Lectures on Data Management and contributed invited chapters and articles on big graphs querying and mining in the ACM SIGMOD blog and in the Springer Encyclopedia of Big Data Technologies. More information at https://homes.cs.aau.dk/~Arijit/index.html.

Tianxing Wu is an associate professor working at School of Computer Science and Engineering of Southeast University, China. He is one of the main contributors to build Chinese large-scale encyclopedic knowledge graph: Zhishi.me and schema knowledge graph: Linked Open Schema. He was awarded the 2019 Excellent Ph.D. Degree Dissertation of Jiangsu Computer Society, 2020 Excellent Ph.D. Degree Dissertation of Southeast University, and CCKS 2022 Best Paper Award. His research interests include knowledge graph, knowledge representation and reasoning, and data mining. He has published over 50 papers in top-tier conferences and journals, such as ICDE, AAAI, IJCAI, ECAI, ISWC, TKDE, TKDD, JWS, WWWJ, and etc. He is the editorial board member of International Journal on Semantic Web and Information Systems, Data Intelligence, and etc. He also has served as the (senior) program committee member of AAAI, IJCAI, ACL, The WebConf, EMNLP, ISWC, ECAI, and etc. More information at https://tianxing-wu.github.io.

Xi Chen is the director of the algorithm Team of Platform and Content Group, Tencent. He received the Ph.D. Degree Dissertation of Zhejiang University and won good results in many KG and LLM competitions, such as CCKS2020 NER Task, CHIP2020 Relation Extraction Task, SuperGLUE Challenge, Semeval and so on. He has published over 40 papers in top-tier conferences and journals, such as ACL, EMNLP, NeurIPS, WWW, AAAI, IJCAI, TKDE, JWS, and etc. He was awarded the PAKDD 2021 Best Paper Award. More information at https://scholar.google.com/citations?user=qy0QX0MAAAAJ&hl=zh-CN.


OpenKG

OpenKG(中文开放知识图谱)旨在推动以中文为核心的知识图谱数据的开放、互联及众包,并促进知识图谱算法、工具及平台的开源开放。

如何学习AI大模型 ?

“最先掌握AI的人,将会比较晚掌握AI的人有竞争优势”。

这句话,放在计算机、互联网、移动互联网的开局时期,都是一样的道理。

我在一线互联网企业工作十余年里,指导过不少同行后辈。帮助很多人得到了学习和成长。

我意识到有很多经验和知识值得分享给大家,故此将并将重要的AI大模型资料包括AI大模型入门学习思维导图、精品AI大模型学习书籍手册、视频教程、实战学习等录播视频免费分享出来。【保证100%免费】🆓

CSDN粉丝独家福利

这份完整版的 AI 大模型学习资料已经上传CSDN,朋友们如果需要可以扫描下方二维码&点击下方CSDN官方认证链接免费领取 【保证100%免费】

读者福利: 👉👉CSDN大礼包:《最新AI大模型学习资源包》免费分享 👈👈

(👆👆👆安全链接,放心点击)

对于0基础小白入门:

如果你是零基础小白,想快速入门大模型是可以考虑的。

一方面是学习时间相对较短,学习内容更全面更集中。
二方面是可以根据这些资料规划好学习计划和方向。

👉1.大模型入门学习思维导图👈

要学习一门新的技术,作为新手一定要先学习成长路线图,方向不对,努力白费。

对于从来没有接触过AI大模型的同学,我们帮你准备了详细的学习成长路线图&学习规划。可以说是最科学最系统的学习路线,大家跟着这个大的方向学习准没问题。(全套教程文末领取哈)
在这里插入图片描述

👉2.AGI大模型配套视频👈

很多朋友都不喜欢晦涩的文字,我也为大家准备了视频教程,每个章节都是当前板块的精华浓缩。

在这里插入图片描述
在这里插入图片描述

👉3.大模型实际应用报告合集👈

这套包含640份报告的合集,涵盖了AI大模型的理论研究、技术实现、行业应用等多个方面。无论您是科研人员、工程师,还是对AI大模型感兴趣的爱好者,这套报告合集都将为您提供宝贵的信息和启示。(全套教程文末领取哈)

在这里插入图片描述

👉4.大模型落地应用案例PPT👈

光学理论是没用的,要学会跟着一起做,要动手实操,才能将自己的所学运用到实际当中去,这时候可以搞点实战案例来学习。(全套教程文末领取哈)

在这里插入图片描述

👉5.大模型经典学习电子书👈

随着人工智能技术的飞速发展,AI大模型已经成为了当今科技领域的一大热点。这些大型预训练模型,如GPT-3、BERT、XLNet等,以其强大的语言理解和生成能力,正在改变我们对人工智能的认识。 那以下这些PDF籍就是非常不错的学习资源。(全套教程文末领取哈)
img

在这里插入图片描述

👉6.大模型面试题&答案👈

截至目前大模型已经超过200个,在大模型纵横的时代,不仅大模型技术越来越卷,就连大模型相关的岗位和面试也开始越来越卷了。为了让大家更容易上车大模型算法赛道,我总结了大模型常考的面试题。(全套教程文末领取哈)

在这里插入图片描述
👉学会后的收获:👈
基于大模型全栈工程实现(前端、后端、产品经理、设计、数据分析等),通过这门课可获得不同能力;

能够利用大模型解决相关实际项目需求: 大数据时代,越来越多的企业和机构需要处理海量数据,利用大模型技术可以更好地处理这些数据,提高数据分析和决策的准确性。因此,掌握大模型应用开发技能,可以让程序员更好地应对实际项目需求;

基于大模型和企业数据AI应用开发,实现大模型理论、掌握GPU算力、硬件、LangChain开发框架和项目实战技能, 学会Fine-tuning垂直训练大模型(数据准备、数据蒸馏、大模型部署)一站式掌握;

能够完成时下热门大模型垂直领域模型训练能力,提高程序员的编码能力: 大模型应用开发需要掌握机器学习算法、深度学习

CSDN粉丝独家福利

这份完整版的 AI 大模型学习资料已经上传CSDN,朋友们如果需要可以扫描下方二维码&点击下方CSDN官方认证链接免费领取 【保证100%免费】

读者福利: 👉👉CSDN大礼包:《最新AI大模型学习资源包》免费分享 👈👈

(👆👆👆安全链接,放心点击)
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值