Kuzu
Kùzu is an embeddable, scalable, extremely fast graph database. It is permissively licensed with an MIT license, and you can see its source code here.
Key characteristics of Kùzu:
- Performance and scalability: Implements modern, state-of-the-art join algorithms for graphs.
- Usability: Very easy to set up and get started with, as there are no servers (embedded architecture).
- Interoperability: Can conveniently scan and copy data from external columnar formats, CSV, JSON and relational databases.
- Structured property graph model: Implements the property graph model, with added structure.
- Cypher support: Allows convenient querying of the graph in Cypher, a declarative query language.
Get started with Kùzu by visiting their documentation.
Setting up
Kùzu is an embedded database (it runs in-process), so there are no servers to manage. Install the following dependencies to get started:
pip install -U langchain-kuzu langchain-openai langchain-experimental
This installs Kùzu along with the LangChain integration for it, as well as the OpenAI Python package so that we can use OpenAI's LLMs. If you want to use other LLM providers, you can install their respective Python packages that come with LangChain.
Here's how you would first create a Kùzu database on your local machine and connect to it:
import kuzu
db = kuzu.Database("test_db")
conn = kuzu.Connection(db)
Create KuzuGraph
Kùzu's integration with LangChain makes it convenient to create and update graphs from unstructured text, and also to query graphs via a Text2Cypher pipeline that utilizes the
power of LangChain's LLM chains. To begin, we create a KuzuGraph
object that uses the database object we created above in combination with the KuzuGraph
constructor.
from langchain_kuzu.graphs.kuzu_graph import KuzuGraph
graph = KuzuGraph(db, allow_dangerous_requests=True)
Say we want to transform the following text into a graph:
text = "Tim Cook is the CEO of Apple. Apple has its headquarters in California."
We will make use of LLMGraphTransformer
to use an LLM to extract nodes and relationships from the text.
To make the graph more useful, we will define the following schema, such that the LLM will only
extract nodes and relationships that match the schema.
# Define schema
allowed_nodes = ["Person", "Company", "Location"]
allowed_relationships = [
("Person", "IS_CEO_OF", "Company"),
("Company", "HAS_HEADQUARTERS_IN", "Location"),
]
The LLMGraphTransformer
class provides a convenient way to convert the text into a list of graph documents.
from langchain_core.documents import Document
from langchain_experimental.graph_transformers import LLMGraphTransformer
from langchain_openai import ChatOpenAI
# Define the LLMGraphTransformer
llm_transformer = LLMGraphTransformer(
llm=ChatOpenAI(model="gpt-4o-mini", temperature=0, api_key=OPENAI_API_KEY), # noqa: F821
allowed_nodes=allowed_nodes,
allowed_relationships=allowed_relationships,
)
documents = [Document(page_content=text)]
graph_documents = llm_transformer.convert_to_graph_documents(documents)
graph_documents[:2]
[GraphDocument(nodes=[Node(id='Tim Cook', type='Person', properties={}), Node(id='Apple', type='Company', properties={}), Node(id='California', type='Location', properties={})], relationships=[Relationship(source=Node(id='Tim Cook', type='Person', properties={}), target=Node(id='Apple', type='Company', properties={}), type='IS_CEO_OF', properties={}), Relationship(source=Node(id='Apple', type='Company', properties={}), target=Node(id='California', type='Location', properties={}), type='HAS_HEADQUARTERS_IN', properties={})], source=Document(metadata={}, page_content='Tim Cook is the CEO of Apple. Apple has its headquarters in California.'))]
We can then call the above defined KuzuGraph
object's add_graph_documents
method to ingest the graph documents into the Kùzu database.
The include_source
argument is set to True
so that we also create relationships between each entity node and the source document that it came from.
# Add the graph document to the graph
graph.add_graph_documents(
graph_documents,
include_source=True,
)
Creating KuzuQAChain
To query the graph via a Text2Cypher pipeline, we can define a KuzuQAChain
object. Then, we can invoke the chain with a query by connecting to the existing database that's stored in the test_db
directory defined above.
from langchain_kuzu.chains.graph_qa.kuzu import KuzuQAChain
# Create the KuzuQAChain with verbosity enabled to see the generated Cypher queries
chain = KuzuQAChain.from_llm(
llm=ChatOpenAI(model="gpt-4o-mini", temperature=0.3, api_key=OPENAI_API_KEY), # noqa: F821
graph=graph,
verbose=True,
allow_dangerous_requests=True,
)
Note that we set a temperature that's slightly higher than zero to avoid the LLM being overly concise in its response.
Let's ask some questions using the QA chain.
chain.invoke("Who is the CEO of Apple?")
[1m> Entering new KuzuQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (p:Person)-[:IS_CEO_OF]->(c:Company {id: 'Apple'}) RETURN p[0m
Full Context:
[32;1m[1;3m[{'p': {'_id': {'offset': 0, 'table': 1}, '_label': 'Person', 'id': 'Tim Cook', 'type': 'entity'}}][0m
[1m> Finished chain.[0m
{'query': 'Who is the CEO of Apple?',
'result': 'Tim Cook is the CEO of Apple.'}
chain.invoke("Where is Apple headquartered?")
[1m> Entering new KuzuQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (c:Company {id: 'Apple'})-[:HAS_HEADQUARTERS_IN]->(l:Location) RETURN l[0m
Full Context:
[32;1m[1;3m[{'l': {'_id': {'offset': 0, 'table': 2}, '_label': 'Location', 'id': 'California', 'type': 'entity'}}][0m
[1m> Finished chain.[0m
{'query': 'Where is Apple headquartered?',
'result': 'Apple is headquartered in California.'}
Refresh graph schema
If you mutate or update the graph, you can inspect the refreshed schema information that's used by the Text2Cypher chain to generate Cypher statements.
You don't need to manually call refresh_schema()
each time as it's called automatically when you invoke the chain.
graph.refresh_schema()
print(graph.get_schema)
Node properties: [{'properties': [('id', 'STRING'), ('type', 'STRING')], 'label': 'Person'}, {'properties': [('id', 'STRING'), ('type', 'STRING')], 'label': 'Location'}, {'properties': [('id', 'STRING'), ('text', 'STRING'), ('type', 'STRING')], 'label': 'Chunk'}, {'properties': [('id', 'STRING'), ('type', 'STRING')], 'label': 'Company'}]
Relationships properties: [{'properties': [], 'label': 'HAS_HEADQUARTERS_IN'}, {'properties': [('label', 'STRING'), ('triplet_source_id', 'STRING')], 'label': 'MENTIONS_Chunk_Person'}, {'properties': [('label', 'STRING'), ('triplet_source_id', 'STRING')], 'label': 'MENTIONS_Chunk_Location'}, {'properties': [], 'label': 'IS_CEO_OF'}, {'properties': [('label', 'STRING'), ('triplet_source_id', 'STRING')], 'label': 'MENTIONS_Chunk_Company'}]
Relationships: ['(:Company)-[:HAS_HEADQUARTERS_IN]->(:Location)', '(:Chunk)-[:MENTIONS_Chunk_Person]->(:Person)', '(:Chunk)-[:MENTIONS_Chunk_Location]->(:Location)', '(:Person)-[:IS_CEO_OF]->(:Company)', '(:Chunk)-[:MENTIONS_Chunk_Company]->(:Company)']
Use separate LLMs for Cypher and answer generation
You can specify cypher_llm
and qa_llm
separately to use different LLMs for Cypher generation and answer generation.
chain = KuzuQAChain.from_llm(
cypher_llm=ChatOpenAI(temperature=0, model="gpt-4o-mini"),
qa_llm=ChatOpenAI(temperature=0, model="gpt-4"),
graph=graph,
verbose=True,
allow_dangerous_requests=True,
)
chain.invoke("Who is the CEO of Apple?")
[1m> Entering new KuzuQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (p:Person)-[:IS_CEO_OF]->(c:Company {id: 'Apple'}) RETURN p.id, p.type[0m
Full Context:
[32;1m[1;3m[{'p.id': 'Tim Cook', 'p.type': 'entity'}][0m
[1m> Finished chain.[0m
{'query': 'Who is the CEO of Apple?',
'result': 'Tim Cook is the CEO of Apple.'}