Large language models (LLMs) have revolutionized the way people interact with software applications. You can use natural language commands to instruct AI agents to perform specific tasks.

One of the most useful LLM applications is interacting with a database using natural language, as it allows non-technical users to issue commands and retrieve data from databases, which would otherwise require some database expertise.

In this tutorial, you’ll see how to develop a MySQL AI agent using the OpenAI GPT-4o model that allows you to interact with a MySQL database using natural language. We will utilize the LangGraph orchestration framework to develop the AI agent.

Installing and Importing Required Libraries

You need to install the following libraries to run scripts in this tutorial.

#LangGraph + LangChain + OpenAI + MySQL driver + SQLAlchemy
%pip install -U langgraph langgraph-prebuilt langchain_community "langchain[openai]" pymysql sqlalchemy

The script below imports the required libraries.

from langchain_community.utilities import SQLDatabase
from langchain_openai import ChatOpenAI
from langchain_community.agent_toolkits import SQLDatabaseToolkit
from langgraph.prebuilt import create_react_agent
from IPython.display import Image, display
from langchain_core.runnables.graph import CurveStyle, MermaidDrawMethod, NodeStyles

Creating a MySQL Connection and Retrieving DB Information

To execute sample codes, we store the dummy employees database on a local MySQL server.

The following script uses the LangChain SQLDatabase utility to retrieve information from the MySQL database.

We will pass this information on to our AI agent, which will then generate accurate MySQL queries based on natural language commands.

##database link
##https://github.com/datacharmer/test_db

db_uri = "mysql+pymysql://root:abcd123@localhost:3306/employees"

db = SQLDatabase.from_uri(db_uri)
print("Dialect:", db.dialect)
print("Tables :", db.get_usable_table_names()[:6], "...")

Output:

Dialect: mysql
Tables : ['departments', 'dept_emp', 'dept_manager', 'employees', 'salaries', 'titles'] ...

Code More, Distract Less: Support Our Ad-Free Site

You might have noticed we removed ads from our site - we hope this enhances your learning experience. To help sustain this, please take a look at our Python Developer Kit and our comprehensive cheat sheets. Each purchase directly supports this site, ensuring we can continue to offer you quality, distraction-free tutorials.


Importing LangChain Database Tools

LangChain’s SQLDatabaseToolkit contains tools that allow you to generate a SQL query from natural language, verify the query’s accuracy, retrieve the database schema, and list database tables. These tools are used by LangGraph AI agents while retrieving information from a database.

The following script imports these tools into your application and prints their descriptions. Notice the placeholder below where you’ll need to enter your own OpenAI API key.

openai_api_key = "YOUR_OPENAI_API_KEY"
llm = ChatOpenAI(api_key = openai_api_key,
             temperature = 0,
            model_name = "gpt-4o")
toolkit = SQLDatabaseToolkit(db=db, llm=llm)

tools = toolkit.get_tools()

for tool in tools:
    print(f"{tool.name}: {tool.description}\n")

Output:

langchain SQL tools description

Creating MySQL Agent using LangGraph Default ReAct Agent

Creating an agent in LangGraph is straightforward. You can use LangGraph’s default ReAct agent, which, given an input and a prompt, automatically selects and executes various tools to generate the best possible response.

The following script creates a MySQL ReAct agent in LangGraph. The prompt informs the Agent of the type of input it will receive and the steps it should take to generate a response.

from langgraph.prebuilt import create_react_agent

system_prompt = """
You are an agent designed to interact with a SQL database.
Given an input question, create a syntactically correct {dialect} query to run,
then look at the results of the query and return the answer. Unless the user
specifies a specific number of examples they wish to obtain, always limit your
query to at most {top_k} results.

You can order the results by a relevant column to return the most interesting
examples in the database. Never query for all the columns from a specific table,
only ask for the relevant columns given the question.

You MUST double check your query before executing it. If you get an error while
executing a query, rewrite the query and try again.

DO NOT make any DML statements (INSERT, UPDATE, DELETE, DROP etc.) to the
database.

To start you should ALWAYS look at the tables in the database to see what you
can query. Do NOT skip this step.

Then you should query the schema of the most relevant tables.
""".format(
    dialect=db.dialect,
    top_k=5,
)

agent = create_react_agent(
    llm,
    tools,
    prompt=system_prompt,
)

display(Image(agent.get_graph().draw_mermaid_png()))

Output:

MySQL langgraph default react agent

Let’s test the AI agent by asking a question. The typo (employes) in the question below is intentional. The output will show that our MySQL agent is capable of comprehending natural language queries, even if they contain minor errors.

question = "Give me the names of the top three employes with highest salaries"

for step in Agent.stream(
    {"messages": [{"role": "user", "content": question}]},
    stream_mode="values",
):
    step["messages"][-1].pretty_print()

Output:

The following output displays the flow of messages and corresponding tool calls executed by the MySQL agent to generate the final response.

The agent first called the sql_db_list_tables tools as mentioned in the prompt, which returned all the table names. It then called the sql_db_schema tool to retrieve the scheme of the relevant tables, which are employees, and salaries tables in our case. The agent generates a query and then calls the sql_db_query_checker tool to see if the query is correct. Finally, it calls the sql_db_query tool to execute the query. You can see the names of the top three employees with highest salaries.

================================ Human Message =================================

Give me the names of the top three employes with highest salaries
================================== Ai Message ==================================
Tool Calls:
  sql_db_list_tables (call_bzgBY6HeV69vEhUWyRdRjHHU)
 Call ID: call_bzgBY6HeV69vEhUWyRdRjHHU
  Args:
================================= Tool Message =================================
Name: sql_db_list_tables

departments, dept_emp, dept_manager, employees, salaries, titles
================================== Ai Message ==================================
Tool Calls:
  sql_db_schema (call_TcCnyCqrR6elKtSSmm2WhsNc)
 Call ID: call_TcCnyCqrR6elKtSSmm2WhsNc
  Args:
    table_names: employees, salaries
================================= Tool Message =================================
Name: sql_db_schema


CREATE TABLE employees (
	emp_no INTEGER NOT NULL,
	birth_date DATE NOT NULL,
	first_name VARCHAR(14) NOT NULL,
	last_name VARCHAR(16) NOT NULL,
	gender ENUM('M','F') NOT NULL,
	hire_date DATE NOT NULL,
	PRIMARY KEY (emp_no)
)DEFAULT CHARSET=utf8mb4 COLLATE utf8mb4_0900_ai_ci ENGINE=InnoDB

/*
3 rows from employees table:
emp_no	birth_date	first_name	last_name	gender	hire_date
10001	1953-09-02	Georgi	Facello	M	1986-06-26
10002	1964-06-02	Bezalel	Simmel	F	1985-11-21
10003	1959-12-03	Parto	Bamford	M	1986-08-28
*/


CREATE TABLE salaries (
	emp_no INTEGER NOT NULL,
	salary INTEGER NOT NULL,
	from_date DATE NOT NULL,
	to_date DATE NOT NULL,
	PRIMARY KEY (emp_no, from_date),
	CONSTRAINT salaries_ibfk_1 FOREIGN KEY(emp_no) REFERENCES employees (emp_no) ON DELETE CASCADE
)DEFAULT CHARSET=utf8mb4 COLLATE utf8mb4_0900_ai_ci ENGINE=InnoDB

/*
3 rows from salaries table:
emp_no	salary	from_date	to_date
10001	60117	1986-06-26	1987-06-26
10001	62102	1987-06-26	1988-06-25
10001	66074	1988-06-25	1989-06-25
*/
================================== Ai Message ==================================
Tool Calls:
  sql_db_query_checker (call_F2w7ubGL6p8yNjyzuxAps09Z)
 Call ID: call_F2w7ubGL6p8yNjyzuxAps09Z
  Args:
    query: SELECT e.first_name, e.last_name, s.salary FROM employees e JOIN salaries s ON e.emp_no = s.emp_no ORDER BY s.salary DESC LIMIT 3;
================================= Tool Message =================================
Name: sql_db_query_checker

```sql
SELECT e.first_name, e.last_name, s.salary FROM employees e JOIN salaries s ON e.emp_no = s.emp_no ORDER BY s.salary DESC LIMIT 3;
```
================================== Ai Message ==================================
Tool Calls:
  sql_db_query (call_5GYtmevKF2TyQJXcGfEH8ZzM)
 Call ID: call_5GYtmevKF2TyQJXcGfEH8ZzM
  Args:
    query: SELECT e.first_name, e.last_name, s.salary FROM employees e JOIN salaries s ON e.emp_no = s.emp_no ORDER BY s.salary DESC LIMIT 3;
================================= Tool Message =================================
Name: sql_db_query

[('Tokuyasu', 'Pesch', 158220), ('Tokuyasu', 'Pesch', 157821), ('Honesty', 'Mukaidono', 156286)]
================================== Ai Message ==================================

The top three employees with the highest salaries are:

1. Tokuyasu Pesch with a salary of 158,220
2. Tokuyasu Pesch with a salary of 157,821
3. Honesty Mukaidono with a salary of 156,286

The LangGraph ReAct agent provides the thought-action-observation loop out of the box. However, to have more control over tool execution, you can customize a LangGraph AI agent, as shown in the following section.


Code More, Distract Less: Support Our Ad-Free Site

You might have noticed we removed ads from our site - we hope this enhances your learning experience. To help sustain this, please take a look at our Python Developer Kit and our comprehensive cheat sheets. Each purchase directly supports this site, ensuring we can continue to offer you quality, distraction-free tutorials.


Creating a Customized MySQL Agent with LangGraph

To create a customized MySQL agent, we will first define two high-level tools: get_schema_tool, which internally calls the sql_db_schema tool, and run_query_tool, which calls the sql_db_query tool from the SQLDatabaseToolkit.

from typing import Literal
from langchain_core.messages import AIMessage
from langchain_core.runnables import RunnableConfig
from langgraph.graph import END, START, MessagesState, StateGraph
from langgraph.prebuilt import ToolNode
get_schema_tool = next(tool for tool in tools if tool.name == "sql_db_schema")
get_schema_node = ToolNode([get_schema_tool], name="get_schema")

run_query_tool = next(tool for tool in tools if tool.name == "sql_db_query")
run_query_node = ToolNode([run_query_tool], name="run_query")

We want to force the agent to always call the sql_db_list_tables tool first. To achieve this, we define the list_tables method, which we will call from the first node in our graph.

def list_tables(state: MessagesState):
    tool_call = {
        "name": "sql_db_list_tables",
        "args": {},
        "id": "abc123",
        "type": "tool_call",
    }
    tool_call_message = AIMessage(content="", tool_calls=[tool_call])

    list_tables_tool = next(tool for tool in tools if tool.name == "sql_db_list_tables")
    tool_message = list_tables_tool.invoke(tool_call)
    response = AIMessage(f"Available tables: {tool_message.content}")

    return {"messages": [tool_call_message, tool_message, response]}

Next, we define the call_get_schema function, which calls the get_schema_tool that we defined earlier. This will be the second node in our graph.

def call_get_schema(state: MessagesState):

    llm_with_tools = llm.bind_tools([get_schema_tool], tool_choice="any")
    response = llm_with_tools.invoke(state["messages"])

    return {"messages": [response]}

Finally we define the prompts for nodes that generate a query and check if the query is valid.

generate_query_system_prompt = """
You are an agent designed to interact with a SQL database.
Given an input question, create a syntactically correct {dialect} query to run,
then look at the results of the query and return the answer. Unless the user
specifies a specific number of examples they wish to obtain, always limit your
query to at most {top_k} results.

You can order the results by a relevant column to return the most interesting
examples in the database. Never query for all the columns from a specific table,
only ask for the relevant columns given the question.

DO NOT make any DML statements (INSERT, UPDATE, DELETE, DROP etc.) to the database.
""".format(
    dialect=db.dialect,
    top_k=5,
)


def generate_query(state: MessagesState):
    system_message = {
        "role": "system",
        "content": generate_query_system_prompt,
    }

    llm_with_tools = llm.bind_tools([run_query_tool])
    response = llm_with_tools.invoke([system_message] + state["messages"])

    return {"messages": [response]}
check_query_system_prompt = """
You are a SQL expert with a strong attention to detail.
Double check the {dialect} query for common mistakes, including:
- Using NOT IN with NULL values
- Using UNION when UNION ALL should have been used
- Using BETWEEN for exclusive ranges
- Data type mismatch in predicates
- Properly quoting identifiers
- Using the correct number of arguments for functions
- Casting to the correct data type
- Using the proper columns for joins

If there are any of the above mistakes, rewrite the query. If there are no mistakes,
just reproduce the original query.

You will call the appropriate tool to execute the query after running this check.
""".format(dialect=db.dialect)

def check_query(state: MessagesState):
    system_message = {
        "role": "system",
        "content": check_query_system_prompt,
    }

    # Generate an artificial user message to check
    tool_call = state["messages"][-1].tool_calls[0]
    user_message = {"role": "user", "content": tool_call["args"]["query"]}
    llm_with_tools = llm.bind_tools([run_query_tool], tool_choice="any")
    response = llm_with_tools.invoke([system_message, user_message])
    response.id = state["messages"][-1].id

    return {"messages": [response]}

We also define the should_continue function, which checks if a query is accurate or not. If it is accurate, it is sent to the run_query node; otherwise, the query is generated again.

def should_continue(state: MessagesState) -> Literal[END, "check_query"]:
    messages = state["messages"]
    last_message = messages[-1]
    if not last_message.tool_calls:
        return END
    else:
        return "check_query"

The next step is to define the LangGraph with the node functions we defined. The graph executes in sequence, following the order of the nodes in the graph.

builder = StateGraph(MessagesState)
builder.add_node(list_tables)
builder.add_node(call_get_schema)
builder.add_node(get_schema_node, "get_schema")
builder.add_node(generate_query)
builder.add_node(check_query)
builder.add_node(run_query_node, "run_query")

builder.add_edge(START, "list_tables")
builder.add_edge("list_tables", "call_get_schema")
builder.add_edge("call_get_schema", "get_schema")
builder.add_edge("get_schema", "generate_query")
builder.add_conditional_edges(
    "generate_query",
    should_continue,
)
builder.add_edge("check_query", "run_query")
builder.add_edge("run_query", "generate_query")

agent = builder.compile()

display(Image(agent.get_graph().draw_mermaid_png()))

Output:

MySQL langgraph custom react agent

The above script shows our final AI agent graph.

Let’s ask a question and see the outputs.

question = "Give me the names of the top three employes with highest salaries"

for step in Agent.stream(
    {"messages": [{"role": "user", "content": question}]},
    stream_mode="values",
):
    step["messages"][-1].pretty_print()

Output:

The output below shows that the graph executes the nodes in the same order as they are added in the graph.

================================ Human Message =================================

Give me the names of the top three employes with highest salaries
================================== Ai Message ==================================

Available tables: departments, dept_emp, dept_manager, employees, salaries, titles
================================== Ai Message ==================================
Tool Calls:
  sql_db_schema (call_Q9g0WWat9Ti4frZXnNVYFBpq)
 Call ID: call_Q9g0WWat9Ti4frZXnNVYFBpq
  Args:
    table_names: employees, salaries
================================= Tool Message =================================
Name: sql_db_schema


CREATE TABLE employees (
	emp_no INTEGER NOT NULL,
	birth_date DATE NOT NULL,
	first_name VARCHAR(14) NOT NULL,
	last_name VARCHAR(16) NOT NULL,
	gender ENUM('M','F') NOT NULL,
	hire_date DATE NOT NULL,
	PRIMARY KEY (emp_no)
)DEFAULT CHARSET=utf8mb4 COLLATE utf8mb4_0900_ai_ci ENGINE=InnoDB

/*
3 rows from employees table:
emp_no	birth_date	first_name	last_name	gender	hire_date
10001	1953-09-02	Georgi	Facello	M	1986-06-26
10002	1964-06-02	Bezalel	Simmel	F	1985-11-21
10003	1959-12-03	Parto	Bamford	M	1986-08-28
*/


CREATE TABLE salaries (
	emp_no INTEGER NOT NULL,
	salary INTEGER NOT NULL,
	from_date DATE NOT NULL,
	to_date DATE NOT NULL,
	PRIMARY KEY (emp_no, from_date),
	CONSTRAINT salaries_ibfk_1 FOREIGN KEY(emp_no) REFERENCES employees (emp_no) ON DELETE CASCADE
)DEFAULT CHARSET=utf8mb4 COLLATE utf8mb4_0900_ai_ci ENGINE=InnoDB

/*
3 rows from salaries table:
emp_no	salary	from_date	to_date
10001	60117	1986-06-26	1987-06-26
10001	62102	1987-06-26	1988-06-25
10001	66074	1988-06-25	1989-06-25
*/
================================== Ai Message ==================================
Tool Calls:
  sql_db_query (call_OEqSCKEeVU5NRM7VXdhXkVVp)
 Call ID: call_OEqSCKEeVU5NRM7VXdhXkVVp
  Args:
    query: SELECT e.first_name, e.last_name, s.salary
FROM employees e
JOIN salaries s ON e.emp_no = s.emp_no
ORDER BY s.salary DESC
LIMIT 3;
================================== Ai Message ==================================
Tool Calls:
  sql_db_query (call_v0sSVGHXcF81WbRCFTxmGRND)
 Call ID: call_v0sSVGHXcF81WbRCFTxmGRND
  Args:
    query: SELECT e.first_name, e.last_name, s.salary
FROM employees e
JOIN salaries s ON e.emp_no = s.emp_no
ORDER BY s.salary DESC
LIMIT 3;
================================= Tool Message =================================
Name: sql_db_query

[('Tokuyasu', 'Pesch', 158220), ('Tokuyasu', 'Pesch', 157821), ('Honesty', 'Mukaidono', 156286)]
================================== Ai Message ==================================

The top three employees with the highest salaries are:

1. Tokuyasu Pesch with a salary of 158,220
2. Tokuyasu Pesch with a salary of 157,821
3. Honesty Mukaidono with a salary of 156,286

The final output displays the same response as was observed with the ReAct agent. However, in this case, you have much more control over the execution of steps in the Graph.


Code More, Distract Less: Support Our Ad-Free Site

You might have noticed we removed ads from our site - we hope this enhances your learning experience. To help sustain this, please take a look at our Python Developer Kit and our comprehensive cheat sheets. Each purchase directly supports this site, ensuring we can continue to offer you quality, distraction-free tutorials.