Tomáš Repčík - 7. 9. 2025

Simple yet Powerful Local AI Setup with Pydantic AI

A guide to setting up a local AI environment using Pydantic AI

Pydantic AI setup

I program a lot of mobile apps, and most of them are in languages like Kotlin, Swift, or Dart.

However, I often find myself needing to create backend services or scripts to support these apps.

For these tasks, I prefer using Python due to its simplicity and the vast landscape of libraries available, basically for everything.

However, since I use type safety heavily in my mobile development, I want to maintain that same level of type safety in my Python code.

That is when I discovered Pydantic, a library that allows me to define data models with type annotations, similar to what I use in Dart or Kotlin.

I just enjoy working with it because a lot of dynamic nonsense is taken care of.

I can:

Define data models with type annotations
Validate data automatically
Serialize and deserialize data easily
It integrates well with other frameworks

Now Pydantic has released Pydantic AI with version 1.x.x.

And guess what? It works with all the Pydantic features I enjoy to using!

Suddenly, it all clicks together.

You can send Pydantic models to your backend defined with FastAPI (with Pydantic support), pass them to the AI model and get the response back as a Pydantic model.

I want to show you how to set up the simplest local AI environment using Pydantic.

This is a very basic example, but you can build on top of it and create something more complex. Only limitation is your creativity and maybe your computer resources, if you run models locally.

Setting up the environment

To install the required packages, you can use uv:

uv add fastapi uvicorn pydantic-ai pydantic

If you do not use uv, you can use pip directly (but definitely consider using uv, it is much better):

pip install fastapi uvicorn pydantic-ai pydantic

Also, install Ollama from ollama.com.

Browse available models on Ollama Models.

And run it in the terminal with:

ollama pull <model-name>
ollama run <model-name>

If you have it, let’s move to the code.

Settings

First, let’s create a configuration system using Pydantic Settings. This will manage all our application configuration with type safety and environment variable support.

Why? Managing configuration in a structured way helps prevent errors and makes it easier to understand the application’s behavior. Moreover, you can then set the configuration at runtime via environment variables or .env files at deployment.

Create a config.py file:

from pydantic import BaseModel, Field
from pydantic_settings import BaseSettings, SettingsConfigDict


class Settings(BaseSettings):
    ollama_base_url: str = Field(
        default="http://localhost:11434", description="Ollama API base URL"
    )
    ollama_model: str = Field(default="gemma3:270m", description="Ollama model to use")

    api_host: str = Field(default="0.0.0.0", description="API host")
    api_port: int = Field(default=8000, description="API port")
    api_reload: bool = Field(
        default=True, description="Enable auto-reload in development"
    )

    log_level: str = Field(default="info", description="Logging level")

    model_config = SettingsConfigDict(env_file=".env", env_file_encoding="utf-8")


settings = Settings()

I like to use gemma3:270m model, because it is one of the smallest models and it is fast for testing.

This configuration setup provides:

Type Safety: All settings are typed and validated
Default Values: Sensible defaults for development
Environment Variables: Can be overridden via .env file or environment variables
Documentation: Field descriptions for each setting

You can override any setting by creating a .env file:

OLLAMA_MODEL=llama3.2
API_PORT=8080
LOG_LEVEL=debug

The magic is done by pydantic-settings, which loads the environment variables automatically. Pay attention to the SettingsConfigDict in the model_config.

📧 Get more content like this in your inbox

Pydantic Models

For simplicity, the app will use only two models, one for input and one for output.

from pydantic import BaseModel, Field
from typing import Optional

class AIRequest(BaseModel):

    message: str = Field(
        ..., description="The user message to send to the AI", min_length=1
    )
    instructions: Optional[str] = Field(
        default=None, description="Optional instructions to guide the AI's behavior"
    )


class AIResponse(BaseModel):

    response: str = Field(..., description="The AI's response")
    model: str = Field(..., description="The model used to generate the response")

BaseModel is the core of Pydantic, and it provides all the features you need for data validation and serialization.

Pydantic AI

Now, let’s create a simple AI service using Pydantic AI. This service will handle interactions with the Ollama model.

import logging
from typing import Optional

from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIChatModel
from pydantic_ai.providers.ollama import OllamaProvider

from config import settings

logger = logging.getLogger(__name__)

class AIService:
    def __init__(self):
        # 1. initialization
        self.model = OpenAIChatModel(
            model_name=settings.ollama_model,
            provider=OllamaProvider(base_url=f"{settings.ollama_base_url}/v1"),
        )

        # 2. create agent with instructions
        self.agent = Agent(
            model=self.model,
            instructions="You are a helpful AI assistant. Provide clear, concise, and accurate responses.",
        )

        logger.info(f"AI Service initialized with model: {settings.ollama_model}")

    async def generate_response(
        self, message: str, system_prompt: Optional[str] = None
    ) -> str:
        try:
            # 3. generate response
            agent = self.agent
            if system_prompt:
                agent = Agent(model=self.model, instructions=system_prompt)

            result = await agent.run(message)
            return result.output

        except Exception as e:
            logger.error(f"Failed to generate AI response: {str(e)}")
            raise Exception(f"AI service error: {str(e)}")

# 4. instantiate the service
ai_service = AIService()

AI service with Pydantic AI is quite straightforward (as I like to say, it just works):

1. Initialization: We initialize the OpenAIChatModel with the Ollama provider, using the base URL and model name from our settings.
- OpenAIChatModel is not a mistake, Ollama and other providers implement the same interface as OpenAI, so it is an umbrella implementation for many models.
- Feel free to swap it with other models and providers available at Pydantic AI Providers
1. Agent Creation: We create an Agent with the model and provide it
- The agent is configured with instructions that guide its behavior.
- Here you can extend it with tools, history, required output formats (with Pydantic models, of course) and many more settings
1. Response Generation: The generate_response method takes a user message
- It can be synchronous or asynchronous - for synchronous, just remove await and use run_sync
1. Instantiation: Finally, we instantiate the AIService so it can be used elsewhere in the application.

Putting it together with FastAPI

With these two components in place, we can merge them together in a FastAPI application.

import logging
from contextlib import asynccontextmanager

from fastapi import FastAPI, HTTPException

from ai_service import ai_service
from config import settings
from models import AIRequest, AIResponse

# 1. logging config
logging.basicConfig(level=settings.log_level.upper())
logger = logging.getLogger(__name__)


# 2. FastAPI app with lifespan
@asynccontextmanager
async def lifespan(app: FastAPI):
    logger.info("Starting FastAPI application with Pydantic AI")
    logger.info(f"Using Ollama at: {settings.ollama_base_url}")
    logger.info(f"Using model: {settings.ollama_model}")
    yield
    logger.info("Shutting down FastAPI application")

# 3. create FastAPI app
app = FastAPI(
    title="FastAPI Pydantic AI Example",
    description="A simple FastAPI application with Pydantic AI integration",
    version="0.1.0",
    lifespan=lifespan,
)

# 4. basic root endpoint
@app.get("/")
async def root():
    return {
        "message": "FastAPI Pydantic AI Example",
        "version": "0.1.0",
        "docs": "/docs",
    }

# 5. AI endpoint
@app.post("/ai", response_model=AIResponse)
async def ai_response(request: AIRequest):
    try:
        response = await ai_service.generate_response(
            message=request.message, system_prompt=request.instructions
        )

        return AIResponse(response=response, model=settings.ollama_model)
    except Exception as e:
        logger.error(f"AI request failed: {str(e)}")
        raise HTTPException(
            status_code=500, detail=f"Failed to generate AI response: {str(e)}"
        )

# 6. run the app
if __name__ == "__main__":
    import uvicorn

    uvicorn.run(
        "main:app",
        host=settings.api_host,
        port=settings.api_port,
        reload=settings.api_reload,
        log_level=settings.log_level,
    )

This FastAPI application includes:

1. Logging Configuration: Sets up logging based on the configured log level.
1. Lifespan Management: Uses an async context manager to log startup and shutdown events.
1. FastAPI App Creation: Initializes the FastAPI app with metadata and the lifespan manager.
- here is the place to add initializers, if you need to connect to databases, caches, or other services (for simplicity, we use singletons)
1. Root Endpoint: A simple GET endpoint to verify the service is running.
1. AI Endpoint: A POST endpoint that accepts an AIRequest, generates a response using the AIService, and returns an AIResponse.
- as you can see, it is super simple to use the AI service here - no complex handling, just call the method and get the result
1. App Runner: Runs the FastAPI app with Uvicorn, using settings for host, port, reload, and log level.

Testing the API

You can test the API using curl or any API client like Postman or Bruno.

Part of the example is the Bruno collection, which you can import and test it directly.

It is enough to send a POST request to http://localhost:8000/ai with a JSON body like:

{
  "message": "Hello, how are you?",
  "instructions": "Respond in a friendly manner."
}

Conclusion

This setup provides a simple yet powerful way to integrate AI capabilities into your FastAPI applications using Pydantic for type safety and data validation.

I will definitely explore more features of Pydantic AI and see how it can enhance my projects.

What to explore:

Tool calling
History management
Custom output models
More complex agents
MCP servers
many more at Pydantic AI Documentation

Whole repository with code: GitHub - tomasrepcik/fastapi-pydantic-ai-example

Socials

Thanks for reading this article!

For more content like this, follow me here or on X or LinkedIn.

Subscribe for more