Data integrity is the backbone of effective software, and data validation is the guardian of that integrity. This guide explores data validation in Python, focusing on Cerberus — a lightweight, adaptable library. Our goal is to demonstrate how Cerberus allows you to validate data against a variety of rules and schemas. We’ll start with Cerberus basics, covering installation, schemas, and error handling. Then, we’ll dive into its rule set with examples. Lastly, we’ll unveil Cerberus customization, showing you how to extend its functionality. By the end, you’ll be equipped to validate your own data in your own projects, ensuring your data meets the highest quality and integrity standards in your Python applications.

Cerberus Basics

Cerberus is a powerful Python library designed for data validation and sanitization. It offers a straightforward way to define and enforce validation rules on complex data structures like dictionaries or JSON documents. In this section, we’ll explore the basics of Cerberus, including installation and how to get started.

Installation

You can install Cerberus using pip, Python’s package manager. Open your terminal or command prompt and run the following command:

pip install cerberus

Get Our Python Developer Kit for Free

I put together a Python Developer Kit with over 100 pre-built Python scripts covering data structures, Pandas, NumPy, Seaborn, machine learning, file processing, web scraping and a whole lot more - and I want you to have it for free. Enter your email address below and I'll send a copy your way.

Yes, I'll take a free Python Developer Kit

Getting Started with Cerberus

Now that you have Cerberus installed, let’s dive into how to use it. There are three steps to validating your data with Cerberus:

  1. Create a data validator schema.
  2. Validate your data against the schema.
  3. Handle errors to correct your data.

Creating a Validator and Schema

To begin validating data with Cerberus, you’ll need to create a validator object and define a validation schema. The schema defines the rules for validating your data. Here’s a quick example:

from cerberus import Validator

#Define a validation schema
schema = {
    'name': {'type': 'string', 'minlength': 3, 'maxlength': 20},
    'age': {'type': 'integer', 'min': 18},
    'email': {'type': 'string', 'regex': r'^\S+@\S+\.\S+$'},
}

#Create a Validator instance with the schema
validator = Validator(schema)

The above code defines a validation schema in the form of a dictionary that includes rules for three fields: name, age, and email. For example, it specifies that name should be a string with a minimum length of 3 characters and a maximum length of 20 characters, age should be an integer greater than or equal to 18, and email should be a string that matches a specific regular expression pattern for email addresses.

After defining our schema, we create a validator instance called validator by passing the schema to it.

Validating Data

With the validator and schema in place, you can now use the validate() method to check if a data dictionary conforms to the schema:

data = {
    'name': 'Usman Malik',
    'age': 33,
    'email': 'johndoe@example.com'
}

#Validate the data
if validator.validate(data):
    print("Data is valid!")
else:
    print("Validation errors:")
    print(validator.errors)

Output:

Data is valid!

The output shows that the dictionary is valid.

Handing Validation Errors

If validation fails, Cerberus provides a detailed report of errors. You can access and handle these errors programmatically to provide meaningful feedback to users or take appropriate actions based on the validation results. For example, the dictionary in the following script is not validated by the schema we defined and the validation error is printed on the console.

data = {
    'name': 'Usman Malik',
    'age': 33,
    'email': 'johndoeexample.com'
}

#Validate the data
if validator.validate(data):
    print("Data is valid!")
else:
    print("Validation errors:")
    print(validator.errors)

Output:

Validation errors:
{'email': ["value does not match regex '^\\S+@\\S+\\.\\S+$'"]}

Cerberus Rules

In the previous section, we got acquainted with the basics of Cerberus, setting up a validator and defining a validation schema. Now, let’s dive deeper into some of the rules that Cerberus supports for validating and shaping your data.

Types of Rules

Cerberus offers a rich set of rules that you can apply to your validation schema. Here are some of the key rules you can use:

Type Rule

The type rule specifies the expected data type for a field. For example:

'score': {'type': 'integer'}

You have already seen the type rule in action when we set up our schema bounding the age to integers above 18.

Required Rule

The required rule ensures that a field must be present in the data. For example:

'email': {'type': 'string', 'required': True}

Empty Rule

The empty rule allows you to specify whether a field can be empty or not. You can set it to True or False. For example:

'comments': {'type': 'string', 'empty': False}

Min and Max Rules

The min and max rules set minimum and maximum constraints on numeric values. For example:

'age': {'type': 'integer', 'min': 18, 'max': 99}

Regex Rule

The regex rule lets you validate a field using a regular expression pattern. For example:

'zipcode': {'type': 'string', 'regex': r'^\d{5}$'}

Get Our Python Developer Kit for Free

I put together a Python Developer Kit with over 100 pre-built Python scripts covering data structures, Pandas, NumPy, Seaborn, machine learning, file processing, web scraping and a whole lot more - and I want you to have it for free. Enter your email address below and I'll send a copy your way.

Yes, I'll take a free Python Developer Kit

Examples Using Cerberus Rules

Let’s see an example that creates a schema with the above rules.

schema = {
    'name': {'type': 'string', 'minlength': 3, 'maxlength': 20, 'required': True},
    'age': {'type': 'integer', 'min': 18},
    'email': {'type': 'string', 'regex': r'^\S+@\S+\.\S+$', 'required': True},
    'score': {'type': 'integer', 'min': 0, 'max': 100},
    'comments': {'type': 'string', 'empty': False},
    'zipcode': {'type': 'string', 'regex': r'^\d{5}$'}
}

validator = Validator(schema)

The following script creates a dictionary and validates it using the schema we just defined:

data = {
    'age': 25,
    'score': 85,
    'comments': '',
    'zipcode': '12345'
}

#Validate the data
if validator.validate(data):
    print("Data is valid!")
else:
    print("Validation errors:")
    print(validator.errors)

Output:

Validation errors:
{'comments': ['empty values not allowed'], 'email': ['required field'], 'name': ['required field']}

Let’s address the above problems and modify our data dictionary.

data = {
    'name': 'Alice Johnson',
    'age': 25,
    'email': 'alice@example.com',
    'score': 85,
    'comments': 'Great job!',
    'zipcode': '12345'
}

#Validate the data
if validator.validate(data):
    print("Data is valid!")
else:
    print("Validation errors:")
    print(validator.errors)

Output:

Data is valid!

By now, you’re probably starting to see how this could be useful when requesting input from a user in an application. Let’s dive in a little further into more Cerberus customization.

Cerberus Customization

Cerberus allows you to extend its capabilities with custom rules, types, methods, default setters, and more. Let’s see how to define our custom rules and data types.

Custom Validation Rules

One of the most powerful features of Cerberus is the ability to define custom validation rules. Here’s an example of how to define and use a custom rule:

from cerberus import Validator

class MyValidator(Validator):
    def _validate_is_even(self, is_even, field, value):
        if is_even and value % 2 != 0:
            self._error(field, "Value must be even")

schema = {'number': {'type': 'integer', 'is_even': True}}
v = MyValidator(schema)

document = {'number': 3}
print(v.validate(document))  
print(v.errors)

Output:

False
{'number': ['Value must be even']}

In the above code, we define a custom validator class MyValidator that extends the Validator class. We then define a custom validation rule _validate_is_even that checks if a given integer value is even. If the is_even flag is set to True in the schema for a field, the custom rule will be applied to that field during validation.

To test the custom validator, we create an instance of MyValidator and pass in the schema. We then validate a document that contains the field number with a value of 3. Since 3 is not even, the validation fails and the error message is returned.

Custom Types

Just like rules, Cerberus allows you to create custom types. Let’s see an example:

from decimal import Decimal
import cerberus

decimal_type = cerberus.TypeDefinition('decimal', (Decimal,), ())

cerberus.Validator.types_mapping['decimal'] = decimal_type

schema = {
    'price': {
        'type': 'decimal'
    }
}

data = {
    'price': Decimal(10.99)
}

v = cerberus.Validator(schema)
print(v.validate(data))

Output:

True

The script above begins by importing the Decimal class from the decimal module. It then imports the Cerberus module and proceeds to define a custom data type named decimal using the cerberus.TypeDefinition class. This custom type is then registered with Cerberus by adding it to the cerberus.Validator.types_mapping dictionary. To test this custom data type, we establish a straightforward validation schema for a price field with the decimal type. Finally, we validate the data containing a decimal type price value of 10.99 using the Cerberus validator, which, upon success, prints ‘True’ to indicate a successful validation.

If you do not pass a decimal type to the price value, you will see an error. For instance, instead of passing a decimal type, the following script passes the default floating type value of 10.99, which is not validated by the custom type schema.

data = {
    'price': 10.99
}

v = cerberus.Validator(schema)
print(v.validate(data))

Output:

False

Conclusion

Cerberus is a useful Python package for performing various data validation tasks, and in this tutorial, you’ve learned how to harness its functionalities to conduct a wide range of data validation operations. Whether you’re safeguarding financial transactions, validating user inputs, or maintaining data consistency, Cerberus empowers you to easily validate your data, offering a reliable and extensible solution for ensuring data quality and integrity in Python applications.

If you want to explore Cerberus further, see the official documentation.


Get Our Python Developer Kit for Free

I put together a Python Developer Kit with over 100 pre-built Python scripts covering data structures, Pandas, NumPy, Seaborn, machine learning, file processing, web scraping and a whole lot more - and I want you to have it for free. Enter your email address below and I'll send a copy your way.

Yes, I'll take a free Python Developer Kit