Create Scalar and Aggregate Functions for Python SQLite

Scalar and aggregate functions allow you to perform operations on a set of values from a database without using external functions to handle the data. Though SQLite has some useful predefined functions, they sometimes don’t meet our needs. In this tutorial we’ll explain the differences between a scalar and aggregate function, and we’ll learn how to define our own scalar and aggregate functions and register them to a Python SQLite connection using create_function and create_aggregate. The source code for this tutorial can be downloaded from this link.

Scalar Functions
Aggregate Functions
Other Remarks About Scalar and Aggregate Functions
Closing Thoughts

Scalar Functions

Scalar functions are called once per record in the selected set and return a value for each call. In this section we will describe how to define new scalar functions, and how to register them in a database connection.

Defining New Scalar Functions

Unlike adapters and converters, scalar functions are registered for each connection using the create_function(name, nargs, func) method of an sqlite3.Connection Python object, where:

A scalar function is invoked in a SQL statement using the name provided by the name parameter. The names of scalar functions are case insensitive.
The nargs parameter specifies the number of arguments of the scalar function. We will discuss this parameter in depth throughout this section.
The func parameter is a callable object, i.e. a class with a __call__() method. Every Python function is a callable object. func is called each time we use its name in a SQL statement.

Let’s see a simple example:

import sqlite3
import datetime

def test_age():
    """ The `age()` function computes the age from the birth date. """
    with sqlite3.connect(':memory:') as conn:
        conn.create_function('age', 1, 
            lambda date: datetime.date.today().year \
                - int(date.split('-')[0]))
        
        conn.execute('CREATE TABLE artists(' \
            'name TEXT NOT NULL, ' \
            'birth_date TEXT NOT NULL, '\
            'country CHAR(3));')
        
        conn.executemany(
            'INSERT INTO artists VALUES(?, ?, ?);',
            ( ('Lionel Richie', '1949-06-20', 'USA'),
              ('Alison Moyet', '1961-06-18', 'GBR'),
              ('Róisín Murphy', '1973-07-05', 'IRL'),
              ('Alissa White-Gluz', '1985-07-31', 'CAN'),
              ('Ralf Hütter', '1946-08-20', 'DEU')
            ))
        
        for name, age in conn.execute(
            'SELECT name, age(birth_date) FROM artists;'):
            print('%s is %d years old.' % (name, age))

test_age()

The test_age() function creates a new scalar function named age, which computes the age of a person from his YYYY-MM-DD birth date. We defined it using a lambda function, which gets the current year with datetime.date.today().year, then splits the birth date, and, finally, subtracts the year of birth from the current year. This function has just one argument, date, so we set the nargs parameter of create_function() to 1. Then, we create the artists table, and insert 5 records in it. The test_age() function will print the age of each artist:

Lionel Richie is 70 years old.
Alison Moyet is 58 years old.
Róisín Murphy is 46 years old.
Alissa White-Gluz is 34 years old.
Ralf Hütter is 73 years old.

Get Our Python Developer Kit for Free

I put together a Python Developer Kit with over 100 pre-built Python scripts covering data structures, Pandas, NumPy, Seaborn, machine learning, file processing, web scraping and a whole lot more - and I want you to have it for free. Enter your email address below and I'll send a copy your way.

We can also use a class with a __call__() method as a scalar function. Let’s take a look at a sample class mapping color names to RGB colors:

import sqlite3

def test_color_map():
    """ Using a Python class as scalar function. """
    
    class ColorMap:
        """ A class mapping color names into RGB colors. """
        
        cmap = {
            'black' : (0  , 0  , 0  ),
            'white' : (255, 255, 255),
            'red'   : (255, 0  , 0  ),
            'blue'  : (0  , 255, 0  ),
            'green' : (0  , 0  , 255)
            }    

        def __call__(self, name):
            try:
                return str(self.cmap[name.lower()])
            except KeyError:
                return None

    with sqlite3.connect(':memory:') as conn:
        conn.create_function('rgb', 1, ColorMap())

        conn.execute('CREATE TABLE drawing(' \
            'item TEXT NOT NULL, ' \
            'x INTEGER NOT NULL, ' \
            'y INTEGER NOT NULL, ' \
            'stroke TEXT NOT NULL, ' \
            'fill TEXT NOT NULL);')

        conn.executemany(
            'INSERT INTO drawing VALUES(?, ?, ?, ?, ?);',
            ( ('line-01'  , 12 , 23, 'red'  , 'red'  ),
              ('circle-03', 34 , 5 , 'black', 'blue' ),
              ('line-02'  , 129, 23, 'black', 'black'),
              ('text-01'  , 12 , 3 , 'black', 'white'),
              ('rect-33'  , 80 , 54, 'black', 'green')
            ) )
            
        for item, stroke, fill in conn.execute(
            'SELECT item, rgb(stroke), rgb(fill) FROM drawing;'):
            print(';'.join((item, stroke, fill)))

test_color_map()

The ColorMap class stores all color names and their respective RGB values in a dictionary. Its __calls__() method tries to retrieve a name from the color map in a case-insensitive way. It returns the RGB value, or None if name isn’t in the dictionary. The __call__() method has just one argument, so we will set the nargs parameter of create_function() to 1. The test_color_map() function creates a drawing table, and fills it with a bunch of graphics primitives, represented by their name, coordinates, stroke color, and fill color. Finally, the SELECT statement uses the rgb() scalar function to retrieve all colors from the database in RGB format:

line-01;(255, 0, 0);(255, 0, 0)
circle-03;(0, 0, 0);(0, 255, 0)
line-02;(0, 0, 0);(0, 0, 0)
text-01;(0, 0, 0);(255, 255, 255)
rect-33;(0, 0, 0);(0, 0, 255)

Scalar Functions with the Same Name

You can register multiple scalar functions to the same name, as long as they have a different number of arguments. SQLite will match the function call to the right implementation according to the number of parameters.

import sqlite3
import random

def test_scalar_signature():
    """ Registering multiple functions to the same name. """
    with sqlite3.connect(':memory:') as conn:
        conn.create_function(
            'randint', 1, lambda u: random.randint(0, u))
        conn.create_function(
            'randint', 2, lambda l, u: random.randint(l, u))
        
        for i in range(5):
            for n in conn.execute('SELECT randint(1, 100);'):
                print('Pick a random number: %d' % n)
        
        for i in range(5):
            for n in conn.execute('SELECT randint(50);'):
                print('Pick a random number [0-50]: %d' % n)
                        
test_scalar_signature()

The test_scalar_signature() registers two randint scalar functions:

The first function picks a random integer between 0 and u. The randint(50) function call matches this function, since it has just one parameter.
The second function chooses a random integer between l and u. The randint(1, 100) function call matches this function, since it has two parameters.

The test_scalar_signature() has this typical output:

Pick a random number: 40
Pick a random number: 93
Pick a random number: 87
Pick a random number: 6
Pick a random number: 28
Pick a random number [0-50]: 26
Pick a random number [0-50]: 18
Pick a random number [0-50]: 0
Pick a random number [0-50]: 50
Pick a random number [0-50]: 30

Scalar Functions with Default Parameters

Sometimes we wish to provide default values to parameters of a scalar function. But, as we have seen above, create_function() requires the number of arguments of the registered function. How many arguments have functions like func(a, b=1, c=2)? A function like this may have 1, 2, or 3 arguments, depending on how we call it. To register such a function, we must set nargs to -1, like this:

conn.create_function('f', -1, func)

Setting the nargs parameter to any other negative value will raise an exception.

import sqlite3

def test_default_parameters():
    """ Scalar functions with default parameters. """
    with sqlite3.connect(':memory:') as conn:
        conn.create_function('concat', -1, 
            lambda x, y, sep=',' : "%s%s%s" % (x, sep, y))
        
        conn.execute('CREATE TABLE songs(' \
            'artist TEXT NOT NULL, ' \
            'title TEXT NOT NULL);')
        
        conn.executemany('INSERT INTO songs VALUES (?, ?);',
            ( ('Muse', 'Sunburn'), 
              ('Epica', 'Storm the Sorrow'),
              ('Leo Sayer', 'Living in a Fantasy'),
              ('Spandau Ballet', 'Gold'),
              ('Roy Orbison', 'Uptown')
            ))
        
        print('Using the `,` default separator:')
        for word in conn.execute(
            'SELECT concat(artist, title) FROM songs;'):
            print(word[0])
            
        print('\nSpecifying `;` as separator:')
        for word in conn.execute(
            "SELECT concat(artist, title, ';') FROM songs;"):
            print(word[0])               

test_default_parameters()

The test_default_parameters() registers a concat(x, y, sep=',') function, where x and y are concatenated using sep as separator. Then, it creates a songs table, holding 5 (artist, title) records. Finally, it executes two SELECT statements, one using the default separator, the other using ; as separator. The output of this functions is:

Using the `,` default separator:
Muse,Sunburn
Epica,Storm the Sorrow
Leo Sayer,Living in a Fantasy
Spandau Ballet,Gold
Roy Orbison,Uptown

Specifying `;` as separator:
Muse;Sunburn
Epica;Storm the Sorrow
Leo Sayer;Living in a Fantasy
Spandau Ballet;Gold
Roy Orbison;Uptown

Scalar Functions with Variable Number of Arguments

Occasionally we don’t know the number of parameters of a function in advance, so we want to define functions capable of handling a variable number of parameters. Functions like this are sometimes referred to as variadic functions. Just like the case of functions with default parameters, variadic functions are registered with nargs parameter of the create_function method set to -1.

import sqlite3

def test_variadic_scalar():
    """ Scalar function with variable number of arguments. """

    with sqlite3.connect(':memory:') as conn:
        conn.create_function('csv', -1,
            lambda *args : ','.join(map(repr, args)) )
        
        conn.execute('CREATE TABLE songs(' \
            'artist TEXT NOT NULL, ' \
            'song TEXT NOT NULL, ' \
            'album TEXT NOT NULL, ' \
            'year INTEGER NOT NULL);')
        
        conn.executemany(
            'INSERT INTO songs VALUES (?, ?, ?, ?);',
            ( ('Muse', 'Cave', 'Showbiz', 1999),
              ('America', 'Sandman', 'America', 1971),
              ('Pet Shop Boys', 'Happiness', 'Super', 2016),
              ('Sleeping Romance', 'Alba', 'Alba', 2017),
              ('Depeche Mode', 'Home', 'Ultra', 1997) 
            ))
        
        print('artist,song,album,year')
        for record in conn.execute(
            'SELECT csv(artist, song, album, year) FROM songs;'):
            print(record[0])
            
        print('\nartist,song,year')
        for record in conn.execute(
            'SELECT csv(artist, song, year) FROM songs;'):
            print(record[0])

test_variadic_scalar()

The test_variadic_scalar() function registers a csv scalar function, which concatenates all its parameters using a comma as the separator. The result looks almost like a CSV record, though double quote characters within a field are not escaped. Then, it creates a songs table, holding 4 columns and 5 songs. Finally, it uses the csv function twice, in order to print, respectively, 4 and 3 columns from the table. You can see that the number of arguments passed to the custom csv function are different each time it’s called.

The output of the test_variadic_scalar() function will be:

artist,song,album,year
'Muse','Cave','Showbiz',1999
'America','Sandman','America',1971
'Pet Shop Boys','Happiness','Super',2016
'Sleeping Romance','Alba','Alba',2017
'Depeche Mode','Home','Ultra',1997

artist,song,year
'Muse','Cave',1999
'America','Sandman',1971
'Pet Shop Boys','Happiness',2016
'Sleeping Romance','Alba',2017
'Depeche Mode','Home',1997

Get Our Python Developer Kit for Free

Aggregate Functions

Aggregate functions are called for each record in the table but return just once, after all records have been processed. In this section we will learn how to define and register our own aggregate functions.

Defining New Aggregate Functions

In sqlite3 an aggregate function is a class providing three methods:

You initialize a new instance of the class with the __init__() method. It only has a self parameter, that is, the instance of the class. You can use it to initialize some variable, like a counter.
The step() method is called once for any record of the table. You can define it with as many arguments as you want. Handling NULL values is your responsibility.
After the last row has been processed, the finalize() method is called to compute the result of the aggregate function. It also only has the self parameter.

Let’s take a look at an example. We’ll define a Reduce class, which “sums” all kinds of native SQLite types. It acts like an extension of the sum() function, which tries to convert any value to a numeric type. You define an aggregate function using the create_aggregate(name, nargs, aggregate_class) method of an sqlite3.Connection Python object, much like the create_function method was used to define scalar functions. This is best illustrated with an example:

import sqlite3
    
def test_reduce():
    """ Defining a new aggregate function. """
    
    class Reduce:
        """ Adds all values. """

        def __init__(self):
            self.total = None

        def step(self, item):
            if self.total is None:
                self.total = item
            else:
                self.total += item

        def finalize(self):
            return self.total
    
    with sqlite3.connect(':memory:') as conn:
        conn.create_aggregate('reduce', 1, Reduce)
        
        conn.execute('CREATE TABLE data(' \
            'word TEXT, num INTEGER, bin BLOB);')
        
        conn.executemany(
            'INSERT INTO data VALUES(?, ?, ?);',
            ( ('Guido' , 1, b'\x00\x11'),
              ('Van'   , 2, b'\x22\x33'),
              ('Rossum', 3, b'\x44\x55') ))
            
        for record in conn.execute(
            'SELECT word, num, quote(bin) FROM data;'):
            print(','.join(map(str, record)))
            
        for word, num, bin in conn.execute(
            'SELECT reduce(word), reduce(num), ' \
            'quote(reduce(bin)) FROM data;'):
            print('reduce(word) = %s' % word)
            print('reduce(num)  = %d' % num)
            print('reduce(bin)  = %s' % bin)
                
test_reduce()

Let’s break down this example:

The Reduce class represents our aggregate function. Its __init__() method sets total to None. We will use it to hold the partial “sum” of all values. Its step() method handles a single value from the table. If total hasn’t been set yet, then it sets it to item. Otherwise, it “adds” it to total. Addition behaves according to the data type of total:
- If it is an int, then the function performs integer addition.
- If it is a float, then it performs float addition.
- If it is a str, then it concatenates total to item.
- If it is a blob, then it concatenates total to item.
test_reduce() creates a conn connection to an in-memory database. Then, it registers the Reduce aggregate function as 'reduce'. It creates a data table, holding three distinct data types, and fills it with 3 records. We print out all records using a SELECT statement. The quote(bin) function will return the hexadecimal representation of the BLOB, in order to make it more readable. Finally, we apply the reduce() function on each column, and we get the following output:

Guido,1,X'0011'
Van,2,X'2233'
Rossum,3,X'4455'
reduce(word) = GuidoVanRossum
reduce(num)  = 6
reduce(bin)  = X'001122334455'

**The DISTINCT Filter**

Every aggregate function with a single parameter can prefix its argument with ALL or DISTINCT. The ALL keyword selects all rows in the results set (which is the default behavior), while DISTINCT selects just one instance for each value in the column. In the following example the DISTINCT keyword helps the choice() function pick a value from distinct samples.

import sqlite3
import random
        
def test_choice():
    """ Test the `Chooser` aggregate function. """
    
    class Chooser:
        """ Picks an item from a set of choices. """

        def __init__(self):
            self.choices = []

        def step(self, item):
            """ Adds `values` to the choices. """
            print('inserting %d' % item)
            self.choices.append(item)

        def finalize(self):
            """ Picks an item from the choices. """
            if self.choices:
                return random.choice(self.choices)
    
    with sqlite3.connect(':memory:') as conn:
        conn.create_aggregate('choice', 1, Chooser)
        
        conn.execute(
            'CREATE TABLE samples(sample INTEGER);')
            
        conn.executemany(
            'INSERT INTO samples VALUES(?);',
            ( (1, ), (3, ), (7, ), (1, ), (3, ), 
              (4, ), (5, ), (1, ) ))
 
        print('Available samples:')
        for n in conn.execute('SELECT * FROM samples;'):
            print(n[0])
           
        print('\nChoosing from all samples:')
        for n in conn.execute(
            'SELECT choice(sample) FROM samples;'):
            print('Pick a sample: %d' % n) 
        
        print('\nOnly one instance per sample:')
        for n in conn.execute(
            'SELECT choice(DISTINCT sample) FROM samples;'):
            print('Pick a sample: %d' % n) 
                    
test_choice()

Let’s have a closer look at this example:

The Chooser class is an aggregate function which collects all values from a column, and returns a random value from it, or None if there is no value available. Its __init__() method creates an empty list. Its step() method prints item, then adds it to the list. It doesn’t check if that value is already on the list. Finally, its finalize() method chooses a value from the list using random.choice().
The test_choice() function opens a conn connection to an in-memory database, and registers the Chooser function as 'choice'. Then, it creates a samples table, and inserts some integer values in it. As you can see, samples are not unique, e.g. there are more than one instance of 1. Finally, we try to use the choice() function twice, with and without the DISTINCT keyword, getting the following results:

Available samples:
1
3
7
1
3
4
5
1

Choosing from all samples:
inserting 1
inserting 3
inserting 7
inserting 1
inserting 3
inserting 4
inserting 5
inserting 1
Pick a sample: 1

Only one instance per sample:
inserting 1
inserting 3
inserting 7
inserting 4
inserting 5
Pick a sample: 3

As we can see, in the first case all values from the sample column are inserted into the list. Since this column contains duplicates, not all samples have the same probability of being picked from the list. In the second case, using the DISTINCT keyword prevents duplicate values from being inserted into the list. Using DISTINCT can be more efficient than testing for duplicates in the step() method, using something like if item not in self.choices.

Aggregate Functions with Variable Number of Arguments

Everything we said earlier about the number of arguments of scalar functions still apply to the step() method of classes representing aggregate functions. That means we can have:

Multiple aggregate functions with the same name, but different number of parameters.
An aggregate function with default parameters.
An aggregate function with variable number of parameters.

The only difference is you call the create_aggregate method instead of the create_function method. Let’s see an example of variadic aggregate function:

import sqlite3

def test_variadic_aggregate():
    """ Defining a variadic aggregate function. """
    
    class RangeMax:
        """ Maximum value in a range of cells. """
        
        def __init__(self):
            self.maxval = None
            
        def step(self, *values):
            if values:
                if self.maxval is None:
                    self.maxval = max(values)
                else:
                    self.maxval = max(
                        self.maxval, max(values))
                        
        def finalize(self):
            return self.maxval
                        
    with sqlite3.connect(':memory:') as conn:
        conn.create_aggregate('rangemax', -1, RangeMax)
                            
        conn.execute('CREATE TABLE review(' \
            'item TEXT NOT NULL, ' \
            'USA INTEGER NOT NULL, ' \
            'CAN INTEGER NOT NULL, ' \
            'MEX INTEGER NOT NULL);')
                            
        conn.executemany(
            'INSERT INTO review VALUES(?, ?, ?, ?);',
            ( ('belt'    , 723, 134, 89),
              ('tie'     , 567, 378, 78),
              ('necklace', 893, 456, 46),
              ('earrings', 744, 393, 122),
              ('bracelet', 612, 334, 103)
            ) )
                            
        for m in conn.execute(
            'SELECT rangemax(USA, CAN, MEX) FROM review;'):
            print('Maximum number of items sold: %d' % m)

test_variadic_aggregate()

The RangeMax class represents an aggregate function which returns the maximum value within a rectangular range of values, or None if the range is empty. Its step() method uses the Python built-in max() function to calculate the maximum value within a row of data, then it compares it to the current maximum value. This aggregate function is registered with nargs set to -1, since it has a variable number of arguments.

The test_variadic_aggregate() function creates a review table, and fills it with 5 records, representing the number of items sold per country. The SELECT statement uses the rangemax() function on a rectangular range, including all items and countries, to determine the maximum number of items sold, which is:

Maximum number of items sold: 893

Get Our Python Developer Kit for Free

Remarks About Scalar and Aggregate Functions

In this section we have collected a series of noteworthy remarks which apply both to scalar and aggregate functions. Read them carefully before trying to define your own functions.

Beware of NULL Values

When defining a new scalar or aggregate function, remember that the values which you are performing operations on can be NULL. If that’s the case, then check for None values and handle them appropriately; e.g. an aggregate function counting or summing values should skip NULL values.

import sqlite3

def test_checked_mean():
    """ Calculating the mean value with `CheckedMean`. """
    
    class CheckedMean:
        """ Computes the mean of a set of values. """
    
        def __init__(self):
            self.values = []
            
        def step(self, item):
            if item is not None:
                self.values.append(item)
            
        def finalize(self):
            return sum(self.values) / len(self.values) \
                if self.values else 0
    
    with sqlite3.connect(':memory:') as conn:
        conn.create_aggregate('mean', 1, CheckedMean)
        
        conn.execute(
            'CREATE TABLE samples(i INTEGER);')
        
        conn.executemany(
            'INSERT INTO samples VALUES(?);',
            ( (23, ), (None, ), (12, ), (50, ),
              (76, ), (None, ), (32, ), (63, ) ))
        
        for tot in conn.execute(
            'SELECT mean(i) FROM samples;'):
            print('The mean value is: %d' % tot)
            
test_checked_mean()

In this example we define a CheckedMean aggregate function, which computes the arithmetic mean of a set of values. Its step() methods prevents None values from being inserted in the values list. The test_checked_mean() function creates a samples table, which holds both numeric and NULL values. Finally, the SELECT statement uses the mean() aggregate function to calculate the arithmetic mean on the i column, which is:

The mean value is: 42

When possible, rely on the WHERE or HAVING clauses of the SELECT statement for filtering NULL values:

import sqlite3

def test_unchecked_mean():
    """ Calculating the mean value with `UncheckedMean`. """
    
    class UncheckedMean:
        """ Computes the mean of a set of values. """
        
        def __init__(self):
            self.values = []
            
        def step(self, item):
            self.values.append(item)
            
        def finalize(self):
            return sum(self.values) / len(self.values) \
                if self.values else 0
        
    with sqlite3.connect(':memory:') as conn:
        conn.create_aggregate('mean', 1, UncheckedMean)
        
        conn.execute(
            'CREATE TABLE samples(i INTEGER);')
        
        conn.executemany(
            'INSERT INTO samples VALUES(?);',
            ( (23, ), (None, ), (12, ), (50, ),
              (76, ), (None, ), (32, ), (63, ) ))
        
        for tot in conn.execute(
            'SELECT mean(i) FROM samples ' \
            'WHERE i IS NOT NULL;'):
            print('The mean value is: %d' % tot)
            
test_unchecked_mean()

The UncheckedMean class looks like CheckedMean, but its step() method won’t check for NULL values. The test_unchecked_mean() function adds a WHERE clause to the SELECT statement to make sure that no NULL value will be passed to the mean function. The output of test_unchecked_mean() is the same as its checked version.

New Functions Can Silently Overwrite Others

If you try to register a scalar or aggregate function with the same name and the same number of arguments as another function, the latter will replace the former, and the SQLite core won’t complain (e.g. by raising an exception or a warning):

import sqlite3

def test_overwrite():
    """ Silently overwriting a scalar function. """
    
    def sloc(file):
        try:
            with open(file, 'rt', encoding='utf-8') as src:
                n = 0
                for line in src:
                    line = line.strip()
                    if line:
                        n += 1
        except Exception:
            return 0
        return n

    def sloc_comment(file):
        try:
            with open(file, 'rt', encoding='utf-8') as src:
                n = 0
                for line in src:
                    line = line.rstrip()
                    if line and not any(
                        map(line.startswith, ('/*', ' *'))):
                        n += 1
        except Exception:
            return 0
        return n
                                    
    with sqlite3.connect(':memory:') as conn:
        conn.create_function('sloc', 1, sloc)
        
        conn.execute('CREATE TABLE files(' \
            'name TEXT NOT NULL, ' \
            'lang TEXT NOT NULL);')
                                        
        conn.executemany(
            'INSERT INTO files VALUES(?, ?);',
            ( ('SQLiteGroupConcat.sql', 'sql'),
              ('SQLiteTrim.sql', 'sql') 
            ))
        
        print('Using the `sloc()` function:')
        for name, sloc in conn.execute(
            'SELECT name, sloc(name) FROM files;'):
            print('File `%s` is %d SLOC.' % (name, sloc))
            
        conn.create_function('sloc', 1, sloc_comment)
        print('\nUsing the `sloc_comment()` function:')
        for name, sloc in conn.execute(
            'SELECT name, sloc(name) FROM files;'):
            print('File `%s` is %d SLOC.' % (name, sloc))                                
test_overwrite()

In this example, we define two functions for computing the number of source lines of code (SLOC): the sloc() function just skips the empty lines, the sloc_comment() function also skips SQL comments. Both functions will return 0 if an exception occurs while reading the file (e.g. the file is not available).

The test_overwrite() function creates a files table, holding the name of the files whose SLOC will be calculated. The first SELECT statement will call sloc(), while the second will use sloc_comment(). Even though they use different functions, notice how both functions have their name argument set to sloc. Let’s take a look at the output of the two statements:

Using the `sloc()` function:
File `SQLiteGroupConcat.sql` is 15 SLOC.
File `SQLiteTrim.sql` is 9 SLOC.

Using the `sloc_comment()` function:
File `SQLiteGroupConcat.sql` is 8 SLOC.
File `SQLiteTrim.sql` is 2 SLOC.

Beware that this holds for predefined functions, too:

import sqlite3
import random

def test_overwrite_predefined_function():
    """ Overwriting the `total()` function. """
    
    class new_total:
        """ New version of the total function. """
        
        def __init__(self):
            self.tot = 0
            
        def step(self, i):
            print('adding %d' % i)
            self.tot += i
            
        def finalize(self):
            return self.tot
            
    
    with sqlite3.connect(':memory:') as conn:
        conn.execute('CREATE TABLE test(c);')
        
        conn.executemany('INSERT INTO test VALUES(?);',
            ( (1, ), (2, ), (3, ), (4, ) ))
            
        # Using predefined `total()` function
        for i in conn.execute(
            'SELECT total(c) FROM test;'):
            print('The total of column `c` is %d' % i[0])
        
        # Using `new_total()` function
        conn.create_aggregate('total', 1, new_total)
        for i in conn.execute(
            'SELECT total(c) FROM test;'):
            print('The total of column `c` is %d' % i[0])

test_overwrite_predefined_function()

The total of column `c` is 10
adding 1
adding 2
adding 3
adding 4
The total of column `c` is 10

In other words, you must be careful when choosing the name for your new functions in order to avoid overwriting existing default scalar and aggregate functions.

Aggregate and Scalar Functions Share the Same Namespace

Earlier we explained that we can define multiple scalar functions with the same name but with a different number of arguments. SQLite will choose the correct alternative according to the number of arguments in the function’s call. Despite the fact that they are registered with different functions, both scalar and aggregate functions share the same namespace. In other words, if we register a scalar function with the same name and number of arguments as an aggregate function, the function registered last will overwrite the other.

import sqlite3
import random
                
def test_namespace():
    """ Aggregate and scalar functions share the same namespace. """ 
    
    class RandomPair:
        """ Selects a random pair of items. """
        
        def __init__(self):
            self.choices = []
            
        def step(self, x, y):
            self.choices.append("(%s;%s)" % (x, y))
            
        def finalize(self):
            return random.choice(self.choices) \
                if self.choices else None
    
    with sqlite3.connect(':memory:') as conn:
        conn.execute('CREATE TABLE pairs(' \
            'x INTEGER NOT NULL, y INTEGER NOT NULL);')
            
        conn.executemany('INSERT INTO pairs VALUES(?, ?);',
            ( (12, 35), (97, 72), (12, 34), 
              (54, 23), (11, 90), (23, 44) ))
        
        print('Available samples:')
        for x, y in conn.execute(
            'SELECT * FROM pairs;'):
            print('(%d, %d)' % (x, y))
        
        conn.create_function('random', 2, 
            lambda l, u: random.randint(
                min(l, u), max(l, u)))
        
        print('\nUsing the scalar function:')
        for i in conn.execute(
            'SELECT random(x, y) FROM pairs;'):
            print(i[0])
        
        conn.create_aggregate('random', 2, RandomPair)
        
        print('\nUsing the aggregate function:')
        for i in conn.execute(
            'SELECT random(x, y) FROM pairs;'):
            print(i[0])

test_namespace()

The RandomPair aggregate function chooses a random pair from two columns of values. The test_namespace() function creates a connection to an in-memory database, holding a single pairs table with two columns containing integers. Then, we insert 6 pairs of integers in the table. We execute the same SELECT statement twice:

The first time, we register the scalar function random(l, u), which picks a random number between u and l. The SELECT statement returns 6 records.
The second time, we register the aggregate function random(x, y), which choose a random pair between all records in the pairs table. The SELECT statement returns just one record.

The typical output of the test_namespace() function will be:

Available samples:
(12, 35)
(97, 72)
(12, 34)
(54, 23)
(11, 90)
(23, 44)

Using the scalar function:
13
93
21
46
25
25

Using the aggregate function:
(12;35)

SQLite features a max(column) aggregate function, which picks the maximum value in a column, and a max(x, y, *others) scalar function, which selects the maximum value between x, y, and the values in others. SQLite can tell them apart because the aggregate function requires exactly one parameter, while the scalar function requires at least two parameters.

Get Our Python Developer Kit for Free

Ambiguous Function Calls

Be really careful when defining functions with default parameters and variadic functions. Let’s consider three functions with the following signatures: func(x), func(x, y=0), and func(x, *args). We can associate either the second or the third to the same identifier, since they both require the nargs parameter of create_function() to be set to -1. If we execute a SELECT func(a) FROM tab; statement, which function will func(a) refer to? All three functions would be suitable candidates. Let’s see an example:

import sqlite3
import random
import string

def test_ambiguous_call():
    """ An ambiguous function call. """
    
    def generate_words(n=10, minlen=5, maxlen=10):
        for i in range(n):
            yield ''.join(
                ( random.choice(string.ascii_letters) \
                    for i in range(minlen, maxlen) ))
    
    with sqlite3.connect(':memory:') as conn:
        conn.create_function('concat', 2,
            lambda x, y : '%s,%s' % (x, y))
        conn.create_function('concat', -1,
            lambda x, y, sep=',' : '%s%s%s' % (x, sep, y))
        
        conn.execute('CREATE TABLE words(' \
            'prefix TEXT, suffix TEXT);')
        
        conn.executemany('INSERT INTO words VALUES (?, ?);',
            ( zip(generate_words(5), generate_words(5)) ))
        
        print('`concat(x, y, sep)` matches the second function:')
        for word in conn.execute(
            "SELECT concat(prefix, suffix, '@') FROM words;"):
            print(word[0])
        
        print('\nAmbiguous call:')
        for word in conn.execute(
            'SELECT concat(prefix, suffix) FROM words;'):
            print(word[0])

test_ambiguous_call()

The test_ambiguous_call() registers two 'concat' scalar function, one with two arguments, another one with two arguments and one optional argument. That’s perfectly fine, since they have different number of parameters. Both functions concatenate the first value to the second value. The sep parameter of the second function specifies the separator between the two values. We’ll call the concat() function twice:

The first call uses three parameters, so only the second scalar function can match it.
The second call uses two parameters, so both functions are suitable candidates, but the first function will be applied.

Let’s take a look at a typical output of the test_ambiguous_call() function:

`concat(x, y, sep)` matches the second function:
bRGIB@TrVbN
pBxeU@omJsa
cnhCw@kUpcW
QyJAM@ENDcj
ClnjO@KDWAC

Ambiguous call:
bRGIB,TrVbN
pBxeU,omJsa
cnhCw,kUpcW
QyJAM,ENDcj
ClnjO,KDWAC

As you can see from this example, the function with fixed number of argument is matched first, though the function with variable number of arguments was a good candidate, too. This behavior is hardly predictable, so be really careful when defining functions with the same name.

Error Handling

Scalar and aggregate functions can raise exceptions; e.g. a NULL value is passed to a function that can’t handle NULL values. In this case the exception propagates to the SQLite core and will raise another exception, which is a subclass of the sqlite3.Error exception. This feature makes error handling easier since functions can raise any kind of exception, but we need to be able to handle those from within the sqlite3 module.

import sqlite3

def test_exceptions():
    """ Handling exceptions from Scalar and Aggregate Functions. """
    
    def faulty_scalar(a):
        """ A scalar function raising `ValueError`. """
        raise ValueError('Faulty scalar function.')
        
    class faulty_aggregate:
        """ An aggregate function raising `ValueError`. """
            
        def step(self, a):
            raise ValueError('Faulty aggregate function.')
                
        def finalize(self):
            pass
                
    with sqlite3.connect(':memory:') as conn:
        conn.create_function(
            'faulty_scalar', 1, faulty_scalar)
        conn.create_aggregate(
            'faulty_aggregate', 1, faulty_aggregate)
                                    
        try:
            conn.execute('SELECT faulty_scalar(1);')
        except sqlite3.Error as error:
            print(error)
        
        try:
            conn.execute('SELECT faulty_aggregate(1);')
        except sqlite3.Error as error:  
            print(error)

test_exceptions()

In our example we define a faulty_scalar scalar function and a faulty_aggregate aggregate function, both raising a ValueError exception. To protect our code from propagating exceptions, we need an except clause to catch sqlite3.Error exceptions. This is the base class for all exceptions raised by the sqlite3 module, as we can see from test_exceptions()’s output:

user-defined function raised exception
user-defined aggregate's 'step' method raised error

Closing Thoughts

In this tutorial we learned how to define custom scalar and aggregate functions. They are both registered on a per-connection basis. Scalar functions are represented by callable objects with a fixed (or variable) number of arguments. On the other hand, aggregate functions are represented by classes with step() and finalize() methods.

Functions are matched by name and number of arguments, so we can register more than one function to the same name, but we must always keep in mind that aggregate and scalar functions share the same namespace, and that registering both fixed-number and variable-number functions can lead to ambiguous function calls, which are hard to detect.

Did you find this free tutorial helpful? Share this article with your friends, classmates, and coworkers on Facebook and Twitter! When you spread the word on social media, you’re helping us grow so we can continue to provide free tutorials like this one for years to come. After that, please subscribe to our email list for more Python tutorials.

Get Our Python Developer Kit for Free

List of All Examples in This Tutorial

For convenience, here’s a list of all the Python SQLite examples we presented in this tutorial. Hopefully you’ll find this inspirational when creating your own SQLite functions in Python.

Defining a New Scalar Function age()
Using a Python Class as Scalar Function
Multiple Functions with the Same Name
Scalar Function with a Default Parameter
A Scalar Function with a Variable Number of Parameters
Defining the Reduce Aggregate Function
Using the DISTINCT Keyword
A Variadic Aggregate Function
Checked Mean
Unchecked Mean, Relying on the WHERE Clause
Overwriting a Custom Function
Overwriting a Predefined Function
Aggregate and Scalar Functions Share the Same Namespace
Ambiguous Function Calls
Handling Errors from Scalar and Aggregate Functions

Create Scalar and Aggregate Functions for Python SQLite

The Python Tutorials Blog

Scalar Functions

Defining New Scalar Functions

Scalar Functions with the Same Name

Scalar Functions with Default Parameters

Scalar Functions with Variable Number of Arguments

Aggregate Functions

Defining New Aggregate Functions

**The DISTINCT Filter**

Aggregate Functions with Variable Number of Arguments

Remarks About Scalar and Aggregate Functions

Beware of NULL Values

New Functions Can Silently Overwrite Others

Aggregate and Scalar Functions Share the Same Namespace

Ambiguous Function Calls

Error Handling

Closing Thoughts

List of All Examples in This Tutorial

About The Python Tutorials Blog