Using C Bindings from Python
Introduction
While reading about RocksDB, I encountered this interesting line: Written in C++, RocksDB provides support for embedding in applications written in various languages like C, C++, Rust, Go, and Java through C bindings.
On the surface, this doesn’t seem too complicated - you write shared libraries in C and get Python (or any other language) to use them. I had never done that in practice though. Instead of just reading about it, I decided to get hands-on experience with C bindings myself. Thanks to large language models - I get them to do most of the heavy lifting.
Building a Simple Key-Value Store in C
Let’s start by creating a simple in-memory key-value store in C that we can later use from Python.
// kv_store.c
#include <string.h>
#include <stdlib.h>
#define MAX_ITEMS 100
#define KEY_SIZE 64
#define VALUE_SIZE 256
typedef struct {
char key[KEY_SIZE];
char value[VALUE_SIZE];
} kv_item;
static kv_item store[MAX_ITEMS];
static int store_size = 0;
// Put key-value (overwrite if exists)
void put(const char* key, const char* value) {
for (int i = 0; i < store_size; i++) {
if (strcmp(store[i].key, key) == 0) {
strncpy(store[i].value, value, VALUE_SIZE - 1);
store[i].value[VALUE_SIZE - 1] = '\0';
return;
}
}
if (store_size < MAX_ITEMS) {
strncpy(store[store_size].key, key, KEY_SIZE - 1);
store[store_size].key[KEY_SIZE - 1] = '\0';
strncpy(store[store_size].value, value, VALUE_SIZE - 1);
store[store_size].value[VALUE_SIZE - 1] = '\0';
store_size++;
}
}
// Get a copy of the value (caller must free)
char* get(const char* key) {
for (int i = 0; i < store_size; i++) {
if (strcmp(store[i].key, key) == 0) {
char* result = (char*)malloc(VALUE_SIZE);
strncpy(result, store[i].value, VALUE_SIZE);
return result;
}
}
return NULL;
}
// Delete a key-value pair
void delete(const char* key) {
for (int i = 0; i < store_size; i++) {
if (strcmp(store[i].key, key) == 0) {
for (int j = i; j < store_size - 1; j++) {
store[j] = store[j + 1];
}
store_size--;
return;
}
}
}
// Get the number of items in the store
int get_size() {
return store_size;
}
// Get key at index (caller must free)
char* get_key_at(int index) {
if (index < 0 || index >= store_size) {
return NULL;
}
char* result = (char*)malloc(KEY_SIZE);
strncpy(result, store[index].key, KEY_SIZE);
return result;
}
// Get value at index (caller must free)
char* get_value_at(int index) {
if (index < 0 || index >= store_size) {
return NULL;
}
char* result = (char*)malloc(VALUE_SIZE);
strncpy(result, store[index].value, VALUE_SIZE);
return result;
}
// Free dynamically allocated strings
void free_string(char* ptr) {
free(ptr);
}
The code above implements a simple key-value store with the following features:
- Fixed-size storage: Limited to 100 items with keys up to 64 characters and values up to 256 characters
- put() function: Adds or updates a key-value pair with proper null termination
- get() function: Returns a dynamically allocated copy of the value (caller must free)
- delete() function: Removes a key-value pair from the store
- get_size() function: Returns the number of items currently in the store
- get_key_at() and get_value_at() functions: Get key or value at a specific index (caller must free)
- free_string() function: Properly frees memory allocated by get(), get_key_at(), and get_value_at()
Compiling the C Code into a Shared Library
To use this C code from Python, we need to compile it into a shared library:
gcc -fPIC -shared kv_store.c -o libkvstore.so
The flags used:
-fPIC
: Generate position-independent code. Position-Independent Code is compiled so it can run from any memory address, making it essential for shared libraries, runtime loading, and security features like ASLR.-shared
: Create a shared library instead of an executable. If you ever saw an error from a language that looks like this -somelib.so: cannot open shared object file: No such file or directory
- you can be sure that it is some shared C library that your program is unable to find.
Using the C Library from Python
Now we can use Python’s ctypes
module to interface with our shared library. Notice how we create a Python class to wrap the C functions and handle memory management properly:
import ctypes
from ctypes import c_char_p, c_void_p, c_int
class KVStore:
def __init__(self, lib_path="./libkvstore.so"):
self.lib = ctypes.CDLL(lib_path)
self.lib.put.argtypes = [c_char_p, c_char_p]
self.lib.get.argtypes = [c_char_p]
self.lib.get.restype = c_void_p
self.lib.delete.argtypes = [c_char_p]
self.lib.free_string.argtypes = [c_void_p]
# Functions for getting all items
self.lib.get_size.restype = c_int
self.lib.get_key_at.argtypes = [c_int]
self.lib.get_key_at.restype = c_void_p
self.lib.get_value_at.argtypes = [c_int]
self.lib.get_value_at.restype = c_void_p
def put(self, key: str, value: str):
self.lib.put(key.encode(), value.encode())
def get(self, key: str) -> str | None:
raw_ptr = self.lib.get(key.encode())
if not raw_ptr:
return None
value = ctypes.string_at(raw_ptr).decode()
self.lib.free_string(raw_ptr)
return value
def delete(self, key: str):
self.lib.delete(key.encode())
def get_all(self) -> dict[str, str]:
"""Get all key-value pairs as a dictionary"""
result = {}
size = self.lib.get_size()
for i in range(size):
# Get key
key_ptr = self.lib.get_key_at(i)
if key_ptr:
key = ctypes.string_at(key_ptr).decode()
self.lib.free_string(key_ptr)
# Get value
value_ptr = self.lib.get_value_at(i)
if value_ptr:
value = ctypes.string_at(value_ptr).decode()
self.lib.free_string(value_ptr)
result[key] = value
return result
def size(self) -> int:
"""Get the number of items in the store"""
return self.lib.get_size()
# Example usage
if __name__ == "__main__":
store = KVStore()
store.put("language", "Python")
store.put("db", "Memory")
store.put("framework", "Flask")
print("language:", store.get("language"))
print("db:", store.get("db"))
print("All items:", store.get_all())
print("Store size:", store.size())
store.delete("language")
print("language (after delete):", store.get("language"))
print("All items after delete:", store.get_all())
Key Points About C Bindings
When working with C bindings in Python, there are several important considerations:
Type Definitions
- We must explicitly define the argument and return types using
argtypes
andrestype
- This helps ctypes properly convert between Python and C data types
String Handling
- Python strings are automatically encoded to bytes when passed to C functions
- C strings returned from functions are decoded back to Python strings
- We use
ctypes.string_at()
to safely read from the returned pointer
Memory Management
- The Python wrapper properly frees this memory using the
free_string()
function. Usually this is taken care of by shared libraries themselves but it was worth exploring it here. We could manipulate pointers using Python as well.
Expected Output
When you run the Python code, you should see:
language: Python
db: Memory
All items: {'language': 'Python', 'db': 'Memory', 'framework': 'Flask'}
Store size: 3
language (after delete): None
All items after delete: {'db': 'Memory', 'framework': 'Flask'}
Conclusion
C bindings provide a powerful way to leverage existing C libraries from Python.
While this example is simple, the same principles apply to more complex libraries like RocksDB. The ctypes
module makes it relatively straightforward to interface with C code, and with proper wrapper design, you can hide the complexity of memory management from the end user.
This hands-on approach really helps understand how higher-level languages can benefit from the performance and extensive ecosystem of C libraries while maintaining safety and ease of use.