Data mutability and ownership: Terality vs pandas

Unlike pandas, Terality always copies data and avoids shared mutable data. Differences with pandas on this topic are documented here.

Data ownership and mutability in pandas

With pandas, indexing operations can return either a copy or a view. Consider the following code:

import pandas as pd

df = pd.DataFrame({'a': [1, 2], 'b': [3, 4]}
df['a']

Here, df['a'] is a copy of the columna. As a result, "chained indexing" will not mutate df:

df['a'][0] = 42
print(df)

will print df unchanged.

Whether an indexing operation returns a copy (as above) or a view depends on the underlying memory layout, and is hard to predict as a user. Pandas has an entire documentation section dedicated to this issue.

Additionally, pandas exposes functions that share Python objects between two dataframes: functions like pd.DataFrame.copy can result in two dataframes where mutating one will also mutate the other, leading to error-prone code.

In order to both simplify the API surface and enable performance optimizations, Terality implements a simpler model than the pandas API.

The Terality model: only copies

With Terality, all operations (such as indexing functions) always return copies of the data. Terality never returns views.

Data in a Terality structure (whether a Series or a DataFrame) is never shared with any other structure. Mutating a structure is guaranteed not to mutate any other structure.

Terality does support in-place operations, but they don't offer any performance advantage over operations that return a new DataFrame or Series. Behind the scenes, Terality applies optimizations that make copies as performant as using views or in-place operations.

With Terality, chained indexing (as described in the first section) will never mutate the original DataFrame. Use pd.DataFrame.loc(and the iloc variant) instead of chained indexing when you want to assign to the result of an indexing operation.

Consequence: differences with the pandas API

As a result, some operations will always raise an error in Terality. This should help avoiding common mistakes with pandas code.

  • pd.DataFrame.copyand pd.Series.copyalways perform a recursive deep copy. Passing deep=False is an error.

  • pd.Series.viewis not supported: data can not be shared between two Series.

  • Some pandas functions (such as the Index constructor or pd.DataFrame.rename) accept a copy argument that determines whether data should be copied during the operation. With Terality, only copy=True is accepted. Setting copy=False will raise an error.

Last updated