Data mutability and ownership: Terality vs pandas
Unlike pandas, Terality always copies data and avoids shared mutable data. Differences with pandas on this topic are documented here.
With pandas, indexing operations can return either a copy or a view. Consider the following code:
import pandas as pd
df = pd.DataFrame({'a': [1, 2], 'b': [3, 4]}
df['a']
df['a'][0] = 42
print(df)
will print
df
unchanged.Whether an indexing operation returns a copy (as above) or a view depends on the underlying memory layout, and is hard to predict as a user. Pandas has an entire documentation section dedicated to this issue.
Additionally, pandas exposes functions that share Python objects between two dataframes: functions like
pd.DataFrame.copy
can result in two dataframes where mutating one will also mutate the other, leading to error-prone code.In order to both simplify the API surface and enable performance optimizations, Terality implements a simpler model than the pandas API.
With Terality, all operations (such as indexing functions) always return copies of the data. Terality never returns views.
Data in a Terality structure (whether a Series or a DataFrame) is never shared with any other structure. Mutating a structure is guaranteed not to mutate any other structure.
Terality does support in-place operations, but they don't offer any performance advantage over operations that return a new DataFrame or Series. Behind the scenes, Terality applies optimizations that make copies as performant as using views or in-place operations.
With Terality, chained indexing (as described in the first section) will never mutate the original DataFrame. Use
pd.DataFrame.loc
(and the iloc
variant) instead of chained indexing when you want to assign to the result of an indexing operation.As a result, some operations will always raise an error in Terality. This should help avoiding common mistakes with pandas code.
-
pd.DataFrame.copy
andpd.Series.copy
always perform a recursive deep copy. Passingdeep=False
is an error. - Some pandas functions (such as the
Index constructor
orpd.DataFrame.rename
) accept acopy
argument that determines whether data should be copied during the operation. With Terality, onlycopy=True
is accepted. Settingcopy=False
will raise an error.
Last modified 2yr ago