Data mutability and ownership: Terality vs pandas
Unlike pandas, Terality always copies data and avoids shared mutable data. Differences with pandas on this topic are documented here.
Data ownership and mutability in pandas
With pandas, indexing operations can return either a copy or a view. Consider the following code:
Here, df['a']
is a copy of the columna
. As a result, "chained indexing" will not mutate df
:
will print df
unchanged.
Whether an indexing operation returns a copy (as above) or a view depends on the underlying memory layout, and is hard to predict as a user. Pandas has an entire documentation section dedicated to this issue.
Additionally, pandas exposes functions that share Python objects between two dataframes: functions like pd.DataFrame.copy
can result in two dataframes where mutating one will also mutate the other, leading to error-prone code.
In order to both simplify the API surface and enable performance optimizations, Terality implements a simpler model than the pandas API.
The Terality model: only copies
With Terality, all operations (such as indexing functions) always return copies of the data. Terality never returns views.
Data in a Terality structure (whether a Series or a DataFrame) is never shared with any other structure. Mutating a structure is guaranteed not to mutate any other structure.
Terality does support in-place operations, but they don't offer any performance advantage over operations that return a new DataFrame or Series. Behind the scenes, Terality applies optimizations that make copies as performant as using views or in-place operations.
With Terality, chained indexing (as described in the first section) will never mutate the original DataFrame. Use pd.DataFrame.loc
(and the iloc
variant) instead of chained indexing when you want to assign to the result of an indexing operation.
Consequence: differences with the pandas API
As a result, some operations will always raise an error in Terality. This should help avoiding common mistakes with pandas code.
pd.DataFrame.copy
andpd.Series.copy
always perform a recursive deep copy. Passingdeep=False
is an error.pd.Series.view
is not supported: data can not be shared between two Series.Some pandas functions (such as the
Index constructor
orpd.DataFrame.rename
) accept acopy
argument that determines whether data should be copied during the operation. With Terality, onlycopy=True
is accepted. Settingcopy=False
will raise an error.
Last updated