Terality Docs
  • What is Terality?
  • Documentation
    • Quickstart
      • Setup
      • Tutorial
      • Next steps
    • User guide
      • Supported configurations
      • User dashboard
      • Importing and exporting data
        • Reading/writing to storage
        • Reading from multiple files
        • Writing to multiple files
        • Storage services
        • Data formats
        • From/to pandas
      • .apply() and passing callables
      • Caching
      • Best practices and anti-patterns
      • Upgrading your client version
      • Client configuration (CLI)
      • Support for external packages
      • Advanced topics
        • Data mutability and ownership: Terality vs pandas
    • API Reference
      • Conversion from/to pandas
      • Write to multiple files
    • Deploy Terality in your own AWS account
    • Releases
  • FAQ
    • Differences with
      • Terality and Pandas
      • Terality vs Spark
      • Terality vs Dask
    • Pricing
    • Security
    • Support & contact
    • Common setup issues
Powered by GitBook
On this page
  • Data ownership and mutability in pandas
  • The Terality model: only copies
  • Consequence: differences with the pandas API

Was this helpful?

  1. Documentation
  2. User guide
  3. Advanced topics

Data mutability and ownership: Terality vs pandas

Unlike pandas, Terality always copies data and avoids shared mutable data. Differences with pandas on this topic are documented here.

PreviousAdvanced topicsNextAPI Reference

Last updated 3 years ago

Was this helpful?

Data ownership and mutability in pandas

With pandas, indexing operations can return either a copy or a view. Consider the following code:

import pandas as pd

df = pd.DataFrame({'a': [1, 2], 'b': [3, 4]}
df['a']

Here, df['a'] is a copy of the columna. As a result, will not mutate df:

df['a'][0] = 42
print(df)

will print df unchanged.

Whether an indexing operation returns a copy (as above) or a view depends on the underlying memory layout, and is hard to predict as a user. Pandas has an entire dedicated to this issue.

Additionally, pandas exposes functions that share Python objects between two dataframes: functions like can result in two dataframes where mutating one will also mutate the other, leading to error-prone code.

In order to both simplify the API surface and enable performance optimizations, Terality implements a simpler model than the pandas API.

The Terality model: only copies

With Terality, all operations (such as indexing functions) always return copies of the data. Terality never returns views.

Data in a Terality structure (whether a Series or a DataFrame) is never shared with any other structure. Mutating a structure is guaranteed not to mutate any other structure.

Terality does support in-place operations, but they don't offer any performance advantage over operations that return a new DataFrame or Series. Behind the scenes, Terality applies optimizations that make copies as performant as using views or in-place operations.

Consequence: differences with the pandas API

As a result, some operations will always raise an error in Terality. This should help avoiding common mistakes with pandas code.

With Terality, chained indexing (as described in the first section) will never mutate the original DataFrame. Use (and the iloc variant) instead of chained indexing when you want to assign to the result of an indexing operation.

and always perform a recursive deep copy. Passing deep=False is an error.

is not supported: data can not be shared between two Series.

Some pandas functions (such as the or ) accept a copy argument that determines whether data should be copied during the operation. With Terality, only copy=True is accepted. Setting copy=False will raise an error.

"chained indexing"
documentation section
pd.DataFrame.copy
pd.DataFrame.loc
pd.DataFrame.copy
pd.Series.copy
pd.Series.view
Index constructor
pd.DataFrame.rename