Best practices and anti-patterns

Avoid using .apply()

Don't use df.apply() if it's possible to obtain the same result by applying a built-in function.

apply is a slow operation both with pandas and with Terality. It can't be optimized as much as specialized built-in functions.

Avoid iterating over structures

Iterating over rows of a structure (for x in struct loops) is inefficient and is strongly discouraged both with pandas and Terality. For example, don't do:

sum([x^2 for x in series])

By doing so, you're not taking advantage of the vectorization (parallelization) of pandas computations.

When running this code with Terality, the iteration is done on your computer instead of running it in parallel on the Terality cluster. In this scenario, not only you are not benefiting from pandas vectorization, you are also missing on Terality's parallelized computations.

Use a built-in function instead. In this example, you could use Series.pow.

Avoid iterating manually over structures

Don't access the elements of a structure in a loop. For instance:

# Don't do this! this is even slower than the previous example
sum = 0
for i in range(len(series)):
    sum += series[i]

This code will make an API request to the Terality cluster for each iteration. This will either be extremely slow, or get you temporarily blocked from Terality API from sending too many requests. So please avoid this at all costs!

Best practice: Using Terality's structures API

Instead of iterating over Terality's structures rows, you should use their methods as much as possible. For instance, the above for-loop used to compute the sum of a Series should be replaced by the Series.sum method, which will run significantly faster.

sum = series.sum()

Last updated