Best practices and anti-patterns
Avoid using .apply()
Don't use df.apply()
if it's possible to obtain the same result by applying a built-in function.
apply
is a slow operation both with pandas and with Terality. It can't be optimized as much as specialized built-in functions.
Avoid iterating over structures
Iterating over rows of a structure (for x in struct
loops) is inefficient and is strongly discouraged both with pandas and Terality. For example, don't do:
By doing so, you're not taking advantage of the vectorization (parallelization) of pandas computations.
When running this code with Terality, the iteration is done on your computer instead of running it in parallel on the Terality cluster. In this scenario, not only you are not benefiting from pandas vectorization, you are also missing on Terality's parallelized computations.
Use a built-in function instead. In this example, you could use Series.pow
.
Avoid iterating manually over structures
Don't access the elements of a structure in a loop. For instance:
This code will make an API request to the Terality cluster for each iteration. This will either be extremely slow, or get you temporarily blocked from Terality API from sending too many requests. So please avoid this at all costs!
Best practice: Using Terality's structures API
Instead of iterating over Terality's structures rows, you should use their methods as much as possible. For instance, the above for-loop used to compute the sum of a Series
should be replaced by the Series.sum
method, which will run significantly faster.
Last updated