Reducing memory with batching
With batching plus server-side cursors, you can process arbitrarily large SQL results as a series of DataFrames without running out of memory. Whether you get back 1000 rows or 10,000,000,000, you won’t run out of memory so long as you’re only storing one batch at a time in memory.
It’s true, you won’t be able to load all the data at once. But quite often batched processing is sufficient, if not for all processing, then at least for an initial pass summarizing the data enough that you can then load the whole summary into memory.