Can Spark apps support throughputs of over 1 GBps?
- Zak Goichman
- Dec 11, 2022
- 1 min read
Updated: Dec 13, 2022
Past behavior doesn’t indicate future performance is a trite saying often heard in connection with the stock market but many companies nowadays are taking a shot at the holy grail of the ultimate trading bot.
For this purpose deep learning algorithms are combined with a plethora of data collected from the most varied sources to try and provide that crystal ball experience. Data collection would seem like the easier task of the two but given the extent and variety of sources that too requires meticulous planning to provide for redundancy, consistency and data integrity.
Spark is the platform of choice for such matters as snowflake and other managed platforms provide less capability for a higher cost and when the stream of data consists of hundreds of megabytes of data per second costs can climb rapidly.
The system we built on top of spark was a sprawling app capable of loading plugins in run mode and downloading any kind of data be it from ftp, http, file system or a messaging queue source such as kafka. That data could also be processed in any format for which a plugin was provided and even a convolution of types, e.g. protobuf zipped in gzip.
The download would occur in a separate system and then passed back to the spark app for further processing.
All in all we managed to reach speeds of up to 1GBps from multiple sources guarded by multiple alerts and a redundancy mechanism.




Comments