Elasticsearch for big geotemporal data

O.S. Zhyrenkov, A.Yu. Doroshenko

Abstract


An exponential growth in the volume and complexity of geospatial data, driven by advances in GPS technology, mobile devices, and Internet of Things (IoT) sensors, has created an urgent need for scalable and efficient solutions for storage and query processing [1]. This paper proposes improvements and query response optimization in a scalable solution based on the open-source DBMS Elasticsearch (open source nosql document based database)[3] by using hierarchical spatial indexes grounded in the nested H3 hexagonal grid[16]. An overview of Elasticsearch’s distributed architecture is provided, along with practical recommendations for optimizing storage and response times, focusing on sharding, replication, and specialized data types (geo_point, geo_shape) to handle large spatiotemporal datasets. Modern indexing methods are presented—H3 hexagonal grids for uniform space partitioning, BKD trees for point indexing, and R-trees for complex geospatial objects— with details on their contributions to performance enhancement. An experimental evaluation of the proposed approach is carried out using the public CityTrek-14K dataset, which contains automotive trajectory data. The tests compare DBMS response times for classic polygon-based searches with searches at different H3 index resolutions. The results confirm that high-resolution indexing significantly reduces query times while balancing accuracy and resource usage. Furthermore, observations show more consistent response times with H3 indexes versus greater variability under classic polygon-based searches. These findings demonstrate that the proposed approach complements Elasticsearch’s scalable and flexible architecture, making it a powerful and adaptable platform for handling complex spatiotemporal workloads with potential for real-time machine learning and deeper data analytics.

Prombles in programming 2025; 1: 55-62


Keywords


Elasticsearch; geospatial data; distributed architecture; H3 indexing; BKD tree; R-tree; performance optimization; geotemporal data; trajectories

Full Text:

PDF

References


Omar Alqahtani, O. Alqahtani, Omar Alqahtani, Tom Altman, and T. Altman, "A Resilient Large-Scale Trajectory Index for Cloud-Based Moving Object Applications", Applied Sciences, vol. 10, no. 20, p. 7220, 2020, doi: 10.3390/app10207220

M. M. Alam, L. Torgo, and A. Bifet, "A Survey on Spatio-temporal Data Analytics Systems", Mar. 17, 2021, arXiv: arXiv:2103.09883. doi: 10.48550/arXiv.2103.09883

C. Gormley and Z. J. Tong, Elasticsearch: The Definitive Guide. 2015 [Online]. Available: https://www.amazon.com/Elasticsearch-Definitive Distributed-Real-Time-Analytics/dp/1449358543

T. T. T. Ngo, D. Sarramia, M.-A. Kang, and F. Pinet, "A New Approach Based on ELK Stack for the Analysis and Visualisation of Georeferenced Sensor Data", SN computer science, vol. 4, no. 3, pp. 1–21, Mar. 2023, doi: 10.1007/s42979-022-01628-6

J. Ding, V. Nathan, M. Alizadeh, and T. Kraska, "Tsunami: A Learned Multi-dimensional Index for Correlated Data and Skewed Workloads", Jun. 23, 2020, arXiv: arXiv:2006.13282, doi: 10.48550/arXiv.2006.13282

F. García-García, A. Corral, L. Iribarne, M. Vassilakopoulos, and Y. Manolopoulos, "Efficient large-scale distance-based join queries in spatialhadoop", Geoinformatica, vol. 22, no. 2, pp. 171–209, Apr. 2018, doi: 10.1007/s10707017-0309-y

J.-H. Shen, J.-H. Shen, C. T. Lu, C. T. Lu, M. Y. Chen, and N. Y. Yen, "Grid-based indexing with expansion of resident domains for monitoring moving objects", The Journal of Super computing, vol. 76, no. 3, pp. 1482–1501, Mar. 2020, doi: 10.1007/S11227-017-2224-2

A. Abhishek and S. Senthilnathan, "Bucket based distributed search system", Jan. 17, 2019.

R. Li et al., "TrajMesa: A Distributed NoSQL Storage Engine for Big Trajectory Data", in 2020 IEEE 36th International Conference on Data Engineering (ICDE), Dallas, TX, USA: IEEE, Apr. 2020, pp. 2002–2005. doi: 10.1109/ICDE48307.2020.00224

F. García-García, A. Corral, L. Iribarne, M. Vassilakopoulos, and Y. Manolopoulos, "Efficient large-scale distance-based join queries in spatialhadoop’, Geoinformatica, vol. 22, no. 2, pp. 171–209, Apr. 2018, doi: 10.1007/s10707017-0309-y

T. Gu, K. Feng, G. Cong, C. Long, Z. Wang and S. Wang, "A Reinforcement Learning Based R-Tree for Spatial Data Indexing in Dynamic Environments", Oct. 11, 2021, arXiv: arXiv:2103.04541. doi: 10.48550/arXiv.2103.04541. "Pandey et al. - 2020 - The Case for Learned Spatial Indexes.pdf", 2020.

H. Zhang et al., ‘Construction and Application of Place Name and Address Management System Based on Elasticsearch’, The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. XLVIII-4–2024, pp. 571–576, Oct. 2024, doi: 10.5194/isprs-archives-XLVIII-4-2024-571-2024

X. Shi, "Elastic cloud computing architecture and system for heterogeneous spatiotemporal computing", ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, pp. 115–119, Oct. 2017, doi: 10.5194/ISPRS-ANNALS-IV-4-W2-115-2017

P. M. Dhulavvagol, V. H. Bhajantri, and S. G. Totad, "Performance Analysis of Distributed Processing System using Shard Selection Techniques on Elasticsearch", Procedia Computer Science, Jan. 2020, doi: 10.1016/J.PROCS.2020.03.373

M. R. Vieira, P. Bakalov, E. Hoel, and V. J. Tsotras, "A Spatial Caching Framework for Map Operations in Geographical Information Systems", in Mobile Data Management, Jul. 2012. doi: 10.1109/MDM.2012.12

Agarwal et al. "H3: A Hexagonal Hierarchical Geospatial Indexing System". Proceedings of the ACM SIGSPATIAL doi: 10.1145/12345


Refbacks

  • There are currently no refbacks.