The Top Factors that Contribute to Search Database Latency

Search databases are data repositories that allow efficient and fast search and are typically used by search engines to provide relevant results, based on search queries.

For the past years, the usage volume of search databases has greatly increased due to the rise in popularity of web services and advancements in search technology. This rise in popularity led to new technological challenges, driven by the need to supply higher volumes of demand.

Search latency, the amount of time it takes for a database to process a request and return a response, is one of the main challenges faced by search databases. While low latency allows better user experience and improved search accuracy through additional logic, high latency can have negative effects, such as poor user experience and incorrect decisions in real-time applications (i.e. in e-commerce and financial trading).

This article outlines some of the primary factors to search database latency and offers methods for enhancing search engine performance and improving user experience.

Internal and External Latency

Database latency refers to the amount of time it takes for a user to receive a response after submitting a request. This definition is directly affected by the database performance (internal latency), but also by network-related factors such as network parameters, the quality of service of the client's ISP, network congestion, bandwidth, and the physical distance between the client and the database. Consequently, this interpretation of latency is often called "external latency" and comprises both the internal latency of the database as well as contributions stemming from the network.
When analyzing the database performance, it is more relevant to focus on the internal latency, which is the amount of time it takes for a request to be processed and for a response to be returned within the database itself. This latency is only affected by the database properties and although only partially describes the end user's experience, this definition is a more loyal description of the database performance and will be our focus here.

Contributors to Database Search Latency

Search latency is caused by a range of factors, some of which are challenging to control, while others are relatively straightforward to address. We will review some of the factors and when possible, propose common practices to improve them.

Data Partitioning Strategy

Data partitioning, also known as sharding, is a popular strategy used in search database management to improve performance and scalability. The basic concept is to divide large databases into smaller, more manageable subsets called shards. Each shard contains a subset of the data and is stored on a separate server. When a user submits a search query, the system routes the request to the shard which stores the relevant data. By distributing the data across multiple servers, data partitioning reduces the workload on individual servers, which in turn improves the overall performance and response time of the system. In addition, data partitioning allows easy horizontal scalability as new servers can be added to the system to accommodate the growing volume of data and user requests. Overall, data partitioning is a powerful technique that allows search database systems to handle large amounts of data while maintaining optimal performance and scalability.

The database volume in each shard and the distribution across nodes strongly affect the time it takes to scan the database and retrieve results. This can happen in multiple ways. For once, when a query retrieves data from multiple shards, it must await the response from all shards before returning the results. If some of the shards lack the proper resources, this can lead to increased latency. Moreover, if the data is not evenly distributed across the shards, some shards may become overloaded with queries, while others remain idle, resulting in even greater query latency and slower search times. Latency due to these factors can be reduced by proper resource allocation and load balancing. A few common practices in this context are:

Optimizing the software infrastructure and minimizing the number of round trips required to process each query. One such approach is using a load balancer that routes queries to shards based on their geographic location or availability. Rewriting the query to minimize the number of required shards can also be helpful.
Distributing the data consistently across shards will help equalize the response time of the shards and reduce the overall latency of each search query. This can be done using a consistent hashing algorithm that distributes the load evenly and by a periodic rebalancing of the shards to ensure equalized load per shard.
Applying auto-scaling techniques that dynamically allocate resources based on the current load can ensure that sufficient resources are available per query when the traffic increases. This can help ensure that each query has sufficient resources allocated to it. One can also optimize the query execution plan to minimize the number of resources required to process each query.
Using caching mechanisms can reduce the load on the database and accelerate the search. For example, pre-computing results, storing frequently accessed data in memory, and increasing the volume of the index can significantly reduce the latency per query.

Geographical Distribution

The geographical distribution of data has a significant impact on search latency, due to the distance that the request has to travel between the user and the database server.

If the database is located away from the user, data has to pass through multiple routers and network nodes before reaching its destination, where each hop adds a slight delay.

On the other hand, if the database server is located closer to the query origin, the request will have to travel a shorter distance, resulting in lower latency.

By optimizing the geographical distribution of data and placing database servers closer to their users, organizations can improve the performance of their databases and provide a better user experience. This can be particularly important for applications that require low latency, such as real-time data processing or interactive web applications. This optimization can be achieved by any of the following:

Determining the geographic location of the search users and selecting a cloud provider that has data centers in these locations can improve latency.
Replicating the database across multiple regions can ensure that users can access the data from the nearest data center, reducing the latency of the search requests.
Implementing a content delivery network (CDN) that caches the data at distant locations, can improve the speed of data transfer.
Cloud services providers (i.e AWS) have servers across different regions, divided into availability zones within those regions. Commonly, the cloud service providers have SLA on network latency between instances in different availability zones within the same region. Thus, optimizing the data partitioning according to availability zones can improve latency.

Java Virtual Machine (JVM) Performance

The currently leading search databases (i.e. Elasticsearch, Apache Solr, and Casandra) use JVM to interpret and execute the Java code. As a result, the latency is strongly affected by the JVM performance. For example, if the JVM is experiencing high CPU usage or garbage collection pauses, the response times will reduce and the latency will correspondingly increase. The JVM latency is also strongly affected by the heap, the memory space where Java objects are stored. If the heap is too small, the JVM will spend more time on garbage collection, which may increase latency. On the other hand, if the heap size is too large, it can cause memory fragmentation and reduce performance. Several practices can improve the latency caused by the JVM:

Updating to the latest Java version, which often includes performance improvements, can improve the latency.
Tuning the JVM heap size can optimize the contribution of the garbage collection while preventing memory fragmentation.
Using a lower pause time garbage collector can have a significant impact on JVM latency by reducing the time it takes to perform each garbage collection.
Optimizing the code can help reduce the number of logic operations and improve latency. This can include techniques such as reducing the number of object allocations, minimizing the use of synchronized blocks, and using efficient algorithms.
A just-in-time (JIT) compiler can frequently execute the code into native machine code and reduce the time it takes to execute the search query code and improve latency.

Hardware Limitations

The hardware used to run the search query and store the data can impact the speed and responsiveness of the service, as well as the ability to scale to meet demand. If the hardware is underpowered or outdated, it may struggle to keep up with the demands of the database, resulting in slow search times or even service outages.

This hardware limitation on latency can be mitigated in the following ways:

Use cloud-based services that offer scalable and flexible infrastructure. Cloud providers commonly offer a variety of services, such as auto-scaling and load balancing, that can help ensure that hardware limitations do not impact search performance. Additionally, cloud providers typically use modern hardware and infrastructure, which can help ensure optimal performance and reliability.
Optimize the database. For example, by making use of in-memory databases or utilizing distributed database architectures that can help distribute the load across multiple nodes or clusters.
Invest in more powerful hardware or infrastructure to support high-volume or high-complexity search queries.

Application Design

The design of the search application can have a significant impact on latency. If an application is not optimized to use the database efficiently or generates inefficient queries, it can result in slow search times and decreased performance. Some of the approaches to mitigate this effect are:

Implement query optimization techniques, such as indexing or caching.
Design the application to take advantage of database connection pooling. Connection pooling allows multiple connections to a database to be reused, which can help reduce the overhead of establishing new connections and improve performance.
Using asynchronous programming can help reduce the latency due to waiting for database queries to complete, by allowing other tasks to run simultaneously while the query is running.
Using distributed database architectures and implementing horizontal scaling techniques to distribute the load across multiple servers.

Summary

In this article, we discussed the importance of low search database latency and the main contributors to latency. We offered some optimization strategies to achieve this goal to achieve faster and more efficient application performance.

Some of the key optimization strategies, such as improved indexing, saving multiple copies of the data, and HW upgrade, include a trade-off between latency and storage volume and higher maintenance costs.

If the database is correctly designed, each of the proposed methods is likely, to have non-dramatic effects on most use cases, but the overall contribution of the mentioned factors can be quite significant, if not properly handled.