vRLI can be deployed in a variety of configurations but in its most basic form it is just a single virtual appliance. This limits the number of supported agent connections, events per second, event storage capacity and overall system performance. In fact all deployment types have limitations, those limitations just have higher values as more resource is added.
The recommendation for a production installation is to start with a medium sized node (see VMware docs https://docs.vmware.com/en/vRealize-Log-Insight/4.6/com.vmware.log-insight.getting-started.doc/GUID-284FC5F4-B832-47A7-912E-D407A760CAE4.html) and then scale up or out (or both), remembering that when clustering only a minimum of 3 nodes is supported. But what about when we need more from an existing deployment?
Scale Up or Out
This conundrum is never straightforward. The starting point should always be based on the numbers as with a greenfield install:
- How much data
- How many events
- How long to retain for
Once you have the data you can start to work out how the requirements can be satisfied. The “System Monitoring” view within vRLI will show you current ingestion rates on a balanced per node basis so you can get an idea of what the platform is trying to handle.
In addition to the above, a combination of the individual node numbers within the vRLI documentation (https://docs.vmware.com/en/vRealize-Log-Insight/4.6/com.vmware.log-insight.getting-started.doc/GUID-284FC5F4-B832-47A7-912E-D407A760CAE4.html?hWord=N4IghgNiBcIM4EsBeCB2BzEBfIA) combined with Steve Flanders sizing calculator spreadsheet (https://www.vmware.com/go/loginsight/calculator) can help give direction to scaling. Both of these give you a rough starting place however they use approximations and estimates. All environments are different!
External factors may also influence your decision such as NUMA node configuration, resource availability etc. For example, if your host configuration means that in order to increase an appliance size from medium to large you are going to have to cross NUMA nodes then it might be more appropriate to scale out using medium (and restrict the appliances to running within NUMA only) rather than up so that no performance overhead is encountered. Both configurations support the same number of objects (based on minimum cluster size of 3 nodes) however medium might fit better into the vSphere deployment being leveraged.