This event has ended. View the official site or create your own event → Check it out
This event has ended. Create your own
View analytic
Tuesday, March 28 • 15:10 - 15:50
Using LLVM in a scalable, high-available, in-memory database server

Sign up or log in to save this to your schedule and see who's attending!

In this presentation we would like to show you how we at SAP are using LLVM within our HANA database. We will show the benefits we have from using LLVM as well as the specific challenges of working in an in-memory database server. Thereby we will explain the changes we have to do in the LLVM source and why we have a significant delay until we can move to the latest LLVM version.

A key differentiator of a compiler integrated into a server compared to a standalone compiler is that within the server you may not crash whatever input you get. Even in out-of-memory situation you have to stop and cleanup your current work and return back to your starting state. This is doable but requires to immediately assign all resource allocations to an owner and to take special care when working at the edge of C++ memory handling e.g. when overloading operator new. About two thirds of the changes to LLVM we are doing on our version of the LLVM source are related to out-of-memory situations.

Within the HANA database we use LLVM to compile stored procedures and query plans. For stored procedures several domain specific languages are available which are translated to LLVM IR via an intermediate language. The domain specific languages have powerful features and through the layered code generation the resulting LLVM IR code can become rather large. Furthermore, within our domain specific languages all code is often put into one function which results in having one large function in the LLVM IR. Since the runtime of many optimizer passes and of the register allocator increases non-linear with the size of the functions our compile times exploded up to many hours. To reduce the compile time we are now trying to split large functions automatically into smaller pieces.

In contrast when compiling query plans to machine code the resulting functions typically have small to medium size. The overall response time of the query is determined by the compile time of the query plan plus the execution time of the resulting machine code. So in this scenario the compile time for small and medium sized functions becomes important, sometimes it exceeds the actual execution time. If the time to execute a query without compilation is X microseconds per data row and the time to compile the execution plan of the query is Y microseconds then you need to process Y/X data rows to amortize the cost of compilation. We made several tries to speed up the compilation by reducing the number of optimization passes but are currently stuck at the actual machine code generation. Currently our break-even point between interpreted execution and compiled execution is at about 10.000 data rows.

The key factor why we are happy to use LLVM is the excellent quality we experienced. We use LLVM for 6 years and we had less than a handful issues which were caused by bugs in LLVM. Also when upgrading from one LLVM version to another we did not experience new bugs (besides handling of out-of-memory situations). Further we like the available traces and supportability features to track down problems that occur, the easy to consume APIs and we are very pleased that it is possible to generate debug info for the compiled code so debugging with GDB and profiling is possible even when we have a mixture of C++ and LLVM stack frames.


Markus Eble

Product Owner LLVM Dev, SAP SE
I am working in the technology area of SAP since 18 years, the last 6 of it on compiling database operations into machine code using LLVM within the SAP HANA database. | Interested in working on compiler technology at SAP? | Contact me or take a look at http://www.sap.com/germany/careers-thecore.

Tuesday March 28, 2017 15:10 - 15:50
HS002, E1 3

Attendees (15)