Building LinkedIn's Next Generation Architecture with OSGi by Yan Pujante
This week, I'm attending the Colorado Software Summit in Keystone, Colorado. Below are my notes from an OSGi at LinkedIn presentation I attended by Yan Pujante.
LinkedIn was created in March of 2003. Today there's close to 30M members. In the first 6 months, there was 60,000 members that signed up. Now, 1 million sign up every 2-3 weeks. LinkedIn is profitable with 5 revenue lines and there's 400 employees in Mountain View.
Technologies: 2 datacenters (~600 machines). SOA with Java, Tomcat, Spring Framework, Oracle, MySQL, Servlets, JSP, Cloud/Graph, OSGi. Development is done on Mac OS X, production is on Solaris.
The biggest challenges for LinkedIn are:
- Growing Engineering Team on a monolithic code base (it's modular, but only one source tree)
- Growing Product Team wanting more and more features faster
- Growing Operations Team deploying more and more servers
Front-end has many BL services in one webapp (in Tomcat). The backend is many wars in 5 containers (in Jetty) with 1 service per WAR. Production and Development environments are very different. Total services in backend is close to 100, front-end has 30-40.
Container Challenges
1 WAR with N services does not scale for developers (conflicts, monolithic). N wars with 1 service does not scale for containers (no shared JARs). You can add containers, but there's only 12GB of RAM available.
Upgrading back-end service to new version requires downtime (hardware load-balancer does not account for version). Upgrading front-end service to new version requires redeploy. Adding new backend services is also painful because there's lots of configuration (load-balancer, IPs, etc.).
Is there a solution to all these issues? Yan believes that OSGi is a good solution. OSGi stands for Open Services Gateway initiative. Today that term doesn't really mean anything. Today it's a spec with several implementations: Equinox (Eclipse), Knoplerfish and Felix (Apache).
OSGi has some really, really good features. These include smart class loading (multiple versions of JARs is OK), it's highly dynamic (deploy/undeploy built-in), it has a service registry and is highly extensible/configurable. An OSGi bundle is simply a JAR file with an OSGi manifest.
In LinkedIn's current architecture, services are exported with Spring/RPC and services in same WAR can see each other. The problem with this architecture comes to light when you want to move services to a 2nd web container. You cannot share JARs and can't talk directly to the other web app. With OSGi, the bundles (JARs) are shared, services are shared and bundles can be dynamically replaced. OSGi solves the container challenge.
One thing missing from OSGi is allowing services to live on multiple containers. To solve this, LinkedIn has developed a way to have Distributed OSGi. Multicast is used to see what's going on in other containers. Remote servers use the OSGi lifecycle and create dynamic proxies to export services using Spring RPC over HTTP. Then this service is registered in the service registry on the local server.
With Distributed OSGi, there's no more N-1 / 1-N problem. Libraries and services can be shared in one container (memory footprint is much smaller). Services can be shared across containers. The location of the services is transparent to the clients. There's no more configuration to change when adding/removing/moving services. This architecture allows the software to be the load balancer instead of using a hardware load balancer.
Unfortunately, everything is not perfect. OSGi has quite a few problems. OSGi is great, but the tooling is not quite there yet. Not every library is a bundle and many JARs doesn't have OSGi manifests. OSGi was designed for embedded devices and using it for the server-side is very recent (but very active).
OSGi is pretty low-level, but there is some work being done to hide the complexity. Spring DM helps, as do vendor containers. SpringSource has Spring dm Server, Sun has GlassFish, and Paremus has Infiniflow. OSGi is container centric, but next version will add distributed OSGi, but will have no support for load-balancing.
Another big OSGi issue is version management. If you specify version=1.0.0
, it means [1.0.0, ∞]
. You should at least use version=[1.0.0,2.0.0]
. When using OSGi, you have to be careful and follow something similar to Apache's APR Project Versioning Guidelines so that you can easily identify release compatibility.
At LinkedIn, the OSGi implementation is progressing, but there's still a lot of work to do. First of all, a bundle repository needs to be created. Ivy and bnd is used to generate bundles. Containers are being evaluated and Infiniflow is most likely the one that will be used. LinkedIn Spring (an enhanced version of Spring) and SCA will be used to deploy composites. Work on load-balancing and distribution is in progress as is work on tooling and build integration (Sigil from Paremus).
In conclusion, LinkedIn will definitely use OSGi but they'll do their best to hide the complexity from the build system and from developers. For more information on OSGi at LinkedIn, stay tuned to the LinkedIn Engineering Blog. Yan has promised to blog about some of the challenges LinkedIn has faced and how he's fixed them.
Update: Yan has posted his presentation on the LinkedIn Engineering Blog.