Building scalable, reliable, and secure RESTful services with Dan Diephouse
This afternoon, I attended Dan Diephouse's talk on RESTful services. Below are my notes from his presentation.
Roy Fielding coined the term REST (REpresentative State Transfer) in a thesis. Everything is a resource and are addressable via URIs. Resources are self descriptive and manipulated with verbs via a uniform interface. We don't want keys - we want links! Resources are hypertext. Hypertext is just data with links to other resources. Data model refers to other application states via links. This is possible because of the uniform interface!
REST and HTTP
There's 5 main HTTP methods: GET, POST, PUT, DELETE and HEAD. Get is cacheable and safe (there's no side effects). POST is an unsafe operation and can't be repeated. HEAD is used to retrieve a resources metadata, without getting the method body.
To create a new resource, you use POST. The server will return an HTTP 201 (Created) with a Location header. After that, you'll do a PUT to the location and you'll get back an HTTP 200. Another option is to have the client generate a unique id and PUT to it straight away - instead of doing the POST/PUT - where the POST generates the unique URL.
The biggest problem with REST is firewalls. Many firewalls don't allow PUT or DELETE. Google fixes this by adding a header that specifies a method override.
One of the constraints of REST is all communication is stateless. Session state is kept on the client. The client is responsible for transitioning to new states. States are represented by URIs. The advantage is this improves visibility, reliability and scalability. You don't need to replicate session state on your services in a cluster.
ETag header
Resources may return an ETag header when it is accessed. On subsequent retrieval of the resource, client sends this ETag header back. The client can then use a "If-None-Match" header with the ETag value to communicate with the server. The server will send back a 304 (Not Modified) with no body if nothing has changed. LastModified is a similar header that servers send back. The client will then send a "If-Modified-Since" header and get a similar result.
REST allows scalability through Caching - a.k.a. "cache the hell out of it". There's 3 types of cache:
- Browser
- Proxy
- Gateway
How does caching actually work? A resource is eligible for caching if:
- The HTTP response headers don't say not to cache it
- The response is not authenticated or secure
- No ETag or LastModified header is present
- The cache representation is fresh
Is your cache fresh? Yes, if the expiry time has not been exceeded and the representation was LastModified a relatively long time ago. If it's stale, the remote server will be asked to grab a new copy of the resource and send it back to you.
HEAD allows you to get metadata about a resource without getting the resource itself. You can use it to test that a resource exists, that a link is valid or to check when a resource was last modified.
There's an "Expect 100 Continue" header you can use to query the server to see if it's capable of receiving a message. The nice thing about this is you get client-server communication before sending the message body. An example was provided where you upload an image from your cell phone. You don't want to start sending the message body if authentication is required. Your phone can check for "100 continue" to determine if it should start uploading the file.
For doing batch operations with REST, you can use HTTP connection pipelining. Unfortunately it's broken by some firewalls. Another option is to POST a whole set of data. GData (an extension to the Atom Publishing Protocol) fixes this by allowing you to post a whole bunch of entries at once. Unfortunately, this approach has received a very cold reception from the community.
The Atom Publishing Protocol is a RESTful protocol for building services. You can use it to create, edit, delete entries in a collection. It's an extensible protocol; examples include paging extensions, GData and OpenSearch.
Why should you use APP for your application? Because it provides ubiquitous elements which have meaning across all contexts. You can leverage existing solutions for security (HTTP Auth, WSEE, Google Login, XML Signatures and Encryption). It also eliminates the need for you to write much of the client/server code. Alternatives to APP include HTTPD, Java (Servlets, Restlets, Spring MVC), Ruby on Rails and many others.
Limitations of REST are HTTP is NOT an RPC or message passing system. It also isn't as secure as other solutions. A lot of times, folks just use SSL and basic authentication - which isn't the most secure system.
Dan posted his presentation on his blog if you'd like to download it.
Posted by Stephan on November 16, 2007 at 10:59 AM MST #
Posted by restful on January 01, 2008 at 04:25 AM MST #
Posted by Binuraj on July 30, 2013 at 01:25 PM MDT #