Monday, March 2, 2015

Best practices for CDN friendly applications


Faster content delivery is essential for any application. There are numerous studies which conduct deeper analysis of impact of web page response time on Users.
Response can be enhanced by caching contents. But freshness of content becomes a challenge. If the application is not designed and planned correctly then the problem could be a nightmare for content editors, users, and everyone.
Application architect should carefully design the application to strike a balance between content caching and freshness. There are guidelines (only a handful) which could help immensely reduce the confusion and frustration by the content editors, and developer alike J
It is imperative to understand different systems which could cache the content.
1.      Content is cached in multiple layers by different software
Below is a picture where majority of the Content cache takes place.
2.      How the content is cached?
TTL in the headers determine the time for which the content needs to be cached. Please note that TTL is a concept. The exact header values are a mix of various variables and they have priorities as defined in the RFC. But the primary one is Cache-Control: max-age=<number>, Expires, Last Modified, etc are play a role in determine the validity of the cache.
Here is one method to calculate…don’t get overwhelm J:

 
current_age = max(max(0, response_time - date_value), age_value) + now - request_time;
Primary there are two classifications of validation rules:
a.      Should the software (browser, proxy, cdn) should make a call to upstream software for fresh content?
                                                    i.     Expires, Cache-Control, etc. is used to determine this.
b.      Should the software (cdn, web/app server) serve fresh content?
                                                    i.     304 is returned by upstream software (cdn, web/app servers)
Please note that once the content is downloaded and cached by a browser, there is nothing you can do to flush or purge the content…because browser wont even request for fresher content based on the TTL. This is the most pain point. Because there are ways to purge content in all the layers except various proxy servers and the browser.
3.      Who sets the headers?
In a typical web app, the headers can be set by either the application code running in application, or by application server, or by web server, or by CDN. Usually the software in the above layer respect the headers set by the lower layers. Conspicuously headers set by the code are respected by the above layers. You get highest level of flexibility also when you set the headers at the code level. You have total control of each and every asset’s headers. However, it is complex to visualize in a single pane and if not adequately documented could become night mare for maintenance and could impact the freshness and caching ability of different sections of the application.
4.      Guidelines to set the content
a.      Have a default TTL. Usually 7 days is the default.
b.      You could have TTL based on file extensions.
c.      You could have TTL based on other attributes but this is where different vendors provide different types of features or rules. You could have very sophisticated rules J but I would suggest against usage of complex rules. Difficult to maintain, comprehend, and also could impact your performance because the rules are going to fire on each request depending on the configuration.
d.      Classification of websites (could be applicable to any HTTP assets such as REST call) based on the content freshness requirement helps set the tune with developers(DEvOps) and content editors.
e.      Application code and release process
                                                    i.     Dynamic and static content segregation
1.      Separate out different sections of the page content by the nature of the content.
a.      Json: As an illustration if a home page has a feed, announcement, events, and some images then have at the least four different json calls and render the content in the client side. Have each call follow a unique uri pattern. Apply TTL at those unique URL pattern e.g. */five/*, */30/*, */1day/*, etc. This way different part of the website can have different caching strategy.
b.      Query string: Use query string to bypass caching. Note that if the TTL is not properly set, URI with query string will also be cached thought it is good for one time usage.
                                                   ii.     Asset naming convention
1.      Create a unique name:
a.      appending query string (e.g. mycss.css?v=1)
b.      unique identified in the name (e.g. mycss.v1.css) This format is preferred as it gives you easy way to access, download, and save content.
c.      Handle the naming convention either in build or CI tool. Don’t try to save each of the versions in the source control because if you do you are not using the revision featureJ
                                                  iii.     Align code release management process with the TTL.
                                                  iv.     Plan to reduce TTL before any planned maintenance.
Avoid unplanned maintenance because you WILL run into content caching problem if the content is cached at the client side (browser)

No comments:

Post a Comment