You’d think that this would be quite an easy topic, after all, it was Harry Gordon Selfridge, the founder of Selfridges department store in London, around 1909, who said “The customer is always right”.

Technical Contact: Ben Mawhinney – ben@dronelab.io
Our main stack is MERN (MongoDB, Express, React, Node), using the Express framework. Other languages such as Python are used for parts of the system where appropriate. MySQL and Mongo are used for our main data store, with Redis used for caching. Everything is hosted on Amazon Web Services with Cloudflare as the CDN and Argo where required. As every website and application are different, we look at the requirements of each project before putting it on the most appropriate service. We will use LightSail when the requirements are for a relatively lightweight deployment, all the way up to dedicated servers for high bandwidth/high traffic applications.
The application and data can be replicated in a range of locations if the requirement is for performance and redundancy. The main data and application location are usually hosted in either London, UK or Dublin, Ireland with replication across a number of availability zones if required. Each availability zone runs on its own physically distinct, independent infrastructure, and is engineered to be highly reliable.
We offer 99.98% uptime.
TAC configure automated daily backups on a per site/ per environment basis.
Typically, we will schedule a rolling database backup for each site (every 5 minutes), and these backups are encrypted and stored securely within Amazon S3.
From these backups ensure we can restore a website to any point within the past seven days in the event of data corruption.
This means that for a typical Drupal deployment (for example), the servers are backed up on a daily basis and can be rebuilt from Github using our CircleCI deployment mechanism.
In order to meet our 99.98% availability target, our databases are usually located in multiple availability zones for added stability and redundancy. Again, this is dependent on the requirements set out by the client and the business needs.
This typical configuration means that we are able to operate a highly available stack. In the event that one server goes down, the ‘spare’ server will pick up the load. This also allows for seamless upgrades of the environment.
We use alerts against the load balancer to identify any availability issues as they occur which are sent to the team via Slack and/or SMS.
A monthly uptime report is issued for each site as part of the ongoing support deliverables.
Within AWS we operate within our own Virtual Private Network that keeps us separated from the other infrastructure they host. We operate rate limiting on our API to ensure we are protected against DDOS.
To ensure no-one gets access to this infrastructure we operate a no SSH, FTP, or SFTP policy for our servers and have no way to directly access them or data. We gracefully strip down machines and reprovision them if a change is required.
Within AWS we operate IAM roles to restrict our staff members from gaining access to unwanted parts of the system as well as keeping a log of all access activity.
Within the company we operate a number of backup strategies:
All requests to our API are made via SSL. Dashboard logins and requests are passed securely using HTTPS. Dashboard passwords are encrypted using bcrypt. Our data sources are inaccessible to the public internet and are stored securely with AWS. Whether the messages sent through the platform are encrypted depends on the messaging platform used.
We use test data in our dev and staging environments.
Internal access to the databases is disabled by default, and access is only granted when a problem occurs that requires access.
In the event we require access to the database the only people to do so would be internal employees as we explicitly do not allow any external contractors access to this at any point.
As a matter of course we do not require reviews of our user accounts as they are not given access to any part of the system they don’t need to complete their current task. At the start of each assignment we ensure they have access to what they need to complete it, at the end of this process it is revoked again.
IAM roles within our hosting infrastructure are removed when an employee leaves and outside contractors never receive access to infrastructure or codebase.
All account passwords require at least one uppercase letter, one lowercase letter, one number, one non-alphanumeric character with a minimum length of 12 characters. The passwords are changed every 30 days. 2 Factor Authentication is also required, along with the password.
Mark Middleton is our Managing Director and Chief Technology Officer and is the only member of the team that will have complete access to all portions of the systems we allow access to.
The majority of our engineering team has access to a major portion of the administrative systems for good reason. We limit certain access to other individuals using IAM roles, and general staff have no access to the internals of the system at all.
All accounts we provide to our team are separate, nothing is shared and all access to our infrastructure is logged if and when they access it.
Each login and action performed is logged for any infrastructure or code changes. Direct server or database access is not allowed.
We have a policy of zero remote access to live systems and data, in the event of an issue with our servers we gracefully tear them down and rebuild them on demand.
As our platform is hosted in the cloud the only people to have direct physical access to our servers or data is Amazon; however even if they were to attempt to access the virtual machines within their physical boxes all elements are locked down.
Data is backed up to our secure cloud. We also have plans in motion for extra redundancy by storing multiple regular copies of data across multiple locations. For example, we will use Amazon’s Glacier storage for data that is older than three months old, and is infrequently accessed.
We perform regular code reviews on a day to day basis and run regular monthly audits internally of our cloud systems.
Data is backed up daily and kept for 3 months. Again we have plans in motion to retain more granular backups for a longer period.
We offer no public services for access to raw data other than via the dashboard and the export functions available there, but we are open to agreements on a client by client basis.
While this has never happened we perform quarterly tests to ensure our response to such incidents if/when they happen are well rehearsed. Our protocol is as follows:
We routinely test our applications for vulnerabilities using external auditors who use a hybrid method of penetration testing to monitor infrastructure to find out if a breach can occur and ensure that any loopholes are closed before a system reaches production.
We categorise any reported issues into three different classes of severity and all reports are sent directly to an operations engineer that assesses and assigns the report. Outside of work hours alerts are rotated between team members every week and we operate systems to alert them of such incidents.
As production issues are the most common type of problem relating to downtime for any company, only operation engineers that are able to rectify these problems are assigned to out-of-hours support.
An example of our SLA structure is as follows and can be tailored on a customer to customer basis:
We update our platforms frequently and updates occur with zero downtime, but we occasionally schedule maintenance periods for major updates. This is timed to minimise impact and will only happen when absolutely necessary. You will be notified about this in advance.