Ultimate Guide WebSite Monitoring

What is Website Monitoring?

Website Monitoring is the use of software and automation to verify that a website is performing correctly. Website Monitoring encompasses many specific monitors that check different functions or statuses, from the most vital availability monitors (is a site up or down?) to monitors that are configured specifically to check unique site functions. Monitors can be individually or in combination for a more comprehensive monitoring system.

Performance and functionality of both frontend and backend features can be monitored with different sets of tools. Most active monitoring systems follow the same pattern, performing two key actions when an issue is detected:

The first is logging the time, date & nature of the issues. This allows site owners to pinpoint the time of issues, analyze historic data for trends, and generate reports on site performance.

The second crucial function is alerting the appropriate individual(s) of issues in real time, allowing them to address critical problems as soon as possible.

 

Why Care?

Have you ever walked into a Best Buy during business hours and the business was closed? If so, not only did you probably end up heading to another store to make your purchase, but the experience with Best Buy probably left a bad taste in your mouth. The negative effects of this hypothetical situation apply similarly when a website goes down and users are unable to access it.

Although your website users don’t have to transport themselves physically to your website as in the Best Buy scenario, their online alternatives are likely greater in number and easily accessible.

Businesses, in particular online businesses, work extremely hard to obtain customer’s attention and trust. A lot of things have gone right just to get the user to the point of making the decision to visit your site. So, to have them arrive at the online equivalent of a burned down Disneyland is a very costly event, both in terms of lost sales and wasted marketing efforts. Additionally, the chances that they will return to your site in the future are significantly reduced.

To illustrate the impact of downtime, the last time that Amazon went down it cost  $66,000 per minute in company revenue. Keep in mind that this doesn’t factor in brand damage and other externalities. Moreover, based on historical trajectory, today’s cost per minute of downtime is likely significantly higher as this information is taken from an outage in 2013. (Forbes Article)

A website doesn’t even have to be fully offline to take a big hit. A drop in site speed has a significant negative effect on user interactions and conversions (more). Google themselves did a study to prove this point, showing that increasing page load speed by less than .5 seconds (.400 ms) reduced user interaction (searches) by .76% after 6 weeks (experiment page). Some may consider a 0.76% drop to be an inherent cost of the variable nature of networks. However, keep in mind the obvious fact that .76% can become a significant amount of money when your website brings in high revenue from sales or leads. Furthermore, in the case of many smaller websites, server performance can vary to a significant degree, meaning that page load speeds will often slow down by many seconds–for all intents and purposes: down–making the impact that much more significant.

Most websites (in fact 99.99_%) don’t have anywhere near the name recognition and market domination that Amazon and Google do, meaning that their repeat customers are high in number and retention is more durable after such negative instances. However, for the majority of online businesses, huge effort goes into bringing in users for the first time, and even more effort to achieve a small percentage of repeat customers.

This all boils down to one takeaway:  website performance is crucial to business performance. The negative effects of lost interactions during downtime are costly enough without figuring in the secondary effects. When the repercussions for branding, loyalty, and customer satisfaction are factored in, website monitoring becomes more than just a luxury for those who have extra resources to spare. In fact,  it’s an essential tool for everyone whose business depends on a website.

Internal Vs External Monitoring

Monitoring can be run from two locations, internally and externally:

  • Internal Monitoring – Monitoring run from within a company’s server network or specific environment. Internal monitoring typically is run for hardware, network and server-specific data (ram, cpu, etc).
  • External Monitoring – Monitoring run from outside a company’s network and server environment. External Monitoring is designed to emulate the end user’s experience of a website and will often run from multiple geographic locations to give a more accurate picture of website availability.

Self Hosted vs. Cloud Solutions

There are many web monitoring solutions out there . The solutions typically split into 2:

  • Self Hosted Solutions – Installed on internal servers to test web resources and websites. These can be either internally developed programs or pre-made solutions (open source or proprietary licensed software).
  • Cloud Solutions –  Often paid subscriptions, Cloud Solutions are external monitoring systems that are designed to monitor key aspects of site performance with minimal setup–Circid being one example. In addition to “plug-and-play” monitoring, these services can be configured to monitor unique site functions by adding or modifying a small amount of code.

Types of Monitoring

Most web monitoring can be separated into 5 major categories:

1) Active / Synthetic Monitoring

active synthetic online monitoring

Active / Synthetic monitoring performs automated tests to simulate website visits and website actions in order to check whether or not a site or application is performing. Active / Synthetic monitoring is designed to give an accurate picture of an end user’s experience of a site and is typically run by browser emulation or scripting to call website pages & website actions.

The types of checks include uptime/availability, reliability and performance.

What are we actually monitoring?

Web monitoring as the name mentions works to monitor web properties. Any type of web service can be monitored.

Most Popular Types of Web Monitoring Checks:

  • HTTP Monitoring – Monitoring system will request a web page’s HTTP Head. The monitor will then determine whether a page is available depending on the HTTP status code received in response.
  • HTTPS Monitoring – Monitoring system will request a web page’s HTTPS Head and:

Verify whether or not the SSL certificate is valid and page elements (css,js and images) are all provided by HTTPS

Verify whether or not  the page is available depending on the HTTP status code received

  • HTTP Content Monitoring – A more advance version of above HTTP Monitoring, a content check will request a user-provided URL and

Verify whether or not  the page is up by checking for HTTP status 200

Verify that a user-provided string (text) exists on the page

Content monitoring is useful due to the fact that URLs can return HTTP status 200 (OK) yet still be displaying incorrectly for the user. For example, if using a cms page on a platform such as wordpress, editing the script can result in errors that cause the page to go blank while still technically being up.

  • HTTP Visual Monitoring –  This form of Monitoring will retain an image of a specific URL.With each check it creates an image of the page as it’s currently displaying and compares that image to the previous or original version. If page looks different as a whole or in part, monitoring system will trigger alert based on settings in monitoring.
  • Ping Monitoring – One of the simplest forms of availability monitoring, Ping Monitoring sends an ICMP echo request to a specific interface / server. If a response is received this demonstrates that a server is operating. This is different from HTTP Montioring in that Ping’s only criteria for success is a response from the targeted server – regardless of whether it’s an error code. For this reason, Ping Monitoring can be helpful for monitoring, but is not a reliable measure of whether a site is available to users because ping does not check webpages.

 

Other Popular Active Web Monitoring Services:

  • FTP/SFTP/SSH/Telnet – Monitoring File Transfer protocol for availability and performance.
  • Mail Servers (pop3,imap,smtp) – Monitoring mail servers for availability and performance.
  • DNS – Monitoring DNS and domain name expiry for availability and performance.
  • Various Server Ports & Services (TCP, UDP) – Monitoring various TCP,UDP, ICMP protocols for availability and performance within specific ports or options.
  • Database – Monitoring Mysql and other database servers/services for availability and performance.

 

2) Real User Monitoring (RUM)

Real User Monitoring

This type of passive monitoring will help understand insights into real users experience on your website from the browser side.

The website will listen to/monitor users actions, both recording and classifying interactions with a website (similar to google analytics). Data received based on real user information is used to report specifics and offer insights into how people are actually using your site.

Very different from Active Monitoring, the insights into actual usage make RUM a very powerful monitoring tool. A major plus to this type of monitoring is that, unlike active monitoring, the different variables from a user’s end can can highlight availability issues that may not be caught by extensive testing. For example – a specific browser or operating system may be blocking certain resources from loading on page. An issue that’s very difficult to identify without software due to the number of OS & browser combinations in use around the world.

Real User Monitoring (RUM) is also an excellent tool for development of new features, helping to identify areas for improvement and QA any changes that have been implemented

What are we actually monitoring?

Every single user’s page visit and action on the site is monitored. Real User Monitoring typically monitors the following:

  • Specifics about users (location, browsers, screen size, device, internet connection, etc)
  • Page Views and Load Time of each of the Web Assets (HTML. FFTB, JS, CSS, Images, Network, DNS, Dom Loading)
  • Actions on Site (playing of videos, login, add to cart, etc)
  • Errors in Scripts and Frontend Coding  (Javascript, Menus, etc)

Once user data is being received, RUM monitoring will be able to segment reports based on combinations of user-specific data. For example, you may want to show page load times by country or by device, OR show the load time distribution based on assets on site (50% of load is image loading, 10% html, 20% css and 20% js). The ability to segment is incredibly powerful, allowing one to drill down into very specific combinations of end user factors

How does Real User Monitoring Work?

RUM requires that your website include additional coding which submits user data to the monitoring system. This is most commonly done by placing a javascript file on each and every page, giving site owner/operators  a complete picture of user-site interaction. Google Analytics is probably the best known example of RUM that follows this process.

3) Server Monitoring

Server Monitoring

This type of monitoring will monitor server hardware and software attributes including CPU, RAM, i/o, disk usage, networks, software and services on an operating system.

What are we actually monitoring?

Each of the below server elements are being monitored and alerted based on defined thresholds.

  • Disk Capacity / Usage
  • CPU
  • RAM
  • I/O utilization
  • Network & Traffic

4) Application / Service / API Monitoring

Application /API Monitoring Software

This type of monitoring tests web integrations, API’s and external services for uptime/availability , reliability and performance. This is used for websites that have additional functionality and integrations with external systems.

Common Integrations

  • ERP – Availability and Performance of Enterprise resource planning (ERP) Integrations
  • CRM – Availability and Performance of CRM Integrations
  • Inventory – Availability and Performance of Inventory Integrations
  • Shipping & Logistics – Availability and Performance of Logistics Integration
  • API’s (rest and soap) – Availability and Performance of API’s

What are we actually monitoring?

In each case, a monitor is checking a very specific function or set of functions (see above). After checking for availability, the monitoring system will perform a check unique to the function in order to test the specific extension’s performance.

5) Transaction/Action Monitoring

Automated Transaction Monitoring

Transaction / Action Monitoring is actually an expanded  form of Active Monitoring that is capable of monitoring specific user-site interactions and action sequences. Transaction monitoring tests specific flows and transactions on site as set up by the site owner/operator.

This ability to monitor a site’s most critical flows, including multiple pages and site functionalities, makes transaction monitoring one of the most comprehensive checks that a site owner can set up.

What are we actually monitoring?

Transaction monitoring actively tests any number of interactions or sequences of actions. The site owner/operator determines which ones are important to their site’s performance, and configures the monitors accordingly.

Transaction monitoring will test any number of actions that a user can make on a website.

Examples of Single Action Web Monitoring:

  • Fill out a form
  • Add a product to cart
  • Searches using an onsite search form
  • Load page second time

Examples of Mutli-Action Flow Web Monitoring:

  • Visit homepage then browse contact page and fill out contact form.
  • Internal Search – use the internal search engine to monitor search common results.
  • Checkout – View products, add to cart and proceed to checkout
  • Login / Logout – Visit login page, register and/or login
  • Visit and read 5 popular blog articles

 

Monitoring Alerts and Notifications

Application /API Monitoring Software

As mentioned above, one of the two core functions of site monitoring is alerting the right people to problems when they occur. For each type of monitoring, an alert will be triggered depending on different settings and thresholds. Furthermore, when configured correctly, monitors should send specific issues to different people. For example, if a site is fully offline, an alert might be sent to  the site administrator/manager directly whereas a site-specific function having trouble might trigger an alert to the developer responsible for building it.

Setting up the right alerts is a key part of getting the most out of a monitoring system. Not only do you need to make sure that the correct people are getting the right notifications, but it’s also important to consider how you want to prioritize your monitor alerts and how sensitive their triggers are. If you’ve only got one alert contact and he’s getting blasted with 5 emails per hour telling him that his website is taking over 400 ms to load, he’s a lot less likely to see the email that’s telling him the site is down.  

 

The Most Common Forms of Alert Include:

  • Email – low to medium priority. No matter how plugged in, nobody checks their email 24 hrs a day. In fact, receiving email overload can desensitize us, causing the alert to be less effective
  • SMS – medium priority. SMS is a solid alert system. Although not everyone checks every SMS immediately… most of us do. Furthermore, this alert will be received regardless of which email one has synced with their phone.
  • Voice Call – high priority. Phone calls are very hard to ignore. Both your unconscious mind and your phone do their best to ensure that we take calls. Most of us only turn off our phone while sleeping (if we turn it off at all), making it a solid alert method for severe issues.
  • Push Notifications – high priority. Able to notify you whenever you’re near your phone, Push Notifications are effective alerts. Of course, they depend on being supported by an app or an integration.
  • App notifications – high priority. With the ability to wake your phone and alert you, App notifications are similar to phone calls in their ability to grab your attention even if you’re outside of the office.
  • Webhooks – variable priority. Webhooks allow the monitoring software to notify external applications (e.g.  your organization’s internal communication app) of issues. This allows you to to integrate notifications into whichever application/software you or your company spend the most time with.

 

The Most Common Alerts Include:

  • Downtime of a page or service
  • Slow load speed of a page or specific resources
  • Real User error (frontend or functionality)
  • Specific function down

Data Consistency

Due to the number of systems or elements involved, web monitoring can encounter issues that result in false positives and other erroneous reporting. For example, availability is dependent on the location of the end user and the site’s servers/CDNs. For most of the web monitoring types, data consistency can be improved by adding additional server locations and incorporating independent redundant checks, to confirm that the monitor is giving the most accurate picture possible.

 

Points of Interest and Random Facts

  1. Quick review of HTML Status Codes:
  • 1xx – Informational (i.e 100 Continue)
  • 2xx – Successful (i.e. 200 Ok)
  • 3xx – Redirects (i.e. 201 Permanent Redirect)
  • 4xx – Client Error (i.e. 403 Forbidden)
  • 5xx – Server Error (i.e. 500 Internal Server Error)
  • HTTP Web Checks always look for Status Code – 200 OK
  • A basic problem that Web Monitoring systems must overcome is separating their own servers downtime from the recorded downtime of checked servers/sites.
  • Status Pages for hosting companies (e.g. godaddy status page) are examples of web monitoring tools with public data.
  • For URL and web checks, a common issue faced is 300 redirect statuses. In order to report whether the URL entered is functional for the end user they need to perform an extra step of checking the redirect URL.
  • Ensuring web uptime and high performance onsite also helps with SEO (link)
  • Web monitoring can also spot security hacks with options such as visual checks and string checks. (alert if spammy words appear)
  • Web monitoring systems only monitor and report on issues. A recovery plan and dedicated team should always be prepared for different types of response.
  • HTTP String check (text) monitoring should always check for a phrase that’s unique to only that page.
  • False Positives may be caused by CDN services (i.e. cloudflare) or web attack protection systems or software.
  • No systems are ever 100% online. Even Amazon EC2 and Google Cloud SLA’s guarantee only 99.95% percent of time.

 Any thoughts, additions, corrections or questions? We’d love to hear from you!

Leave a comment, Send us an email, tweet to @circid or find us on Facebook

Why an HTTP Downtime Monitor is Necessary

http uptime monitorWith our goal of creating products that offer simple solutions to the most common problems, we quickly decided on an HTTP monitor (is my site up or down?) as our first tool.

Why did we prioritize the HTTP check?

As more and more businesses are moving more and more of their operations online, the potential losses resulting from site downtime increase exponentially. These losses are not simply limited to lost sales on B2C e-commerce sites, with studies showing that downtime has significant negative effects on branding and other aspects of a business.

Despite growing awareness of the significance of downtime, the fact is that even large organizations with the resources for extra safeguards and redundancy go down occasionally, and a lot of smaller to mid size sites are significantly damaged by frequent downtime.

Obviously, the optimal solution is to prevent the causes of downtime before it actually happens. However, because there are so many different issues that can cause a site to go down, this is just not always possible – even for the biggest organizations with deep resources and safeguards. Site hosting, coding conflicts with the CMS, DoS attacks, and any number of simple human errors can cause a site to go offline. For this reason, an accurate site monitor is a critically important tool for anyone who has a website.

In our eyes, a useful HTTP monitor tool will perform two functions:

  1. Alert the correct person(s) as soon as the site is down so that they can take steps to get the site back online as quickly as possible. For some sites or URLs, we feel that an SMS or a voice call alert are needed to ensure that the alert is received immediately.
  2. Maintain a log of downtime so that web masters and developers can see trends and identify causes.

With these functions site owners can reduce future downtime and minimize its adverse effects by dealing with it in real time.

In keeping with our mission, our HTTP check is a simple tool with a big payoff, addressing a problem that affects a huge number of people who’re invested in website performance.

We’ll be releasing our HTTP check tool in the next month. If you’d like to try it out be sure to sign up for our newsletter  for release details!

Getting More Accurate Page Speed from Google Analytics

small sample
By default, analytics sample size for speed reports is not sufficient.

As we’ve mentioned before, page load speed is one of the most important metrics when you’re assessing how well your site is performing. This is  something that Google Analytics understands and reports accordingly.

However, Analytics uses a default sample rate of 1% on all accounts. This means that if you’re running a smaller site with relatively low page views, the sample is far too small to make the reports statistically significant.

Luckily, there’s a simple solution to this. Google’s suggestion is to increase your sample rate if your site has 100,000 or fewer daily visitors. Google’s explanation for how to set a new site sample size can be found here, but the actual steps are quite simple:

All that’s required is adding the first line of code below where the value (5) represents the percentage of page views that will be included in your sample. As shown below, the site speed line is added directly above the page view.

_gaq.push(['_setSiteSpeedSampleRate', 5]);
_gaq.push(['_trackPageview']);

To see what Google has to say about it, you can check out the below links:

Google Developers Guide

Google Analytics Site Speed Explanation

 

Simple, Effective Image Compression Tool

Working with websites, one of the most important aspects of performance is page load speed. When dealing with load speed, image files (jpg, png, etc) are often the easiest place to shave off some milliseconds (or seconds).

As a Mac user who is looking for a quick and simple way to reduce image file size, ImageOptim is a great free tool.

ImageOptim Lossless compression tool
Choose the tools you want to run on your images

Using ImageOptim allows you to use Lossless compression on your images, meaning that the file size is reduced while the image quality remains the same. Using a variety of image optimization tools, you also have the option to use Lossy compression which does reduce image quality in order to further reduce the size of your images.

As long as you know where your site’s images are, compressing them is as easy as:

  • Opening ImageOptim
  • Downloading your images or image folders from your website
  • Dropping images or image folders into the ImageOptim window
  • Replacing the files or folders on your site.

Having used this tool (in Lossless mode) on quite a few website, I overall file size reductions of anywhere from 8 to 30% with some images being reduced by as much as 70%. With that kind of reduction in file size, it’s well worth the 10 minutes it takes to run your images through the application.

A Mac application as well as an API is available at https://imageoptim.com/

New Paypal Compliance Standards – June 2016 through 2017

For everyone using Paypal on their websites, Paypal’s security upgrades will require that sites be compliant their new standards. Several of these upgrades will go into effect from June of 2016 through 2017. Make sure that your site avoids any interruptions in service by addressing the following points:
(contact your hosting company, webmaster, or IT department if you’re not sure about what they mean).
These include:

SSL Certificate Upgrade – June 17, 2016

Paypal will only work with sites that have   an SSL certificate that uses the SHA256 algorithm and the 2048-bit encryption. In other words, you need to update your SSL certificate if it uses SHA1 signing and the VeriSign G2 Root Certificate.

See Paypal for more information.

TLS 1.2  & HTTP1 Upgrade – June 30, 2017

PayPal will start requiring TLS 1.2 for all HTTPS communications with your site. At the same time, they will also require HTTP/1.1 for all connections.

See Paypal for more information.

IPN Verification Postback in HTTPS – June 30, 2017

For IPN users, paypal will require that the messages posted back to Paypal through an HTTPS url.

This is a very simple check/fix:

  1. Login into Paypal, Account > My Selling Tools > Instant Payment Notifications
  2. Check that the url listed on this page uses HTTPS. If it doesn’t, change the url.

See Paypal for more information.

Discontinue use of GET Method for Classic NVP/SOAP APIs

If you use Paypal’s APIs, PayPal will stop supporting the use of the GET HTTP request method for their classic NVP/SOAP APIs. You’ll need to ensure that your API requests only use the POST HTTP request method.

See Paypal for more information.

 

 

Hello World – Circid’s Blog

Circid logo square

Hello World, Welcome to the Circid blog!

Who We Are:

We’re a group of developers, webmasters, and marketers who’re looking to find solutions to the most common problems that are faced by anybody who depends on a website. Our goal is to make these solutions accessible to everyone, including people who don’t touch code or settings, but are nonetheless heavily invested in a website’s performance.

Our experience has shown us that it’s not staying on top of the small things that can hurt online performance. Whether it’s the inevitable human error, server downtime, or simply that there’s a better way to do things, sales, messages, and users’ relationship with a brand are all heavily affected by your websites quality and performance.

Why the Blog?

Our diverse team means that we’ve got a lot of different ways of looking at websites. Some of us are purely concerned with server performance, others are designers, and still others don’t care what the website’s doing as long as the numbers are doing the right thing each quarter. In this blog we want to answer questions for people like us: everyone. We hope to provide answers, tutorials, and new products that will address the issues that people run into. Nothing’s too big, nothing’s too small – a problem solved is a problem solved.

Also, we don’t like to admit it, but we don’t actually know everything. We hope that you’ll share any solutions to common problems with everyone here (we’ll be happy to credit you!). Or, feel free to shoot us questions and we’ll try to answer you directly or in a blog post. In fact, if there’s anything you want to talk about, just do it – get in touch!

-The Circid Team