What is SawBones?

SawBones is a small web-application for analysing the contents of web pages and collating various statistics about them. Collected statistics are then returned to the user as well as being stored in a "snapshot" of the page at that particular point in time. Once a URL is provided, SawBones will collect a new set of results for the page every month from then on, essentially building up a history of statistics for the page.

Why is this useful?

The statistical history of web pages collected can provide an accurate picture of the development of a website, and indeed the Internet over time. This allows users to see what page styling and scripting methods are being used, how much they're being used, and how their implementation is has changed over time.

What kind of data is collected by SawBones?

SawBones currently collects data into two* main categories of information. Page HTML Information, and Page Media Information.

Page HTML Information

  1. Page doctype
  2. Total HTML elements
  3. Total unique elements
  4. Total common elements
  5. Total page links
  6. Common elements breakdown in graph form
  7. Page links list

Page Media Information

  1. Total page images
  2. Total HTML images
  3. Total CSS images
  4. HTML images list
  5. CSS images list
  6. CSS file list

* (SawBones is in continued development, and as such will include further information categories in the future)

Development Notes

21/10/09 - Alpha Launchy Thing

Tonight I'm putting SawBones up as a hosted project on PixelBag. Please keep in mind that SawBones is very much an ongoing development, and because of that may be buggy or not quite return the results expected from a particular page. SawBones represents the culmination of my own experiences with the HTTP protocol and recursive algorithm design in PHP. Although I've learned a great deal about the obscure ways in which people write their HTML & CSS too!

21/10/09 - Where do we go from here?

Development of SawBones from here on in involves a heavy focus on improving existing algorithms and program structure, for quick accurate analysis of pages. However, I'll also be adding support for new bits and pieces, including (in order of priority):

Support for acutally displaying collected results will be added in time, and with that development I'll be close towards completing my use case implementation for the site. Stay tuned!