Backup at Scale – Part 1 – Linear is badness

linearIn a few technologies recently we see that, by design, performance grows linearly as building blocks are added.  In clustered systems a building block will include CPU, Memory and disk resulting in linear growth of compute performance and capacity.  In the backup world linear just doesn’t cut the mustard.

Who cuts mustard anyway?

Don’t get sidetracked with silly questions like that, use Google!  What I am trying to say is that for backup systems there is a requirement for the “work done to achieve backups” to grow significantly slower than the growth of data to protect.

Imagine a world where 1TB of protected data requires 10% of a building block of “work done”.  Where “work done” is a combination of admin time, compute, backup storage etc.  If our backup processes and technologies required a linear growth of work done then much badness occurs.  Diagrammatically…


No one would ever get to the situation described in the diagram above as they would soon realise that “this just ain’t workin’” and rethink their systems.  However the question is what should the “work done” growth look like?  It needs to be a shallower growth curve than that of the data protected and needs to slow as the capacities increase.  So we can imagine that we would want to achieve something like this:

slow growth

But how… How… HOW!?!

A number of methodologies can be employed to work towards this goal.  The first and most obvious step is to A-U-T-O-M-A-T-E (sounds better if you say it in a robotty way).

Phase 1 -Take the drudge processes (and believe me there are plenty) and automate them:

  1. Checking backup logs for failures
  2. Restarting backups that have failed
  3. Generating reports

Phase 2 – Take some of the more difficult but boring jobs and automate them too!

  1. Restore testing
  2. New backup client requests
  3. Restore requests

If your environment is at Google scale you may want to automate crazy things like purchasing, receipt and labelling of new backup media.  This is an extreme case but you get the principle, break down the tasks done in the backup process and see what you can get machines to do better and more accurately than humans.

There are plenty of people that have already done all this and many products to look at for help. Start Googling…

Is that it? – No, we will return with other methods to help backup at scale


Introducing… The Backup Storage Admin

The good old world of storage…

Typically in organisations there are two distinct roles assigned in the storage department.

1. Storage Admin – the person who provisions and supports primary storage to application or server admins

2. Backup Admin – the person who administers the backup software, tape solution and/or VTL

So where is the role Backup Storage Admin then?
Firstly we need to describe what is changing to open up the field for this new role. Traditionally all backup task and operations have been dragged through some kind of backup application. This consists of a number of components:

1. The Backup server – tracks and catalogs all backups, manages schedules and retention etc
2. Media or Storage node – performs the data movement of the data to be protected
3. Backup client – gathers and informs the server what files need backing up.

All of these components make up the backup product. They need specialist skills and training to understand how to implement them. The application guy doesn’t know the details of how NetBackup, Networker or TSM work, he understands how his application need protecting.

So what needs to change to the way backups are done to allow for this new role?

In our last post we discussed the DD Boost for RMAN feature of the EMC Data Domain. This allows the application guy to use his own backup tools to speak directly to the back end Data Domain storage. This puts the application admin in control of scheduling, retention and management of backups.

Scale this up and think about how this might develop. Imagine you had a VMWare admin who wanted to manage his own backups to a central Backup repository, or a NAS administer that wanted to point his NAS device directly at the backup storage device. Repeat the same question for any number of apps and databases.

No need for a dedicated backup software?
What we have described is that each application uses its own custom data protection method to send backups to a central backup storage device. This means that no longer is the backup admin managing all the database and file modules of a backup product, but is provisioning storage to application teams to be used for backups.  In fact the provisioning of the backup storage could be automated too so that when a service is brought online the backup storage is automatically provisioned, I will avoid using the word cloud at this point ;-)

Who will know what is backed up though!?
Good question! There would have to be a way to catalog all these disparate types of backups into a common format that could then be used to report and review backup success. Easier said than done you may think but not an impossibility by any means.

So in summary then, what would the responsibilities of a Backup Storage Admin be:
- Manage pot of storage specifically as a backup target
- Provision backup storage based on requests from application or server teams
- Manage the backup catalog
- Report on capacity trending
- Chargeback storage usage to the application or server teams

So surely this is all a pipe dream, well keep your eye on some of the stuff at EMC World 2012 and what Stephen Manley has to say here


TSMagic Unleashed!

The event at IBM Bedfont went well today, thanks to all for supporting us.  The day was essentially to layout Silverstring’s RAP framework for Storage Management Automation(SMA). 

RAPRAP stand for Reduce, Automate and Protect.  We figure these are the three results people look for from a successful SMA engagement.  For instance under the “Reduce” element we will put TPC data audits for identifying archivable data and dedupe for reducing primary and backup storage.  For “Automation” we will discuss the PREDATAR suite for automating TSM management.  Finally for “Protection” we will discuss our automated TSM recovery tool (PREDATAR Recovery Tracker) and consult offerings such as TSM Health checks and Recovery Audits.  So enough of the ramble, basically we based the day around the RAP theme.

We used the “Reduce” section to cover the official launch of the TSMagic product (we didn’t get the same cheers and clapping Steve Jobs gets, how does he do it!?).  It was nice letting the product out into the open for general inspection and prodding, we got some good interest in the product that is very encouraging! (phew, we didn’t waste months of development).

Ronnie De Giorgio covered the “Automation” section of RAP by giving us a comprehensive run through of the PREDATAR product set.  He gave an insight into the future of PREDATAR wearing his hat as PREDATAR Development Manager.

thalesThe final section of RAP, “Protect” was covered with one of the products from the security and encryption giant THALES.  Simon Taylor presented on the TEMS key management appliance that will allow a fully secured and offsited method of implementing key management for tape and disk products.

If anyone wants the materials that we used or other marketing bumf on the products then email me –

All in all a tiring but enjoyable day that will hopefully lead to some business…


TSMagic – save 20% of space in TSM

Wow it has been a while since I last posted!!

Life the Universe and TSMagic!

Life has been crazy busy over the past few months, in a good way though.  I have been working with the software guys here at Silverstring (the PREDATAR team) on our new product, TSMagic.  This new PREDATAR module is really rather fun!

Firstly I will tell you what the software does and then go onto explain the consultancy spin that we have put on it.

The Software

So imagine being asked the question:

“how many versions do we keep of the file quantum_bananas.pdf in TSM?”


“TSM person, tell me the percentage of data in TSM that hasn’t been read in the last year”

These specific questions may not crop up regularly but you will almost definitely get questions about what data is in TSM, how many version are being kept and how much a particular application is occupying.  TSMagic uses the TSM database and gives a visual representation of the contents of usage of TSM storage.  So for example the screenshot below is one of the highlevel views that shows the breakdown of the backup data within a TSM system:


Breakdown of the file types using TSMagic

Breakdown of the file types using TSMagic

The particular designation of data types comes as a number of defaults that can be altered and added to using a nice easy GUI driven wizard.

Another couple of Screenies:


Breakdown of space occupied by the different application types

Breakdown of space occupied by the different application types

Split of application data types (SQL, Oracle, Exchange etc)

 Analysis of data by Created Date

A useful Storage Resource Management type report showing the age of the data that is resident in TSM.


The Consultancy Spin! 

It is all very well having access to a load of pretty pictures, but what are you going to do about it!?  That is where the work that I have been doing comes into play.  Rather than deliver TSMagic as a software product, we have built it into a consultancy offering that uses the data to give a view of the following:

  • Capacity – data and facts about the current capacity and storage use of TSM and where savings can be made. Our experience of the tests we have performed is that massive savings can be made by identifying all the unnecessary backups that occupy space that would never usually get spotted. One instance we saw a 25% storage saving. This was all unnecessary backups such as backing up the disk based TSM DB backups, back into TSM (eh, good idea!!).
  • Compliance – we use TSMagic to check the required version and retention against what is really happening. This is one better than the usual test which is to check that TSM is configured to have the correct policies. The usual tests completely ignore if the right data is bound to those policies.
  • Archiving – Using the SRM type reporting we can give a view on the age and profile of the types of data in TSM which gives a very good indication of the types of data on primary storage. Who cares!? I hear you cry! Well it gives you a very quick and accurate picture of what primary storage can be freed up by archiving unused data!
  • TSM 6 readiness. Using the data type breakdown we can give a view of the level of dedupe that can be achieved in your TSM 6 environment. Usually this figure is based on a “finger in the air” estimate and not actual data from your own environment.

So the plan with the TSMagic Consultancy is to present, in an innovative way, the reductions in cost by managing TSM storage wisely and the reductions in compliance penalty payments by being able to rapidly identify and fix compliance issues.

The launch is on the 10th June…

…to be continued