CMS Blog Series – Part 1: Auto Discovery
My first venture into implementing a CMDB was nearly five years for a very large multi-national firm and before the term ITIL was barely even spoken in North America. I was challenged by the Chief Technology Officer to look into Configuration Management and come up with a strategy for addressing it. It did not take long for me to identify my first batch of major obstacles;
- Volume of data
- Quality of data
- Constant Changes
I found that in my particular situation my problem was centered not on creating more data but instead, deciphering the volumes that already existed in the environment and determining what was accurate and at what point in time was it accurate. Accuracy and timeliness of the data in an MDR is vital and a manually populated CMS is wrought with pitfalls. There is some manual data which is unavoidable but it needs to be kept to a minimum and validated far more often than its discovered counterpart.
Now that I’ve provided some background for my position on Auto Discovery let’s make sure that everyone is on the same page in regards to what I mean by “Auto Discovery” or “Discovery Tools”. When referring to these terms I’m talking about technologies that have the ability to maintain an accurate portrayal of the IT environment. “These tools will tell you the truth about your environment because by their very name, they are discovering the existing elements and their attributes.”
“These tools will tell you the truth about your environment because by their very name, they are discovering the existing elements and their attributes.”
‘The CMDB Imperative: How to realize the dream and avoid the nightmares’ Page 64
With the exception of situations where some form of Service Catalog is already in place and available for integration, Auto Discovery will not be able to leverage service definitions and will be limited to IT component data only. Auto Discovery is not an end-all be-all solution, it is merely a part of the overall solution that must be leveraged to reduce the burden which would otherwise be placed on your already over burdened staff.
The following quotes from Ronni Colville in a press release by BMC Software in regards to their acquisition of Tideway Systems Limited (Tideway) state the importance of discovery in a CMDB solution. Hopefully Ronni will start refer to it as a CMS in the future but for now, I guess we have to accept her reference to it as a CMDB.
“The CMDB is most powerful when it has a clear understanding of infrastructure components, application dependencies and service relationships” “Essential to that are solutions that automatically discover, model and maintain those relationships despite ongoing infrastructure changes.
Ronni Colville, Vice President and Distinguished Analyst, Gartner as quoted in October 19, 2009 BMC Software press release.
There are various elements to the Auto Discovery topic so I’ll touch on a few of them below to get you going.
Should I use it?
In my opinion everyone who can use it should and because of virtualization it will be a must. The extent to which you use it will vary based on the size of firm and budgets. If you’re a small firm, university, agency that has a minimal amount of IT components in your environment or have access to inexpensive labor to perform the function manually, then installing and maintaining an additional technology might not yet be a justifiable decision. If you are a medium sized company or larger, you need to perform a cost comparison of implementing the technology versus using staffing resources and assess whether or not you can legitimately meet the demand of a constantly changing environment with human resources versus a technology solution. You must factor into your equation the impact on services if you try to do it manually but cannot meet the pace of change in your environment without a technological solution.
…assess whether or not you can legitimately meet the demand of a constantly changing environment with human resources versus a technology solution.
So, ask yourself these questions;
- Can I get budget to hire/retain resources for this continual task that will only become more demanding as the company grows?
- Is it realistic to expect human resources to work 24×7x365 manually monitoring and documenting modifications that happen in my environment and manually cross reference them against RFCs?
- Can I count on the data being gathered by these people to be accurate and up-to-date around the clock?
- What is the cost of implementing a technology solution?
- What is the cost of using a basic WMI based tool network administrator utility versus maintaining a staff to do it manually?
- Are there processes in place to ensure the accuracy and authenticity of the discovered data?
- Are there other sources for the same data which can be used for comparison?
- Are there regulatory requirements ( i.e. SOX ) which the Discovery Tool does not comply with?
- What is the historical track record for the quality of data generated by the tool?
- What would be the impact to my processes if my company decides to change tool vendors?
How could I use it?
This is the topic that most likely will generate a heated debate because it touches on some foundational and philosophical beliefs that people have about leveraging data from a discovery tool. My rationale is a very simple one, use the tool or any other mechanism for that matter in any way that you can as long as you are deriving value from it for your company.
Let’s touch on the two main elements in how you might use Auto Discovered data in regards to a CMS.
Initial Setup and CMS population
The main questions here are; Whether you should use a discovery tool to ‘seed’ a CMS and what are the risks of doing so? The short answer is yes, you should use it as long as the value of leveraging that data out-weighs the risk of having unverified data as part of your CMS. Ideally, you can flag the data with a quality metric so as to make sure that the consumers of it understand it’s reliability.
Ask yourself this, is it better to have no system in place versus having one that is partially populated but with only 70% accurate data? Before answering the question however remember that your service desk folks and other operations personnel are already using that same data but they are aggregating it manually, so, are you making their jobs better or worse by providing them a consolidated view of data they ALREADY use EVEN IF its quality is questionable. I am not by any means advocating the promotion of poor quality data however I am advocating the exposure of data to more people as part of a data cleansing and quality improvement effort. If you reduce your staffs research time by 20% simply by bringing all the data together into one portal, have you not provide value to them?
Some basic items to address:
- Identify all your potential data sources
- Categorize the sources by the data they provide
- Create a Data Source Matrix identifying where sources have overlapping data
- Assign a Reliability Factor to each source
- Determine which source should be the “reference” mark as your starting point for quality measures
- Determine what data can be used “as-is” and what needs cleansing and consolidation
- Communicate the findings
Continued Use
In regards to regular use, the question is; “How should the auto discovered data be used for the day-to-day operation of a CMS?” Although many feel that it should only be used for initial startup, I would argue that the tools can be used as part of your ongoing verification and audit process for all the data that makes up the CMS.
In both cases, you need to put safeguards in place. You should look at putting very rigid policies and standards in place to clearly communicate how the data can and should be used. You should also communicate what the expected reliability is for all elements of data. This is true for not only discovered but also manually generated data. As much as we would like to believe that only data which is 100% guaranteed will be used, it is not realistic. No matter what your environment is, there are anomalies and issues that arise and that will impact the quality and comprehensiveness of data. If you know what they are, be sure to communicate them so that nobody consuming the data is surprised and can plan accordingly.
Some basic items to address:
- Identify which sources are “retirement candidates”
- Execute manual audits to ensure the data quality metrics are being achieved
- Gather feedback from Change, Incident, Problem, Release, Deployment processes regarding data quality
- Review Problem and Incident management reports to determine where data gaps might exist
- Assess TCO of implementation and adjust implementation as needed
What’s the difference between Agent vs Agent-less?
When talking about discovery technologies, they tend to fall into one of two categories; Agent and Agent-less. The Agent based approach is one where by some piece of technology, typically software, is installed on a networked computing device and it monitors the device for changes to its settings. This is sometimes also referred to as a passive approach since it is waiting for something to occur before acting on it. The agent-less approach is one based more on an active polling style whereby it looks out into the environment on some regular period to determine what has changed since the last time it checked. Think about it in terms of how you aggregate data from the Internet. There are some websites where you go out to for information on let’s say a daily based. However, there are other sites where you have signed up to get an email notification or RSS feed update whenever something gets modified on that site. The term agent-less does leverage on-board technology in somewhat the same fashion as the agent based approach however it differs in that it does not require you to load it as a third-party piece of software. Two examples of the instrumentation required to execute an agent-less solution are SNMP and WMI since they are included as a standard part of a typicall operating system installation.
The key here is that in order to populate and maintain the CMS, you need to leverage both. There are positives and negatives to both approaches which you will need to consider and you will need to determine what percentage of each you will need to meet your requirements. Below is a graphic from “The CMDB Imperative” which demonstrates the difference between Manual and Automated CI Discovery.
The bottom line on what to do or not do with auto discovered data is a question of value and communication. Leverage the auto-discovery technologies as best as you can and inform the consumers of its reliability and comprehensiveness. Determine a threshold mark where you decide the data from a particular discovery technology will no longer be reliable or complete enough to use. Then go forward with it and offer your consumers some value.
