Over at the TSMADmin blog, Chad Small recently posted “DB2 Doesn’t Make a Difference.” Chad was concerned that he was not seeing a performance boost in his database queries after upgrading from TSM 5.5 to 6.x.
As you know, one of the biggest changes in TSM 6.x is that it is now backed by DB2. Previous releases of TSM used a proprietary database that worked well, but had certain limitations, especially with scaling.
In any case, Chad was using the following query to find old Oracle database backups that had not been properly cleaned:
select \ cast(sum(b.file_size/1073741824) as decimal(18,2)) AS GB_SIZE \ from \ backups a, \ contents b \ where \ a.node_name in ('DEV01_ORA','DEV02_ORA','DEV03_ORA','PRD01_ORA','PROD02_ORA') and \ a.backup_date < '2011-11-01 00:00:00' and \ a.object_id=b.object_id
Chad’s problem was that the query was slow. Really, really slow. It sounded like an interesting problem (I know, I’m weird), and since one of my many hats is doing Oracle support, I have a seen a fair amount of poorly-performing database queries. And usually, the culprit is the query itself, and not the database. I’m not anyone’s idea of a SQL tuning expert, but it seemed like a good opportunity to learn.
I confirmed what Chad was seeing by issuing a select count(*) against
contents, and the
contents query in particular took very long to execute. Additionally, it returned tens of millions of rows, so there was a lot of data to process.
Since the CLI for DB2,
db2, is included with TSM, I used it to query the database directly. Once connected, the first order of business was to find out what the query was hitting: what are
db2 => set schema tsmdb1 DB20000I The SQL command completed successfully. db2 => list tables for schema tsmdb1 Table/View Schema Type Creation time ------------------------------- --------------- ----- -------------------------- . . . BACKUPS TSMDB1 V 2011-09-14-19.05.58.424225 . . . CONTENTS TSMDB1 V 2011-09-14-19.06.04.636090 . . .
Well. We can see from the ‘V’ in the Type column that the two objects are views. A view is essentially a SQL code block that you can query and update just like you would a table. This is great for abstracting a complicated database, but whenever you execute a statement against it, it gets executed itself. So there may be a whole lot more processing involved than you expect or need.
In order to examine the SQL code for the views, DB2 provides the
db2look command-line utility. Here are the commands to dump the code for the two views. Note that the output is being placed into a text file.
db2look -d TSMDB1 -e -v backups -o backups.sql
db2look -d TSMDB1 -e -v contents -o contents.sql
After looking at the source code, I found that most of what Chad was looking for seemed to be in the backup_objects table. Here’s my rewrite of Chad’s query:
select \ sum(cast(bk.bfsize/1024/1024 as decimal(14,4))) AS MB \ from \ backup_objects bk \ where \ bk.nodeid in (select nodeid from nodesview where node_name in ('DEV01_ORA','DEV02_ORA','DEV03_ORA','PRD01_ORA','PROD02_ORA')) and \ bk.insdate<'2011-11-01 00:00:00'
Chad pointed out that you can’t query the
nodes table from the
dsmadmc client (why??), however you are able to query against the
nodesview view. You’ll need to change the column from
node_name when you do.
With the new query, Chad is able to get results within 10 minutes. Still not blazing fast, but it sounds like the query is usable now.