Friday, November 7, 2008

Oracle SQL group by on a portion of a clob field in the database

I work on an application which writes its internal errors in processing to a DB table. The input data that caused the failure due to a parsing error or missing configuration error is put in a clob column in the database. This input data is an XML message and one of the tags identifies each application that sends data to my application. We had one of the architects who requested us to give a split up on the number of errors that have occurred for each application.

This was significant from Oracle perspective because you cannot do a group by on a clob. In order to do a group by, you need to be able to sort and clob fields do not have a natural sort order. The workaround is to do a to_char function on the clob field but you can get a max of 4000 chars. For me, this suited the requirement because the substring I needed was a small string. But the issue is that I needed to sort a portion of an XML tag inside the clob field and group them based on that. Not only that I needed to find a specific type of error in the XML message and count them.

After a little food for thought, here is the final query that fetched the sweet results


SELECT k.application_name, COUNT (*)
FROM (SELECT TO_CHAR
(SUBSTR (clobfield,
( INSTR (clobfield,
'ns0:appName',
1,
1
)
+ 21
),
( INSTR (clobfield,
'lt;/ns0:appName',
1,
1
)
- INSTR (clobfield,
'ns0:appName',
1,
1
)
- 22
)
)
) AS application_name
FROM tableName
WHERE appID = 1
AND INSTR (clobfield, 'ns0:appName', 1, 1) <> 0
AND INSTR (clobfield, 'ErrorMsg', 1, 1) <> 0
AND INSTR (clobfield, 'Application Not Configured', 1, 1) <> 0) k
GROUP BY k.application_name


The INSTR function selects the index of the occurrence of a string inside another string, while substr gets the substring from a string. The numbers 21 and 22 account for the tag name lengths and the additional characters in the tag name like > etc.

If you need to count based a specific portion of a clob, just modify the above query and you are good to go..

No comments: