Pages

Thursday, February 7, 2013

Cisco Prime Infrastructure NCS 1.2 and the Oracle issue

Continuing my story about NCS diskspace increase.

A few weeks everything went fine, until one day the webserver or the shell wasn't accessible anymore.
In the vmware server console we saw that all of the memory was in use (8GB). So we add some extra memory and rebooted the system. (At that time I didn't made the correlation with increasing the disk space a few weeks ago)

This didn't solve the problem.

But I could log in and memory usage seemed normal. First I went looking in /var/log , you can skip this step, nothing of value can be found here.

The interesting directories are /opt/CSCOlumos/logs/

Last file modified was hm-0-0.log. Containing:

ERROR [system] [HealthMonitorServer] HealthMonitorServer.initHealthMonitor: initHealthMonitor(): can not start DB
INFO  [database] [Thread-17] OracleSchemaUtil dbServerUp(): wcs errorCode = 1034

(Other logfiles were also giving oracle errors)
Closing in on the oracle DB.



Next target /opt/oracle (disclaimer: I know nothing about Oracle), after some searching I discovered the logfile /opt/oracle/diag/rdbms/wcs/wcs/trace/alert_wcs.log which seemed interesting.

ORA-19815: WARNING: db_recovery_file_dest_size of 107374182400 bytes is 100.00% used, and has 0 remaining bytes available.
ORA-19809: limit exceeded for recovery files
ORA-16038: log 1 sequence# 1018 cannot be archived
ORACLE Instance wcs - Archival Error
ARCH: Archival stopped, error occurred. Will continue retrying

This didn't look good. Apparently Oracle has it own internal idea of how big a disk is and how much free space is available. Because we still had 130Gb available.

/dev/mapper/smosvg-optvol
                     249204396  97436636 138909244  42% /opt

With the help of google and a colleague with the necessary oracle knowledge. This was the solution (increasing the recovery db size):

[root@ncs oracle]# su - oracle
[oracle@ncs ~]$ ls
base  coracleenv  dbPasswd.pwd  oracleenv  oraInventory  templates  utils
[oracle@ncs ~]$ . oracleenv
[oracle@ncs ~]$ sqlplus '/as sysdba'
SQL*Plus: Release 11.2.0.2.0 Production 
Copyright (c) 1982, 2010, Oracle.  All rights reserved.
Connected to an idle instance.

SQL> startup nomount
ORACLE instance started.

Total System Global Area 4275781632 bytes
Fixed Size                  2233336 bytes
Variable Size            2986347528 bytes
Database Buffers         1275068416 bytes
Redo Buffers               12132352 bytes
SQL> show parameter db_recovery_file_dest_size

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_recovery_file_dest_size           big integer 100G
SQL> alter system set db_recovery_file_dest_size=120G scope=both;

System altered.

SQL> show parameter db_recovery_file_dest_size;

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_recovery_file_dest_size           big integer 120G
SQL> shutdown
ORA-01507: database not mounted


ORACLE instance shut down.
SQL> quit

Still the error, but now some free space to do something

ORA-19815: WARNING: db_recovery_file_dest_size of 128849018880 bytes is 85.07% used, and has 19241095680 remaining bytes available.

Next ran ncs cleanup on the NCS CLI

ncs/admin# ncs cleanup
===================================================
Starting Cleanup: 
===================================================
Removing all files in backup staging directory
Removing all Matlab core related files
Removing all older log files
Cleaning older archive logs
Cleaning database backup and all archive logs
Cleaning database
Stopping database
Starting database
Starting database clean
Completed database clean
Stopping database
===================================================
Completed Cleanup
===================================================
ncs/admin#

And rebooted the system.

Oracle DB came up normally and NCS was happily running again.

10 comments:

  1. Great post, ran into this issue now myself and not exactly easy to diagnose unless you explicitly go look for this error.

    ReplyDelete
  2. Hi
    I had exactly the same problem today, and thanks to your post I was able to fix it.
    Thanks for sharing!

    ReplyDelete
  3. thank you , these two posts were most excellent. now made my ncs a happy bunny

    ReplyDelete
  4. Thanks for posting this. Ran into this error, but for a slightly different reason. A sysadmin for our VM environment skinny'd down the memory on our Prime host for some unknown reason, and that memory change manifested in not having enough shared memory available for Oracle to want to startup nicely. This post at least pointed me in the right direction to locate the log files. (Nice quote by Bones, too.)

    ReplyDelete
  5. Thank you for this. It solved my issue and made my day much easier. Like you said, Dammit Jim, I'm a network Engineer

    ReplyDelete
  6. Thanks a lot for your post !!!

    ReplyDelete
  7. Thanks a lot, I ran into the same issue, and this solved it as well.
    Cheers!

    ReplyDelete
  8. We just ran into this issue and your post helped us get back up and running, so thanks!!

    ReplyDelete
  9. Thanks for posting this. I had the same problem and this fixed it for me. I guess I probably need to do the maintenance every once in a while. I echo Andreas, not easy to find unless you know where to look.

    ReplyDelete