A few weeks everything went fine, until one day the webserver or the shell wasn't accessible anymore.
In the vmware server console we saw that all of the memory was in use (8GB). So we add some extra memory and rebooted the system. (At that time I didn't made the correlation with increasing the disk space a few weeks ago)
This didn't solve the problem.
But I could log in and memory usage seemed normal. First I went looking in /var/log , you can skip this step, nothing of value can be found here.
The interesting directories are /opt/CSCOlumos/logs/
Last file modified was hm-0-0.log. Containing:
ERROR [system] [HealthMonitorServer] HealthMonitorServer.initHealthMonitor: initHealthMonitor(): can not start DB
INFO [database] [Thread-17] OracleSchemaUtil dbServerUp(): wcs errorCode = 1034
(Other logfiles were also giving oracle errors)
Closing in on the oracle DB.
Next target /opt/oracle (disclaimer: I know nothing about Oracle), after some searching I discovered the logfile /opt/oracle/diag/rdbms/wcs/wcs/trace/alert_wcs.log which seemed interesting.
ORA-19815: WARNING: db_recovery_file_dest_size of 107374182400 bytes is 100.00% used, and has 0 remaining bytes available.
ORA-19809: limit exceeded for recovery files
ORA-16038: log 1 sequence# 1018 cannot be archived
ORACLE Instance wcs - Archival Error
ARCH: Archival stopped, error occurred. Will continue retrying
/dev/mapper/smosvg-optvol
249204396 97436636 138909244 42% /opt
With the help of google and a colleague with the necessary oracle knowledge. This was the solution (increasing the recovery db size):
[root@ncs oracle]# su - oracle
[oracle@ncs ~]$ ls
base coracleenv dbPasswd.pwd oracleenv oraInventory templates utils
[oracle@ncs ~]$ . oracleenv
[oracle@ncs ~]$ sqlplus '/as sysdba'
SQL*Plus: Release 11.2.0.2.0 Production
Copyright (c) 1982, 2010, Oracle. All rights reserved.
Connected to an idle instance.
SQL> startup nomount
ORACLE instance started.
Total System Global Area 4275781632 bytes
Fixed Size 2233336 bytes
Variable Size 2986347528 bytes
Database Buffers 1275068416 bytes
Redo Buffers 12132352 bytes
SQL> show parameter db_recovery_file_dest_size
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
db_recovery_file_dest_size big integer 100G
SQL> alter system set db_recovery_file_dest_size=120G scope=both;
System altered.
SQL> show parameter db_recovery_file_dest_size;
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
db_recovery_file_dest_size big integer 120G
SQL> shutdown
ORA-01507: database not mounted
ORACLE instance shut down.
SQL> quit
Still the error, but now some free space to do something
ORA-19815: WARNING: db_recovery_file_dest_size of 128849018880 bytes is 85.07% used, and has 19241095680 remaining bytes available.
Next ran ncs cleanup on the NCS CLI
ncs/admin# ncs cleanup
===================================================
Starting Cleanup:
===================================================
Removing all files in backup staging directory
Removing all Matlab core related files
Removing all older log files
Cleaning older archive logs
Cleaning database backup and all archive logs
Cleaning database
Stopping database
Starting database
Starting database clean
Completed database clean
Stopping database
===================================================
Completed Cleanup
===================================================
ncs/admin#
And rebooted the system.
Oracle DB came up normally and NCS was happily running again.
Great post, ran into this issue now myself and not exactly easy to diagnose unless you explicitly go look for this error.
ReplyDeleteHi
ReplyDeleteI had exactly the same problem today, and thanks to your post I was able to fix it.
Thanks for sharing!
thank you , these two posts were most excellent. now made my ncs a happy bunny
ReplyDeleteThanks for posting this. Ran into this error, but for a slightly different reason. A sysadmin for our VM environment skinny'd down the memory on our Prime host for some unknown reason, and that memory change manifested in not having enough shared memory available for Oracle to want to startup nicely. This post at least pointed me in the right direction to locate the log files. (Nice quote by Bones, too.)
ReplyDeleteThank you for this. It solved my issue and made my day much easier. Like you said, Dammit Jim, I'm a network Engineer
ReplyDeleteThanks a lot for your post !!!
ReplyDeleteThanks a lot, I ran into the same issue, and this solved it as well.
ReplyDeleteCheers!
awesome. saved my day!
ReplyDeleteWe just ran into this issue and your post helped us get back up and running, so thanks!!
ReplyDeleteThanks for posting this. I had the same problem and this fixed it for me. I guess I probably need to do the maintenance every once in a while. I echo Andreas, not easy to find unless you know where to look.
ReplyDeleteThank you for this post, it was very helpfull, and solved my problem
ReplyDeleteI had some notes from a past problem where Oracle in Prime was the cause. The notes led me to this post which completely solved the problem. I am grateful. Thank you.
ReplyDeleteThanks for this info. I am being asked for a oracle pw. It's not my root pw. Hopefully just a cleanup will do.
ReplyDelete