A few weeks everything went fine, until one day the webserver or the shell wasn't accessible anymore.
In the vmware server console we saw that all of the memory was in use (8GB). So we add some extra memory and rebooted the system. (At that time I didn't made the correlation with increasing the disk space a few weeks ago)
This didn't solve the problem.
But I could log in and memory usage seemed normal. First I went looking in /var/log , you can skip this step, nothing of value can be found here.
The interesting directories are /opt/CSCOlumos/logs/
Last file modified was hm-0-0.log. Containing:
ERROR [system] [HealthMonitorServer] HealthMonitorServer.initHealthMonitor: initHealthMonitor(): can not start DB INFO [database] [Thread-17] OracleSchemaUtil dbServerUp(): wcs errorCode = 1034
(Other logfiles were also giving oracle errors)
Closing in on the oracle DB.
Next target /opt/oracle (disclaimer: I know nothing about Oracle), after some searching I discovered the logfile /opt/oracle/diag/rdbms/wcs/wcs/trace/alert_wcs.log which seemed interesting.
ORA-19815: WARNING: db_recovery_file_dest_size of 107374182400 bytes is 100.00% used, and has 0 remaining bytes available. ORA-19809: limit exceeded for recovery files ORA-16038: log 1 sequence# 1018 cannot be archived ORACLE Instance wcs - Archival Error ARCH: Archival stopped, error occurred. Will continue retrying
/dev/mapper/smosvg-optvol 249204396 97436636 138909244 42% /opt
With the help of google and a colleague with the necessary oracle knowledge. This was the solution (increasing the recovery db size):
[root@ncs oracle]# su - oracle [oracle@ncs ~]$ ls base coracleenv dbPasswd.pwd oracleenv oraInventory templates utils [oracle@ncs ~]$ . oracleenv [oracle@ncs ~]$ sqlplus '/as sysdba' SQL*Plus: Release 18.104.22.168.0 Production Copyright (c) 1982, 2010, Oracle. All rights reserved. Connected to an idle instance. SQL> startup nomount ORACLE instance started. Total System Global Area 4275781632 bytes Fixed Size 2233336 bytes Variable Size 2986347528 bytes Database Buffers 1275068416 bytes Redo Buffers 12132352 bytes SQL> show parameter db_recovery_file_dest_size NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ db_recovery_file_dest_size big integer 100G SQL> alter system set db_recovery_file_dest_size=120G scope=both; System altered. SQL> show parameter db_recovery_file_dest_size; NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ db_recovery_file_dest_size big integer 120G SQL> shutdown ORA-01507: database not mounted ORACLE instance shut down. SQL> quit
Still the error, but now some free space to do something
ORA-19815: WARNING: db_recovery_file_dest_size of 128849018880 bytes is 85.07% used, and has 19241095680 remaining bytes available.
Next ran ncs cleanup on the NCS CLI
ncs/admin# ncs cleanup =================================================== Starting Cleanup: =================================================== Removing all files in backup staging directory Removing all Matlab core related files Removing all older log files Cleaning older archive logs Cleaning database backup and all archive logs Cleaning database Stopping database Starting database Starting database clean Completed database clean Stopping database =================================================== Completed Cleanup =================================================== ncs/admin#
And rebooted the system.
Oracle DB came up normally and NCS was happily running again.
Great post, ran into this issue now myself and not exactly easy to diagnose unless you explicitly go look for this error.ReplyDelete
I had exactly the same problem today, and thanks to your post I was able to fix it.
Thanks for sharing!
thank you , these two posts were most excellent. now made my ncs a happy bunnyReplyDelete
Thanks for posting this. Ran into this error, but for a slightly different reason. A sysadmin for our VM environment skinny'd down the memory on our Prime host for some unknown reason, and that memory change manifested in not having enough shared memory available for Oracle to want to startup nicely. This post at least pointed me in the right direction to locate the log files. (Nice quote by Bones, too.)ReplyDelete
Thank you for this. It solved my issue and made my day much easier. Like you said, Dammit Jim, I'm a network EngineerReplyDelete
Thanks a lot for your post !!!ReplyDelete
Thanks a lot, I ran into the same issue, and this solved it as well.ReplyDelete
awesome. saved my day!ReplyDelete
We just ran into this issue and your post helped us get back up and running, so thanks!!ReplyDelete
Thanks for posting this. I had the same problem and this fixed it for me. I guess I probably need to do the maintenance every once in a while. I echo Andreas, not easy to find unless you know where to look.ReplyDelete
Thank you for this post, it was very helpfull, and solved my problemReplyDelete
I had some notes from a past problem where Oracle in Prime was the cause. The notes led me to this post which completely solved the problem. I am grateful. Thank you.ReplyDelete
Thanks for this info. I am being asked for a oracle pw. It's not my root pw. Hopefully just a cleanup will do.ReplyDelete