When a MongoDB server crashes you will usually find what is called a “stack trace” in its log file. But what is it and what purpose does it have? Let’s simulate a simple crash so we can dig into it.
Crashing a test server
In a test setup with a freshly installed MongoDB server, we connect to it and create some test data:
1 2 3 4 5 6 7 8 9 | $ mongo MongoDB shell version v3.6.12 (...) > use test switched to db test > db.albums.insert({ name: "The Wall" }) WriteResult({ "nInserted" : 1 }) > db.albums.find() { "_id" : ObjectId("5d237cef9affce6d7e4e8345"), "name" : "The Wall" } |
On a separate connection to the server, we change the ownership of the MongoDB data files, so the mongod user will no longer have access to them:
1 | $ sudo chown root:root /var/lib/mongo/* |
Going back to the mongo session, we try to add a new record and it fails, as expected:
1 2 3 4 5 6 7 8 9 10 11 | > db.albums.insert({ name: "The Division Bell" }) 2019-07-08T17:27:40.275+0000 E QUERY [thread1] Error: error doing query: failed: network error while attempting to run command 'insert' on host '127.0.0.1:27017' : DB.prototype.runCommand@src/mongo/shell/db.js:168:1 DBCollection.prototype._dbCommand@src/mongo/shell/collection.js:173:1 Bulk/executeBatch@src/mongo/shell/bulk_api.js:903:22 Bulk/this.execute@src/mongo/shell/bulk_api.js:1154:21 DBCollection.prototype.insert@src/mongo/shell/collection.js:317:22 @(shell):1:1 2019-07-08T17:27:40.284+0000 I NETWORK [thread1] trying reconnect to 127.0.0.1:27017 (127.0.0.1) failed 2019-07-08T17:27:40.284+0000 W NETWORK [thread1] Failed to connect to 127.0.0.1:27017, in(checking socket for error after poll), reason: Connection refused 2019-07-08T17:27:40.284+0000 I NETWORK [thread1] reconnect 127.0.0.1:27017 (127.0.0.1) failed failed |
Looking at the error log we confirm the server has crashed, leaving a stack trace (also called a “backtrace”) behind:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | $ sudo cat /var/log/mongodb/mongod.log (...) 2019-07-08T17:27:39.666+0000 E STORAGE [thread2] WiredTiger error (13) [1562606859:666004][24742:0x7f70a3501700], log-server: __directory_list_worker, 48: /home/vagrant/db/journal: directory-list: opendir: Permission denied (...) 2019-07-08T17:27:39.666+0000 E STORAGE [thread2] WiredTiger error (-31804) [1562606859:666313][24742:0x7f70a3501700], log-server: __wt_panic, 523: the process must exit and restart: WT_PANIC: WiredTiger library panic (...) ----- BEGIN BACKTRACE ----- (...) mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x5618a3ab92c1] mongod(+0x22744D9) [0x5618a3ab84d9] mongod(+0x22749BD) [0x5618a3ab89bd] libpthread.so.0(+0xF6D0) [0x7f70a6cff6d0] libc.so.6(gsignal+0x37) [0x7f70a6959277] libc.so.6(abort+0x148) [0x7f70a695a968] mongod(_ZN5mongo32fassertFailedNoTraceWithLocationEiPKcj+0x0) [0x5618a21e064c] mongod(+0xA6D9EE) [0x5618a22b19ee] mongod(+0xADEEF1) [0x5618a2322ef1] mongod(__wt_err_func+0x90) [0x5618a217b742] mongod(__wt_panic+0x3F) [0x5618a217bb62] mongod(+0xB3DFB2) [0x5618a2381fb2] libpthread.so.0(+0x7E25) [0x7f70a6cf7e25] libc.so.6(clone+0x6D) [0x7f70a6a21bad] ----- END BACKTRACE ----- Aborted |
But what can we infer from these somewhat cryptic lines full of hexadecimal content?
Inspecting the MongoDB stack trace
In the bottom of the stack trace, we can see a list of function names and addresses. Note the resolution of most functions worked reasonably well in the example above; the mongod binary used by our test server is not stripped of symbols (if yours is you will need to install the respective debugsymbols/debuginfo package and use the mongod binary provided by it to resolve the stack trace):
1 2 | $ file `which mongod` /usr/bin/mongod: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=1d0fd59529274e06c35e6dc4c74e0ef08caf931c, not stripped |
This means we can actually extract them from the mongod binary with the help of a tool such as nm, from GNU Development Tools:
1 | $ nm -n /usr/bin/mongod > mongod.symbols |
Function names appear all mangled though:
1 2 3 4 5 6 7 8 9 10 11 | $ tail mongod.symbols 0000000003125f88 u _ZNSt9money_getIwSt19istreambuf_iteratorIwSt11char_traitsIwEEE2idE 0000000003125f90 u _ZNSt10moneypunctIwLb1EE2idE 0000000003125f98 u _ZNSt10moneypunctIwLb0EE2idE 0000000003125fa0 b _ZZN9__gnu_cxx27__verbose_terminate_handlerEvE11terminating 0000000003125fc0 b _ZZN12_GLOBAL__N_112get_catalogsEvE10__catalogs 0000000003126008 b _ZGVZN12_GLOBAL__N_112get_catalogsEvE10__catalogs 0000000003126020 b _ZZN12_GLOBAL__N_112get_catalogsEvE10__catalogs 0000000003126068 b _ZGVZN12_GLOBAL__N_112get_catalogsEvE10__catalogs 0000000003126080 B __wt_process 00000000031260e8 A _end |
We can use another tool from that same toolkit, c++filt, to get them straightened, for example:
1 2 | $ echo _ZNSt9money_getIwSt19istreambuf_iteratorIwSt11char_traitsIwEEE2idE | c++filt std::money_get<wchar_t, std::istreambuf_iterator<wchar_t, std::char_traits<wchar_t> > >::id |
In fact, we can process the whole stack trace with c++filt all at once…
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | $ cat <<EOT | c++filt > mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x5618a3ab92c1] > mongod(+0x22744D9) [0x5618a3ab84d9] > mongod(+0x22749BD) [0x5618a3ab89bd] > libpthread.so.0(+0xF6D0) [0x7f70a6cff6d0] > libc.so.6(gsignal+0x37) [0x7f70a6959277] > libc.so.6(abort+0x148) [0x7f70a695a968] > mongod(_ZN5mongo32fassertFailedNoTraceWithLocationEiPKcj+0x0) [0x5618a21e064c] > mongod(+0xA6D9EE) [0x5618a22b19ee] > mongod(+0xADEEF1) [0x5618a2322ef1] > mongod(__wt_err_func+0x90) [0x5618a217b742] > mongod(__wt_panic+0x3F) [0x5618a217bb62] > mongod(+0xB3DFB2) [0x5618a2381fb2] > libpthread.so.0(+0x7E25) [0x7f70a6cf7e25] > libc.so.6(clone+0x6D) [0x7f70a6a21bad] > EOT |
… and get it fully demangled, with C++ function and method names easily recognizable now:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | mongod(mongo::printStackTrace(std::basic_ostream<char, std::char_traits<char> >&)+0x41) [0x5618a3ab92c1] mongod(+0x22744D9) [0x5618a3ab84d9] mongod(+0x22749BD) [0x5618a3ab89bd] libpthread.so.0(+0xF6D0) [0x7f70a6cff6d0] libc.so.6(gsignal+0x37) [0x7f70a6959277] libc.so.6(abort+0x148) [0x7f70a695a968] mongod(mongo::fassertFailedNoTraceWithLocation(int, char const*, unsigned int)+0x0) [0x5618a21e064c] mongod(+0xA6D9EE) [0x5618a22b19ee] mongod(+0xADEEF1) [0x5618a2322ef1] mongod(__wt_err_func+0x90) [0x5618a217b742] mongod(__wt_panic+0x3F) [0x5618a217bb62] mongod(+0xB3DFB2) [0x5618a2381fb2] libpthread.so.0(+0x7E25) [0x7f70a6cf7e25] libc.so.6(clone+0x6D) [0x7f70a6a21bad] |
While easily reproducible, this was not a very interesting example: the change in the database files ownership caused WiredTiger to crash upon the insert without leaving much trace behind. Let’s have a look at another one.
A more realistic example
Despite being somewhat old, bug SERVER-13751 (mongod crash on geo nearSphere query) provides a realistic yet easy to reproduce example of a simple routine that crashed MongoDB 2.6.0 (this bug is, in fact, a duplicate of SERVER-13666, but it provides a simpler test case). Here’s how to get to it.
1) First, we download these old binaries and start a MongoDB server:
1 2 3 4 5 | $ wget http://downloads.mongodb.org/linux/mongodb-linux-x86_64-2.6.0.tgz $ tar zxvf mongodb-linux-x86_64-2.6.0.tgz $ cd mongodb-linux-x86_64-2.6.0/bin $ mkdir /home/vagrant/db $ ./mongod --dbpath /home/vagrant/db |
2) In a second terminal window, we connect to the MongoDB server we just started and run a more simplified version of the routine described in the bug, which consists of creating a 2dsphere index and querying for a point described with invalid coordinates:
1 2 3 4 | $ cd mongodb-linux-x86_64-2.6.0/bin $ ./mongo > db.places.ensureIndex({loc:"2dsphere"}) > db.places.find({loc:{$nearSphere: [200.4905, 300.2646]}}) |
Now when we look back at the first terminal we find the server has crashed, leaving the following stack trace:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | ./mongod(_ZN5mongo15printStackTraceERSo+0x21) [0x11bd301] ./mongod() [0x11bc6de] /lib64/libc.so.6(+0x36340) [0x7f11cd866340] /lib64/libc.so.6(gsignal+0x37) [0x7f11cd8662c7] /lib64/libc.so.6(abort+0x148) [0x7f11cd8679b8] ./mongod(_ZN5mongo13fassertFailedEi+0x13a) [0x11421ea] ./mongod(_ZN15LogMessageFatalD1Ev+0x1d) [0x125d58d] ./mongod(_ZN5S2Cap13FromAxisAngleERK7Vector3IdERK7S1Angle+0x169) [0x1267699] ./mongod(_ZN5mongo11S2NearStage11nextAnnulusEv+0xd2) [0xabd142] ./mongod(_ZN5mongo11S2NearStage4workEPm+0x1fb) [0xabf2cb] ./mongod(_ZN5mongo12PlanExecutor7getNextEPNS_7BSONObjEPNS_7DiskLocE+0xef) [0xd66a7f] ./mongod(_ZN5mongo11newRunQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1_+0x958) [0xd4acf8] ./mongod() [0xb96382] ./mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x442) [0xb98962] ./mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x9f) [0x76b76f] ./mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x4fb) [0x117367b] /lib64/libpthread.so.0(+0x7dd5) [0x7f11ce62bdd5] /lib64/libc.so.6(clone+0x6d) [0x7f11cd92e02d] |
Processing the stack trace with c++filt we get:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | ./mongod(mongo::printStackTrace(std::basic_ostream<char, std::char_traits<char> >&)+0x21) [0x11bd301] ./mongod() [0x11bc6de] /lib64/libc.so.6(+0x36340) [0x7f11cd866340] /lib64/libc.so.6(gsignal+0x37) [0x7f11cd8662c7] /lib64/libc.so.6(abort+0x148) [0x7f11cd8679b8] ./mongod(mongo::fassertFailed(int)+0x13a) [0x11421ea] ./mongod(LogMessageFatal::~LogMessageFatal()+0x1d) [0x125d58d] ./mongod(S2Cap::FromAxisAngle(Vector3<double> const&, S1Angle const&)+0x169) [0x1267699] ./mongod(mongo::S2NearStage::nextAnnulus()+0xd2) [0xabd142] ./mongod(mongo::S2NearStage::work(unsigned long*)+0x1fb) [0xabf2cb] ./mongod(mongo::PlanExecutor::getNext(mongo::BSONObj*, mongo::DiskLoc*)+0xef) [0xd66a7f] ./mongod(mongo::newRunQuery(mongo::Message&, mongo::QueryMessage&, mongo::CurOp&, mongo::Message&)+0x958) [0xd4acf8] ./mongod() [0xb96382] ./mongod(mongo::assembleResponse(mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&)+0x442) [0xb98962] ./mongod(mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*, mongo::LastError*)+0x9f) [0x76b76f] ./mongod(mongo::PortMessageServer::handleIncomingMsg(void*)+0x4fb) [0x117367b] /lib64/libpthread.so.0(+0x7dd5) [0x7f11ce62bdd5] /lib64/libc.so.6(clone+0x6d) [0x7f11cd92e02d] |
This particular mongod binary is stripped of symbols:
1 2 | $ file mongod mongod: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.9, stripped |
So in order to resolve the stack trace, we need to first obtain one that is not:
1 2 3 | $ wget http://downloads.mongodb.org/linux/mongodb-linux-x86_64-debugsymbols-2.6.0.tgz $ tar zxvf mongodb-linux-x86_64-debugsymbols-2.6.0.tgz $ cd mongodb-linux-x86_64-debugsymbols-2.6.0/bin |
We can now extract the actual function names from the addresses between the brackets using addr2line (option “f” provides the function name and if we also use “i” we get the preceding ones as well if the main one was inline; option “C” provides some extent of demangling, similar to c++filt):
1 2 3 | $ addr2line -e mongod -ifC 0x1267699 S2Cap::FromAxisAngle(Vector3<double> const&, S1Angle const&) /srv/10gen/mci-exec/mci/git@github.commongodb/mongo.git/mongodb-mongo-v2.6/src/third_party/s2/s2cap.cc:35 |
One of the greatest values of working with Open Source software is being able to have a direct look at this exact piece of code, which translates to https://github.com/mongodb/mongo/blob/v2.6/src/third_party/s2/s2cap.cc#L35 :
1 2 3 4 5 | S2Cap S2Cap::FromAxisAngle(S2Point const& axis, S1Angle const& angle) { DCHECK(S2::IsUnitLength(axis)); DCHECK_GE(angle.radians(), 0); return S2Cap(axis, GetHeightForAngle(angle.radians())); } |
Note that the actual fix for this bug didn’t come from modifying this function, which is being re-used from a third-party (another beauty of working with Open Source!), but in making sure the arguments that are being passed to it (which compose the point’s coordinates) are validated beforehand.
There is a home for bugs
If you ever run into a MongoDB server crash, I hope this little set of instructions can serve as a reference in helping you make sense of the stack trace that will (hopefully) have been left behind. You can then search for bugs at https://jira.mongodb.org if you’re running a MongoDB server, or at https://jira.percona.com/projects/PSMDB if you’re running Percona Server for MongoDB. If you can’t find a bug that matches your crash, please consider filing a new one; providing a clear stack trace alongside the exact binary version you’re using is a must. If you are able to reproduce the problem at will and can provide a reproducible test case as well, like the ones we showed above, that will not only make the life of our developers easier, it also increases the likelihood of getting the bug fixed much, much faster.
thanks for the information guys. Sometimes you might not find a particular app that you are looking for in the apple app store. But I’m sure there are other ways to get a certain type of iOS apps. So today I will tell you about App valley which is a great third-party app where you can get much more than what you will get in the Apple App store. https://appvalleyvip.com/
Great piece of content. Interested people can read a quick guide to install MongoDB here – https://www.gspann.com/resources/blogs/mongodb-installation
Thanks Fernando this valuable information. I,m also an author. If you have time then visit my blog.