Don't Use MongoDB Profiling Level 1

TLDR: It is not profile level 1 that is the problem; it’s a gotcha with the optional ‘slowms’ argument that causes users to accidentally set verbose logging and fill their disk with log files.

In MongoDB, there are two ways to see, with individual detail, which operations were executed and how long they took.

Profiling. Saves the operation details to a capped collection system.profile. You access this information through a mongodb connection.
Log lines of “COMMAND” component type in the mongod log files. (Also mongos v4.0+ log files). You have to access the files as Unix or Windows user and work with them as text.

Profiling is low-cost, but it is limited to keeping only a small snapshot. It has levels: 0 (off), 1 (slowOp(s only)), 2 (all).

The (log file) logger also has levels of its own, but there is no ‘off’. Even at level 0, it prints any slow operation in a “COMMAND” log line. ‘Slow’ is defined by the configured slowOpThresholdMs option (originally “slowMS”). That is 100ms by default, which is a good default i.m.o.

Usability problem

The log-or-not code and profile-or-not are unrelated systems to the user, but they share the same global variable for ‘slow’: serverGlobalParams.slowMS

The basic post-mortem will go:

Someone used db.setProfilingLevel(1/*slowOp*/, 0/*ms*/) to start profiling all operations (DON’T DO THIS – use db.setProfilingLevel(2/*all*/) instead.)
The logger starts writing every command to the log files.
They executed db.setProfilingLevel(0/*off*/) to turn the profiler off.
The logger continues writing every command to the log files because slowMS is still 0ms.
The DBA gets paged after hours because the disk filled up with log file and thought ‘oh, I should have set up log rotation on that server; I’ll do it in the morning’.
The DBA gets woken up in the small hours of the morning because the new primary node has also crashed due to a full disk.

So that’s the advice: Until MongoDB is enhanced to have separate slow-op threshold options for the profiler, never use db.setProfilingLevel(1, …). Even if you know the gotcha, someone learning over your shoulder won’t see it.

What to do instead:

Use only db.setProfilingLevel(0 /*off*/) <-> db.setProfilingLevel(2 /*all*/) and don’t touch slowms
- There is still a valid use case for using profiler at level 1, but if you are not taking on the responsibility of looking after the log file’s code interdependence on the slowMS value, don’t go there.
If you want the logger to print “COMMAND” lines for every command, including fast ones, use db.setLogLevel(1, "command"). And run db.setLogLevel(-1, "command") to stop it again. ‘Use log level 1, not profile level 1’ could almost be the catchphrase of this article.
(db.setLogLevel(1) + db.setLogLevel(0) is an alternative to the above, but is an older, blunter method.)
If you want to set the sampleRate you can do that without changing level (or ‘slowms’) with the following command: db.setProfilingLevel(db.getProfilingStatus().was, {“sampleRate”: <new float value>})
If you want a different threshold for slowMS permanently, use the slowOpThresholdMs option in the config file, but you can also do it dynamically as for the sampleRate instructions above.

Some other gotchas to do with profiling and logging:

The logger level is global; the slowOpThresholdMs a.k.a. slowMS value is global, but profiling level is per db namespace.
All of the above are local only to the mongod (or mongos) the commands are run on / the config file is set for. If you run it on a primary it does not change the secondaries, and if you run it on a mongos it does not change the shard or the configsvr replicaset nodes.
mongos nodes only provide a subset of these diagnostic features. They have no collections of their own, so for starters, they cannot make a system.profile collection.

bool shouldDBProfile(bool shouldSample = true) {
    // Profile level 2 should override any sample rate or slowms settings.
    if (_dbprofile >= 2)
        return true;

    if (!shouldSample || _dbprofile <= 0)
        return false;

      /* Blog: and by elimination if _dbprofile == 1: */
    return elapsedTimeExcludingPauses() >= Milliseconds{serverGlobalParams.slowMS};
}

bool CurOp::completeAndLogOperation(OperationContext* opCtx,
                                    logger::LogComponent component,
                                    boost::optional<size_t> responseLength,
                                    boost::optional<long long> slowMsOverride,
                                    bool forceLog) {
    // Log the operation if it is eligible according to the current slowMS and sampleRate settings.
    const bool shouldLogOp = (forceLog || shouldLog(component, logger::LogSeverity::Debug(1)));
    const long long slowMs = slowMsOverride.value_or(serverGlobalParams.slowMS);
    ...
    ...
    const bool shouldSample =
        client->getPrng().nextCanonicalDouble() < serverGlobalParams.sampleRate;

    if (shouldLogOp || (shouldSample && _debug.executionTimeMicros > slowMs * 1000LL)) {
        auto lockerInfo = opCtx->lockState()->getLockerInfo(_lockStatsBase);

    ...
    // Return 'true' if this operation should also be added to the profiler.
    return shouldDBProfile(shouldSample);
}

bool shouldDBProfile(bool shouldSample = true) {

// Profile level 2 should override any sample rate or slowms settings.

if (_dbprofile >= 2)

return true;

if (!shouldSample || _dbprofile <= 0)

return false;

/* Blog: and by elimination if _dbprofile == 1: */

return elapsedTimeExcludingPauses() >= Milliseconds{serverGlobalParams.slowMS};

}

bool CurOp::completeAndLogOperation(OperationContext* opCtx,

logger::LogComponent component,

boost::optional<size_t> responseLength,

boost::optional<long long> slowMsOverride,

bool forceLog) {

// Log the operation if it is eligible according to the current slowMS and sampleRate settings.

const bool shouldLogOp = (forceLog || shouldLog(component, logger::LogSeverity::Debug(1)));

const long long slowMs = slowMsOverride.value_or(serverGlobalParams.slowMS);

...

const bool shouldSample =

client->getPrng().nextCanonicalDouble() < serverGlobalParams.sampleRate;

if (shouldLogOp || (shouldSample && _debug.executionTimeMicros > slowMs * 1000LL)) {

auto lockerInfo = opCtx->lockState()->getLockerInfo(_lockStatsBase);

...

// Return 'true' if this operation should also be added to the profiler.

return shouldDBProfile(shouldSample);

}

5 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Mark Callaghan

4 years ago

Please file a feature request and list the URL in this post

Akira Kurogane

Author

Reply to Mark Callaghan

4 years ago

https://jira.mongodb.org/browse/SERVER-18946 (from 2015), with related SERVER-4785 (2012).

We can hope that it might be fixed in v4.4 I guess. (I can’t see any patch as being backward compatible due to it probably having small user interface changes).

Mark Callaghan

Reply to Akira Kurogane

4 years ago

Thank you. Now I can file the issue on jira

asaSAS

4 years ago

Wait a minute. System.profile can be set as a capped collection, did you try it?

Akira Kurogane

Author

3 years ago

v4.4 update: I thought I saw https://jira.mongodb.org/browse/SERVER-4785 and/or https://jira.mongodb.org/browse/SERVER-5266 as one of the “4.3 Desired” or “4.3 Required” items several months ago, but maybe that was just a daydream. It’s in Backlog now and won’t be 4.4.

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Don’t Use MongoDB Profiling Level 1

Usability problem

What to do instead:

Related

Related Blog Articles

RECOMMENDED ARTICLES

New Valkey Packages by Percona

Valkey/Redis: Not-So-Good Practices

Choosing the Right Database: Comparing MariaDB vs. MySQL, PostgreSQL, and MongoDB

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Don’t Use MongoDB Profiling Level 1

Usability problem

What to do instead:

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

New Valkey Packages by Percona

Valkey/Redis: Not-So-Good Practices

Choosing the Right Database: Comparing MariaDB vs. MySQL, PostgreSQL, and MongoDB

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation