For most Exchange administrators the first time they encounter the concept of “back pressure” is when they see this error:

452 4.3.1 Insufficient system resources

They might see it for the first time in a non-delivery report, an SMTP error log from an application, a telnet session, or the queue viewer on another Exchange server.

In this article:

  • An overview of Transport service resource monitoring
  • Customizing back pressure thresholds
  • Detecting back pressure
    • Monitoring Transport queues
    • Monitoring event logs
    • Monitoring protocol logs

Microsoft Exchange Transport Service Resource Monitoring

Back pressure is the name for a condition that an Edge Transport or Hub Transport server is in when it is in an overloaded state and is actively refusing some or all further connection attempts from other systems.

The overloaded state is based on a series of resource utilization metrics:

  • Free disk space on the drive(s) that store the message queue database and logs
  • Uncommitted queue database transactions in memory
  • Memory utilization by the EdgeTransport.exe process (the Microsoft Exchange Transport service)
  • Overall memory utilization for the server

Each of those metrics is measured individually, and as such each is individually capable of causing the server to go into a back pressure state. There are two different levels of back pressure. as well as the condition where no over-utilization is occurring, so in total there are three resource utilization conditions that your Edge or Hub Transport servers can be in:

  • Normal – all is well and the server is performing its role as intended (assuming you haven’t modified the back pressure settings to mask a genuine problem – more on that later)
  • Medium – a resource is moderately over-utilized and the server begins limiting some connection types. Typically internal email flow remains functional while email from external or non-Exchange sources will be rejected.
  • High – a resource is severely over-utilized. The server ceases to accept any new connections.

For disk space metrics the back pressure condition causes messages to be rejected. However for memory utilization metrics, before rejecting connections the server will first take actions to attempt to relieve the conditions.

For example, the server will perform garbage collection (reclaiming memory from unused objects) or flush the server’s DNS cache.

If after a certain number of polling intervals (which vary depending on the metric involved) the utilization is still above threshold, then the server will begin rejecting new connections as it does with disk space utilization.

Ultimately the problems you will actually notice are delays or a total lack of message delivery.

Customizing Back Pressure Thresholds

The metrics used by Transport server resource monitoring to trigger back pressure are based on a combination of configurable settings as well as fixed algorithms.

The configurable settings are stored in the EdgeTransport.exe.config file, located in the following directories by default:

  • Exchange 2007: C:Program FilesMicrosoftExchange ServerBin
  • Exchange 2010: C:Program FilesMicrosoftExchange ServerV14Bin
  • Exchange 2013: C:Program FilesMicrosoftExchange ServerV15Bin

If you installed Exchange Server to a different location then you will find the config file in the bin folder of your installation directory. In Exchange 2010 and 2013 this is referenced by the environment variable $env:exchangeinstallpath.

A Guide to Back Pressure in Microsoft Exchange Server
EdgeTransport.exe.config

The first two values control whether resource monitoring is enabled, and at what interval resource monitoring is performed.

Though the option exists it is not recommended to actually disable resource monitoring. Instead the recommended approach is to identify the cause of resource over-utilization and correct it.

The same goes for basically all of the configurable settings for resource monitoring. Rather than spend time tuning the settings you should resolve the underlying issue, for example by adding disk or memory capacity to the server, or by adding additional servers to assist with overall email traffic load.

If you do happen to want to dive into the specific metrics and thresholds you can read more about them here:

Note: there is no Exchange 2013 documentation for this yet but my understanding at this stage is that it is largely the same as Exchange 2010 when it comes to resource monitoring.

Detecting Back Pressure

Back pressure can go undetected in your Exchange environment for quite some time if you are not monitoring for it. A Transport server can slip in and out of back pressure hundreds of times in a day and you could be completely unaware that it is happening if other servers manage to handle the email traffic load well enough that your customers don’t complain about delayed message delivery.

If you already have an Exchange-aware or event-based monitoring system running in your environment then then you are probably already monitoring for the signals I am about to describe, or can do so relatively easily with the monitoring system you have. For everyone else I will make a few suggestions for ways to detect these conditions.

Note that not all of these are practical to use for real time, proactive monitoring and alerting. But used as part of your regular server capacity monitoring they can be used to detect emerging capacity problems before they begin to seriously impact your environment.

Monitoring Transport Queues

Back pressure on a Transport server in one site can cause email to queue on Transport servers in other sites. Therefore by monitoring queue lengths you can detect the signs of back pressure.

You can use the Get-Queue cmdlet to manually check Transport queues on the local Transport server or on a remote server.

[PS] C:\>Get-Queue

Identity                 DeliveryType          Status MessageCount NextHopDomain
--------                 ------------          ------ ------------ -------------
HO-EX2010-MB1866        SmtpRelayWithinAdSite Retry  3            hub version 8
HO-EX2010-MB1Submission Undefined             Ready  0            Submission

One way to automate the checking of queues is with my Test-ExchangeServerHealth.ps1 script. However it is more suited to running as a scheduled report at a specific time of day, not as a continuous monitoring and alerting script (though you could customize it be one).

Get-Queue doesn’t always tell you the full story though. This is the Get-Queue output on an Exchange 2007 server that is in back pressure.

[PS] C:\>Get-Queue

Identity                DeliveryType Status MessageCount NextHopDomain
--------                ------------ ------ ------------ -------------
HO-EX2007-MB1Submis... Undefined    Ready  0            Submission

Nothing appears wrong based on that. However in the event log:

Log Name: Application
Source: MSExchangeMailSubmission
Date: 25/08/2012 10:35:25 PM
Event ID: 1009
Task Category: MSExchangeMailSubmission
Level: Warning
Keywords: Classic
User: N/A
Computer: HO-EX2007-MB1.exchangeserverpro.net
Description:
The Microsoft Exchange Mail Submission Service is currently unable to contact any Hub Transport servers in the local Active Directory site. The servers may be too busy to accept new connections at this time.

The above event log entry is the Mailbox server’s mail submission to the local Hub Transport server in the site (which happens to be the same server in this particular case) being rejected due to back pressure on the Transport server.

Another option is to use Perfmon to monitor the aggregate delivery queue on each of your Transport servers in real time.

A Guide to Back Pressure in Microsoft Exchange Server
Aggregate Delivery Queues on Transport Servers (Nothing to see here)

Monitoring Event Logs

Even if you don’t have an event-based monitoring system you can still monitor event logs using PowerShell and the Get-EventLog cmdlet.

Back pressure conditions are logged to the Application event log under a series of event IDs:

  • Event ID 15004: Increase in the utilization level for any resource (eg from Normal to Medium)
  • Event ID 15005: Decrease in the utilization level for any resource (eg from High to Medium)
  • Event ID 15006: High utilization for disk space (ie critically low free disk space)
  • Event ID 15007: High utilization for memory (ie critically low available memory)

For example to search for instances of event ID 15004 in the past 24 hours you can run the following PowerShell command:

PS C:\> Get-EventLog -ComputerName ho-ex2007-mb1 -LogName Application -After (Get-Date).AddDays(-1) | where {$_.EventID -eq "15004"}

   Index Time          EntryType   Source                 InstanceID Message
   ----- ----          ---------   ------                 ---------- -------
   93560 Aug 25 22:24  Warning     MSExchangeTransport    2147760796 Resource pressure increased from Normal to High...
   93535 Aug 25 22:09  Warning     MSExchangeTransport    2147760796 Resource pressure increased from Normal to Medi...

A more detailed report can be generated with some scripting.

Monitoring Protocol Logs

When an Edge or Hub Transport server rejects a connection due to back pressure it uses a 4.x.x SMTP status code.

Using a similar Log Parser query to the one I shared in my previous article on reporting SMTP error codes, you can scan your protocol logs for evidence of back pressure events.

This command when run from the same directory that has the protocol logs in it will produce a report of any 4.x.x SMTP error codes.

"C:Program Files (x86)Log Parser 2.2logparser.exe" "SELECT data as [Status Code],Count(*) as Hits FROM *.log WHERE data LIKE '4%' GROUP BY data ORDER BY Hits DESC" -i:CSV -nSkipLines:4 -rtp:-1

The results will appear similar to this (if any such errors are present).

Status Code                                                                Hits

-------------------------------------------------------------------------- -----

452 4.3.1 Insufficient system resources                                    19791

421 4.7.0 Too many errors on this connection, closing transmission channel 1298
421 4.4.1 Connection timed out                                             526
454 4.7.0 Temporary authentication failure                                 289
451 4.7.0 Timeout waiting for client input                                 32

You may be more interested in the per-day stats, in which case this query can be used:

"C:Program Files (x86)Log Parser 2.2logparser.exe" "SELECT TO_LOCALTIME(TO_TIMESTAMP(EXTRACT_PREFIX(TO_STRING([#Fields: date-time]),0,'T'), 'yyyy-MM-dd')) AS Date, COUNT(*) AS Hits from *.log where (data LIKE '4%') GROUP BY Date ORDER BY Date ASC" -i:CSV -nSkipLines:4 -rtp:-1

The results will look similar to this:

Date       Hits
---------- -----
2012-08-08 12
2012-08-09 93
2012-08-10 72
2012-08-11 7
2012-08-12 48
2012-08-13 146
2012-08-14 460
2012-08-15 36
2012-08-16 389
2012-08-17 10345
2012-08-18 105
2012-08-19 252
2012-08-20 9631
2012-08-21 43
2012-08-22 133
2012-08-23 80
2012-08-24 57
2012-08-25 21
2012-08-26 6

Statistics:
-----------
Elements processed: 13181205
Elements output:    19
Execution time:     123.13 seconds (00:02:3.13)

This highlights the importance of taking samples of your server stats on a regular basis. While the above example makes it pretty easy to spot the significant increase in errors on two particular days, if your logging data did not cover a long enough time span then you may need to rely on previous benchmarks to spot problems.

It also highlights the importance of having diagnostic logging such as protocol logging turned on in advance of a problem occurring, so that you can begin to troubleshoot with all data immediately available to you.

Summary

Back pressure can cause serious problems in an Exchange Server environment due to the interruptions it causes to message delivery. Be sure to check your Transport servers for signs of back pressure, and take steps to resolve the underlying issues.

About the Author

Paul Cunningham

Paul is a former Microsoft MVP for Office Apps and Services. He works as a consultant, writer, and trainer specializing in Office 365 and Exchange Server. Paul no longer writes for Practical365.com.

Comments

  1. Don Howson

    We experienced back pressure for the first time a couple of days ago. We have maybe 25 users on a good day, which hasn’t changed in years. 1TB drive, 80% free. E-mail database is 120GB. Last year, on a lark, I increased memory from 8GB to 24GB. With 8GB, memory was 90% used, with 24GB, within a few days it was 85% used. Processor usage is rarely over a few percent with the odd spike to 20% or so, if I sit a watch long enough. I’m being told that suddenly our server doesn’t have enough resources. Does this make any sense?

  2. Darya

    Error 🙁
    C:\Program Files\Microsoft\Exchange Server\V15\TransportRoles\data\Queue>”C:\Pro
    gram Files (x86)\Log Parser 2.2\logparser.exe” “SELECT data as [Status Code],Cou
    nt(*) as Hits FROM *.log WHERE data LIKE ‘5%’ GROUP BY data ORDER BY Hits DESC”
    -i:CSV -nSkipLines:4 -rtp:-1
    Error: SELECT clause: Syntax Error: unknown field ‘data’

    To see valid fields for the CSV input format type:
    LogParser -h -i:CSV

  3. Shafeeque

    hi,

    TRANSPORT SERVICE FOR EXCHANGE 2016 SUDDENLY STOPPED . THERE IS NOTHING IN THE EVNT LOG OTHER ONE 9041 LOG. IS THIS BECUASE OF THE BACKPRESSURE ?

  4. Mark Levy

    Thanks for the excellent article. I just discovered “back pressure” the hard way, and I realize that you’re not supposed to change the settings, but are they for all server roles? The reason I ask is we’ve got an Exchange 2016 premises server being used as a relay server for Office 365, and there are no user mailboxes.

    Would the thresholds be different for a server that’s only being used for relay (and management?)

    Thanks!

    Mark

  5. Muthu

    HI,

    Good One it is the same in 2013 also ?

    Regards
    Muthu

  6. Olajide Akinwande

    Paul, this didn’t work for me. it gave error saying ” unexpected token”

  7. Juan

    Does Exchange 2013 still use Log ID 15004,15006 for bakpressure?

  8. AlexIz

    Hi, Paul!

    We regularly see evenID 15004 on each of our 5 MBX (Exchange 2013 CU7). Usually the lasts for 10-30 seconds only but twice we’ve got long lasting backpressure (only 15004). We even opened a case at Microsoft PSS, they gathered and checked all possible logs but could not find the root cause of backpressure (15004). The only thing the said is that harrdware resources are ok (disks latencies, cpu load, available memory and so on). they said that backpressure caused by version bucket (15004) is usually caused by disk issues… So the verdict was tune thresholds because our send connectors are set to 50 MB (non-default) limit and backpressure thresolds were left as default.
    But I’m not sure that this is a good idea tuning up these thresholds (and by the way there is no document published by Microsoft that explains how to calculate new thresholds for Exchange 2013).

    So the question is the following – what can be done to troubleshoot such an unusual case? May be we shoul turn some verbose logging?

    1. Adam Neal

      I have exactly the same issue as Alexlz – albeit on Exchange 2007 – where the backpressure event IDs are 15004 and 15005.

      I found the cause to be PF replication messages (we have several thousand folders). I’ve modified the DatabaseMaxCahceSize in EdgeTransport.exe.config, which helped, but the issues continue periodically. I’m going to try tuning some of the other settings soon, if I can’t get these events under control by clearing down the folder replicas.

      Might be worth a look?

  9. Tomek

    For all seeking remedy for emails stuck in submission queue on Exchange 2007 – not all emails, just few selected ones, mainly with attachments (unfortunately these are most important). Check event logs for ID 1050. If you see “The execution time of agent ‘Transport Rule Agent’ exceeded 300000 (milliseconds) while handling event ‘OnRoutedMessage’….” – disable all Transport Rules you can under Organization Config. – Hub Transport. I was trigger-happy at fighting spam this way, but my server couldn’t take the load of rules and exceptions. Figuring it out made me sick of stress. Once I disabled 90% of my rules, mail flow returned. Next night I’m finally going to sleep well.

  10. james

    Hi Paul,

    I believe we are experiencing some back pressure issues in exchange 2010 mainly due to disk space. Although we don’t see any Event ID 15004, 15005, 15006 or 15007.

    We do receive the following Event ID 1009 The Microsoft Exchange Mail Submission Service is currently unable to contact any Hub Transport servers in the local Active Directory site. We have a single Hub transport server which is hosted on the same box.

    When I run the Get-EventLogLevel command I see MSExchangeTransportResourceManager is set to Lowest. Should I increase this log level in order to receive the events, if so to what level, High? Are the any other diagnostic levels i should increase in order to help identify whats going wrong?

    Thanks

  11. Joel

    I’m having an odd one I can put my finger on. Getting disk space event logs about back pressure, but have plenty of free space on all drives. So it looks like I need to tweak the thresholds. I have 63.1/100 free on C, but Exchange and logs are on E where I have 290/400 free space so it should be well below the warning levels. Yet I constantly see this popping up:

    15004: 3/12/2014 11:00:47 PM  MSExchangeTransport-The resource pressure increased from Normal to High.

    The following resources are under pressure:
    Queue database logging disk space (“E:Program FilesMicrosoftExchange ServerV15TransportRolesdataQueue”) = 99% [High] [Normal=95% Medium=97% High=99%]
    Temporary Storage disk space (“E:Program FilesMicrosoftExchange ServerV15TransportRolesdataTemp”) = 99% [High] [Normal=95% Medium=97% High=99%]

    1. james

      Hi Paul,

      I believe we are experiencing some back pressure issues in exchange 2010 mainly due to disk space. Although we don’t see any Event ID 15004, 15005, 15006 or 15007.

      We do receive the following Event ID 1009 The Microsoft Exchange Mail Submission Service is currently unable to contact any Hub Transport servers in the local Active Directory site. We have a single Hub transport server which is hosted on the same box.

      When I run the Get-EventLogLevel command I see MSExchangeTransportResourceManager is set to Lowest. Should I increase this log level in order to receive the events, if so to what level, High? Are the any other diagnostic levels i should increase in order to help identify whats going wrong?

      Thanks

  12. TeeC

    Reasonable guide, except you don’t provide any specific advice on tuning the back pressure options to resolve problems.

    1. Avatar photo

      That’s because you should not try to tune the back pressure settings.

      From the article:

      “Instead the recommended approach is to identify the cause of resource over-utilization and correct it.
      The same goes for basically all of the configurable settings for resource monitoring. Rather than spend time tuning the settings you should resolve the underlying issue, for example by adding disk or memory capacity to the server, or by adding additional servers to assist with overall email traffic load.”

  13. pravin

    Hi Paul,

    I am facing problem on Exchange server 2013 that Edge transport service using high memory and cause slowness. please suggest any solution on that.

  14. Navishkar Sadheo

    Hi Paul

    found the problem, it was network issues at the destination site.

    thank you for response and your article.

    it gave me some good insight into back pressure

  15. Navishkar Sadheo

    odd thing is when we reboot the source server the emails start flowing fine for a while and then we start getting the 421 errors in the queue viewer after a while which leads me to believe that the problem is with the source server and not the destination one.

    any other ideas?

  16. Navishkar Sadheo

    Hi Paul

    I am getting this error here:

    The Microsoft Exchange Mail Submission Service is currently unable to contact any Hub Transport servers in the local Active Directory site. The servers may be too busy to accept new connections at this time.

    But no other events relating to back pressure in the event log .

    I also ran the script you uploaded that checks for back pressure activity. Came up clean.

    What else do you think it could be.

    I am having trouble transferring email from one ht to another ht in a different ad site

    I am getting 421 SMTP errors

  17. Saran

    Hi, I’ve had this since day one, usually it’s the private bytes that overflow the ram. Haven’t been able to pinpoint what is filling up the RAM. Setup a script to restart the Service when the event is triggered

  18. TM

    When I run the monitor protocol logs I get the following error.

    I am running the command in my “TransportRolesLogsProtocolLogSmtpReceive” folder.

    Unexpected token ‘SELECT data as [Status Code],Count(*) as Hits FROM *.log WHERE data LIKE ‘4%’ GROUP BY data ORDER BY
    Hits DESC’ in expression or statement.
    At line:1 char:167
    + “C:Program Files (x86)Log Parser 2.2logparser.exe” “SELECT data as [Status Code],Count(*) as Hits FROM *.log WHERE
    data LIKE ‘4%’ GROUP BY data ORDER BY Hits DESC” <<<< -i:CSV -nSkipLines:4 -rtp:-1
    + CategoryInfo : ParserError: (SELECT data as …ER BY Hits DESC:String) [], ParentContainsErrorRecordExc
    eption
    + FullyQualifiedErrorId : UnexpectedToken

    1. DH

      Thanks for the script. However, I received the same error message

    2. jay c

      Make sure you’re running this in CMD and not in PowerShell.

  19. Ifiok

    Great post Paul, Thanks.

Leave a Reply