Crash issues & Performance Tweaking

So i’m noticing that since 3.5 and now in 3.6 i’m crashing to the desktop about every 30 - 35 minutes. I can get a few short range cargo or mining runs in. But in runs from PO to GH to Reclamation runs for delivery. Pick up 2 new delivery jobs at Reclamation and usually as I board my ship I get a crash to the desktop

Secondly I just built a new system to replace the 8 year old horse I had that finally dropped dead from a power spike. Having built a number of systems in the past I’ve always had solid success with tweaks. This time around however I’m running into a few brick walls. So to start with here are the system specs.

System Information

  Time of this report: 7/29/2019, 23:24:14

     Operating System: Windows 10 Pro 64-bit (10.0, Build 18362) (18362.19h1_release.190318-1202)
             Language: English (Regional Setting: English)
          Motherboard: Asus RoG Strix X299-E
                 BIOS: 1704 (type: UEFI)
            Processor: Intel(R) Core(TM) i9-9980XE CPU @ 3.00GHz (36 CPUs), ~3.0GHz
         Memory Brand: Corsair Vengence LPX 4000Mhz (combined total 128g
               Memory: 131072MB RAM
  Available OS Memory: 130742MB RAM
            Page File: 10566MB used, 139630MB available
          Windows Dir: C:\WINDOWS
      DirectX Version: DirectX 12
  DX Setup Parameters: Not found
     User DPI Setting: 144 DPI (150 percent)
   System DPI Setting: 144 DPI (150 percent)
      DWM DPI Scaling: UnKnown
             Miracast: Available, with HDCP

Microsoft Graphics Hybrid: Not Supported
DirectX Database Version: Unknown
DxDiag Version: 10.00.18362.0267 64bit Unicode

Display Devices

       Card: 2x NVIDIA TITAN RTX running in SLI mode
       Chip type: TITAN RTX
        DAC type: Integrated RAMDAC
    Current Mode: 3840 x 2160 (32 bit) (60Hz)
    Average FPS:  30 - 47 

O/S drive is a Samsung V-Nand SSD 970 Evo Plus NVMe M.2
All other drives are Samsung Evo 860 SSD on Sata 6.0
CPU Cooling is a Corsair H115i Pro though I am considering pitching it for a Thermal-take cooling jacket system if the performance boost can be attained.

Internet Connection

Cox Cable: Download: 240 - 250 Mbps Average
Upload: 30 - 35 Mbps Average

I cant get the memory to run at the 4k rate. I tweak past 3600 and the system goes into lock up. I’ve triple verified that the memory is compatible with the board per manufacture listing. Nor can I get the CPU to tweak past about 3.4 - 3.5 Ghz when it should be able to clock out at 4.4Ghz roughly. So firstly what am i missing on the tweaks to get the system to push over to higher limits.

Secondly when the crashes occur I find a pair of errors that are concurrently listed in the Event viewer. The first being the following:

Error PerfLib Event ID 1023
Windows cannot load the extensible counter DLL “C:\WINDOWS\system32\sysmain.dll” (Win32 error code The specified module could not be found.).

The second is as follows:

Error Perfnet Event ID 2004
Unable to open the Server service performance object. The first four bytes (DWORD) of the Data section contains the status code.

Whether these two errors have any direct correlation to the crash issues I have no idea. I’m so rusty from days working for MS that my MCSE and MS troubleshooting skills amount to needing a tetanus shot with a horse needle.

Any advice and/or suggestions are greatly appreciated.

What are you using for cooling?

I’ve found that newer hardware is much more sensitive to heat. The new Intel chips have advertised TDP that is very misleading. Not to mention two Titans. This is going to be the biggest change from a system that’s 8 years old.

I have had lots of issues with my system, went so far as to replace the PSU when in fact adding more cooling was the key. And my system isn’t going to generate as much heat as the specs here.

currently for cooling im running the following:

Temps are ranging between 125 F-135 F degrees

So it’s water cooled good - that temp at full load? What is it at idle? What’s the temp of the GPU?

One thing to try, remove one of the titans - SLI is not what it used to be.

For your CPU it runs normally at 3 ghz, boost is all-core/2.0 (/max. 3.0) 3.8/4.4 GHz 4.5 GHz for a nice discussion with an Intel Fellow on what exactly boost is see this 5 day old article at anandtech

If you really want to stress test your machine, see if your CPU and memory are to blame I like running prime95’s torture test (help see if it’s CPU/Memory or GPU)

I have found bad memory using memory tests and RMAed that, on my current system all the memory tests were fine, torture tests of CPU all fine. But it still locks up / reboots / crashes sometimes. By adding extra fans and turning up the speed it does it far less often. It can be very tricky to debug a computer that crashes.

You can easily start running into red herrings from the event log. I’ve been building PCs / debugging them for 25 years. This latest batch can be even more tricky.

Okay I’ve pulled the Card and tested stability, no apparent change. As for the errors from the event manager I never give much credence unless I can find direct correlations for behavior, but given how rusty I am at OS work I figured throwing the info out there was a shot in the dark.

I’ll be setting up Prime95 to run in a bit here to see what happens.

As an off note the only program I get crashes from is SC. Anything else I’m running doesn’t hiccup in the slightest for what ever relevance that might be.

Any other high taxing games? SC is pushing the envelope, not a lot of other games will stress hardware. Also it will scale to multiple cores, many other games are single threaded. So make sure when you run prime95 let it tax all cores so it’s like SC.

Lastly when running prime95 watch the heat of the CPU and if possible the case / gpu.

The only programs that might meet that requirement is Adobe Premier when i’m doing video editing with heavy graphics & audio overlays as well as my control program that is networked to a trio of CNC mills and 1 CNC Lathe. None of which has presented any sort of crash or lock up issues to date

I probably just jinxed myself for saying that. :rofl:

The test is currently in progress, i’ll let it run till midnight my time so roughly 6+ hours. Heat at this point hasn’t kicked the Corsair cooler over to high gear either. As a side note only on rare occasion have i heard it kick in when in SC and even then its only for periods of 30 seconds at most.

Would running the resolution at 4k have any relevant effect potentially?

4k tends to be GPU limited vs CPU limited at lower resolutions.

You could try turning SC down to 1080p (an extreme to remove most of the heavy lifting of the GPU) just to see if it would affect how long it runs.

Premiere is theoretically written to be much more stable than any game since it’s a much bigger deal if it crashes mid task. I only have problems with games, no other applications on my end.

@Alxa Well after a full 6 hour run I get no errors or hardware notifications. Heat never got above 71 degrees Celsius. Haven’t pulled the second Titan card but disabled SLI. FPS is hanging around 29-30 FPS. Currently running a timed test to see how long it takes for a dump to desktop

That’s good - CPU sounds stable as I’d hope for a CPU of that caliber / cost.

What about heat at the GPU when stressing it - will need something other than Prime95 for that.

GPU temp never rose above 77C

Well I’ve shut down SLI and dropped to high graphics from Ultra high. So far I’ve avoided a dump to desktop via crash error report and only 2 disconnects. Longest run thus far is just over 84 minutes. If I can get up to three hours i’ll see about upping the graphics back to Ultra and see if it holds steady

SLI could be the issue, it’s not been a good idea for a while.

The game itself is not 100% stable so even a perfect system will still disconnect / crash occasionally. But hard lockups should be rare. That’s why it’s good they have the crash data collector.

Not sure if you fixed your issue or not, but careful with that cooler. I have the same one on the spare parts shelf. The water pump took a dump and the system would auto shut down after a little useage. (It died after 3 months of use)

@Alxa at this point things seem to be holding. Thankfully i haven’t had any hard lock ups on the system. Once I get enough accumulated hours to feel the stability is relative i’ll bump the ultra graphic on. As for the SLI who knows at this point any more.

Most of my 3D machining work during real time cut & display holds solid and its a serious system hog. The only other program I have that has any use for the SLI is X4: Foundations and I haven’t had any issues. I suppose to some degree we just have to hope that CIG decides to do some code work to kick in Multi core & Multi GPU usage and make it stable for the high performing systems out there.

@jaranamoe i have read of a few people having issues with the H115I cooler. Hence why I’m intending to get a new case and go with a full up cooling system. Just need to settle on a case style and I’ve already been digging through ThermTakes site to see what might fit the bill. Case wise my 18 yr old son is diligently trying to get me to buy a DeepCool QuadStellar since it goes with the whole SC involvement. I suspect its really his sneaky way of going “hey dad if you got one can i have one for Christmas?” :rofl::rofl:

