Ryzen 7000 Testing

CaseySJ

Guru
Guru
Donator
Joined
May 10, 2020
Messages
1,269
An intermediate observation:
  • After disabling on-board WiFi and BT, I was able to boot Ventura
  • Currently trying to create a new snapshot with one of three different build of IOPCIFamily:
    • First build (built with xcodebuild on command line)
      • kmutil succeeds in adding this to kernel cache
    • Second build -- standard Ventura, but built with Xcode App
      • kmutil fails due to a symbol problem
    • Third build -- IOPCIMessageInterruptController replaced with that from Big Sur -- built with Xcode App
      • kmutil fails due to a symbol problem
  • I'll boot back into FrankenMon and redo the 3rd experiment with xcodebuild on command line
The symbol problem is as follows:
Code:
    One or more binaries has an error which prevented linking.  See other errors.
    Could not use 'com.apple.iokit.IOPCIFamily' because: Failed to bind '___asan_version_mismatch_check_apple_clang_1400' in 'com.apple.iokit.IOPCIFamily' (at offset 0xa in __TEXT, __text) as could not find a kext which exports this symbolFailed to bind '___asan_version_mismatch_check_apple_clang_1400' in 'com.apple.iokit.IOPCIFamily' (at offset 0x520aa in __TEXT, __text) as could not find a kext which exports this symbolFailed to bind '___asan_version_mismatch_check_apple_clang_1400' in 'com.apple.iokit.IOPCIFamily' (at offset 0x7713a in __TEXT, __text) as could not find a kext which exports this symbolFailed to bind '___asan_version_mismatch_check_apple_clang_1400' in 'com.apple.iokit.IOPCIFamily' (at offset 0xa317a in __TEXT, __text) as could not find a kext which exports this symbolFailed to bind '___asan_version_mismatch_check_apple_clang_1400' in 'com.apple.iokit.IOPCIFamily' (at offset 0xa38aa in __TEXT, __text) as could not find a kext which exports this symbolFailed to bind '___asan_version_mismatch_check_apple_clang_1400' in 'com.apple.iokit.IOPCIFamily' (at offset 0xef37a in __TEXT, __text) as could not find a kext which exports this symbolFailed to bind '___asan_version_mismatch_check_apple_clang_1400' in 'com.apple.iokit.IOPCIFamily' (at offset 0xef39a in __TEXT, __text) as could not find a kext which exports this symbolFailed to bind '___asan_version_mismatch_check_apple_clang_1400' in 'com.apple.iokit.IOPCIFamily' (at offset 0xf345a in __TEXT, __text) as could not find a kext which exports this symbolFailed to bind '___asan_version_mismatch_check_apple_clang_1400' in 'com.apple.iokit.IOPCIFamily' (at offset 0xf421a in __TEXT, __text) as could not find a kext which exports this symbolFailed to bind '___asan_version_mismatch_check_apple_clang_1400' in 'com.apple.iokit.IOPCIFamily' (at offset 0xffa6a in __TEXT, __text) as could not find a kext which exports this symbolFailed to bind '___asan_version_mismatch_check_apple_clang_1400' in 'com.apple.iokit.IOPCIFamily' (at offset 0x11aafa in __TEXT, __text) as could not find a kext which exports this symbol
    com.apple.iokit.IOPCIFamily:
        Failed to bind '___asan_version_mismatch_check_apple_clang_1400' in 'com.apple.iokit.IOPCIFamily' (at offset 0xa in __TEXT, __text) as could not find a kext which exports this symbolFailed to bind '___asan_version_mismatch_check_apple_clang_1400' in 'com.apple.iokit.IOPCIFamily' (at offset 0x520aa in __TEXT, __text) as could not find a kext which exports this symbolFailed to bind '___asan_version_mismatch_check_apple_clang_1400' in 'com.apple.iokit.IOPCIFamily' (at offset 0x7713a in __TEXT, __text) as could not find a kext which exports this symbolFailed to bind '___asan_version_mismatch_check_apple_clang_1400' in 'com.apple.iokit.IOPCIFamily' (at offset 0xa317a in __TEXT, __text) as could not find a kext which exports this symbolFailed to bind '___asan_version_mismatch_check_apple_clang_1400' in 'com.apple.iokit.IOPCIFamily' (at offset 0xa38aa in __TEXT, __text) as could not find a kext which exports this symbolFailed to bind '___asan_version_mismatch_check_apple_clang_1400' in 'com.apple.iokit.IOPCIFamily' (at offset 0xef37a in __TEXT, __text) as could not find a kext which exports this symbolFailed to bind '___asan_version_mismatch_check_apple_clang_1400' in 'com.apple.iokit.IOPCIFamily' (at offset 0xef39a in __TEXT, __text) as could not find a kext which exports this symbolFailed to bind '___asan_version_mismatch_check_apple_clang_1400' in 'com.apple.iokit.IOPCIFamily' (at offset 0xf345a in __TEXT, __text) as could not find a kext which exports this symbolFailed to bind '___asan_version_mismatch_check_apple_clang_1400' in 'com.apple.iokit.IOPCIFamily' (at offset 0xf421a in __TEXT, __text) as could not find a kext which exports this symbolFailed to bind '___asan_version_mismatch_check_apple_clang_1400' in 'com.apple.iokit.IOPCIFamily' (at offset 0xffa6a in __TEXT, __text) as could not find a kext which exports this symbolFailed to bind '___asan_version_mismatch_check_apple_clang_1400' in 'com.apple.iokit.IOPCIFamily' (at offset 0x11aafa in __TEXT, __text) as could not find a kext which exports this symbol
Error Domain=KMErrorDomain Code=31 "Error occurred while building a collection:
    1: One or more binaries has an error which prevented linking.  See other errors.
 

alyxferrari

Guru
Guru
Joined
Jul 19, 2021
Messages
25
I had a nearly identical issue trying to build KCs a week or so ago, except it cited a different kext. You're not alone :p
I never figured out how to fix it, I'd been at it for so long at that point that I just called it a night.
 

CaseySJ

Guru
Guru
Donator
Joined
May 10, 2020
Messages
1,269
Interim Update #7:
  • On a different Hack (how many do I have? don't ask ;)) running Monterey, I decided to install the previous version of Xcode that ships with Monterey 12.3 SDK.
  • XNU builds fully (needs a couple minor code tweaks)
  • IOPCIFamily also builds fully (also needs a handful of minor code tweaks)
  • All components of IOPCIFamily can be built through Xcode:
    • IOPCIFamily.kext
    • ApplePIODMA.kext
    • PCIDriverKit.framework
  • Perhaps we need to install all of these components together, not just IOPCIFamily.kext alone...
  • Will test soon on Monterey because it's less constrained than Ventura
Screen Shot 2022-11-06 at 5.49.30 AM.png
 
Last edited:

ExtremeXT

Donator
Donator
Joined
Aug 7, 2022
Messages
843
Interim Update #7:
  • On a different Hack (how many do I have? don't ask ;)) running Monterey, I decided to install the previous version of Xcode that ships with Monterey 12.3 SDK.
  • XNU builds fully (needs a couple minor code tweaks)
  • IOPCIFamily also builds fully (also needs a handful of minor code tweaks)
  • All components of IOPCIFamily can be built through Xcode:
    • IOPCIFamily.kext
    • ApplePIODMA.kext
    • PCIDriverKit.framework
  • Perhaps we need to install all of these components together, not just IOPCIFamily.kext alone...
  • Will test soon on Monterey because it's less constrained than Ventura
View attachment 8119
I don't think the framework or PIODMA have to be replaced since we're not modifying the code, not sure though.
 

CaseySJ

Guru
Guru
Donator
Joined
May 10, 2020
Messages
1,269
There were no kmutil (kernel cache) issues when integrating the new IOPCIFamily.kext into Monterey and creating a new snapshot. After blessing the new snapshot (so it boots on next startup), the system booted up fully, but screens are blank. This does not matter because everything we're interested in happens before screens initialize.

Incidentally, the new blessed snapshot did not boot in Ventura. It just produced the dreaded [EXITBS]. Hence I'll focus on Monterey today.

IMG_7687.jpg

Going to replace IOPCIMessagedInterruptController now with Big Sur version.
 
Last edited:

CaseySJ

Guru
Guru
Donator
Joined
May 10, 2020
Messages
1,269
Interim Update #8:
  • Good thing we switched to Monterey...
  • The system boots properly with screen output -- I changed my previous post accordingly (no blank screen problem)
  • The new IOPCIFamily with IOPCIMessageInterruptController from Big Sur compiled and linked
  • The new IOPCIFamily subsequently integrated into the Monterey kernel cache with kmutil (no errors)
  • A new system snapshot was created, and it booted up fully
  • But now the however comes in... ;)
  • However, PCI devices still do not work. We can see in screenshot below that:
    • System Snapshot has been booted
    • /System/Library/Extensions folder clearly shows our new IOPCIFamily in effect
    • IORegistryExplorer shows no interrupts attached to i225-V
  • At least we know that our debugging methodology (i.e., the ability to modify and install IOPCIFamily) works
  • This also means we have plenty of options for further instrumentation / experimentation
  • I'll upload this to my GitHub repo momentarily -- done
Screen Shot 2022-11-06 at 6.57.26 AM.png
 
Last edited:

ExtremeXT

Donator
Donator
Joined
Aug 7, 2022
Messages
843
Interim Update #8:
  • Good thing we switched to Monterey...
  • The system boots properly with screen output -- I changed my previous post accordingly (no blank screen problem)
  • The new IOPCIFamily with IOPCIMessageInterruptController from Big Sur compiled and linked
  • The new IOPCIFamily subsequently integrated into the Monterey kernel cache with kmutil (no errors)
  • A new system snapshot was created, and it booted up fully
  • But now the however comes in... ;)
  • However, PCI devices still do not work. We can see in screenshot below that:
    • System Snapshot has been booted
    • /System/Library/Extensions folder clearly shows our new IOPCIFamily in effect
    • IORegistryExplorer shows no interrupts attached to i225-V
  • At least we know that our debugging methodology (i.e., the ability to modify and install IOPCIFamily) works
  • This also means we have plenty of options for further instrumentation / experimentation
  • I'll upload this to my GitHub repo momentarily -- done
View attachment 8121
Can you try with my PR?
 

CaseySJ

Guru
Guru
Donator
Joined
May 10, 2020
Messages
1,269
Can you try with my PR?
Just merged the pull request and installed the new IOPCIFamily. Alas, the result is same as before:

Screen Shot 2022-11-06 at 8.04.28 AM.png

My suspicion is that PCI devices other than GPU and USB are being marked InActive earlier on in the startup sequence. Their configuration registers are therefore not read because every configRead() function first checks for inactive state.
 
Last edited:

CaseySJ

Guru
Guru
Donator
Joined
May 10, 2020
Messages
1,269
Interim Update #9:
  • I've added trace logs that report the state of devices throughout the initialization process.
  • Here's a partial look at the trace logs for I225 (device 0x15f3):
Code:
[    0.467413]: bridgeAllocateResources(bridge [i22]8:0:0(9:9), state 0xb7c, type 0x7)
[    0.467415]: Casey bridgeAllocateResources child 1 [i28]9:0:0(0x8086:0x15f3) state 0x808
[    0.467418]: Casey bridgeAllocateResources child 1 [i28]9:0:0(0x8086:0x15f3) state 0x808
[    0.467419]:   MEM: 0xfc200000:0x0,0x100000-0x100000,0x0:0x100000 (at [i28]9:0:0(0x8086:0x15f3)) arsmbv
[    0.467422]:   MEM: 0xfc200000:0x0,0x100000-0x100000,0x0:0x100000 (at [i28]9:0:0(0x8086:0x15f3)) arsmbv
[    0.467424]:   MEM: 0xfc300000:0x0,0x4000-0x4000,0x0:0x4000 (at [i28]9:0:0(0x8086:0x15f3)) arsmbv
[    0.467426]:   MEM: 0xfc300000:0x0,0x4000-0x4000,0x0:0x4000 (at [i28]9:0:0(0x8086:0x15f3)) arsmbv
[    0.467428]: Casey bridgeAllocateResources child 3 [i28]9:0:0(0x8086:0x15f3) state 0x808
[    0.467430]: Casey bridgeAllocateResources child 3 [i28]9:0:0(0x8086:0x15f3) state 0x808
[    0.467489]:   MEM: 0xfc200000:0x100000,0x100000-0x100000,0x0:0x100000 (at [i28]9:0:0(0x8086:0x15f3)) Arsmbv   MEM: 0xfc200000:0x100000,0x100000-0x100000,0x0:0x100000 (at [i28]9:0:0(0x8086:0x15f3)) Arsmbv  ok allocated
[    0.467496]:  ok allocated
[    0.467498]:   MEM: 0xfc300000:0x4000,0x4000-0x4000,0x0:0x4000 (at [i28]9:0:0(0x8086:0x15f3)) Arsmbv   MEM: 0xfc300000:0x4000,0x4000-0x4000,0x0:0x4000 (at [i28]9:0:0(0x8086:0x15f3)) Arsmbv  ok allocated
[    0.467505]:  ok allocated
[    0.467508]: Casey applyConfiguration : device [i28]9:0:0(0x8086:0x15f3) state 0x808
[    0.467510]: Casey applyConfiguration : device [i28]9:0:0(0x8086:0x15f3) state 0x808
[    0.467512]: Applying config (bm 0x9, sm 0x0) for device [i28]9:0:0(0x8086:0x15f3)
[    0.467514]: Applying config (bm 0x9, sm 0x0) for device [i28]9:0:0(0x8086:0x15f3)
[    0.467530]:   [0x10 MEM] 0xfc200000, read 0xfc200000
[    0.467535]:   [0x10 MEM] 0xfc200000, read 0xfc200000
[    0.467540]:   [0x1c MEM] 0xfc300000, read 0xfc300000
[    0.467544]:   [0x1c MEM] 0xfc300000, read 0xfc300000
[    0.467550]:   Device Command = 0x00100006
[    0.467555]:   Device Command = 0x00100006

  • Log entries beginning with Casey are obviously mine; this allows me to filter the logs more easily
  • Later in the log we see that device I225 is readable because when configRead is called, its state is 0x1e instead of 0x0 (it starts off at 0x0)
Code:
[   21.830728]: Casey configAccess I225 state 0x1e
[   22.054239]: 0, 0  AppleIntelI210::start - Allocate interrupt event source failed
[   22.056777]: /System/Library/DriverExtensions/com.apple.DriverKit-AppleEthernetIXGBE.dext/com.apple.DriverKit-AppleEthernetIXGBE[336] ==> com.apple.dextDK: DriverKit_AppleEthernetIXGBE-0x100000533 waiting for server com.apple.DriverKit-AppleEthernetIXGBE-100000533

  • Here we see that Interrupts were not registered; so my goal is to add trace logs to the steps leading up to this
  • The full log is attached in case any one is interested
  • When looking at the state reported by configAccess, the flags are as follows:
C:
// masks for getState()
enum {
    kIOServiceInactiveState = 0x00000001,
    kIOServiceRegisteredState   = 0x00000002,
    kIOServiceMatchedState  = 0x00000004,
    kIOServiceFirstPublishState = 0x00000008,
    kIOServiceFirstMatchState   = 0x00000010,
    kIOServiceReservedMatchState   = 0x80000000,
};

  • When looking at all other state values, the flags are as follows:
C:
enum {
//    kPCIDeviceStateResourceAssigned  = 0x00000001,
    kPCIDeviceStatePropertiesDone    = 0x00000002,
    kPCIDeviceStateTreeConnected     = 0x00000004,
    kPCIDeviceStateConfigurationDone = 0x00000008,
    kPCIDeviceStateScanned          = 0x00000010,
    kPCIDeviceStateAllocatedBus     = 0x00000020,
    kPCIDeviceStateTotalled         = 0x00000040,
    kPCIDeviceStateAllocated        = 0x00000080,
    kPCIDeviceStateChildChanged     = 0x00000100,
    kPCIDeviceStateChildAdded       = 0x00000200,
    kPCIDeviceStateNoLink           = 0x00000400,
    kPCIDeviceStateRangesProbed     = 0x00000800,

    kPCIDeviceStateConfigProtectShift = 15,
    kPCIDeviceStateConfigRProtect    = (VM_PROT_READ  << kPCIDeviceStateConfigProtectShift),
    kPCIDeviceStateConfigWProtect    = (VM_PROT_WRITE << kPCIDeviceStateConfigProtectShift),

    kPCIDeviceStateDead             = 0x80000000,
    kPCIDeviceStateEjected          = 0x40000000,
    kPCIDeviceStateToKill           = 0x20000000,
    kPCIDeviceStatePaused           = 0x10000000,
    kPCIDeviceStateRequestPause     = 0x08000000,
    kPCIDeviceStateHidden           = 0x04000000,

    kPCIDeviceStateDeadOrHidden     = kPCIDeviceStateDead | kPCIDeviceStateHidden
};
 

Attachments

  • dmesg4.txt.zip
    58.2 KB · Views: 0

CaseySJ

Guru
Guru
Donator
Joined
May 10, 2020
Messages
1,269
Interim Update #10:
  • After compiling several more versions of IOPCIFamily I am getting a better sense of the order in which its member functions are invoked
  • Functions such as probe and open (handleOpen) are called when the downstream driver instantiates its derived class and invokes the parent’s superclass (super :: probe for instance)
  • When probe is called, IOPCIFamily attempts to resolve interrupts, but fails
  • However, probe also makes a platform function call that I have yet to trace-log
  • IOPCIFamily implements a new IOPCIHostBridge class that doesn’t exist in Big Sur
My earlier assumption of PCI devices being marked inactive is incorrect. Instead, the probe/open sequence is not succeeding because of some other failure in these steps.
 
Last edited:

CaseySJ

Guru
Guru
Donator
Joined
May 10, 2020
Messages
1,269
In the previous post I forgot to attach the latest dmesg log, so it's attached below. This log is from a modified IOPCIFamily in which I've disabled a lot of my trace logs that aren't needed any longer. Instead, trace logs were added to functions related to interrupts and AppleVTD. Boot argument pci=0x07 was used instead of 0x03 because 0x07 also enables VT-d logs.
C:
enum {
    kIOPCIConfiguratorIOLog          = 0x00000001,
    kIOPCIConfiguratorKPrintf        = 0x00000002,
    kIOPCIConfiguratorVTLog          = 0x00000004,
    ...
}
The interesting part of this log occurs below, where we see that interrupts on the I225's parent bridge DP00 are failing.
Code:
[   20.494796]:  start: NVM Checksum incorrect Casey resolveLegacyInterrupts provider DP00 pin=255
[   20.795815]: Casey resolveLegacyInterrupts provider DP00 pin=255
[   20.798853]: Casey resolveInterrupts DP00 calling resolveMSIInterrupts
[   20.801900]: Casey resolveInterrupts DP00 calling resolveMSIInterrupts
[   20.804933]: Casey resolveInterrupts DP00 ret=-536870201 lret=-536870201
[   20.807943]: Casey resolveInterrupts DP00 ret=-536870201 lret=-536870201
[   20.810973]: 0, 0  AppleIntelI210::start - Allocate interrupt event source failed
[   20.813611]: /System/Library/DriverExtensions/com.apple.DriverKit-AppleEthernetIXGBE.dext/com.apple.DriverKit-AppleEthernetIXGBE[334] ==> com.apple.dextDK: DriverKit_AppleEthernetIXGBE-0x100000540 waiting for server com.apple.DriverKit-AppleEthernetIXGBE-100000540
 

Attachments

  • dmesg12.txt.zip
    54.3 KB · Views: 1

CaseySJ

Guru
Guru
Donator
Joined
May 10, 2020
Messages
1,269
Interim Update #11:
  • After several more iterations of IOPCIFamily, the logs are more clearly showing what the problem is.
  • Let's look at the following segment from attached dmesg:
Code:
[   20.909711]: Casey extendedConfigRead8 I225 for 9:0:0 offset=0xd valueRead=0xff
[   21.133250]: apfs_stop_bg_work:1065: disk2s11 Volume FrankenMon2 is unmounting, stop any bg work
[   21.350489]:  start: NVM Checksum incorrectCasey extendedConfigRead8 I225 for 9:0:0 offset=0x3d valueRead=0xff
[   21.576431]: apfs_log_mount_unmount:1889: disk2s11 unmounting volume FrankenMon2, requested by: com.apple.MobileSoftwareUpdate. (pid 320); parent: launchd (pid 1)
[   21.576435]: apfs_vfsop_unmount:2740: disk2s11 waiting for purgatory cleaner to finish
[   21.578661]: authapfs_seal_restore:462: disk2s11 broken in xid: 445692, fsroot tree xid: 524514
[   21.579963]: apfs_vfsop_unmount:3074: disk2 nx_num_vols_mounted is 11
[   21.579965]: apfs: total mem allocated: 40680328 (39 mb);
[   21.579966]: apfs_vfsop_unmount:3087: all done.  going home.  (numMountedAPFSVolumes 11)
[   21.582949]: apfs_mount:27041: disk2s11 EALREADY; rooted from snap, apfs <ptr>
[   21.582953]: apfs_log_mount_unmount:1889: disk2s11 mounting volume FrankenMon2, requested by: mount_apfs (pid 337); parent: com.apple.Mobile (pid 320)
[   21.582958]: apfs_mount_livefs:26758: disk2 mount no firmlinks
[   21.582985]: revert_to_snapshot:1629: disk2s11 Reverting to snapshot w/xid 524492 and old sblock oid 1291450.
[   21.583008]: revert_extents_to_snapshot:1453: disk2s11 free'ing extents in main extentref tree 916911
[   21.583017]: free_allocated_snapshot_extents:1368: disk2s11 processed 2 extents and free'd 2 blocks
[   21.597061]: pci1022,43f6: match category IODefaultMatchCategory exists
[   21.621690]: Casey extendedConfigRead8 I225 for 9:0:0 offset=0x3d valueRead=0xff
[   21.624977]: Casey resolveLegacyInterrupts provider DP00 pin=255
[   21.628236]: Casey resolveLegacyInterrupts provider DP00 pin=255
[   21.631500]: Casey resolveInterrupts DP00 calling resolveMSIInterrupts
[   21.634735]: Casey resolveInterrupts DP00 calling resolveMSIInterrupts
[   21.638173]: Casey resolveMSIInterrupts DP00 - callPlatformFunction(gIOPlatformGetMessagedInterruptControllerKey) - returns 0xe00002c7
[   21.640932]: Sandbox: accessoryupdater(96) allow mach-lookup com.apple.commcenter.coretelephony.xpcCasey resolveMSIInterrupts DP00 - callPlatformFunction(gIOPlatformGetMessagedInterruptControllerKey) - returns 0xe00002c7
[   21.644877]: Casey resolveInterrupts DP00 ret=0xe00002c7 lret=0xe00002c7
[   21.648036]: Casey resolveInterrupts DP00 ret=0xe00002c7 lret=0xe00002c7
[   21.651203]: 0, 0  AppleIntelI210::start - Allocate interrupt event source failed
[   21.653696]: /System/Library/DriverExtensions/com.apple.DriverKit-AppleEthernetIXGBE.dext/com.apple.DriverKit-AppleEthernetIXGBE[338] ==> com.apple.dextDK: DriverKit_AppleEthernetIXGBE-0x10000053b waiting for server com.apple.DriverKit-AppleEthernetIXGBE-10000053b

  • We see several lines like this: Casey extendedConfigRead8 I225 for 9:0:0 offset=0xd valueRead=0xff.
  • This is reading the device's PCI configuration register at offset 0xd, which is the LatencyTimer.
  • Here's a full list of register offsets:
C:
/* Definitions of PCI Config Registers */
enum
{
    kIOPCIConfigurationOffsetVendorID          = 0x00,
    kIOPCIConfigurationOffsetDeviceID          = 0x02,
    kIOPCIConfigurationOffsetCommand           = 0x04,
    kIOPCIConfigurationOffsetStatus            = 0x06,
    kIOPCIConfigurationOffsetRevisionID        = 0x08,
    kIOPCIConfigurationOffsetClassCode         = 0x09,
    kIOPCIConfigurationOffsetCacheLineSize     = 0x0C,
    kIOPCIConfigurationOffsetLatencyTimer      = 0x0D,
    kIOPCIConfigurationOffsetHeaderType        = 0x0E,
    kIOPCIConfigurationOffsetBIST              = 0x0F,
    kIOPCIConfigurationOffsetBaseAddress0      = 0x10,
    kIOPCIConfigurationOffsetBaseAddress1      = 0x14,
    kIOPCIConfigurationOffsetBaseAddress2      = 0x18,
    kIOPCIConfigurationOffsetBaseAddress3      = 0x1C,
    kIOPCIConfigurationOffsetBaseAddress4      = 0x20,
    kIOPCIConfigurationOffsetBaseAddress5      = 0x24,
    kIOPCIConfigurationOffsetCardBusCISPtr     = 0x28,
    kIOPCIConfigurationOffsetSubSystemVendorID = 0x2C,
    kIOPCIConfigurationOffsetSubSystemID       = 0x2E,
    kIOPCIConfigurationOffsetExpansionROMBase  = 0x30,
    kIOPCIConfigurationOffsetCapabilitiesPtr   = 0x34,
    kIOPCIConfigurationOffsetInterruptLine     = 0x3C,
    kIOPCIConfigurationOffsetInterruptPin      = 0x3D,
    kIOPCIConfigurationOffsetMinimumGrant      = 0x3E,
    kIOPCIConfigurationOffsetMaximumLatency    = 0x3F
};

  • Later in the log we see an attempt to read offset 0x3d, which is InterruptPin.
  • The call succeeds, but the resulting value is 0xFF (255).
  • Hence we see the following (by the way, pin cannot be greater than 4):
Code:
[   21.718639]: Casey extendedConfigRead8 I225 for 9:0:0 offset=0x3d valueRead=0xff
[   21.942471]: Casey resolveLegacyInterrupts provider DP00 pin=255

  • This means, in effect, that the device's configuration registers are no longer active. (They were active before because IORegistryExplorer contains valid entries.)
  • Because of this, the system cannot assign interrupts -- or do anything else.
  • Possible reasons for inactive configuration registers:
    • Address of register is wrong
    • Device has been deactivated
    • Other
 

Attachments

  • dmesg19.txt.zip
    57.9 KB · Views: 0
Last edited:

svan71

Donator
Donator
AMD OS X Member
Joined
Oct 24, 2020
Messages
123
Thank you all for your persistence.
 

craighazan

Donator
Donator
Joined
Nov 22, 2021
Messages
346
I'm absolutely fascinated by all this!. Amazing work, I'm sure I speak for a lot of people, we really appreciate the updates. And just learning about the guts of a system.
 

CaseySJ

Guru
Guru
Donator
Joined
May 10, 2020
Messages
1,269
The next update will likely take a couple of days. Weekdays are busy, and the next step in troubleshooting is tricky -- to determine when and where device configuration registers become disabled. It's likely that the base address is getting changed, so I'll add trace logs to monitor base addresses. Everything works fine during initial PCI configuration where all buses and bridges are scanned, and IOReg is populated. Sometime after IOPCIConfigurator has completed the configuration, all PCI devices other than GPU and USB become inaccessible even though their "state" parameters are still active. In other words, IOPCIFamily thinks the devices are still alive when in fact something has happened to them -- such as incorrect base address -- but this still needs to be verified via trace logs.
 
Last edited:

ManuelB

Donator
Donator
AMD OS X Member
Joined
Jul 23, 2021
Messages
204
@CaseySJ vielen Dank für deine Arbeit...
 

CaseySJ

Guru
Guru
Donator
Joined
May 10, 2020
Messages
1,269
Interim Update #12:
  • I was able to spend a few hours here and there over the past couple of days
  • Found a way to track base address register, but I placed the trace log in callPlatformFunction() and this shows some interesting results
  • Now I need to copy-and-paste the trace log into another function, most likely configRead8() -- will do this tomorrow as time permits
  • Here's the code for the trace log itself:
C:
    if (gIOPCIFlags & (kIOPCIConfiguratorIOLog | kIOPCIConfiguratorKPrintf))
    {
        IOPhysicalAddress bar0Addr = 0;
        // IOPhysicalAddress bar1Addr = 0;
        if (this->reserved)
        {
            if (this->reserved->deviceMemory[kIOPCIRangeBAR0])
            {
                bar0Addr = this->reserved->deviceMemory[kIOPCIRangeBAR0]->getPhysicalAddress();
            }
            else if (this->reserved->deviceMemoryMap[kIOPCIRangeBAR0])
            {
                bar0Addr = this->reserved->deviceMemoryMap[kIOPCIRangeBAR0]->getPhysicalAddress();
            }
        }
        DLOG("Casey callPlatformFunction %s : %s, bar0=0x%llx result=0x%x \n", this->getName(), functionName->getCStringNoCopy(), bar0Addr, result);
    }

  • I am only logging the address of BAR0. There are also BAR1, BAR2, BAR3, BAR4, BAR5, and ExpansionROM. These are not being logged at the moment.
  • This shows something interesting. All of the BAR0 addresses are 0 except for GPU and USB. We can see this by running the following command on the attached dmesg log file:
Bash:
cat dmesg31.txt | grep bar0= | grep -v bar0=0x0

  • This will show all the calls to callPlatformFunction() where the Base Address Register (BAR) is non-zero. The result is:
Bash:
[    0.319646]: Casey callPlatformFunction XHC0 : GetMessagedInterruptAddress, bar0=0xfc900000 result=0x0
[    0.319648]: Casey callPlatformFunction XHC0 : GetMessagedInterruptAddress, bar0=0xfc900000 result=0x0
[    0.319697]: Casey callPlatformFunction XHC1 : GetMessagedInterruptAddress, bar0=0xfc800000 result=0x0
[    0.319698]: Casey callPlatformFunction XHC1 : GetMessagedInterruptAddress, bar0=0xfc800000 result=0x0
[    0.319791]: Casey callPlatformFunction XHC2 : GetMessagedInterruptAddress, bar0=0xfcf00000 result=0x0
[    0.319793]: Casey callPlatformFunction XHC2 : GetMessagedInterruptAddress, bar0=0xfcf00000 result=0x0
[    0.319987]: Casey callPlatformFunction SWUS : IOPlatformDeviceMessage, bar0=0xfce00000 result=0x0
[    0.319989]: Casey callPlatformFunction SWUS : IOPlatformDeviceMessage, bar0=0xfce00000 result=0x0
[    0.319994]: Casey callPlatformFunction SWUS : IOPlatformDeviceMessage, bar0=0xfce00000 result=0x0
[    0.319996]: Casey callPlatformFunction SWUS : IOPlatformDeviceMessage, bar0=0xfce00000 result=0x0
[    0.320065]: Casey callPlatformFunction SWUS : IOPCIDeviceChanged, bar0=0xfce00000 result=0xe00002c7
[    0.320066]: Casey callPlatformFunction SWUS : IOPCIDeviceChanged, bar0=0xfce00000 result=0xe00002c7
[    0.320416]: Casey callPlatformFunction XH00 : GetMessagedInterruptAddress, bar0=0xfc700000 result=0x0
[    0.320418]: Casey callPlatformFunction XH00 : GetMessagedInterruptAddress, bar0=0xfc700000 result=0x0
[    0.320435]: Casey callPlatformFunction SWUS : IOPlatformDeviceMessage, bar0=0xfce00000 result=0x0
[    0.320436]: Casey callPlatformFunction SWUS : IOPlatformDeviceMessage, bar0=0xfce00000 result=0x0
[    0.320442]: Casey callPlatformFunction SWUS : IOPlatformDeviceMessage, bar0=0xfce00000 result=0x0
[    0.320444]: Casey callPlatformFunction SWUS : IOPlatformDeviceMessage, bar0=0xfce00000 result=0x0
[    0.320518]: Casey callPlatformFunction SWUS : IOPlatformDeviceMessage, bar0=0xfce00000 result=0x0
[    0.320519]: Casey callPlatformFunction SWUS : IOPlatformDeviceMessage, bar0=0xfce00000 result=0x0
[    0.320525]: Casey callPlatformFunction SWUS : IOPlatformDeviceMessage, bar0=0xfce00000 result=0x0
[    0.320526]: Casey callPlatformFunction SWUS : IOPlatformDeviceMessage, bar0=0xfce00000 result=0x0
[    0.320531]: Casey callPlatformFunction SWUS : IOPCIDeviceChanged, bar0=0xfce00000 result=0xe00002c7
[    0.320533]: Casey callPlatformFunction SWUS : IOPCIDeviceChanged, bar0=0xfce00000 result=0xe00002c7
[    0.320592]: Casey callPlatformFunction SWUS : IOPCIDeviceChanged, bar0=0xfce00000 result=0xe00002c7
[    0.320594]: Casey callPlatformFunction SWUS : IOPCIDeviceChanged, bar0=0xfce00000 result=0xe00002c7
[    1.645923]: Casey callPlatformFunction pci1022,43f6 : GetMessagedInterruptAddress, bar0=0xfc680000 result=0x0
[    1.645924]: Casey callPlatformFunction pci1022,43f6 : GetMessagedInterruptAddress, bar0=0xfc680000 result=0x0
[   15.093438]: Casey callPlatformFunction pci1022,43f6 : GetMessagedInterruptAddress, bar0=0xfc380000 result=0x0
[   15.093439]: Casey callPlatformFunction pci1022,43f6 : GetMessagedInterruptAddress, bar0=0xfc380000 result=0x0
[   19.942839]: Casey callPlatformFunction XH00 : GetMessagedInterruptAddress, bar0=0xfc400000 result=0x0
[   19.942840]: Casey callPlatformFunction XH00 : GetMessagedInterruptAddress, bar0=0xfc400000 result=0x0
[   26.531445]: Casey callPlatformFunction SWUS : ResolvePCIInterrupt, bar0=0xfce00000 result=0x0
[   26.531446]: Casey callPlatformFunction SWUS : ResolvePCIInterrupt, bar0=0xfce00000 result=0x0
[   26.531461]: Casey callPlatformFunction SWUS : SetDeviceInterrupts, bar0=0xfce00000 result=0x0
[   26.531463]: Casey callPlatformFunction SWUS : SetDeviceInterrupts, bar0=0xfce00000 result=0x0
[   26.531474]: Casey callPlatformFunction SWUS : GetMessagedInterruptController, bar0=0xfce00000 result=0xe00002c7
[   26.531475]: Casey callPlatformFunction SWUS : GetMessagedInterruptController, bar0=0xfce00000 result=0xe00002c7
[   26.531497]: Casey callPlatformFunction SWUS : GetMessagedInterruptAddress, bar0=0xfce00000 result=0x0
[   26.531498]: Casey callPlatformFunction SWUS : GetMessagedInterruptAddress, bar0=0xfce00000 result=0x0
[   26.531503]: Casey callPlatformFunction HDAU : GetMessagedInterruptAddress, bar0=0xfcd20000 result=0x0
[   26.531504]: Casey callPlatformFunction HDAU : GetMessagedInterruptAddress, bar0=0xfcd20000 result=0x0
[   32.031254]: Casey callPlatformFunction SWUS : ResolvePCIInterrupt, bar0=0xfce00000 result=0x0
[   32.031255]: Casey callPlatformFunction SWUS : ResolvePCIInterrupt, bar0=0xfce00000 result=0x0
[   32.031266]: Casey callPlatformFunction SWUS : SetDeviceInterrupts, bar0=0xfce00000 result=0x0
[   32.031267]: Casey callPlatformFunction SWUS : SetDeviceInterrupts, bar0=0xfce00000 result=0x0
[   32.031281]: Casey callPlatformFunction SWUS : GetMessagedInterruptAddress, bar0=0xfce00000 result=0x0
[   32.031281]: Casey callPlatformFunction SWUS : GetMessagedInterruptAddress, bar0=0xfce00000 result=0x0
[   32.031284]: Casey callPlatformFunction GFX0 : GetMessagedInterruptAddress, bar0=0x880000000 result=0x0
[   32.031285]: Casey callPlatformFunction GFX0 : GetMessagedInterruptAddress, bar0=0x880000000 result=0x0

...continued in next post...
 
Last edited:

CaseySJ

Guru
Guru
Donator
Joined
May 10, 2020
Messages
1,269
Interim Update #12 Continued...

  • We see that BAR0 is non-zero only for the GPU, USB, and Audio (from AMD GPU) controllers -- and their parent PCI bridges. USB controllers are XHC0, XHC1, XHC2, and XH00. The GPU is GFX0. SWUS is the GPU's parent PCI bridge.
  • Let's drill down to one of these devices. Let's pick USB controller XH00. There are actually two of these devices (see IOReg below). The log above shows that the BAR0 addresses are bar0=0xfc700000 for one of them, and bar0=0xfc400000 for the other.
  • Now let's see what IORegistryExplorer shows:
Screen Shot 2022-11-09 at 7.30.16 PM.pngScreen Shot 2022-11-09 at 7.02.26 PM.png
  • The values are identical. This confirms that the trace log is correct (i.e. my code is not buggy).
  • Now let's see what IOReg shows for I225 and WIFI controllers:
Screen Shot 2022-11-09 at 7.04.24 PM.pngScreen Shot 2022-11-09 at 7.05.17 PM.png
  • Sure enough, both I225 and WIFIhave correct register addresses
    • I225 is the Intel i225-V controller with addresses 0xfc100000 and 0xfc200000
    • WIFI is the on-board Intel WiFi 6E controller with address 0xfc500000
  • But when we look at the trace log for I225, we see the following:
Bash:
% cat dmesg31.txt| grep I225

[    0.544217]: Casey extendedConfigRead8 I225 for 9:0:0 offset=0xe valueRead=0xff
[    0.544219]: Casey extendedConfigRead8 I225 for 9:0:0 offset=0xe valueRead=0xff
[    9.582596]: Casey callPlatformFunction I225 : IOPlatformDeviceMessage, bar0=0x0 result=0x0
[    9.582597]: Casey callPlatformFunction I225 : IOPlatformDeviceMessage, bar0=0x0 result=0x0
[    9.582609]: Casey callPlatformFunction I225 : IOPlatformDeviceMessage, bar0=0x0 result=0x0
[    9.582610]: Casey callPlatformFunction I225 : IOPlatformDeviceMessage, bar0=0x0 result=0x0
[    9.803133]: Casey callPlatformFunction I225 : IOPCIDeviceChanged, bar0=0x0 result=0xe00002c7
[    9.803135]: Casey callPlatformFunction I225 : IOPCIDeviceChanged, bar0=0x0 result=0xe00002c7
[   12.448923]: I225(0x10000028a): matching deferred by AppleIntelI210
[   28.571813]: Sandbox: softwareupdated(96) deny(1) file-read-metadata /private/var/mobileCasey extendedConfigRead8 I225 for 9:0:0 offset=0x8 valueRead=0xff
[   28.791622]: Casey extendedConfigRead8 I225 for 9:0:0 offset=0x8 valueRead=0xff
[   29.012093]: Casey extendedConfigRead8 I225 for 9:0:0 offset=0xc valueRead=0xff
[   29.012096]: Casey extendedConfigRead8 I225 for 9:0:0 offset=0xc valueRead=0xff
[   29.232301]: Casey extendedConfigRead8 I225 for 9:0:0 offset=0xd valueRead=0xff
[   29.232302]: Casey extendedConfigRead8 I225 for 9:0:0 offset=0xd valueRead=0xff
[   29.672986]:  start: NVM Checksum incorrectCasey extendedConfigRead8 I225 for 9:0:0 offset=0x3d valueRead=0xff
[   29.893459]: Casey extendedConfigRead8 I225 for 9:0:0 offset=0x3d valueRead=0xff

  • The BAR0 address for I225 is 0. Hmm! What happened here?
  • When we look at the trace log for WIFI, we see the same story:
Bash:
% cat dmesg31.txt| grep WIFI

[   10.023767]: Casey extendedConfigRead8 WIFI for 10:0:0 offset=0xe valueRead=0xff
[   10.023769]: Casey extendedConfigRead8 WIFI for 10:0:0 offset=0xe valueRead=0xff
[   19.060406]: Casey callPlatformFunction WIFI : IOPlatformDeviceMessage, bar0=0x0 result=0x0
[   19.060407]: Casey callPlatformFunction WIFI : IOPlatformDeviceMessage, bar0=0x0 result=0x0
[   19.060418]: Casey callPlatformFunction WIFI : IOPlatformDeviceMessage, bar0=0x0 result=0x0
[   19.060419]: Casey callPlatformFunction WIFI : IOPlatformDeviceMessage, bar0=0x0 result=0x0
[   19.280705]: Casey callPlatformFunction WIFI : IOPCIDeviceChanged, bar0=0x0 result=0xe00002c7
[   19.280706]: Casey callPlatformFunction WIFI : IOPCIDeviceChanged, bar0=0x0 result=0xe00002c7
  • The BAR0 address for WIFI is also 0
  • In fact, the BAR0 addresses (IOPCIMemory or IOPCIMemoryMap) for all of our missing PCI devices are 0, even though IORegistryExplorer contains the correct values
  • This indicates that the IOPCIDevice data structure, which is independent of IOReg tree, is getting changed, but only for certain devices
dmesg31.txt and IORegistryExplorer dump are attached.
 

Attachments

  • dmesg31.txt.zip
    68.9 KB · Views: 1
  • Casey’s Mac Pro.zip
    11.8 MB · Views: 3
Last edited:

Edhawk

Guru
Guru
Joined
May 2, 2020
Messages
2,393
Find a better case and take the plastic bag of the GPU for starters!
 
Back
Top Bottom
  AdBlock Detected
Sure, ad-blocking software does a great job at blocking ads, but it also blocks some useful and important features of our website. For the best possible site experience please take a moment to disable your AdBlocker.