Help with GPU Unit Failures

Mitchell Tuckness

Well-Known Member
USA team member
Hi,

I am trying to run Boinc client on a server I use to run Zoneminder for Security Cameras around the house. The CPU works fine for doing the projects, but I wanted to have the client use the GPU's of some spare cards I had around the house since it is a headless (no video / desktop) server and the GPU's were doing nothing.

So I have three ATI cards in the system, here is some system information below.

I have a few problems, concerns, questions. I am not a Linux guru, still learning a lot, but pretty good at Google. My problem is, I am not sure if these cards are functioning, compatible and functioning. I have tried to get the latest drivers installed, but it seems something changed in Ubuntu 16.04 and AMD / ATI cards and their support drivers etc. What I am trying to find out is if these cards GPU's will work and if I have the correct / best drivers, hoping a real guru can help out or a few :)

Another issue is my results:

Here is a link to a typical failed work unit for SETI. https://setiathome.berkeley.edu/result.php?resultid=5377275627

A link to the computer and it's work units: https://setiathome.berkeley.edu/results.php?hostid=8122337

I am not sure if these are coming from one of the cards, all of the cards, driver or what. Any ideas? Thanks!

Here is some info of the drivers the cards are using:

sudo lshw -c video
*-display
description: VGA compatible controller
product: Curacao PRO [Radeon R7 370 / R9 270/370 OEM]
vendor: Advanced Micro Devices, Inc. [AMD/ATI]
physical id: 0
bus info: pci@0000:03:00.0
version: 00
width: 64 bits
clock: 33MHz
capabilities: pm pciexpress msi vga_controller bus_master cap_list rom
configuration: driver=amdgpu latency=0
resources: irq:36 memory:e0000000-efffffff memory:f7800000-f783ffff ioport:b000(size=256) memory:f7840000-f785ffff
*-display
description: VGA compatible controller
product: Pitcairn PRO [Radeon HD 7850 / R7 265 / R9 270 1024SP]
vendor: Advanced Micro Devices, Inc. [AMD/ATI]
physical id: 0
bus info: pci@0000:04:00.0
version: 00
width: 64 bits
clock: 33MHz
capabilities: pm pciexpress msi vga_controller bus_master cap_list rom
configuration: driver=amdgpu latency=0
resources: irq:37 memory:d0000000-dfffffff memory:f7700000-f773ffff ioport:a000(size=256) memory:f7740000-f775ffff
*-display
description: VGA compatible controller
product: Pitcairn PRO [Radeon HD 7850 / R7 265 / R9 270 1024SP]
vendor: Advanced Micro Devices, Inc. [AMD/ATI]
physical id: 0
bus info: pci@0000:05:00.0
version: 00
width: 64 bits
clock: 33MHz
capabilities: pm pciexpress msi vga_controller bus_master cap_list rom
configuration: driver=amdgpu latency=0
resources: irq:38 memory:c0000000-cfffffff memory:f7600000-f763ffff ioport:9000(size=256) memory:f7640000-f765ffff




Distributor ID: Ubuntu
Description: Ubuntu 16.04.1 LTS
Release: 16.04
Codename: xenial

03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Curacao PRO [Radeon R7 370 / R9 270/370 OEM]
04:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Pitcairn PRO [Radeon HD 7850 / R7 265 / R9 270 1024SP]
05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Pitcairn PRO [Radeon HD 7850 / R7 265 / R9 270 1024SP]



03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Curacao PRO [Radeon R7 370 / R9 270/370 OEM] (prog-if 00 [VGA controller])
Subsystem: PC Partner Limited / Sapphire Technology Curacao PRO [Radeon R7 370 / R9 270/370 OEM]
Flags: bus master, fast devsel, latency 0, IRQ 36
Memory at e0000000 (64-bit, prefetchable) [size=256M]
Memory at f7800000 (64-bit, non-prefetchable) [size=256K]
I/O ports at b000
Expansion ROM at f7840000 [disabled] [size=128K]
Capabilities: <access denied>
Kernel driver in use: amdgpu
Kernel modules: radeon, amdgpu

03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series]
Subsystem: PC Partner Limited / Sapphire Technology Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series]
Flags: bus master, fast devsel, latency 0, IRQ 41
Memory at f7860000 (64-bit, non-prefetchable) [size=16K]
Capabilities: <access denied>
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel

04:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Pitcairn PRO [Radeon HD 7850 / R7 265 / R9 270 1024SP] (prog-if 00 [VGA controller])
Subsystem: Hightech Information System Ltd. Pitcairn PRO [Radeon HD 7850 / R7 265 / R9 270 1024SP]
Flags: bus master, fast devsel, latency 0, IRQ 37
Memory at d0000000 (64-bit, prefetchable) [size=256M]
Memory at f7700000 (64-bit, non-prefetchable) [size=256K]
I/O ports at a000
Expansion ROM at f7740000 [disabled] [size=128K]
Capabilities: <access denied>
Kernel driver in use: amdgpu
Kernel modules: radeon, amdgpu

04:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series]
Subsystem: Hightech Information System Ltd. Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series]
Flags: bus master, fast devsel, latency 0, IRQ 42
Memory at f7760000 (64-bit, non-prefetchable) [size=16K]
Capabilities: <access denied>
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel

05:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Pitcairn PRO [Radeon HD 7850 / R7 265 / R9 270 1024SP] (prog-if 00 [VGA controller])
Subsystem: Hightech Information System Ltd. Pitcairn PRO [Radeon HD 7850 / R7 265 / R9 270 1024SP]
Flags: bus master, fast devsel, latency 0, IRQ 38
Memory at c0000000 (64-bit, prefetchable) [size=256M]
Memory at f7600000 (64-bit, non-prefetchable) [size=256K]
I/O ports at 9000
Expansion ROM at f7640000 [disabled] [size=128K]
Capabilities: <access denied>
Kernel driver in use: amdgpu
Kernel modules: radeon, amdgpu

05:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series]
Subsystem: Hightech Information System Ltd. Cape Verde/Pitcairn HDMI Audio [Radeon HD 7700/7800 Series]
Flags: bus master, fast devsel, latency 0, IRQ 43
Memory at f7660000 (64-bit, non-prefetchable) [size=16K]
Capabilities: <access denied>
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel
 

DrBob

Administrator
USA team member
I'm not a Linux guru either but I'm guessing this is a driver issue. As you said, apparently there are not any proprietary ATI drivers available for 16.04 and the BOINC application doesn't seem to like the way the open source amdgpu driver handles OpenCL.

Hopefully someone with more Linux/ATI knowledge will stop by with some more ideas on how to get your cards crunching work.
 

Nick Name

Administrator
USA team member
Ah Linux....the operating system that sounds like the greatest thing in the world until you actually try to do anything with it. I've been messing with it on and off for about two years and feel like I've accomplished nothing. :confused: Needless to say I've not yet been issued a guru card for Linux (or anything else for that matter, if certain parties are reading this - you know who you are ;) ) . This:
Capabilities: <access denied>
makes me wonder about the permissions although I don't know how to check or fix it. I also know there are various groups, and the boinc group has to be added to something or vice-versa, but again (unfortunately) I don't know how to check for or fix that either.

There's a thread over at GPUGrid that may help although it's specifically about Nvidia.
https://www.gpugrid.net/forum_thread.php?id=4306

You may also find this useful from the folks over at Einstein.
https://einsteinathome.org/content/...ds-fglrx-catalyst-replacing-amd-gpupro#156126

And finally perhaps review the general BOINC help forums, there are some folks with good Linux knowledge there.
https://boinc.berkeley.edu/dev/forum_forum.php?id=23

Good luck and please let us know if you get it solved.

And welcome to the team and forum! :D
 

Mitchell Tuckness

Well-Known Member
USA team member
Well apparently the ATI cards was a driver issue and then I took them out and added an old Nvidia card and GTX 580 and IT fails with an unknown error so after a while I found a post that said the card was too old to crunch with the modern CUDA driver! WTF. These cards have perfectly good GPU's and apparently I can't use any of them. ahhhhhhrrrgggg!
 

Nick Name

Administrator
USA team member
Well apparently the ATI cards was a driver issue and then I took them out and added an old Nvidia card and GTX 580 and IT fails with an unknown error so after a while I found a post that said the card was too old to crunch with the modern CUDA driver! WTF. These cards have perfectly good GPU's and apparently I can't use any of them. ahhhhhhrrrgggg!
Sorry to hear that, it's things like this that make me very reluctant to move to Linux.
 

Mitchell Tuckness

Well-Known Member
USA team member
Sorry to hear that, it's things like this that make me very reluctant to move to Linux.

Yeah, in this case I don't really blame Linux. It was ATI's decision to not supply drivers for every card they make except for the latest models for Linux. Though I have read that they did provide all the information for someone to write drivers, so maybe someone will though that will take some time. The Nvidia card issue is actually the projects fault. They specified a certain version of the driver and that driver only works on 600 series cards and above; naturally I have a 580 card.... sigh. Hopefully there is another project that will use GPU's and not specify that certain driver, I have to do some digging.

That's the one thing about Linux is you have to do a lot of digging to get stuff working, but once it is working correctly, unless there is a hardware issue or software updates, it's bulletproof for stability and it manages hardware resources much better than Windows.
 
once it is working correctly, unless there is a hardware issue or software updates, it's bulletproof for stability
(emphasis added)

Agreed. Of course, my two Ubuntu boxes seem to get DAILY software updates, which means I spend more time fixing them than I do a roomful of Windows and Mac boxes... If I could disconnect them from the network, they'd run forever! (And accomplish zip.) I just got bit by the same nightmare of drivers you describe above. Fell back to Ubuntu 14 from 16, then back to 16, then swapped AMD 280x GPU for Nvidia 950, STILL no luck. If you find a magic bullet, please post it here!
 

doneske

Well-Known Member
USA team member
I used to run a couple of ATI cards with the old fglrx proprietary driver and it used to give me headaches whenever I did updates to the kernel because it was a dynamic mod to the kernel. To be safe you had to remove the old driver first and then install the new update. AMD didn't seem to keep up well with the latest kernels so I was always back-leveled just to use those cards. Wasn't worth all the trouble and then in 16.x fgrlx wasn't supported anymore and that took care of that. Now AMD has been working on the AMDGPU DRM driver but it only supports GCN 1.2 cards by default. They are working on Southern Islands support (GCN 1.0/1.1) but not quite there yet. Supposedly, the first kernel to support the AMDGPU driver was 4.9 but GCN 1.0 was disabled by default. I believe in 4.10 the GCN 1.0 is enabled by default. All my cards except maybe 1 are pre GCN 1.0 so they won't be supported ever. I'm hoping the April release (17.04) of Ubuntu will have the 4.11 kernel and if it does I will test my newest card. Unfortunately, it may be 4.12 or 4.13 before we get true support for ATI cards after GCN 1.0 including openCL. I've been running Ubuntu for the past 5 years and have relatively few problems with it, and when I do, it usually involves a kernel update. I don't apply updates automatically and usually just pick a time (like once a month) and apply the non-kernel updates using sudo apt-get upgrade. It will apply all updates except the kernel updates. If you want to apply the kernel updates simply use sudo apt-get dist-upgrade. Normally, you only have to re-boot if you update the kernel but I've noticed recently that it seems like it wants to restart more and more after the non-kernel updates and that was one of the reasons I stopped using Windows. I'm thinking they are messing around in libc and that is causing the need to restart. I've been tempted to open a bug ticket on it. Admittedly, if you have a "bunch" of hardware attached to your machines, Ubuntu (linux) may give you more problems. The problem is that a lot of vendors don't write firmware for linux and you have to wait for the community to reverse-engineer the blob and make it available for linux which might take a year or more depending on demand. More and more vendors are coming on board but it's a slow process. I don't attach a lot of hardware to my machines so I get to miss a lot of the issues. I believe NVIDIA is still providing their proprietary driver for their cards for the latest kernel. I don't have one of those cards so I'm not sure about that.
 

Nick Name

Administrator
USA team member
I was really surprised by all the "need to reboot to install these updates" messages when I started running Lubuntu on the laptop. And I was really surprised at the number of security updates, having heard for years how much better Linux was in that department versus Windows.
 

Scott

Member
USA team member
I have tried many flavors of Linux. Ubuntu. Mint, Zorin....all sucked. Great if you don't want to do anything with your computer, and yes they always seem to have updates. At least in Windows I can easily disable the update service to get around that problem. There is a reason all of those Linux builds are free. No one would pay a dime for them.
 
Top