Folding@Home Welcome to our NEW Folding@Home forum!

Nick Name

Administrator
USA team member
Welcome to our new Folding forum for discussion about our new Folding team! I heard of FAH years ago but for whatever reason never got into it, preferring to crunch BOINC projects instead, but I always thought it was a worthwhile project. Today, it's still going strong and may be even better known than BOINC. Last month when GPUGrid was short on work I decided to give it a try. I found the software fairly easy to set up, the only difficulty I had was getting it to run on the correct GPU, not unexpected when you have a multiple GPU system. :p I also looked for our USA team, and I did find a USA team but there was no evident link to our BOINC team.

Yesterday I took the plunge and created a new Folding (hereafter FAH) team that's directly linked to our site here, and our BOINC efforts. There were multiple reasons for that:
  1. There's no mechanism to contact whoever started the other team and coordinate efforts with our BOINC team.
  2. The FAH team page URL has some political connotations and I'd prefer to keep the focus on crunching and science.
  3. I only recognized one! BOINC team member's name.
  4. The time seemed right as people are interested in the coronavirus and FAH will be starting a project dedicated to that research shortly.
  5. SETI's hibernation means there will be a lot of potential crunching power available, if our team mates are interested.
In the next few days I'm going to start mentioning our team on Twitter. Hopefully this will draw some interest, grow the team and over time lead to increased BOINC participation as well.

Now for the particulars. If you're interested in exactly what this project is researching, look here. Understand that unlike BOINC, where a project is basically run by a single team, multiple entities are submitting work to FAH. You can pick what you'd prefer to crunch, but if no work for that project is available you will get work from another. I don't mind this as I do have my preference but it's all worthwhile. Currently you can choose to prioritize Cancer, Alzheimer's, Parkinsons or Huntington's disease. There is also the Any category which means all work is treated equally. There's an FAQ here if you'd like to learn more.

  1. To get started, download the FAH software. I've only run the Windows client so far, I will try Linux eventually (hopefully soon!). :cool:
  2. Everything is controlled from your client, there are no web preferences like BOINC has. The software comes with the FAHControl manager, similar to the BOINC manager, and a web controller which uses your browser. The web browser is simpler to use, but doesn't offer the level of control the advanced option does. FAH assumes you want to use your whole computer to fold; if you don't you'll need to use the advanced controller to set things to your liking.
  3. The web controller connects to http://client.foldingathome.org/ and establishes a local connection. Chromium-based browsers seem to work fine but I had manually enter 127.0.0.1:7396 in my Firefox-based browser.
  4. Make sure to get your Passkey. It's similar to the CPID used by BOINC, but it also directly affects stats, without it you won't qualify for the QRB (Quick Return Bonus).
  5. Since there's no project website to join teams etc. you join via one of the controllers. That's also where you enter your name. I encourage you to use your BOINC alias if you have one. Here's how to do it via FAHControl. The team is named BOINC.USA and the number is 236370. You'll only need to enter the number.
Now, get ready to.....FOLD ON! :USA:
 

doneske

Well-Known Member
USA team member
I connected two linux machines and ran into a few little problems. First was operator error. I blindly followed the example in the Linux install section and installed the 7.4.4 fahclient. After realizing it wasn't the latest, I went back and upgraded to 7.5.1. Second, I also tried to install FAHControl but neither release would install due to Python incompatibilities. FAH still uses Python 2 and Ubuntu deprecated and removed some of the modules that FAHControl needs. There are a few topics on the support forum where one could go out and find those old modules, install them and FAHControl will work. I'm not interested in loading a bunch of back level code just to get a piece of software working. Supposedly, there is a new client in the works that will fix these issues but no date is known. That means I will have to manage and configure these clients manually using the config.xml file. WebControl works but I don't think it is as robust as FAHControl. I'm going to have to go back and update the config.xml to allow me to manage all the headless machines from one desktop. The comments in the config.xml says the syntax is documented in the Client User Guide. Where is the Client User Guide? I'll probably have to hunt through the support forums for syntax that others have used. I think I've decided to use the app_config in BOINC to free up about 25% of the processors on the other servers and then create CPU slots with a specific number of CPUs for F@H (ex: 24 thread server will have 18 CPUs for BOINC and 6 for F@H). That's a task for another day as I still need to nail down the config.xml syntax.
 

Nick Name

Administrator
USA team member
WebControl is pretty nice, but yes very basic compared to FAHControl.

Here's a copy of my config file, maybe it will help. It's intact except for my Passkey. I assume since it's .xml it's the same for Windows and Linux.

Code:
<config>
  <!-- Folding Core -->
  <checkpoint v='3'/>
  <core-priority v='low'/>

  <!-- Folding Slot Configuration -->
  <cause v='ALZHEIMERS'/>

  <!-- Network -->
  <proxy v=':8080'/>

  <!-- User Information -->
  <passkey v='notarealpasskey'/>
  <team v='236370'/>
  <user v='NickNameUSA'/>

  <!-- Folding Slots -->
  <slot id='0' type='GPU'>
    <cuda-index v='1'/>
    <gpu-index v='1'/>
    <opencl-index v='1'/>
    <paused v='true'/>
  </slot>
</config>

cause v needs to be ANY to prioritize COVID19 although I'm not sure that's in production yet.
 

doneske

Well-Known Member
USA team member
Thanks for the sample. I was looking for the statement that assigns CPUs to a slot and I think I found it in a forum post. ex: < cpus v='N'/> Then where did it go in the stanzas. If it is in the Folding Core stanza, it is a general specification (applicable to ALL slots) or it can go in the Fold Slots stanza under the slot definition and it then only pertains to that one slot. I think they purposely don't publish the information because they want you to use the FAHControl facility but if it is faulty, then you are left with editing the file manually.
 

Nick Name

Administrator
USA team member
Getting GPU work here, don't know how much since there's no task history like BOINC. My original plan was to run FAH on the weekend, now that the CV work is live I've put one GPU on it full time. If I have time this weekend I'll try to get a dedicated system running.
 

doneske

Well-Known Member
USA team member
Finally, got the slot configuration nailed down after trial and error and most of a day. I kept going to the WEBControl and pressing the Stop Folding button. That only pauses the slots. It doesn't actually stop the fahclient process which means when you click the Start Folding button it just reactivates the slots. It doesn't re-read the config.xml. I also edited the config.xml while the slots were paused, then when I issued the stop for the fahclient it over writes the config.xml at shutdown so all my changes were gone. Lesson learned, if you edit the config.xml file, stop the fahclient process and then do the editing. As a result, on the big EPYC server, was able to carve out 26 processors and divide into three slots of 10,8,8. In WEBControl they look like 3 tabs each with their own progress bar. Another anomoly, on the EPYC server which is Centos 8, the client wouldn't install because of a missing ssl dependencies. Was able to install compat-openssl10 and get the back level SSL code and the client installed after that. Stanford likes to claim that their client will run on just about any system and that is why they say you can use the --nodeps flags on install to bypass dependency issues. My experience is, the software won't run with those missing dependencies. I was also able to get the F@H clients configured where I can go to each server from one web browser and not have to go to each machine individually especially since most don't have a GUI installed. They are command line only based servers. Better but not perfect.

I probably would never know if I executed a COVID-19 WU as I don't have a way to monitor continuously and Stanford doesn't provide a task history. I guess I could write a script to parse the client logs and look for the project # but not that motivated. Last time I participated in F@H I used the HFM tool. It is very similar to BoincTasks but for F@H. Problem for me is it is Windows based and I don't have any Windows machines. I could run it under Wine -- maybe, but I already have BoincTasks runnning that way and it presents a few minor problems. Mostly inconveniences more that real problems but they take time to fix. EX: Double-click the icon and it doesn't start. You see the icon in the task bar but it goes away after a few minutes. It also leaves a process running that has to be killed manually. Sometimes the only way to solve the start problem is to re-boot the OS (wait a minute, that sounds like Windows) maybe it's a built in Windows "feature" in Wine.
 

Nick Name

Administrator
USA team member
Finally, got the slot configuration nailed down after trial and error and most of a day. I kept going to the WEBControl and pressing the Stop Folding button. That only pauses the slots. It doesn't actually stop the fahclient process which means when you click the Start Folding button it just reactivates the slots. It doesn't re-read the config.xml. I also edited the config.xml while the slots were paused, then when I issued the stop for the fahclient it over writes the config.xml at shutdown so all my changes were gone. Lesson learned, if you edit the config.xml file, stop the fahclient process and then do the editing. As a result, on the big EPYC server, was able to carve out 26 processors and divide into three slots of 10,8,8. In WEBControl they look like 3 tabs each with their own progress bar. Another anomoly, on the EPYC server which is Centos 8, the client wouldn't install because of a missing ssl dependencies. Was able to install compat-openssl10 and get the back level SSL code and the client installed after that. Stanford likes to claim that their client will run on just about any system and that is why they say you can use the --nodeps flags on install to bypass dependency issues. My experience is, the software won't run with those missing dependencies. I was also able to get the F@H clients configured where I can go to each server from one web browser and not have to go to each machine individually especially since most don't have a GUI installed. They are command line only based servers. Better but not perfect.

I probably would never know if I executed a COVID-19 WU as I don't have a way to monitor continuously and Stanford doesn't provide a task history. I guess I could write a script to parse the client logs and look for the project # but not that motivated. Last time I participated in F@H I used the HFM tool. It is very similar to BoincTasks but for F@H. Problem for me is it is Windows based and I don't have any Windows machines. I could run it under Wine -- maybe, but I already have BoincTasks runnning that way and it presents a few minor problems. Mostly inconveniences more that real problems but they take time to fix. EX: Double-click the icon and it doesn't start. You see the icon in the task bar but it goes away after a few minutes. It also leaves a process running that has to be killed manually. Sometimes the only way to solve the start problem is to re-boot the OS (wait a minute, that sounds like Windows) maybe it's a built in Windows "feature" in Wine.
Thanks for sticking with it, I'd have probably thrown in the towel.

The lack of task history is the most disappointing feature to me so far. It kind of reminds me of Science United in that you just set up the client and the it selects the work you get.
 

doneske

Well-Known Member
USA team member
Probably is better for the researchers as they get to decide what is the priority. They can mark the COVID-19 high priority and we as crunchers don't have to do anything. We still have some choice in the projects by selecting any or Parkinson's but I still would like to see how many COVID-19s I did or Alheimer's I did... Oh well, I can live with it. I did try and scan the log and it does have the project numbers that were worked on but to make it useful, it would need to be inserted into a DB of some sort. Logs seem to get managed and old logs are put in the /var/lib/fahclient/logs directory. I wonder how long they stay there.
 

Nick Name

Administrator
USA team member
Hit my first snag and it's kind of major. The client is stuck trying to download COVID-19 work from a server that's either down or rejecting connections. I had to switch to Cancer to get work. I'm not sure how long it was stuck before I found it. You'd think the project setting of "Any" would prevent this situation but apparently not. :confused:
 

doneske

Well-Known Member
USA team member
It's like once the assignment server assigns a slot to a work server, that's it.. You never go back to the assignment server to get a new work server. Changing the type of work forces it back to the assignment server to get a new work server. I think you can stop and restart the client to fix the problem too.

One thing about dealing with all the issues is one learns a lot. I can control the slots better by using the command line versus the WebControl. All of the functions in FAHControl can be done by using the command line. I was trying to find a way to tell a client to stop after completing the current work unit but that didn't exist in WebControl sort of. If you click Stop Folding, a dialog box opens and you have a choice to stop (pause) immediately or stop (pause) after completing the WU and returning the results. Problem is, the pause after finish doesn't work. From a Linux command line one can enter: FAHClient --send-finish and it does the same thing or to pause a slot, enter: FAHClient --send-pause N where N is the slot number. If you leave the number off, it pauses all slots. There are pages of sub commands. To see, enter: FAHClient --help (NOTE: there are double hyphens before the commands).

I was also looking around for recommendations concerning slot size (how many CPUs to assign to a slot). There doesn't seem to be anything concrete. Some say the more CPUs the faster the WU runs but some WUs fail with a large number of CPUs. It seems to depend on the project. Then there is the domain decomposition in Gromacs and prime numbers (doesn't like large prime numbers, although no one can define large). Also some projects can use a large amount of CPUs (like more than 16) but other s can't. As of right now, I'm sticking to 8 CPUs per slot and going with that. That means on the 24 core and 32 core machines, there will be 3 and 4 slots respectively of 8 CPUs each.
 

Nick Name

Administrator
USA team member
Stopping and restarting didn't fix it, it kept trying to connect to the same server that was down. Changing the cause preference did get work moving again, and there seemed to be plenty of Covid19 work coming in even with selecting Cancer or Alzheimer's. Unfortunately I saw the same problem today, except it looks like they just can't keep up with the demand for work, at least for GPU. I'm switching back to GPUGrid for awhile.
 

Nick Name

Administrator
USA team member
I'm getting work again tonight. I've set program exclusions in BOINC and set PrimeGrid to a resource share of zero, and also am only allowing the PPSieve application. That work runs quickly and has a six day deadline, it should accommodate any FAH outage nicely without babysitting. I needed exclusions for FahCore_21.exe and 22. According to BOINC documentation the syntax is not case sensitive. I'd prefer to use GPUGrid as a backup but that work runs longer, and as new work units are created based on returned work I don't want jobs hanging in limbo. I don't really care if the PrimeGrid sieve work times out.

I had to set BOINC to run based on preferences instead of Always like I usually do. If you try a program exclusion and it's not working, that's probably why, assuming there's not a typo.
 

BeauZaux

Well-Known Member
USA team member
Jumped in on FAH today. Installed on a Win10 sys pretty easy, but had trouble with Linux Mint, which I'm new to, but worked it out. Somehow put in the wrong team #, but corrected. Running on 6 cores on one LM sys, but it's not using NVIDIA GPU. Is there something in FAH to configure use of CPU and/or GPU?
 

Nick Name

Administrator
USA team member
Awesome, great to have you on board! (y)

Jumped in on FAH today. Installed on a Win10 sys pretty easy, but had trouble with Linux Mint, which I'm new to, but worked it out. Somehow put in the wrong team #, but corrected. Running on 6 cores on one LM sys, but it's not using NVIDIA GPU. Is there something in FAH to configure use of CPU and/or GPU?
If it's like it was on for me on Windows it assumes you want to use every available resource, so it should have set itself up to use every available CPU thread and every available GPU. I haven't had time to play with it in Linux yet, but here's what I did in Windows. Go to your FAHControl if you can and check Configure -> Slots. You should have CPU and GPU slots. If you don't see GPU slots, add them. My guess is that you just aren't getting work because thousands - if not millions - of machines have registered in the last few days and many of the servers are overwhelmed. You can also check the log in FAHControl and if you see connection errors, you know that's the problem. Example:

02:45:21:WU02:FS00:Connecting to 65.254.110.245:8080
02:45:21:WU02:FS00:Assigned to work server 140.163.4.231
02:45:21:WU02:FS00:Requesting new work unit for slot 00: READY gpu:1:TU102 [GeForce RTX 2080 Ti Rev. A] M 13448 from 140.163.4.231
02:45:21:WU02:FS00:Connecting to 140.163.4.231:8080
02:45:42:WARNING:WU02:FS00:WorkServer connection failed on port 8080 trying 80
02:45:42:WU02:FS00:Connecting to 140.163.4.231:80
02:46:03:ERROR:WU02:FS00:Exception: Failed to connect to 140.163.4.231:80: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.

The FAH client and or assignment servers are also not very good at rolling over to a new one if one isn't responding properly, from what I've observed it will get stuck querying one or two. It took a little doing, mainly to determine which GPU which was which, but I've set up exclusions in BOINC in cc_config to accommodate this. I've set PrimeGrid as a backup project but as these outages are persistent and getting longer I may go back to GPUGrid instead.
 

BeauZaux

Well-Known Member
USA team member
From what I read, I'm thinking the Coronavirus project uses CPU's only. My Win machine (no GPU) is running a different project, very slowly.
I don't see Coronavirus as a selectable project. Guess I got lucky.
Found the log and just finished recent Corona project. No new WUs.
Trying to get some friends to join in. Got one on Rosetta. Will hit him to join team.
 

Nick Name

Administrator
USA team member
There is or at least was GPU work. It may have run out by now, I don't know how the job submission works. If it's like BOINC they keep creating work until the project is finished. If they create a fixed number of CPU / GPU tasks, GPU might be gone now with so many people and even some companies signing up. Right now I'm having trouble getting any work, I've got 2 tasks stuck trying to upload and I can tell I had a few hours of downtime during the day.

Oh, you can't select specific projects, only broad categories. They recommend selecting the Any category to get COVID-19 work, but I've gotten it even with selecting Alzheimer's and Cancer.
 
Top