You might also be interested in a short article on the setup and use of monitoring for ANSYS R18 RSM.
If you are an ANSYS RSM (Remote Solve Manager) user, you’ll find some changes in version 18.0. Most of the changes, which are improvements to the installation and configuration process, are under the hood from a user standpoint. One key change for users, though, is how you monitor a running job. This short entry shows how to do it in version 18.0.
Rather than bring up the RSM monitor window from the Start menu as was done in prior version, in 18.0 we launch the RSM job monitor directly from the Workbench window, by clicking on Jobs > Open Job Monitor… as shown here:
When a solution has been submitted to RSM for solution on a remote cluster or workstation, it will show up in the resulting Job Monitor window, like this:
Hopefully this saves some effort in trying to figure out where to monitor jobs you have submitted to RSM. Happy solving!
- What actually happens after I submit my job to RSM?
- Where are the files needed to run the solve go?
- How do the files get returned to the client machine, or do they?
- What if something goes wrong with my solve or in the RSM file downloading process, is there any hope of recovery?
- Are there any recommendations out there for how best to use RSM?
If your question is, how do I setup RSM as a user? You answers are here from a post by Ted Harris. The post today is a deeper dive into RSM.
The answers to questions 1 through 3 above are really only necessary if you would like to know the answer to question 4. My reason for giving you a greater understanding of the RSM process is so that you can do a better job of troubleshooting should your RSM job run into an issue. Also, please note that this process is specifically for an RSM job submitted for ANSYS Mechanical. I have not tested this yet for a fluid flow run.
What happens when a job gets submitted to RSM?
The following will answer questions 1-3 above.
When a job is run locally (on your machine), ANSYS uses the Solver Files Directory to store and update data. That folder can be found by right clicking on the Solution branch in the Model tree and selecting Open Solver Files Directory.
When a job gets submitted to RSM, the files that are stored in the above folder will be transferred to a series of two temporary directories. One temporary directory on the client side (where you launched the job from) and one temporary directory on the compute server side (where the numbers get crunched).
Next, these files get transferred to a temporary directory on the compute server. The files in the _ProjectScratch directory will remain there but the folder will not be updated again until the solve is interrupted or finished.
If you navigate to that directory on your compute server, you will see all of the necessary files needed to run. Depending on your IT structure, you may or may not have access to this directory, but it is there.
Once your run is completed or you have interrupted it to review intermediate results and your results have been downloaded and transferred to the solver files folder, both of the temporary directories get cleaned up and removed. I have just outlined the basic process that goes on behind the scenes when you have submitted a job to RSM.
What if something goes wrong with my RSM job? Can I recover my data and re-read it into Workbench?
Recently, I ran into a problem with one of my RSM jobs that resulted in me losing all of the data that had been generated during a two day run. The exact cause of this problem I haven’t determined but it did force me to dive into the RSM process and discover what I am sharing with you today. By pin-pointing and understanding what goes on after the job is submitted to RSM, I did determine that it can be possible to recover data, but only under certain circumstances and setup.
First, if you have the “Delete Job Files in Working Directory” box checked in the compute server properties menu accessed from the RSM queue interface (see below) and RSM sees your job as being completed, the answer to the above question is no, you will not be able to recover your data. Essentially, because the compute server is cleaned up and the temporary directory gets deleted, the files are lost.
To avoid lost data and prepare for such a catastrophe, my recommendation is that you or your IT department, uncheck the “Delete Job Files in Working Directory” box. That way, you have a backup copy of your files stored on the server that you can delete later when you are sure you have all of your files safely transferred to your solver files folder within your project directory structure.
The downside to having this box unchecked is that you have to manually cleanup your server. Your IT department might not like, or even allow you to do this because it could clutter your server if you do not stay on top of things. But, it could be worth the safety net.
As for getting your data back into Workbench, you will need to manually copy the files on the compute server to your solver files folder in your Workbench project directory structure. I explained how to access this folder at the beginning of this post. Once you have copied those files, back in the Mechanical application, with the Solution branch of your model tree highlighted, selects Tools>Read Results Files… (see below graphic), navigate to your solver files directory, select the *.rst file and read it in.
Though it is possible to run concurrent RSM jobs from the same project, my recommendation is to only run one RSM job at a time from the same project in order to avoid communication or licensing holdups
Unless you are confident that you will not ever need to recover files, consider unchecking the “Delete Job Files in Working Directory” box in the compute server properties menu.
Note: if you are not allowed access to your compute server temporary directories, you should probably consult your IT department to get approval for this action.
Caution: if you uncheck this box, be sure that you stay on top cleaning up your compute server once you have your files successfully downloaded
Depending on your network speed, when your results files get large, >15GB, be prepared to wait for upload and download times. There is likely activity, but you might not be able to “see” it in the progress information on the RSM output feed. Be patient or work outside of RSM using a batch MAPDL process.
Avoid hitting the “Interrupt Solution” command more than once. I have not verified this, but I believe this can cause mis-communication between the compute server and local machine temporary directories which can cause RSM to think that there are no files associated with your run to be transferred.
If you’re not familiar with it, RSM is the ANSYS Remote Solve Manager. In short, it allows you to submit solutions from various ANSYS tools so they can be solved remotely, such as on a compute cluster, remote number cruncher, or perhaps just another computer that isn’t being used very much. Note that there is no additional licensing or installation is required (other than perhaps ANSYS HPC licensing to take advantage of multiple cores). RSM is installed automatically when ANSYS is installed; it just needs to be configured to be activated.
According to PC Revive, in version 14.5 and 15.0, there is a nicely documented Setup Wizard that helps with the setup and configuration of RSM on compute servers. This setup wizard as well as the rest of the RSM documentation in the ANSYS Help does a great job of explaining RSM and what must be done to setup and configure it. This Focus entry assumes that your crack IT staff has installed RSM on your compute machine(s) and has decided where the Compute Server will be (can be on your local machine or on your ‘number cruncher’ or on a different machine). So, our focus here is on what needs to be done as a user to send your solutions off to the remote solver using RSM.
As an example, we have RSM 15.0 configured with the Compute Server on a remote computer named cs3a. The first time running RSM, using Start > All Programs > ANSYS 15.0 > Remote Solve Manager > RSM 15.0, we get the window shown here:
Notice that it only shows our local machine (My Computer) and nothing about the actual remote computer on which we want to solve.
Therefore, we need to add the information on our cluster node which contains the compute server.
To do this, click on Tools > Options. This is the resulting window. Notice the Add button at lower left is grayed out:
Now that a new name has been typed in the Name field, the Add button is active. After clicking Add, we get this:
After clicking OK, we will now see that the new remote computer has been added in the RSM window:
The next step is to set your login password for accessing this computer. Right click on the new hostname in the RSM window in the tree at left, and select Set Password.
Then enter your network login and password information in the resulting window:
If your accounts are fully setup, at this point you can run a test by right clicking on the localhost item in the tree under the remote computer name and selecting Test Server:
If the test is successful, you will see that the test job completed with a green checkmark on the folder icon in the upper right portion of the RSM window:
If your login is not configured properly, you will likely get an error like this one shown below. Notice that the upper right portion now states that the job has failed and there is a red X rather than a green checkmark on the folder icon. By clicking on the job in the upper right panel, we can see the job log in the lower right panel. In this case, it says that the login failed due to an incorrect password.
The fix for the password problem is to ensure that the correct login is being accessed by RSM on the remote computer. This is done from the RSM window by right clicking on the remote computer name and selecting Accounts.
If your account and/or password are different on the remote computer than they are on your local machine, you will need to establish an alternate account so that RSM knows to use the correct login on the remote computer. Right click on your account in the Accounts pane, and select Add Alternate Account:
Enter your username and password for the remote computer in the resulting window. Next, we need to associate that login with localhost on the remote computer. This is down by checking the localhost box in the Compute Servers pane, like this:
Another problem we have seen is that the user doesn’t have permission for ANSYS to write to the default solve directory on the remote computer. In that case, the test job log will have an error like this:
This fix in this case is to establish a solve directory manually, first by creating one on the remote computer, if needed, and second by specifying that RSM use that directory rather than the default. The second step is accomplished in the RSM window via right clicking on the localhost item for the remote computer, then selecting Properties. On the General tab, you should be able to change the Working Directory Location to User Specified, then enter the desired directory location as shown in the image below. If that option is greyed out, either your password for the remote machine has not been entered correctly, or you are not part of the admin group on the remote computer. In the case of the latter, either your RSM administrator has to do it for you, or you have to be granted the admin access.
At this point, if the test server runs have completed successfully you should be ready to try a real solution using RSM. We’ll use Mechanical to show how it’s done. In the Mechanical editor, click on Tools > Solve Process Settings. Here we will need to specify the remote computer and queue we’ll be using for the solution. Click on the Add Remote button:
In the resulting Rename Solve Process Settings button, type in a name for your remote solve option that makes sense to you. We called ours RemoteSolve1. This new option will now show up on the left side of the Solve Process Settings window:
The next step is to type in the name of the Solve Manager over on the right side. In our case, the Solve Manager is on computer cs3a. Any queues that are available to RSM for this Solve Manager will show in the Queue field, after a brief period of time to make the connection. In our case, the only queue is a local queue on cs3a.
We are now ready to solve our Mechanical model remotely, using RSM. Instead of clicking the Solve button in Mechanical, we will click on the drop down arrow to the right of the solve button. From the dropdown, we select the remote solve option we created, RemoteSolve1:
Assuming the solution completes with no errors, this job will show up in the RSM window with a status of Finished when it is done.
The final step in this case is to download the results from the remote computer back to the client machine. In the Mechanical editor, this is done by right clicking on the Solution branch and selecting Get Results as shown below. Also note that you can monitor a nonlinear solution via Solution Information. You’ll just need to right click during solution to have a snapshot of the nonlinear diagnostics brought back from the remote computer.
We hope this helps with the setup and utilization of RSM from a user perspective. There are other options and applications for RSM that we didn’t discuss, but hopefully this is useful for those needing to get ‘over the hump’ in using RSM.