Skip to content

WIP: Propper and save evaluation of realtime capability#4132

Draft
hdiethelm wants to merge 4 commits into
LinuxCNC:masterfrom
hdiethelm:halcmd_getrt
Draft

WIP: Propper and save evaluation of realtime capability#4132
hdiethelm wants to merge 4 commits into
LinuxCNC:masterfrom
hdiethelm:halcmd_getrt

Conversation

@hdiethelm

@hdiethelm hdiethelm commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Intended to be used with:
#4107

Will fix is_sim / is_rt which is broken: #4129

Two new functionality's for two use-cases:

  1. Not running to see if the system is actually capable of running RT
  2. Running to query the real time capability and type

realtime status can be used to check if realtime is running.

The realtime script is extended with the verify command returning 0 if RT capable, 1 if not.
It is intended to use when not running and running.

  • RTAI: Returns always 0
  • uspace: calls rtapi_app getrt and returns the state
    • Not running: rtapi_app performs all the checks and returns immediately
    • Running: rtapi_app calls the master for the real time capability and returns the state

There is the new function hal_get_realtime_type() returning the type of the actually running realtime system trough the hal.

is_sim / is_rt use realtime verify at init.

rtapi_is_realtime() is deprecated: It works only in real time context since #3964 and was never 100% reliable, also according to the doc.

@BsAtHome

BsAtHome commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

I do think it to be very problematic that getting the RT status is so involved. There should be a simple test that does not involve instantiating larger parts of infrastructure.

@hdiethelm

hdiethelm commented Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

:-D The CI has no realtime:
ERROR MOTION: no realtime detected.

It is somewhat a chicken and egg problem. Not involving the RT infrastructure needs separate code and this can always not be in sync with RT. Involving RT runs a lot of stuff. Let's think about this. Exactly the issue here: #4129

What do you think about a parameter / pin? So you can use getp / gets to ask for realtime?

Or I could add a new command path to halcmd / rtapi_app that does not start the hal if it is not yet running.

@BsAtHome

BsAtHome commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Running non-RT is not an error per se (see CI). Therefore, you cannot and must not "simply" or "blindly" force one or the other.

There are use-cases where you want to know the RT status and that does not always mean that you will or will not be running either. Finding out what the RT status is or will be must be lightweight and may be different from where you ask. Doing it in a component or from the cmd-line may be different, depending how you ask and with what intention you ask.

@grandixximo

grandixximo commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Both CI failures are small:

  • rip-and-test-clang: has_setuid_root is unused on the default uspace build (the detect_* callers are all behind USPACE_RTAI/XENOMAI ifdefs, so the stubs never call it), and -Werror=unused-function kills it. [[maybe_unused]] or moving it inside the ifdefs fixes it.
  • rip-rtai: hal.h:240 extern rtapi_realtime_status_t hal_realtime_status(); needs (void). The RTAI kernel-module build uses -Werror=strict-prototypes, so the empty () fails (clang on the uspace build let it through). Same on the definition in hal_lib.c.

Two runtime things I noticed while reading:

  • rtapi_realtime_status() returns LXRT for detect_rtai(), but makeApp() only handles RTAI, so a uspace RTAI build hits the final else and sets app = nullptr.
  • If can_set_sched_fifo() succeeds but the kernel string matches none of the markers, it falls through to NONE and runs SCHED_OTHER. The old code treated SCHED_FIFO success alone as realtime, so this is a regression on plain-PREEMPT/generic kernels; the fallback there probably wants to stay RT-capable.

Minor: REALTIME_STATUS_PREEMT_* is missing a P (PREEMPT), and detect_preempt_dynamic() tests the same string on both sides of the ||.

@hdiethelm hdiethelm force-pushed the halcmd_getrt branch 2 times, most recently from b73daaf to 5dfd303 Compare June 6, 2026 12:22
@hdiethelm

Copy link
Copy Markdown
Contributor Author

Thanks. Fixed.

One quirk: RTAI in userspace is called LXRT. I should rename this consistently.
https://www.rtai.org/userfiles/documentation/magma/html/api/group__lxrt.html#details

Comment thread src/hal/halmodule.cc Outdated
Comment thread src/hal/halmodule.cc Outdated
Comment thread src/rtapi/rtapi.h Outdated
@hdiethelm hdiethelm force-pushed the halcmd_getrt branch 3 times, most recently from 7f6e60d to fbf2b81 Compare June 6, 2026 13:18
@hdiethelm

Copy link
Copy Markdown
Contributor Author

Hmm, time for a break, to many force pushes, sorry. I will continue tomorrow.

So you are OK with the general concept? Then I will polish it up, update the doc and do some more testing in different combinations.

Open:

  • Python enum
  • Python is_sim / is_rt: I would prefer just removing them
    • Alternative: Dynamic property, but this is also halve breaking due to unknown has to be handled. I could throw an error in this case.

Deferred:

  • rtapi_app start / stop behaivour

@grandixximo

Copy link
Copy Markdown
Contributor

The CI fixes and the LXRT / fallback handling look good now.

On naming, with @BsAtHome's #4099 in mind: rtapi_get_realtime_type() is consistent with the existing rtapi_get_* getters. The hal-side hal_get_realtime_type() is the one I'd reconsider, since #4099 is standardizing hal_get_<datatype>(ref) (e.g. hal_get_si32(ref), hal_get_bool(ref)) where the suffix is a HAL data type and it takes a typed ref. A parameterless hal_get_realtime_type() reads off-pattern in that family; something like hal_realtime_type() would keep hal_get_* reserved for the value accessors. @BsAtHome owns that convention, so up to him.

For the two open how-tos:

  • Dynamic is_rt/is_sim without breaking the API: lib/python/hal.py already wraps _hal (from _hal import *), so a PEP 562 module __getattr__ there gives live values: return get_realtime_type() > 0 for is_rt and <= 0 for is_sim. Existing callers (pncconf, stepconf) keep working; the one change is that before rtapi_app runs the value is -1, so is_rt reads False / is_sim True.
  • Exposing the enum to Python: one PyModule_AddIntConstant(m, "REALTIME_TYPE_NONE", REALTIME_TYPE_NONE) per value, consistent with the existing HAL_BIT etc. constants.

@BsAtHome

BsAtHome commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

something like hal_realtime_type() would keep hal_get_* reserved for the value accessors. @BsAtHome owns that convention, so up to him.

Well, I don't own it. However, I agree with the argument to leave the get/set moniker to the hal pin/param data access.

@hdiethelm

hdiethelm commented Jun 6, 2026

Copy link
Copy Markdown
Contributor Author

The last few commits are still figuring out ways to solve all issues, not ready for code style review yet.

@grandixximo Thanks a lot for the hints. Helps a lot not to have to search everything.

  • hal_get_realtime_type: There is also hal_get_param_value_by_name and so on, so also consistent.
  • Python enum: Done
  • is_rt / is_sim That one was annoying: First, i need to create a component if there is none yet and a lot of error handling: bdd3ef9 And then I discovered that stepconf.py / stepconf.py need is_sim before realtime is up and running. So roll back and just call realtime verify at init: cdbace2 However, now this is fully backwards compatible, no side effects expected.
  • I left the python is_initialized function from ^ in, might be this is useful. Many functions can only be used if there is already a component. So you can check if there is already one and create one if needed. hal.component_exists() needs also a component to succeed. I tried... ;-)

@hdiethelm hdiethelm force-pushed the halcmd_getrt branch 2 times, most recently from 6477a97 to af73e08 Compare June 6, 2026 22:13
@hdiethelm

Copy link
Copy Markdown
Contributor Author

And of course, after debian package install, the realtime script is not any more in path. And there was also no define for the path where it is. af73e08 adds one.

For scripts, REALTIME=@REALTIME@ is used, so it is available there.

@hdiethelm

Copy link
Copy Markdown
Contributor Author

Due to error handling is annoying not knowing when rtapi_app is running or not:
579f7ed
Let's see how much fall out this generates...

It now works exactly the same as RTAI. realtime start is now needed before every halcmd involving realtime.
Everything else including halrun / linuxcnc usw. works as before what I tested so far.

And it generated a new ToDo: Cleanup the realtime script. It is a mess. There are a lot of RTAI only parts executed in uspace mode. Like loading an empty list of modules and so on. So far, I just added the things I needed.

@grandixximo

Copy link
Copy Markdown
Contributor

Two notes after the latest push.

Naming: you're right, I'll withdraw my concern. hal_get_realtime_type() matches the existing hal_get_lock(), which is already a parameterless global-state getter, so hal_get_* isn't exclusively the typed-ref family. Leave it as is.

CI / auto-start: the two failures (raster, hal-show) come from dropping the uspace auto-start of rtapi_app on first loadrt. raster runs halcmd -f raster.hal (a loadrt with no realtime start) and now gets No master found. That breaks standalone halcmd -f *.hal scripts in general, so I'd suggest not hard-breaking it: keep auto-start working but emit a one-time deprecation warning pointing at realtime start. That keeps existing scripts working and gives a migration path. Then migrate our own scripts/configs/tests to call realtime start explicitly so the tree models the new idiom (and update the expected outputs, since a stderr warning will otherwise trip output-comparison tests like hal-show).

Separately: is dropping the auto-start actually needed for the RT-status goal, or is it a cleanup that could be its own PR? Keeping them decoupled would let the getrt/realtime_type work land without the broader behavior change.

@hdiethelm

Copy link
Copy Markdown
Contributor Author

If these test fails due to a missing realtime start, they most probably fail also with RTAI, i have to test it.

It is not needed in this PR. However, as soon you like to use hal_realtime_type() in many places, it will help a lot, duet to realtime needs to be started before.

I will move it in a separate PR as initialy planned, this one gets already big.

@hdiethelm

Copy link
Copy Markdown
Contributor Author

^Rebased to master / removed rtapi_app: start command / do not autostart

This commit lives on a new branch to create a PR later: https://github.com/hdiethelm/linuxcnc-fork/tree/rtapi_no_autostart I will wait with this until the PR at hand is merged. There are some conflicts if I rebase rtapi_no_autostart to main.

@hdiethelm

Copy link
Copy Markdown
Contributor Author

So, it is slowly taking shape but still not final. I updated the doc and cleaned up some parts.

Some things I am considering:

  • hal_get_realtime_type() returns an rtapi type. A typedef rtapi_realtime_type_t hal_realtime_type_t would clean this up.
  • rtapi_get_realtime_type() should only be used in hal.c. User / rt components should all use hal_get_realtime_type(). This would need a new header file rtapi_hal_priv.h to enforce this.

Before merge: The history is a mess. Should I squash it to a single commit or more? The commits could be:

  • python is_initialized
  • rtapi_app getrt / rtapi_get_realtime_type() / deprecate rtapi_is_realtime() / python is_sim + is_rt fix
  • hal_get_realtime_type() / pass trough hal

@hdiethelm hdiethelm changed the title WIP: New halcmd getrt WIP: Propper and save evaluation of realtime capability Jun 7, 2026
@grandixximo

grandixximo commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Please squash 😁

3 commits is clean

hdiethelm added 4 commits June 7, 2026 16:34
This helps to check if hal functions can be used or if a component needs
to be created first.
…_app

Python is_rt / is_sim now use "realtime verify" which uses
"rtapi_app check_rt" in uspace / returns true in rtai.

If realtime not running: rtapi_app performs the checks an returns the
state

If realtime running: rtapi_app calls master to perform the check and
returns the state
The same checks are performed always the same now. If something is not
properly checked, makeApp() fill fail instead of just chosing a
different RT implementation by itsself.

New function: rtapi_realtime_type_t rtapi_get_realtime_type(void)
rtapi_is_realtime() was always unrelaiable. hal_get_realtime_type()
returns now the true running realtime type trought the HAL for user and
realtime components.

This function is also exposed trought python hal.
@hdiethelm

hdiethelm commented Jun 7, 2026

Copy link
Copy Markdown
Contributor Author

So, squashed and cleanup. New:

  • renamed getrt to check_rt
  • hal_realtime_type_t
  • rtapi_app only shows the realtime issue report once when started in master mode
  • RTAPI_HAL_PRIV to make rtapi_is_realtime() / rtapi_get_realtime_type() unavailable except in hal / rtapi

ToDo:

  • Review
  • Test on many different combinations setsuid / setcap / rtai / uspace and so on

and update the expected outputs, since a stderr warning will otherwise trip output-comparison tests like hal-show

I will look into this. Is there a script to do this?

@grandixximo

Copy link
Copy Markdown
Contributor

No dedicated rebase script, it's manual. Run runtests -n (the -n keeps the temp files instead of cleaning up), then for each test that uses an expected file cp <testdir>/result <testdir>/expected once you've confirmed the new output is correct. Tests that use a checkresult script instead of expected don't apply. For this PR it's probably moot anyway since the auto-start warning moved to the separate PR and CI is green; it'll matter there.

grandixximo added a commit to grandixximo/linuxcnc that referenced this pull request Jun 8, 2026
…istic

Query realtime status with 'realtime verify' (from LinuxCNC#4132) rather than
probing the setuid bit. latency-histogram asks the realtime layer
directly; latency-test relies on the existing "POSIX non-realtime" note.
grandixximo added a commit to grandixximo/linuxcnc that referenced this pull request Jun 10, 2026
…istic

Query realtime status with 'realtime verify' (from LinuxCNC#4132) rather than
probing the setuid bit. latency-histogram asks the realtime layer
directly; latency-test relies on the existing "POSIX non-realtime" note.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants