Skip to content

batch wsh:run tevents by hour#3181

Open
sawka wants to merge 1 commit intomainfrom
sawka/batch-wsh-tevents
Open

batch wsh:run tevents by hour#3181
sawka wants to merge 1 commit intomainfrom
sawka/batch-wsh-tevents

Conversation

@sawka
Copy link
Copy Markdown
Member

@sawka sawka commented Apr 3, 2026

No description provided.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 3, 2026

Walkthrough

This pull request refactors WSH (Windows Shell) telemetry tracking from a boolean error indicator to count-based metrics with aggregation. The TypeScript type definitions and Go struct are updated to replace the WshHadError boolean field with WshErrorCount and WshCount numeric fields. A new aggregation function updateWshRunTEvent is introduced to consolidate wsh:run telemetry events within hourly time buckets, keyed by timestamp, event name, and command. The telemetry recording flow is updated to route WSH events to this aggregation logic, and error handling in WshActivityCommand is modified to set the error count and command name instead of just flagging an error.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive No description was provided, making it impossible to assess relevance to the changeset. Add a pull request description explaining the motivation, approach, and any migration considerations for batching telemetry events by hour.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: converting wsh:run telemetry events to be batched by hour instead of individual logging.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch sawka/batch-wsh-tevents

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@cloudflare-workers-and-pages
Copy link
Copy Markdown

Deploying waveterm with  Cloudflare Pages  Cloudflare Pages

Latest commit: 7c5eaca
Status: ✅  Deploy successful!
Preview URL: https://f8e2fc3e.waveterm.pages.dev
Branch Preview URL: https://sawka-batch-wsh-tevents.waveterm.pages.dev

View logs

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/wshrpc/wshserver/wshserver.go (1)

1326-1350: ⚠️ Potential issue | 🟠 Major

Emit telemetry per validated entry; current loop can misattribute/drop events.

Line 1327-Line 1335 deletes invalid entries but still processes them, and the single post-loop emit on Line 1347 means only one nondeterministic key is recorded when multiple commands are present.

Proposed fix
 func (ws *WshServer) WshActivityCommand(ctx context.Context, data map[string]int) error {
 	if len(data) == 0 {
 		return nil
 	}
-	props := telemetrydata.TEventProps{}
+	filtered := make(map[string]int, len(data))
 	for key, value := range data {
-		if len(key) > 20 {
-			delete(data, key)
-		}
-		if !wshActivityRe.MatchString(key) {
-			delete(data, key)
-		}
-		if value != 1 {
-			delete(data, key)
-		}
+		if len(key) > 20 || !wshActivityRe.MatchString(key) || value != 1 {
+			continue
+		}
+		filtered[key] = value
+
+		props := telemetrydata.TEventProps{WshCount: value}
 		if strings.HasSuffix(key, "#error") {
 			props.WshCmd = strings.TrimSuffix(key, "#error")
-			props.WshErrorCount = 1
+			props.WshErrorCount = value
 		} else {
 			props.WshCmd = key
 		}
+		telemetry.GoRecordTEventWrap(&telemetrydata.TEvent{
+			Event: telemetry.WshRunEventName,
+			Props: props,
+		})
+	}
+	if len(filtered) == 0 {
+		return nil
 	}
 	activityUpdate := wshrpc.ActivityUpdate{
-		WshCmds: data,
+		WshCmds: filtered,
 	}
 	telemetry.GoUpdateActivityWrap(activityUpdate, "wsh-activity")
-	telemetry.GoRecordTEventWrap(&telemetrydata.TEvent{
-		Event: telemetry.WshRunEventName,
-		Props: props,
-	})
 	return nil
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/wshrpc/wshserver/wshserver.go` around lines 1326 - 1350, The loop over
data currently deletes invalid entries but still uses the mutated map and then
emits a single ActivityUpdate and telemetry.TEvent for props after the loop,
causing misattribution or loss when multiple valid keys exist; instead, iterate
over the original map entries and for each entry that passes all validations
(use wshActivityRe.MatchString(key), value == 1, len(key) <= 20), build
per-entry props (set props.WshCmd and props.WshErrorCount appropriately) and
call telemetry.GoUpdateActivityWrap(ActivityUpdate{WshCmds: map[string]int{key:
value}}, "wsh-activity") and
telemetry.GoRecordTEventWrap(&telemetrydata.TEvent{Event:
telemetry.WshRunEventName, Props: props}) for each validated entry so each
command produces its own telemetry and activity update (use the existing
symbols: data, wshActivityRe, props, ActivityUpdate,
telemetry.GoUpdateActivityWrap, telemetry.GoRecordTEventWrap).
🧹 Nitpick comments (1)
pkg/telemetry/telemetry.go (1)

200-200: Comment is stale after schema migration.

The comment still says (cmd, haderror) but aggregation key is now command with count/error-count fields.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/telemetry/telemetry.go` at line 200, Update the stale comment that reads
"aggregates wsh:run events per (cmd, haderror) key within the current 1-hour
bucket" to reflect the new schema: state that aggregation is keyed by command
and the bucket holds count and error_count fields (or similar names used in the
code), e.g., "aggregates wsh:run events per command into 1-hour buckets with
fields for total count and error_count"; make this change near the aggregation
logic for wsh:run so it accurately describes the current data shape.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/telemetry/telemetry.go`:
- Around line 207-233: The current read-then-insert in the wstore.WithTx block
(the SELECT uuid ... conditional insert around WshRunEventName handling) allows
a race where concurrent writers both insert duplicate hourly rows; add a
DB-level UNIQUE constraint for the bucket identity used here (the tuple for the
hourly bucket: ts, event, and the wsh:cmd identity extracted from props) and
replace the read-then-insert logic with a single atomic UPSERT that updates the
existing row's props to increment WshCount and WshErrorCount (and merges props)
on conflict; locate the branch using tx.GetString(`SELECT uuid FROM db_tevent
...`) / tx.Exec(...) and change it to an INSERT ... ON CONFLICT(...) DO UPDATE
(or equivalent upsert supported by the DB) that atomically increments
telemetrydata.TEventProps.WshCount and WshErrorCount and updates props via
dbutil.QuickJson (or use JSON functions) instead of separate SELECT/UPDATE.

---

Outside diff comments:
In `@pkg/wshrpc/wshserver/wshserver.go`:
- Around line 1326-1350: The loop over data currently deletes invalid entries
but still uses the mutated map and then emits a single ActivityUpdate and
telemetry.TEvent for props after the loop, causing misattribution or loss when
multiple valid keys exist; instead, iterate over the original map entries and
for each entry that passes all validations (use wshActivityRe.MatchString(key),
value == 1, len(key) <= 20), build per-entry props (set props.WshCmd and
props.WshErrorCount appropriately) and call
telemetry.GoUpdateActivityWrap(ActivityUpdate{WshCmds: map[string]int{key:
value}}, "wsh-activity") and
telemetry.GoRecordTEventWrap(&telemetrydata.TEvent{Event:
telemetry.WshRunEventName, Props: props}) for each validated entry so each
command produces its own telemetry and activity update (use the existing
symbols: data, wshActivityRe, props, ActivityUpdate,
telemetry.GoUpdateActivityWrap, telemetry.GoRecordTEventWrap).

---

Nitpick comments:
In `@pkg/telemetry/telemetry.go`:
- Line 200: Update the stale comment that reads "aggregates wsh:run events per
(cmd, haderror) key within the current 1-hour bucket" to reflect the new schema:
state that aggregation is keyed by command and the bucket holds count and
error_count fields (or similar names used in the code), e.g., "aggregates
wsh:run events per command into 1-hour buckets with fields for total count and
error_count"; make this change near the aggregation logic for wsh:run so it
accurately describes the current data shape.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: fb1f0bec-b14b-472a-a0c7-170827cd3ca3

📥 Commits

Reviewing files that changed from the base of the PR and between 388b4c9 and 7c5eaca.

📒 Files selected for processing (4)
  • frontend/types/gotypes.d.ts
  • pkg/telemetry/telemetry.go
  • pkg/telemetry/telemetrydata/telemetrydata.go
  • pkg/wshrpc/wshserver/wshserver.go

Comment on lines +207 to +233
return wstore.WithTx(ctx, func(tx *wstore.TxWrap) error {
uuidStr := tx.GetString(
`SELECT uuid FROM db_tevent WHERE ts = ? AND event = ? AND json_extract(props, '$."wsh:cmd"') IS ?`,
eventTs.UnixMilli(), WshRunEventName, tevent.Props.WshCmd,
)
if uuidStr != "" {
var curProps telemetrydata.TEventProps
rawProps := tx.GetString(`SELECT props FROM db_tevent WHERE uuid = ?`, uuidStr)
if rawProps != "" {
if err := json.Unmarshal([]byte(rawProps), &curProps); err != nil {
log.Printf("error unmarshalling wsh:run props: %v\n", err)
}
}
curCount := curProps.WshCount
if curCount <= 0 {
curCount = 1
}
curProps.WshCount = curCount + incomingCount
curProps.WshErrorCount += tevent.Props.WshErrorCount
tx.Exec(`UPDATE db_tevent SET props = ? WHERE uuid = ?`, dbutil.QuickJson(curProps), uuidStr)
} else {
newProps := tevent.Props
newProps.WshCount = incomingCount
tsLocal := utilfn.ConvertToWallClockPT(eventTs).Format(time.RFC3339)
tx.Exec(`INSERT INTO db_tevent (uuid, ts, tslocal, event, props) VALUES (?, ?, ?, ?, ?)`,
uuid.New().String(), eventTs.UnixMilli(), tsLocal, WshRunEventName, dbutil.QuickJson(newProps))
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Race window can create duplicate hourly rows for the same wsh:cmd.

Line 208 does a read-before-write (SELECT uuid ...) and then conditionally inserts. Concurrent writers can both miss and both insert, breaking aggregation guarantees for a (ts, event, cmd) bucket.

A robust fix is to enforce a DB-level unique key for the bucket identity and switch this path to an atomic UPSERT that increments wsh:count/wsh:errorcount.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/telemetry/telemetry.go` around lines 207 - 233, The current
read-then-insert in the wstore.WithTx block (the SELECT uuid ... conditional
insert around WshRunEventName handling) allows a race where concurrent writers
both insert duplicate hourly rows; add a DB-level UNIQUE constraint for the
bucket identity used here (the tuple for the hourly bucket: ts, event, and the
wsh:cmd identity extracted from props) and replace the read-then-insert logic
with a single atomic UPSERT that updates the existing row's props to increment
WshCount and WshErrorCount (and merges props) on conflict; locate the branch
using tx.GetString(`SELECT uuid FROM db_tevent ...`) / tx.Exec(...) and change
it to an INSERT ... ON CONFLICT(...) DO UPDATE (or equivalent upsert supported
by the DB) that atomically increments telemetrydata.TEventProps.WshCount and
WshErrorCount and updates props via dbutil.QuickJson (or use JSON functions)
instead of separate SELECT/UPDATE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant