Skip to content

Predicts complete words from prefixes using vector-based word similarity.

License

Notifications You must be signed in to change notification settings

blacklanternsecurity/wordpredictor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Word Predictor

Word Prediction Tool

This project provides a simple tool for training and testing a lightweight prediction model based on analyzing word frequencies using word2vec.

Utilized in BBOT within the ffuf_shortnames module.

Sample models are included in the trained_models folder. These were trained based on data harvested from the Common Crawl project.


Features

  • Train a Model: Train a Word2Vec model on a text corpus and extract word frequency data into a lightweight format.
  • Predict Words: Retrieve likely word completions for a given prefix using a given pre-trained model, ranked by frequency.

Installation

This project uses Poetry for dependency management.

Prerequisites

  • Python 3.9 or higher
  • Poetry (install with pip install poetry if not already installed)

Steps

  1. Clone the repository:

    git clone https://github.com/yourusername/word-predictor.git
    cd word-predictor
  2. Install dependencies:

    poetry install
  3. Activate the virtual environment:

    poetry shell

    Or run with poetry run:

    poetry run python3 wordpredictor.py
    

Usage

The tool supports two modes: train and test.

1. Training Mode

Train a Word2Vec-based word predictor on a custom text file.

Command

poetry run word-predictor train <file_path> [--min_count <value>] [--debug]

Arguments

  • <file_path>: Path to the input text file containing words (one per line).
  • --min_count: Minimum frequency for a word to be included in the vocabulary (default: 2).
  • --debug: Enable debug mode to print tokens during training.

Example

poetry run word-predictor train words.txt --min_count 5

2. Testing Mode

Test the predictor by retrieving predictions for a given prefix.

Command

poetry run word-predictor test <model_path> --prefix <prefix> --n <top_n>

Arguments

  • <model_path>: Path to the trained .pred file.
  • --prefix: Prefix to predict words for.
  • --n: Number of top predictions to retrieve.

Example

poetry run word-predictor test words.pred --prefix "pre" --n 5

About

Predicts complete words from prefixes using vector-based word similarity.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages