Word Predictor

Word Prediction Tool

This project provides a simple tool for training and testing a lightweight prediction model based on analyzing word frequencies using word2vec.

Utilized in BBOT within the ffuf_shortnames module.

Sample models are included in the trained_models folder. These were trained based on data harvested from the Common Crawl project.

Features

Train a Model: Train a Word2Vec model on a text corpus and extract word frequency data into a lightweight format.
Predict Words: Retrieve likely word completions for a given prefix using a given pre-trained model, ranked by frequency.

Installation

This project uses Poetry for dependency management.

Prerequisites

Python 3.9 or higher
Poetry (install with pip install poetry if not already installed)

Steps

Clone the repository:

git clone https://github.com/yourusername/word-predictor.git
cd word-predictor

Install dependencies:
```
poetry install
```
Activate the virtual environment:
```
poetry shell
```
Or run with poetry run:
```
poetry run python3 wordpredictor.py
```

Usage

The tool supports two modes: train and test.

1. Training Mode

Train a Word2Vec-based word predictor on a custom text file.

Command

poetry run word-predictor train <file_path> [--min_count <value>] [--debug]

Arguments

<file_path>: Path to the input text file containing words (one per line).
--min_count: Minimum frequency for a word to be included in the vocabulary (default: 2).
--debug: Enable debug mode to print tokens during training.

Example

poetry run word-predictor train words.txt --min_count 5

2. Testing Mode

Test the predictor by retrieving predictions for a given prefix.

Command

poetry run word-predictor test <model_path> --prefix <prefix> --n <top_n>

Arguments

<model_path>: Path to the trained .pred file.
--prefix: Prefix to predict words for.
--n: Number of top predictions to retrieve.

Example

poetry run word-predictor test words.pred --prefix "pre" --n 5

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
trained_models		trained_models
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
wordpredictor.py		wordpredictor.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Word Predictor

Word Prediction Tool

Features

Installation

Prerequisites

Steps

Usage

1. Training Mode

Command

Arguments

Example

2. Testing Mode

Command

Arguments

Example

About

Releases

Packages

Languages

License

blacklanternsecurity/wordpredictor

Folders and files

Latest commit

History

Repository files navigation

Word Predictor

Word Prediction Tool

Features

Installation

Prerequisites

Steps

Usage

1. Training Mode

Command

Arguments

Example

2. Testing Mode

Command

Arguments

Example

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages