Skills Nest

Filter by tags

Found 88 skills
X-Trend Architecture

Implementation guide for X-Trend (Cross Attentive Time-Series Trend Network), combining LSTM encoders, multi-head attention mechanisms, and few-shot learning for trend-following trading strategies with interpretable predictions.

Unsloth Full Fine-Tuning

Performs full fine-tuning (FFT) with 100% exact weight updates using Unsloth's optimized gradient checkpointing, enabling larger batch sizes and complete model modification for base model pre-training and continued pre-training tasks.

Type Hint Adder

Add comprehensive type hints to Python functions and methods with special support for PyTorch tensor types. Improves code maintainability, enables static type checking with mypy, and provides better IDE support for scientific computing code.

TorchServe Model Serving

Production-ready PyTorch model serving engine that handles MAR packaging, custom handlers for preprocessing/inference, multi-GPU worker scaling, and model version management via REST/gRPC API.

PyTorch Torch Compile Optimization

Optimize PyTorch models using torch.compile (TorchDynamo/Inductor) for JIT compilation into optimized kernels. Focuses on reducing Python overhead, managing compile overhead, debugging graph breaks, and proper benchmarking methodology with warmup runs.

PyTorch Lightning

A high-level training framework for PyTorch that automates 40+ engineering details like epoch loops, optimization, and hardware acceleration while maintaining flexibility. Supports multi-GPU/multi-node scaling, reproducibility, and decouples research code from engineering boilerplate.

PyTorch Geometric (PyG)

A library built on PyTorch for implementing Graph Neural Networks (GNNs). Provides MessagePassing layers, modular aggregation schemes, and efficient mini-batching for handling large graphs through disjoint graph representation.

PyTorch CUDA Environment Configuration

Configure and verify PyTorch CUDA 13 environment including toolkit, driver requirements, and wheel compatibility. Provides guidance on CUDA setup verification, runtime checks, and accurate GPU timing for NVIDIA GPUs.

NVIDIA NeMo Framework

Enterprise AI platform for building, customizing, and deploying generative AI models and agents. Includes NeMo Retriever for RAG pipelines, NeMo Customizer for fine-tuning, NeMo Guardrails for safety, and tools for data curation, evaluation, and multi-agent orchestration.

Loop Vectorizer

Converts inefficient Python loops into fast vectorized PyTorch tensor operations, achieving 10-10000x performance improvements by leveraging GPU acceleration and batch processing.

Implement Paper From Scratch

A systematic guide for implementing research papers step-by-step from scratch. Focuses on building deep understanding through methodical implementation with checkpoint questions, debugging strategies, and verification steps for each component.

GPU CLI - Cloud GPU Execution Tool

Run Python and ML code on cloud GPUs seamlessly. Prefix commands with 'gpu' to execute on remote GPUs via RunPod. Supports model training, ComfyUI, Stable Diffusion, and LLM inference with automatic provisioning, code syncing, and output management.

Gemini Image Generation

Image generation using Google's Imagen and Gemini native models. Supports text-to-image creation, image editing, iterative refinement, and multi-turn conversational generation with SynthID watermarks.

CoreML Optimizer

Expert guidance for optimizing machine learning models for Apple's CoreML framework on iOS and macOS. Covers quantization, palettization, pruning, Neural Engine targeting, compute unit selection, and performance profiling to reduce model size and improve inference latency.

Complex Tensor Handler for PyTorch

Handle complex-valued tensors in PyTorch for astronomical imaging applications, including FFT operations, phase/amplitude conversions, and complex arithmetic for neural networks.

Beam Tracking ML Pipeline Design

Design and refactor beam tracking ML/RL pipelines with CSI teacher and RSRP student models, enforce shape contracts, and produce inference-safe models for wireless communication systems.

CUA Computer Use Agent Framework

Open-source framework for building AI agents that automate desktop applications through vision-based UI control. Supports multi-platform automation (Windows/Linux/macOS), 100+ LLM providers, and autonomous task execution with screenshot analysis, mouse/keyboard control, and cloud/local deployment options.

Unsloth Quantization

Advanced quantization techniques for LLM fine-tuning using Dynamic 4-bit, FP8 training, and 8-bit optimizers to minimize VRAM usage while maintaining accuracy on memory-constrained GPUs.

Unsloth Long Context Training

Training models on extended context lengths (up to 89K+ tokens) using optimized RoPE scaling and memory-efficient Triton kernels, enabling 4x longer context windows with 30% less memory usage than Flash Attention 2.

Unsloth Direct Preference Optimization (DPO)

Memory-efficient Direct Preference Optimization for aligning language models with human preferences using paired chosen/rejected data, without requiring a separate reference model. Optimized for low VRAM environments with FP8 support.

TRL Training on Hugging Face Jobs

Train and fine-tune language models using TRL (Transformer Reinforcement Learning) on Hugging Face cloud infrastructure. Supports SFT, DPO, GRPO, and reward modeling methods with automatic GPU provisioning, real-time monitoring via Trackio, and GGUF conversion for local deployment.

TorchVision Computer Vision Library

Computer vision library for PyTorch featuring pretrained models, advanced v2 image transforms, and utilities for handling complex data types like bounding boxes and masks. Supports standard CV tasks including classification, detection, and segmentation with performance-optimized augmentations.

TorchAudio

Audio signal processing library for PyTorch that enables GPU-accelerated feature extraction, waveform manipulation, and data augmentation for speech recognition and audio ML tasks.

AMD Strix Halo PyTorch Setup

Complete setup assistant for AMD Strix Halo (Ryzen AI MAX+ 395) PyTorch environments. Handles ROCm installation verification, community PyTorch builds for gfx1151, GTT memory configuration, and environment setup for running 30B parameter ML models.

scARCHES - Single-Cell Architecture Surgery

Deep learning library for single-cell chromatin accessibility and multi-omics data analysis. Provides model surgery techniques for scATAC-seq/scRNA-seq integration, batch correction, reference mapping, and transfer learning across datasets using SCVI, TRVAE, and SCANVI models.

PyTorch Quantization

Model optimization techniques using INT8 quantization for size reduction and inference acceleration, supporting Post-Training Quantization (PTQ) and Quantization Aware Training (QAT) on FBGEMM and QNNPACK backends.

PyTorch Distributed Training

Distributed training strategies for PyTorch including DistributedDataParallel (DDP) and Fully Sharded Data Parallel (FSDP), enabling multi-GPU and multi-node model training with efficient process management and checkpointing.

PyTorch Core Fundamentals

Core PyTorch library for deep learning, providing tensor operations, automatic differentiation (autograd), neural network modules, and training loop orchestration with GPU acceleration and memory optimization capabilities.

PowerGraph GNN Research Pipeline

A research pipeline for topology-aware GNN representation learning on power grids, featuring physics-guided models for power flow, optimal power flow, and cascading failure prediction with self-supervised pretraining capabilities.

Enterprise Machine Learning

Enterprise ML specialist with TensorFlow 2.20, PyTorch 2.9, and Scikit-learn 1.7 expertise. Provides AutoML, neural architecture search, MLOps automation, and production deployment with comprehensive monitoring and experiment tracking.