Small Models Hackaton Submission: Scrubdata | Monterrey .

Members-Only

Recent Talks & Demos are for members only

Exclusive feed

You must be an AI Tinkerers active member to view these talks and demos.

June 17, 2026 · Monterrey

ScrubData: Local Data Cleaning Plan

Discover ScrubData, a hands-off data cleaning tool using a small AI model to generate reversible, explained cleaning plans. See how it masks sensitive data and handles ambiguities, ensuring trustworthy data transformation.

Video
Overview
Links
Tech stack
  • Qwen3-4B-Instruct-2507
    An ultra-efficient 4-billion parameter language model optimized for rapid, non-thinking instruction following and 256K long-context reasoning.
    Alibaba's Qwen3-4B-Instruct-2507 delivers high-tier performance in a compact 4.0-billion parameter footprint. Operating in a dedicated non-thinking mode (bypassing slow reasoning blocks to output answers immediately), this model excels at instruction following, code generation, and multilingual tasks across 100+ languages. Its standout feature is a massive 256K token context window, allowing developers to process entire codebases or dense documents locally on consumer-grade hardware without sacrificing speed or accuracy.
  • llama
    Meta's open-weights LLM family optimized for high-performance local deployment and custom fine-tuning across 8B to 405B parameter scales.
    Llama 3.1 delivers state-of-the-art performance through a flagship 405B parameter model trained on 15 trillion tokens. It supports a 128k context window: ideal for analyzing massive datasets or long-form documentation. Developers utilize Llama for diverse tasks (multilingual translation, Python code generation, and complex reasoning) while maintaining data sovereignty via local hosting. The ecosystem includes the Llama Stack for agentic workflows and optimized weights for 8B and 70B models, ensuring high throughput on consumer hardware or enterprise clusters.
  • Ollama
    Deploy and run open-source Large Language Models (LLMs) like Llama 3 and Mistral locally on your machine: achieve private, cost-effective AI via a simple command-line interface.
    Ollama is the essential tool for running LLMs locally: consider it the Docker for AI models. It packages complex models and dependencies into a single, easy-to-use application for macOS, Linux, and Windows systems. You get immediate access to models like Gemma 2 and DeepSeek-R1 via a straightforward CLI or REST API. This local-first approach guarantees data privacy and security, eliminating cloud dependency and high API costs. Ollama also optimizes performance on consumer hardware using techniques like quantization, ensuring efficient execution even on standard desktops.
  • Gradio
    Gradio is the open-source Python library for rapidly building and sharing interactive web UIs for any machine learning model or Python function.
    Gradio is the essential tool for data scientists and ML engineers: it turns any Python function (including TensorFlow, PyTorch, and Hugging Face models) into a live, interactive web application with just a few lines of code. This open-source library eliminates the need for complex frontend development, handling all HTML, CSS, and JavaScript automatically. Developers define the function and specify inputs (e.g., 'text', 'image', 'slider') and outputs, then launch the interface locally, embed it in a notebook, or instantly generate a shareable public link. Gradio is widely adopted for quick prototyping, model demonstration, and deployment on platforms like Hugging Face Spaces, making complex models accessible to non-technical users for testing and feedback.
  • OpenTelemetry-GenAI
    OpenTelemetry-GenAI standardizes observability for generative AI applications by establishing a unified schema for tracking prompts, completions, token usage, and agent workflows.
    As engineering teams integrate large language models into production, tracking performance requires more than basic HTTP metrics. OpenTelemetry-GenAI solves this by defining standard semantic conventions (specifically starting with version 1.37) to capture critical LLM metadata. The framework instruments client libraries to record input and output token counts, model names, prompt and completion payloads, and complex agent tool calls. By standardizing this telemetry, developers can export structured traces and metrics directly to their existing observability pipelines (such as Datadog or Honeycomb) without maintaining parallel, vendor-specific SDKs.