Skip to main content

Vurvey Labs Datasets Uploading Best Practices

These are the best practices for maximizing your data within the Vurvey AI ecosystem.

Andrew @ Vurvey avatar
Written by Andrew @ Vurvey
Updated over 2 months ago

Vurvey uses complex and proprietary data embedding systems stemming from our SenseMake™ Model that empowers the ingestion, embedding and knowledge graphing of complex and disparate data for optimal AI conversational interfaces. To ensure the best results, this guide will walk you through the optimal ways to prepare and upload your datasets, supported file types, and the critical role of data taxonomy. Spoiler alert: when in doubt, PDF is your golden ticket!


Supported File Types and Key Considerations

Vurvey Labs supports a variety of file formats, each with specific capabilities and limitations. Here’s a breakdown of what we accept and how to make the most of them:

  1. Videos

    • Supported Formats: .mp4, .avi, .mov

    • Max File Size: 100 MB

    • What We Extract: Transcripts only (no imagery or video content is processed).

    • Best Practice: If your video contains spoken content you want analyzed, ensure the audio is clear. Vurvey will capture the visual details of the video and settings, will capture sounds beyond just voice and transcribe all spoken words for you.

  2. Documents

    • Supported Formats: .pdf, .json, .txt, .csv

    • Max File Size: 50 MB

    • Key Notes:

      • PDFs are the VIPs here: Vurvey can extract both text and imagery from PDFs, making them the most versatile and powerful option. If you’re debating which format to use, export your file as a PDF first—it’s hands-down the superior choice.

      • .json and .txt: Great for raw text data but no imagery support.

      • .csv: Perfect for structured data, but avoid splitting columns across pages (more on this below).

  3. MS Office Files

    • Supported Formats: .docx (Word), .xlsx (Excel), .pptx (PowerPoint)

    • Max File Size: 50 MB

    • Key Notes: These are solid options for text and basic formatting, but Vurvey can’t see imagery in these formats. For best results, convert .docx or .pptx files to PDF before uploading to capture visuals too. For .xlsx, see our CSV tip below.


Why PDF Reigns Supreme

Let’s be real: PDFs are the rockstars of Vurvey Labs uploads. Here’s why:

  • Versatility: They handle text, images, and formatting like champs, giving Vurvey the fullest picture of your data.

  • Consistency: Unlike other formats, PDFs preserve your content exactly as you see it—no surprises during processing.

  • Imagery Support: Want Vurvey to analyze a chart, diagram, or photo? PDFs are the only format where we can “see” those visuals.

Pro Tip: Before uploading any file, ask yourself: “Can I export this as a PDF?” If yes, do it. You’ll thank yourself later when your data shines in the embedding stage.


Special Tips for CSV and Excel Files

Structured data like spreadsheets can be tricky and AI generally performs poorly with tabular CSV data—especially large files—because the structure of the data inherently requires access to the entire dataset to be meaningful. Retrieving “chunks” (this is how generative AI RAG systems work) doesn’t make sense unless you’re asking simple questions like “what are the column headers?” The real value in tabular data comes from its structured format, which enables aggregation and analysis. That’s exactly what today's AI systems and LLM's struggle with—they weren’t built for scenarios where context is distributed across many rows and columns.

Our current recommendation is to export excel files and CSV files as single page width PDF files. Even if the human eye cannot read the text due to it's small size, this is the current best practice. This will not allow you to count or sum mentions or items but will help the system see the data with as much context as possible. For exporting to PDF, follow the below best practices before uploading to Vurvey Datasets:

  • Single-Page Layout: Ensure all columns fit on one page. If columns spill across multiple pages (e.g., due to narrow page widths in an export), Vurvey won’t be able to stitch them together, and your data will turn into gibberish.

  • How to Fix It:

    1. In Excel, adjust column widths and use “Fit to One Page” in Print Settings before saving.

    2. Export as PDF to lock in the layout.

  • When in Doubt: Convert to PDF. Yes, we’re saying it again—PDFs avoid these headaches entirely.


The Power of Data Taxonomy: Why It Matters

Read article we wrote on this here: https://help.vurvey.com/en/articles/10874197-why-clean-data-for-ai

Uploading files is just the start. To get meaningful results from Vurvey’s processing and embedding steps, your data needs structure—enter data taxonomy. Think of taxonomy as the “filing system” for your content. It’s how you organize and label your data so the AI can understand and retrieve it effectively.

Here’s why it’s a game-changer:

  • Clarity for AI: A well-organized file (e.g., clear headings, consistent labels, logical sections) helps Vurvey “read” your intent, not just your words.

  • Searchability: Embedding success depends on the AI knowing what’s what. If your PDF has a table of contents, titled sections, or tagged visuals, retrieval becomes lightning-fast and accurate.

  • Avoiding Chaos: A jumbled mess of text or unlabeled data (think: a 50-page .txt file with no breaks) forces the AI to guess—and it’s not a mind reader.

How to Build a Solid Taxonomy:

  • Use Headings: Break up documents with descriptive titles (e.g., “Sales Data Q1” vs. random numbers).

  • Label Visuals: In PDFs, add captions to charts or images (e.g., “Revenue Growth 2024”).

  • Keep It Consistent: If you’re uploading multiple files, use the same naming conventions or section styles.

  • Test It: Before uploading, ask: “Could a stranger pick this up and understand it?” If not, tweak it.


The Three-Step Process: What to Expect

Here’s how your upload journey unfolds in Vurvey Labs:

  1. Uploading: Drop your files into the system. Stick to the supported formats and size limits, and prioritize PDFs for the win.

  2. Processing: Vurvey extracts text (and imagery from PDFs), transcribes videos, and preps your data for embedding. Clean, well-structured files shine here.

  3. Success: Your data gets transformed into a searchable, AI-ready format. A strong taxonomy and PDF usage mean faster, richer insights.


Quick Recap: Your Upload Checklist

  • File Type: PDF first, always. Convert .docx, .pptx, or .xlsx when possible.

  • Size Limits: 100 MB for videos, 50 MB for documents and MS Office files.

  • Videos: Clear audio for transcripts; visuals won’t be seen.

  • CSV/Excel: All columns on one page—or export as PDF.

  • Taxonomy: Organize with headings, labels, and consistency for AI-friendly results.

By following these best practices, you’ll set yourself up for success with Vurvey Labs. Have questions? We’re here to help—happy uploading!

Did this answer your question?