Skip to content

Why Pypdex?

Excel is a fantastic tool to due to it's reactive interface that provides real-time updates when you change cell values. It is such an indespensible tool that most businesses use it for the bulk of their analyses...but working with Big Data in Excel is nearly impossible.

The Problem

The problem is that Excel has size and speed limitations.

  • Excel is limited to ~1 million rows of data
  • Excel is limited to the amount of data size it can reasonably handle

Note

You can find the published limitations here

  • If you use 32-bit Excel you are limited to 2GB of RAM
  • 64-bit does not impose a limit....but if you are using multiple GB of RAM for Excel you have experienced how it becomes unresponsive and may simply time out when trying to do complex calculations

You may be asking at this point what about Power Query? Doesn't Power Query solve the problem of cleaning up and working with large datasets to get them down to a size that an Excel worksheet can handle?

You would be correct! Power Query does provide some data transformation capabilities...but it also can become unresponsive and slow when working with multi-GB files or doing many different transformations

Note

Power Queries published limitations can be found here

All these limitations result in work-arounds as your datasets grow. Unless we spend a great deal of time learning how to program, then we often will have to:

  1. Split up datasets into something more manageable for Excel to handle or
  2. Only work with a subset of the data and hope we don't miss something critical

Either way, we lose context by not seeing the whole picture of the data we are working with. This inevitably leads to either missing information or duplication of information since because we may not have split up or subsettted the data correctly in the first place. The errors only compound when you start combining datasets together.

Anyone who has worked in industry as an analyst knows that data-driven answers to questions your boss or co-worker(s) ask are rarely are simple enough to answer with one dataset. And those answers often drive important decisions that matter. Thus, it is critical to be as accurate as possible.

This is why it is so important to be able to see the whole picture for each dataset you working with.

Solution

Pypdex aims to make it simple to work with large datasets (even larger than RAM!) while giving you a similar feel to what you'd expect with Excel. The whole point is to make working with Big Data fast and easy

Note

By Big Data, we mean any data that is large enough to fit on your computer

You'll get - Fast Feedback for all you data transformations

Below is a simple example to whet your appetite using a 5 million row sales dataset:

Who is Pypdex for?

Pypdex is geared for Excel/spreadsheet users so they can clean, transform, and combine datasets that can be used within their spreadsheet for their analyses. No coding is required.