Python PDF Parsing

GitHub Repo

A Little Bit of Background

In the summer of 2020, I interned at Front Rush as a Technical Implementation Specialist. I helped equipment managers organize their athletic inventory by implementing SQL scripts and Excel macros to automate repetitive processes. For example, clients would send over their orders/invoices from vendors such as Nike or Adidas, and we would generate spreadsheets with the information in a specific format to upload into the database.

My Python Project

During my internship, I implemented a new feature that assists with uploading images into inventory to complement the items. Typically clients would send their orders as Excel spreadsheets but some orders were only available in PDF format, so I developed Python code to tackle the issue. After doing some research online, I was able to save extracted images from orders and name them accordingly with information pulled from our database.

I further capitalized the idea of using Python to read PDF files by developing a script that parses the file and returns a formatted Excel spreadsheet in the desired format. The program is driven by the concept of regular expressions, a useful tool for finding patterns in text.

Impact

The programs I’d implemented allowed for clients to send in PDF files for orders that could not have been converted to an Excel spreadsheet. It also allowed for images to be attached to their items in our software, which helps equipment managers identify their items much more quickly and efficiently.

GitHub Repo