Ben Pollins - Portfolio

As a Mechanical Engineer working in construction, I used mechanical schematics (system drawings) daily to understand the systems, identify the scope of work, and inform commissioning strategies. One particular use of these drawings is the initial take-off of all components (i.e. counting up all pumps, valves, meters, etc.) so the correct materials can be purchased in the required quantities. While this is an essential task for the estimating and procurement stages of all projects, it has always been done the same way: one person is given a stack of drawings to review and count manually. Not only is this a very slow and time-consuming process, but the final totals almost always contain human error, leading to costly re-orders and delays to the project.

To solve this, I have created an application to remove the manual nature of this task - allowing users to import a pdf file of their drawing, returning a table that summarises all components and their quanitities. Written in Python, the application reads the drawing, identifies pipework components and leverages Machine Learning to classify and tally the results.

While this solves one problem - it is only just the start! Training a model to read and understand schematics can have huge applications across construction, from interrogating system design in the early stages of the project to advising plant-replacement strategies following handover - this tool has only scratched the surface.

Many modern banking apps have in-built tools to assist customers with budgeting - from classifying transactions to tracking your spending at specific retailers, these tools provide interesting insight into the customer and where their money is spent.

Having banked with the same company for the last 8 years, I do not have these nifty tools at my disposal. To gain the same level of insight into my spending, I decided to download all transaction data from my account, and using Python and the matplotlib library I undertook an exploratory analysis of this information. Using the limited information in each transaction's description, I could identify characteristic trends in my spending, allowing me to effectively budget my finances and save up to £100 a month.

Predicting loan defaults is crucial for financial institutions to minimize risk and make informed lending decisions. By identifying potential defaulters early, banks and lenders can take proactive measures to mitigate losses, adjust credit policies, and offer tailored financial solutions. Predicting these defaults is an interesting challenge, so in this project I developed a machine learning model to do just that, using data obtained from Kaggle and using PySpark for data processing.

Using Python and PySpark I loaded the data and started investigating - looking for high-level patterns in the distribution of the data and exploring the correlation between variables. I then looked to clean and preprocess the data by handling missing values, encoding categorical variables, and normalising numerical features. Careful feature engineering and model selection allowed me to create new features that captured the complex relationships within the data and find the most suitable algorithms to model them, ready for an iterative process of hyperparameter tuning to optimize performance.

The result was an algorthm that could predict a loan default with an accuracy and recall of over 90% - and one that could be reliably scaled by multitudes with ease.

PORTFOLIO

Automated Schematic Reader

EXPLORATORY DATA ANALYSIS OF MY PERSONAL FINANCES

LOAN RISK PREDICTOR WITH PYSPARK