Our Work

OUR WORK

We have completed projects and built solutions across an array of industries and sectors. Our collective machine learning, data science, and data engineering experience spans fintech, mobility services, retail, and telecommunications. Have a browse through our case studies below to find out more about the work we’ve done in each area. If you’d like to hear more about our work then don’t hesitate to get in touch. 

AI Services with Advanced Data Augmentation

Our AI services lead in data augmentation and machine learning innovation, providing robust solutions to complex industry challenges. Through strategic data augmentation in machine learning, we boost system accuracy and efficiency, tailoring AI to fit the unique needs of fintech, telecom, and customer service sectors. Our approach streamlines operations and enhances decision-making, ensuring our clients stay ahead in a data-driven landscape.

 

 
 Fintech Solutions

 


Customer Service Optimization

 

 
Model Generation & Anomaly Detection

 

 
Marketing & Advertising Insights

 

 
 Data Augmentation

 

 
Optimization & Forecasting

Identity Fraud Detection
for Credit Card Users

The Challenge:

To create a model that could improve identity fraud detection in credit card users while maintaining variable explainability. 

The Factored Solution:

We implemented an experimentation pipeline to detect the best model given a set of variables. A deep analysis of variables was established, which improved model accuracy. 

The Outcome:

The detection rate for cases of identity fraud improved by 7.5%, which saved our client time and money as they no longer had to spend on or carry out additional processes.

We used a variety of skills and tech stack tools including: TensorFlow, Python, LGBM, XGBoost, Hyperopt, Deep Learning, Fraud Detection, Forecasting, and Model Interpretability. 

Credit Card Default Model
with Deep Learning

The Challenge:

To create better models for credit card default prediction among current clients, without needing to utilize any additional features other than the ones already available.

The Factored Solution:

We used deep learning models to encode temporal features that classic approaches could not facilitate during optimization. 

The Outcome:

We created a more accurate representation of customers who’d been using the product for more than 6 months (50% of users), and reduced the effort required for feature engineering. 

 

We used a variety of skills and tech stack tools including: TensorFlow, Python, Deep Learning, RNNs, Forecasting, and Credit Scoring. 

Creating Alternative Credit
Scoring Methods

The Challenge:

Traditional credit scoring requires clients to have a credit history, and when they don’t they are either denied their applications or are given excessive interest rates. The challenge here was to build an alternative credit scoring system for small and medium enterprises (SMEs) to assess and understand their own financial health. 

The Factored Solution:

With a combination of machine learning and credit risk expertise, we built a set of models that allowed us to measure the credit health of SMEs using only transactional data from their bank accounts.

The Outcome:

We built a scalable web application for SMEs to link their bank accounts and ultimately monitor their own credit health.

 

The tech stack and skills we used to implement this include: PostgreSQL, AWS, Docker, Celery, Jenkins, GraphQL, Vue.js, Microservice-Oriented Architectures, APIs, Containerization, and CI/CD.

Customer Service
Ticket Categorization

The Challenge:

Customer experience agents are required to assign a category to customer inquiries regarding product improvement. This process takes a lot of human work, rendering it not only inefficient, but also vulnerable to misclassification. We needed to find a way to detect misclassified cases and assign them to agents for re-labeling.

The Factored Solution:

We used state-of-the-art NLP representation learning to detect when a category is misclassified and active learning techniques to balance out the volume of cases sent back to the agents. 

The Outcome:

Our system managed to detect 8 of every 10 misclassifications across more than 50 classes. The system was then extended to include the sending of the cases to agents for re-categorization.

 

We used a range of tech stack tools and skills including: TensorFlow, Docker, Flask, Natural Language Processing, Transfer Learning, BERT, ALBERT, Siamese Networks, Triplet Loss, and Deployment.

Renaming Transactions for
Improved Customer Experience

The Challenge:

Customer experience in banking is impacted by transactions that don’t have a clear description. Customer service teams receive complaints and queries because sometimes legitimate transactions are not recognized by users due to unclear transaction names and descriptions.

The Factored Solution:

We recovered the descriptions of transactions using word frequency and clustering to extract a clear description from a confusing text or assign a description from a similar transaction. 

The Outcome:

The database went from having 0.5% of transactions with clear descriptions to 40%. 

 

We utilized various skills and a tech stack including: Scikit-Learn, TensorFlow, Dask, Natural Language Processing, and Clustering. 

Quality of Service Evaluation
Using Audio Calls

The Challenge:

To create a system that would be able to automatically evaluate call center call recordings to simplify the job of the human evaluators and give feedback to agents as soon as possible.

The Factored Solution:

We automated the data flow of both calls and evaluations from our client’s systems in order to use these as input to train a classifier, as well as serve the model through a REST API for determining whether a call needs an additional reviewer.

The Outcome:

We created an initial system to automatically identify issues within calls and help to improve the overall quality of customer service.

 

The expertise and tech stack we utilized includes: AWS Transcribe, GCP Speech-to-Text, AWS S3, AWS Lambda, Pydub, Scikit-learn, Boto3, SciPy, spaCy, Flask, MongoDB, Docker, Cloud Computing, Audio File Processing, Multivariate Gaussian Distribution, Logistic Regression, Natural Language Processing, System Design, and REST APIs. 

Automated Model Generation
With Interpretability

The Challenge:

To generate models automatically for a telco company by using just input metadata and a configuration file, and extract the feature importance with directionality.

The Factored Solution:

We implemented a pipeline that could train tree-based and boosting models for regression/classification problems, allowing the use of hyperparameter optimization and random/custom splits as well as the execution of different methods for feature importance extraction.

The Outcome:

An easy-to-use pipeline that receives one input and one modeling configuration file and then automatically produces a fitted model with its evaluation and a new file containing a summary of the feature interpretability.

 

We used an array of tech stack tools and skills including: Treeinterpreter, SHAP, Scikit-learn, LightGBM, SciPy, Hyperopt, AWS EC2, XLWings, Model Interpretability, Ensemble and Boosting Methods, Model Evaluation, and Excel-Python Integration. 

AI Search Assistant
Using NLP

The Challenge:

To formulate the best question possible to find the right information in a search engine or forum such as StackOverflow. 

The Factored Solution:

We built an NLP powered assistant plug-in that provides real-time feedback to improve and specify the question asked, before then launching the search. 

The Outcome:

The feedback provided by the plug-in was able to improve 70% of the initial questions asked.

 

We used a range of skills and tech stack tools including: Hugging Face API for SOTA models in NLP (transformers, pre-trained word embeddings), Docker (MLOps), FastAPI & Flask (REST API), Tensorflow Data Pipelines, Interpretable ML with Inferential Algorithms such as SHAP, and a Full Stack end-to-end ML Solution deployable at scale thanks to Docker. 

Anomaly Detection
on Network Management

The Challenge:

To improve the existing fault detection network system by significantly reducing the number of false positives, while improving accuracy on detecting global anomalies.

The Factored Solution:

We implemented an unsupervised learning model to detect anomalous time series and also anomalies within time series

The Outcome:

We created a more efficient online anomaly detection function, with a much lower number of false positives, which fulfilled the client’s desired objective. 

 

We utilized the following tech stack and skills to implement this solution: Apache Kafka and AWS ECR and Docker images, key-value storage databases, Golang/Python, Clustering, Incremental Statistics, and Kernel Estimation. 

Social Media
Engagement Prediction

The Challenge:

To predict the type of engagement (like, reply, retweet or retweet with comment) that a user will opt for in relation to a given tweet.

The Factored Solution:

Using the text from tweets and anonymized account information, we created a deep learning model that could predict how users would interact or engage with tweets they see on Twitter.

The Outcome:

We built a system that could automatically predict the type and probability of engagement for a particular tweet by a specific user. We achieved a very similar PRAUC-RCE to the state of the art model. 

 

The skills and tech stack that we used included: Apache Spark, Hadoop, TensorFlow, AWS EMR, AWS EC2 and AWS S3. 

Real-Time Bidding for
Programmatic Advertising

The Challenge:

To optimize our client’s $20,000 USD per month of advertising spend at the impression level.

The Factored Solution:

We leveraged feature-enriched regressions and Reinforcement Learning to bid on individual impressions.

The Outcome:

We deployed a bidding system utilizing Beeswax physical infrastructure, capable of handling up to 85,000 queries per second. 

 

We used the following skills and tech stack to build the solution: TensorFlow, Snowflake, Airflow, Astronomer, Beeswax, S3, Data Pipeline Scaling,  Programmatic Advertising, and Reinforcement Learning.

Click-Through Rate (CTR)
Prediction

The Challenge:

To determine whether a user on a social media platform will engage with content, using the content’s text and tabular features describing the user type. We had over 30 million users that could possibly interact with over 160 million content pieces.

The Factored Solution:

We built a pipeline for feature engineering and training using Spark, PyTorch, and TensorFlow that supported processing and training with 800 GB of data.

The Outcome:

Our model outperformed other solutions with ensembles of tree-based methods and a publication describing our approach was accepted by RecSys, one of the most important conferences in the field of recommender systems. 

 

We used the following skills and tech stack to create this solution: Spark, PyTorch, TensorFlow, scikit-learn, AWS EMR, AWS EC2, AWS S3, Big Data Processing, Natural Language Processing, Cloud Infrastructure, Clustering, and Recommender Systems.

Microservices for Augmenting
Text Data

The Challenge:

Companies usually have large amounts of raw text data. Without a proper structure, the information contained within this text data cannot be fully exploited. The challenge we faced in this instance was to create a system that would allow us to better analyze text data.

The Factored Solution:

We deployed a set of microservices, accessible through API calls, to generate sentiment, topic and entity features for raw text.

The Outcome:

Augmenting text data with structured features helped to improve downstream models and enabled better text analytics.

 

We utilized a variety of skills and tech stack tools to implement this, including: TensorFlow, Docker, Flask, Natural Language Processing, and Microservice-Oriented Architectures. 

Generative Techniques
on Production

The Challenge:

To use state of the art generative techniques to feed machine learning models in real time with real-like augmented data to improve accuracy, particularly in object detection tasks.

The Factored Solution:

We deployed a system that can be used as a simple library or API to generate synthetic images in real time and feed them to our client’s models to improve generalization capabilities in unseen data.

The Outcome:

We achieved a faster convergence of model training than traditional augmentation techniques.

 

The skills and tech stack utilized here included: TensorFlow, Pytorch, Python, Generative Models, Style Transfer, Data Augmentation, CNNs, Computer Vision, and Image Processing. 

Automatic Labeling
of Data

The Challenge:

Despite not having any labeled data to hand, we were asked to classify Twitter data into certain groups, determining whether a specific tweet shows a certain emotion or sentiment.

The Factored Solution:

We automatically labeled certain data points, using a set of keywords associated with each category of emotion or sentiment, such as happiness, sadness, anxiety, anger etc. We then trained NLP models based on the labeled data (while of course removing the words used to label each category so the model could classify tweets according to each group). 

The Outcome:

We deployed this model using AWS services, and it now generates accurate predictions as to which sentiment or emotion a particular tweet will show every hour for our client. Our solution is fully integrated with our client’s backend.

 

The tech stack and skills we used for this model included: TensorFlow, NLP, AWS S3, AWS lambda, AWS EC2, PostgresSQL, and FastAPI.

Fleet Resource
Planning

The Challenge:

To find a suitable algorithm so that, given certain locations within a U.S. state, we could allocate a fleet (of vehicles) to satisfy demand and avoid stockout issues. 

The Factored Solution:

By looking at the utilization history of the fleet, we generated a temporal forecast of the required fleet per location. Then, by running a mixed algorithm, we were able to effectively allocate vehicles to maximize efficiency and minimize stockout issues.  

The Outcome:

An increase in fleet efficiency and downtime scheduling optimization led to an increase in KPIs for our client. 

 

The skills and tech stack tools we utilized for this included: Python, Scikit-learn, Econometric Packages, Linear Optimization, Resource Allocation, Greedy Algorithms, and Stochastic Simulations. 

The Challenge:

To build KPI forecasts that were valid in 2020, despite unprecedented trends, and make the forecasts available to key decision-makers.

The Factored Solution:

We deployed a system of multiple modules that we scheduled to train daily. We parallelized the process so we could train for various locations in a reduced amount of time.

The Outcome:

The model deployed would update daily and could create KPI forecasts for over 3,000 locations in less than an hour. Decision-makers could visualize the updated forecasts in a Tableau and filter by geographic location and date.

 

The skills and tech stack tools utilized for this included: Pyspark, AWS S3, SciPy, NumPy, pandas, Ray, AWS Redshift, Tableau, Parallel Computing, Dashboard Development, Cloud Computing, and Time Series.

Economic Impact
Predictor

The Challenge:

To predict the social and economic impact on specific geographical regions caused by the Covid-19 pandemic and the public policies taken to contain such an outbreak (for example, lockdowns).

The Factored Solution:

We built a system that used deep learning to forecast new incidences in the following week. These incidence predictions, together with public data from the government, were used to detect regions that were similarly impacted economically and socially by the pandemic.

The Outcome:

The system was deployed as part of an early warning platform, which was deployed and maintained using Factored’s cloud (AWS). Additionally, the solution was exposed in a paper that was accepted and presented at NeurIPS 2020.

We used a variety of skills and tech stack tools, including: Docker, TensorFlow, Elastic Container Service, Kubernetes. S3, Time Series Forecasting, Unsupervised Learning, DevOps: AWS container services, and AWS container orchestration tools.