About Me

My name is Edilson Santos

I have a degree in Electrical Engineering and work as a Data Scientist in the financial market.

In addition, I engage in personal projects in Data Science to gain experience in solving business problems and mastering data analysis tools.

I am seeking opportunities to work professionally as a Data Scientist to assist the company in decision-making by building solutions using data.

Skills

Programming Languages

  • Python with a focus on data analysis.
  • Strategic Thinking.
  • SQL for data extraction.
  • ETL concepts
  • SQLite and Postgres databases.

Statistics and Machine Learning

  • Descriptive statistics (location, dispersion, skewness, kurtosis, density).
  • A/B testing (Chi-Squared, Z-Test, One-Sample T-Test, Mann-Whitney, ANOVA)
  • Algorithms for Regression, Classification, 'Learn to Rank', Clustering, Deep Neural Networks(Deep Learning), Convolutional Neural Networks(CNN), Time Series, Recurrenct Neural Networks(RNN), Long-Short Term Memory(LSTM), Natural Processing Language(NLP).
  • Techniques for data balancing, attribute selection, and dimensionality reduction.
  • Performance metrics for algorithms (RMSE, MAE, MAPE, Confusion Matrix, Precision, Recall, ROC Curve, Lift Curve, AUC, Silhouette Score, DB-Index)
  • Machine Learning packages: Sklearn and Scipy, Tensorflow, Keras.

Data Visualization

  • Matplotlib, Plotly, Seaborn.
  • Streamlit.
  • PowerBi, Metabase, Looker(GoogleStudio).

Software Engineering

  • Git, Github, Virtual Environment, Docker.
  • Streamlit Cloud, Flask, Python API's.
  • Cloud Heroku, Cloud ORender, AWS Amazon, Google Cloud Platform(GCP).

Experiences

7+ Complete Data Science Projects

Construction of data solutions for business problems, close to the real challenges of companies, using public data from Data Science competitions, where I addressed the problem from the conception of the business challenge, analyzing the problem using Python for Data Analysis and Exploration, Statistics, and Machine Learning Algorithms, to the publication of the trained algorithm in production, using Cloud Computing tools.

3+ Years as a Data Scientist

I have been working as a Data Scientist in the financial market for 3 years with the aim of enhancing trading techniques carried out by the proprietary trading desk. My responsibilities encompass a variety of techniques, including backtesting of strategies, correlation analysis, algorithmic trading, strategic risk management, and the development of predictive models using Machine Learning.

1+ Years as a Data Analyst

I work as a Data Analyst at a financial consulting company, where my role involves prospecting clients to assess their financial situation, identifying their financial goals, developing a customized plan to achieve those goals, implementing the plan, and regularly monitoring the client's progress.

Data Science Projects

Sentiment Analysis of Amazon Product Reviews using LSTM

The goal of this project was to develop an automated machine learning solution that could accurately categorize Amazon reviews as Positive, Neutral, or Negative, going beyond the limitations of traditional star ratings. The intent was to provide vendors with specific insights from reviews to enhance product quality and user experience on Amazon.

Tools such as Python, Natural Language Processing (NLP) libraries, and Long Short-Term Memory (LSTM) neural networks were employed to create the model.

The outcome was an efficient model that could identify and categorize negative reviews with a high recall of 84%. This underscores the significance of the Recall metric in capturing the majority of negative reviews. If implemented, Amazon could save an estimated $15,330,000 annually!

The tools used were:

  • Python, Pandas, Keras, Tensorflow
  • Jupyter Notebook
  • Git,Github
  • Long Short-Term Memory Neural Networks(LSTM)

Classification of Brain Tumors using Deep Learning.

The aim of the project was to build a classifier that would increase the accuracy of brain tumor diagnoses at the Health Care Hospital.

Tools such as Python, Keras, Tensorflow, and Convolutional Neural Networks (specifically the VGG-16 model) were used to build the model.

The result was a machine learning model with 99% accuracy in correctly classifying brain tumor images.

The tools used were:

  • Python, Pandas, NLP libraries, Tensorflow, Keras
  • Jupyter Notebook
  • Git,Github
  • Convolution Neural Networks(CNN)

A/B Testing for a Mobile Game.

The objective of this project was to design an A/B test for a Mobile Game. As players progress in the game, they encounter gates that force them to wait a while before they can proceed or make an in-app purchase. In this project, we will analyze the results of an A/B test in which the first gate in Cookie Cats was moved from level 30 to level 40. Specifically, we will analyze the impact on player retention.

Tools such as Python, Data Visualization libraries, and Statistical concepts were used to understand, model, and apply the A/B test.

The result concluded that there was a significant effect in moving the gate to level 40. However, this effect would lead to a decrease in revenue, making it more sensible to keep the gate as it was at level 30.

The tools used were:

  • Python, Pandas, Matplotlib, Seaborn
  • Jupyter Notebook
  • Git,Github
  • Statistics(Teste Chi-Squared)

Building a Loyalty Program with Customer Clustering.

The objective of this project was to develop a loyalty program to segment the customers of a fictitious E-Commerce company from the United Kingdom using clustering techniques.

Tools such as Python, Statistics, and unsupervised Machine Learning techniques were used to segment a group of customers based on their purchase characteristics. With these characteristics, a Dashboard was built in Metabase to monitor the metrics built in the project.

In addition, I used Software Engineering techniques to build a Cloud infrastructure on Amazon AWS to automate the production process.

The result of this solution, if implemented, would result in an increase in the company's revenue by about 12.5%.

The tools used were:

  • Python, Pandas, Matplotlib, Seaborn
  • Jupyter Notebook
  • K-Means, Hierachical Clustering, GaussianMixedModel
  • AWS Cloud(EC2, S3, Postgres, SQLite)
  • Git,Github
  • Estatística
  • Metabase Visualization

Health Insurance Cross-Selling

The goal of this project was to help a health insurance company also sell car insurance to its existing customers, i.e., to cross-sell. For this purpose, a Machine Learning model was developed to rank customers most likely to purchase car insurance, facilitating the company's communication strategy, optimizing its business, and reducing costs.

Tools like Python programming for Exploratory Analysis and Data Manipulation, Business Insights through Statistics, web tools like Github, and cloud servers like Heroku were used. I also used Machine Learning algorithms, specifically Supervised Classification algorithms.

The result was a spreadsheet in Google Sheets that, given a customer base, made predictions through an API that fetched the Machine Learning model from the Cloud and returned this customer base sorted by the highest probability of purchasing car insurance.

The tools used were:

  • Python
  • Jupyter Notebook
  • Heroku Cloud
  • Machine Learning
  • Git, Github
  • Statistics

Sales Forecast for a Pharmacy Chain.

In this project, a sales forecast was made for the Rossmann pharmacy chain for the next 6 weeks, which can be accessed by a bot on Telegram from any internet-enabled device.

Tools like Python programming for Exploratory Analysis and Data Manipulation, web tools like Github, and ORender were used. Machine Learning concepts and models such as XGBoost, Random Forest, among others, were also utilized.

The final result was a Telegram Bot that accesses the Machine Learning model published in the Cloud and provides information about the sales forecast for a specific store.

The tools used were:

  • Python
  • Jupyter Notebook
  • Render Cloud
  • Machine Learning
  • Github

Development of a Dashboard for a Restaurant Marketplace using Streamlit.

In this project, using Python programming concepts, Data Manipulation, Strategic Thinking, and Business Logic, along with web development tools like Streamlit and Github, a Management Panel was developed with the main business metrics for a restaurant marketplace operating in several countries and continents.

The final result was a Dashboard hosted in a Cloud environment available through the web. The panel can be accessed using any internet-connected device.

The tools used were:

  • Python
  • Jupyter Lab
  • Terminal
  • Streamlit
  • Streamlit Cloud
  • Github

Contact

Feel free to get in touch.