Building a scalable pipeline for processing and analyzing network flow data, with a focus on anomaly detection and bot activity.
- Ingest and sample large network datasets with Polars
- Transform raw flow logs into feature-rich tabular format
- Develop modular ETL pipeline for local or streamed flow data
- Integrate anomaly detection and classification models
(e.g. Isolation Forest, LOF, Random Forest, LGBM) - Evaluate under real-world class imbalance
and that's Ritchie Vink - creator of Polars with my graffiti:
-
All of my projects are available at https://github.com/anopsy
-
How to reach me madkowalczuk@gmail.com
-
Fun fact 🎨 I paint graffiti portraits
🎨 Selected Projects ┣━━ Data Science Content Intern at NannyML: ┃ ┣━━ 📈Post-Deployment Data Science blogs ┃ ┃ ┣━━ 📉Data Quality and Covariate Shift ┃ ┃ ┗━━ 🌀Models aren't Forever ┃ ┣━━ contributed to the Research team on Anomaly Detection by evaluating multiple detection algorithms and generating synthetic datasets ┃ ┗━━ contributed to docs ┃ ┣━━ PyData and PyLadies Con speaker and volunteer at: ┃ ┣━━ 💽PyData Amsterdam 2024 Talk-Alice in Open Source Land ┃ ┣━━ 🤖PyLadiesCon 2024 Talk ┃ ┗━━ 🏃PyData Open Source Sprint ┃ ┣━━ Contributed to OSS at: ┃ ┣━━ 🧱scikit-lego ┃ ┃ ┣━━ contributed to docs ┃ ┃ ┗━━ made ColumnSelector dataframe agnostic using Narwhals ┃ ┗━━ 🐳🦄narwhals ┃ ┃ ┣━━ worked on pyarrow/dask backend implementation ┃ ┃ ┗━━ contributed to docs and tests ┃ ┗━━ 💡embetter ┃ ┣━━ deprecated a method ┃ ┗━━ added pre-commit hooks ┃ ┣━━ Juniors_vs_ChatGPT ┃ - Did ChatGPT replaced Juniors and Interns? ┃ ┣━━ data cleaning ┃ ┣━━ data wrangling ┃ ┣━━ data analysis ┃ ┣━━ modeling ┃ ┗━━ python🐍/API/polars🐻❄️/hvplot📊 ┃ ┣━━ Compensation Prediction ┃ - How much do Engineers earn? ┃ ┣━━ data modeling ┃ ┣━━ model evaluation ┃ ┣━━ containerization using docker ┃ ┣━━ building streamlit app ┃ ┗━━ python🐍/scikit-learn/streamlit📈/docker📦 ┃ ┣━━ MaskMap: Decoding the Hidden Spectrum ┃ - Prototype of a diagnosis support tool using the power of NLP to identify symptoms of Autistic Masking ┃ ┣━━ data scraping ┃ ┣━━ data cleaning ┃ ┣━━ modeling ┃ ┣━━ deploying ┃ ┗━━ python🐍/pandas🐼/FastAPI ┃ ┣━━ Equity in Healthcare: Women in Data Science Datathon 2024 ┃ - WIDS Datathon Project predicting a timely diagnosis of Metastatic Cancer ┃ ┣━━ data cleaning ┃ ┣━━ data wrangling ┃ ┣━━ data analysis ┃ ┣━━ modeling ┃ ┗━━ python🐍/pandas🐼/ensemble🌳/keras🧠 ┃ ┣━━ Relative Search Volumes Analysis ┃ - Search Volumes for Autism vs Autism Spectrum Disorder around the world ┃ ┣━━ data scraping ┃ ┣━━ data cleaning ┃ ┣━━ modeling WIP ┃ ┗━━ python🐍/pandas🐼 ┃ ┣━━ Steelplate Defect Visual EDA ┃ - Colorful joyplots for Visual EDA ┃ ┣━━ data visualization ┃ ┣━━ ensemble ┃ ┗━━ python🐍/pandas🐼/xgb🌳/seaborn🎨 ┃ ┣━━ hossenfelder - 🦺WIP ┃ - Data Analysis and Prediction of views on Sabine Hossenfelder YT channel ┃ ┣━━ data scraping ┃ ┣━━ data cleaning ┃ ┣━━ modeling WIP ┃ ┗━━ python🐍/pandas🐼 ┃ ┗━━ MyFalaClassifier - 🦺WIP - Detector of surfable waves ┣━━ live-stream scraping ┣━━ image processing ┣━━ transfer learning ┣━━ deploying ┗━━ python🐍/keras🧠


