Awesome Data Analysis
#
400+ curated tools, libraries, cheatsheets, roadmaps, and tutorials to master data analysis. Perfect for beginners and experienced data analysts and scientists.
📑 Contents#
🏆 Awesome Data Science Repositories#
Curated collections of high-quality GitHub repos for inspiration and learning.
Awesome Data Science - A curated list of courses, books, tools, and resources for data science.
Data Science for Beginners - Microsoft’s data science curriculum.
OSSU Data Science - Open Source Society University’s self-study path.
Data Science Best Resources - Carefully curated links for data science resources in one place.
Data Science Articles from CodeCut - A collection of articles, videos, and code related to data science.
Data Science Using Python - Resources for data analysis using Python.
🗺️ Roadmaps#
Step-by-step guides and skill trees to master data science and analytics.
Data Analyst Roadmap - Structured learning path for analysts.
Data Science Roadmap Tutorials - Tutorials for the data science roadmap.
Data Science Roadmap from A to Z - Comprehensive roadmap for data science.
Roadmap for Data Science - Structured roadmap for aspiring data scientists.
Data Analyst RoadMap - Comprehensive roadmap for aspiring data analysts.
66DaysOfData - 66-day data analytics learning challenge.
Data Analyst Roadmap from Zero - Guide to becoming a data analyst from scratch.
Data Analyst Roadmap for Professionals - 8-week program for analysts at all levels.
🐍 Python#
Resources#
A collection of resources for learning and mastering Python programming.
Awesome Python - An opinionated list of awesome Python frameworks, libraries, software, and resources.
30 Days Of Python - A 30-day programming challenge to learn the Python programming language.
Real Python Tutorials - Tutorials on Python from Real Python.
Data Science Python - Common data analysis and machine learning tasks using Python.
Python Data Science Handbook - Full text of the “Python Data Science Handbook” in Jupyter Notebooks.
Python for Algorithms & Interviews - Files for Udemy course on algorithms and data structures.
Tanu N Prabhu Python - This repository helps you understand Python from scratch.
Interactive Coding Challenges - 120+ interactive Python coding interview challenges.
Clean Code Python - Clean Code concepts adapted for Python.
Best of Python - A ranked list of awesome Python open-source libraries and tools.
Awesome Python Ppplications - Free software that works great, and also happens to be open-source Python.
List of Python Api Wrappers - List of Python API wrappers and libraries.
Awesome Time Series in Python - Curated list of Python packages for time series analysis.
Python Tutorial - Python tutorial from GeeksforGeeks.
Data Manipulation with Pandas#
Tutorials and best practices for working with pandas DataFrames.
Awesome Pandas - A curated list of resources for using the Pandas library.
100 data puzzles for pandas - A collection of data puzzles to practice your Pandas skills.
Pandas Tutor - Visualize Pandas operations step-by-step (perfect for beginners).
Pandas Exercises - Exercises designed to help you improve your Pandas skills.
Pandas Cookbook - A cookbook with various recipes for using Pandas effectively.
Hands-On Data Analysis with Pandas - Materials for following along with Hands-On Data Analysis with Pandas.
Effective Pandas - A series focused on writing effective and idiomatic Pandas code.
Useful Python Tools for Data Analysis#
A collection of Python libraries for efficient data manipulation, cleaning, visualization, validation, and analysis.
Data Manipulation & Cleaning#
Pandas-dq - Data type correction and automatic DataFrame cleaning.
Vaex - High-performance Python library for lazy Out-of-Core DataFrames.
DataCleaner - Python tool for automatically cleaning and preparing datasets.
Polars - Multithreaded, vectorized query engine for DataFrames (Rust-powered).
Pandas-flavor - Add custom methods to Pandas.
TheFuzz - Fuzzy string matching (Levenshtein distance).
PandasAI - Conversational data analysis using LLMs and RAG.
DateUtil - Extensions for standard Python datetime features.
Fugue - Unified interface for Pandas, Spark, and Dask.
Pandas-DataReader - Reads data from various online sources into pandas DataFrames.
sklearn-pandas - Bridge between Pandas and Scikit-learn.
fitter - Figures out the distribution your data comes from.
Arrow - Enhanced work with dates and times.
Pendulum - Alternative to datetime with timezone support.
Automated Data Visualization Tools#
AutoViz - Automatic data visualization in 1 line of code.
Vizro - Low-code toolkit for building high-quality data visualization apps.
Great Tables - Create awesome display tables using Python.
DataMapPlot - Create beautiful plots of data maps.
Datashader - Quickly and accurately render even the largest data.
Sweetviz - Automatic EDA with dataset comparison.
Lux - Automatic DataFrame visualization in Jupyter with a click.
Yellowbrick - A suite of visual diagnostic tools for machine learning, extending the Scikit-Learn API.
Data Quality & Profiling#
Pandas-profiling - Automatic DataFrame visualization and profiling.
PyOD - Python library for outlier and anomaly detection.
YData Profiling - 1 line of code data quality profiling & exploratory data analysis.
Missingno - Visualize missing data patterns in matrix format.
Dora - Automate EDA: preprocessing, feature engineering, visualization.
Alibi-detect - Algorithms for outlier, adversarial and drift detection.
Feature Engineering & Selection#
FeatureTools - Open-source automated feature engineering.
Feature Selector - Tool for dimensionality reduction of machine learning datasets.
TSFresh - A Python library for automatically extracting features from time series data.
Feature Engine - A feature engineering library with Scikit-Learn compatibility.
Prince - A Python library for multivariate exploratory data analysis, including PCA, CA, MCA, and more.
Factor Analyzer - A Python package for factor analysis, including exploratory and confirmatory methods.
Testing & Validation#
Pytest - Framework for writing small tests.
Cerberus - Data validation through schemas.
Pandera - Data validation through declarative schemas.
PandasVet - Code style validator for Pandas (similar to ESLint).
ETL & Data Pipelines#
Prefect - Workflow orchestration for building resilient data pipelines.
Airflow - Platform for automating data workflows.
Apache Arrow - Universal columnar format and multi-language toolbox for fast data interchange.
Petl - ETL tool for data cleaning and transformation.
DuckDB - In-memory analytical database for fast SQL queries.
Interactive Tools & GUIs#
D-Tale - Interactive GUI for data analysis in a browser.
Pandasgui - GUI for viewing and filtering DataFrames.
QGrid - Interactive grid for sorting, filtering, and editing DataFrames in Jupyter.
PyGWalker - Interactive UIs for visual analysis of pandas DataFrames.
Mito - Jupyter extensions that help you write code faster.
Pivottablejs - Interactive PivotTable.js tables in Jupyter.
Data Generation & Simulation#
Formatting & Logging#
Rich - Rich text and beautiful formatting in the terminal.
Pandas-log - Logs pandas operations for data transformation tracking.
Icecream - Debugging without using print.
Module Dependency & Code Management#
Parallel Computing for DataFrames#
Pandarallel - Parallel operations for pandas DataFrames.
Dask - Parallel computing for arrays and DataFrames.
Modin - Speeds up Pandas by distributing computations.
Documentation#
Sphinx - The Sphinx documentation generator.
Pdoc - API documentation for Python projects.
Mkdocs - Project documentation with Markdown.
File Formats & Documents#
OpenPyXL - Read/write Excel files with support for advanced features.
Tablib - Exports data to XLSX, JSON, CSV via a single API.
PyPDF2 - Reads and writes PDF files.
Python-docx - Reads and writes Word documents.
CleverCSV - Smart CSV reader for messy data.
Xlwings - Integration of Python with Excel.
Xmltodict - Converts XML to Python dictionaries.
Python-markdownify - Convert HTML to Markdown.
MarkItDown - Python tool for converting files and office documents to Markdown.
Additional#
Pillow - Image processing library.
Ftfy - Fixes broken Unicode strings.
Records - SQL queries to databases via Python syntax.
Dataset - JSON-like interface for working with SQL databases.
JmesPath - Queries JSON data (SQL-like for JSON).
Glom - Transforms nested data structures.
Pampy - Pattern matching for Python dictionaries.
Geopy - Geocoding addresses and calculating distances.
Diagrams - Diagrams as code for cloud system architecture prototyping.
Scattertext - Beautiful visualizations of language differences among document types.
Pygorithm - A Python module for learning all major algorithms.
IGraph - A library for creating and manipulating graphs and networks, with bindings for multiple languages.
Joblib - A lightweight pipelining library for Python, particularly useful for saving and loading large NumPy arrays.
🗃️ SQL & Databases#
Resources#
SQL tutorials and database design principles.
SQLZoo - SQL Tutorial - Interactive SQL tutorial.
SQL Bolt - Learn SQL - Learn SQL through interactive lessons.
SQL Tutorial - Comprehensive SQL tutorial resource.
SQL Tutorial by W3Schools. - Comprehensive SQL tutorial.
PostgreSQL Tutorial by W3Resource - Tutorial for PostgreSQL.
MySQL Tutorial by W3Resource - Tutorial for MySQL.
MongoDB Tutorial by W3Resource - Tutorial for MongoDB.
InterviewBit - SQL Interview Questions - Collection of SQL interview questions.
EverSQL - AI-powered SQL query optimization and database observability tool.
GeeksforGeeks - SQL Tutorial - Detailed SQL tutorial.
Awesome Postgres - A curated list of awesome PostgreSQL software, libraries, tools and resources.
Awesome MySql - A curated list of awesome MySQL software, libraries, tools and resources.
Awesome Clickhouse - A curated list of awesome ClickHouse software.
Awesome SQLAlchemy - A curated list of awesome tools for SQLAlchemy.
Awesome Sql - List of tools and techniques for working with relational databases.
Tools#
A collection of Python libraries and drivers for seamless database access and interaction.
PyODBC - Python library for ODBC database access.
SQLAlchemy - SQL toolkit and ORM for Python.
Psycopg2 - PostgreSQL database adapter.
MySQL Connector/Python - MySQL driver for Python.
PonyORM - ORM for Python with dynamic query generation.
PyMongo - Official MongoDB driver for Python.
📊 Data Visualization#
Resources#
Color theory, chart selection guides, and storytelling tips.
Visualization Curriculum - Interactive notebooks designed to teach data visualization concepts.
The Python Graph Gallery - A collection of Python graph examples for data visualization.
FlowingData - Insights on data analysis and visualization.
Data Visualization Catalogue - A comprehensive catalog of data visualization types.
From Data to Viz - A guide to choosing the right visualization based on your data.
Data Viz Project - A resource for selecting suitable visualizations.
Chartopedia - A guide to help you select the appropriate chart types.
DataForVisualization - Tutorials and insights on data visualization techniques.
Truth & Beauty - Exploration of the aesthetics of data visualization.
Tools#
Libraries for static, interactive, and 3D visualizations.
Matplotlib - A comprehensive library for creating static, animated, and interactive visualizations in Python.
Seaborn - A statistical data visualization library based on Matplotlib.
Plotly - A library for creating interactive plots and dashboards.
Altair - A declarative statistical visualization library for Python.
Bokeh - A library for creating interactive visualizations for modern web browsers.
HoloViews - A tool for building complex visualizations easily.
Geopandas - An extension of Pandas for geospatial data.
Folium - A library for visualizing data on interactive maps.
Pygal - A Python SVG charting library.
Plotnine - A grammar of graphics for Python.
Bqplot - A plotting library for IPython/Jupyter notebooks.
PyPalettes - A large (+2500) collection of color maps for Python.
📈 Dashboards#
Resources#
Ttutorials for building and enhancing dashboards and visualizations using various tools and frameworks.
Awesome Dashboards - A collection of outstanding dashboard and visualization resources.
Best of Streamlit - Showcase of community-built Streamlit applications.
Awesome Dashboards - Comprehensive resources for Dash users.
Awesome Panel - Resources and support for Panel users.
Dash Enterprise Samples - Production-ready Dash apps.
Plotly Dash Tutorial - Tutorial for learning Plotly Dash.
geeksforgeeks - Tableau Tutorial - Comprehensive tutorial on Tableau.
geeksforgeeks - Power BI Tutorial - Detailed tutorial on Power BI.
DashTools - Command line tools for Dash applications.
Tools#
Frameworks for building custom dashboard solutions.
Dash - Framework for creating interactive web applications.
Streamlit - Simplified framework for building data applications.
Panel - Framework for creating interactive web applications.
Gradio - Tool for creating and sharing machine learning applications.
Software#
A list of leading tools and platforms for data visualization and dashboard creation.
Tableau - Leading data visualization software.
Microsoft Power BI - Business analytics tool for visualizing data.
QlikView - Tool for data visualization and business intelligence.
Metabase - User-friendly open-source BI tool.
Apache Superset - Open-source data exploration and visualization platform.
Redash - Tool for visualizing and sharing data insights.
Grafana - Dashboarding and monitoring tool.
Datawrapper - User-friendly chart and map creation tool.
ChartBlocks - Online chart creation platform.
Infogram - Tool for creating infographics and visual content.
Google Data Studio - Free tool for creating interactive dashboards and reports.
Rath - Next-generation automated data exploratory analysis and visualization platform.
🕸️ Web Scraping & Crawling#
Resources#
A collection of valuable resources, tutorials, and libraries for web scraping with Python.
Best of Web Python - A ranked list of awesome Python libraries for web development.
Python Scraping - Code samples from the book “Web Scraping with Python”.
Awesome Web Scraping - List of libraries, tools, and APIs for web scraping and data processing.
Easy Scraping Tutorial - Simple but useful Python web scraping tutorial code.
Webscraping from 0 to Hero - An open project repository sharing knowledge and experiences about web scraping with Python.
Trump Lies - Tutorial for web scraping in Python with Beautiful Soup.
Scraping Tutorial - Tutorial for scraping streaming sites.
Scraper Projects - List of mini projects that involve web scraping.
Tools#
A list of Python libraries and tools for web scraping.
BeautifulSoup - A library for parsing HTML and XML documents.
Selenium - A tool for automating web applications for testing purposes.
Scrapy - An open-source and collaborative web crawling framework for Python.
Gerapy - Distributed Crawler Management Framework based on Scrapy, Scrapyd, Django, and Vue.js.
TextAttack - A Python framework for adversarial attacks, data augmentation, and model training in NLP.
AutoScraper - A smart, automatic, fast, and lightweight web scraper for Python.
Feedparser - A library to parse feeds in Python.
Trafilatura - A Python & command-line tool to gather text and metadata on the web.
You-Get - A tiny command-line utility to download media contents (videos, audios, images) from the web.
Dirsearch - A web path scanner.
MechanicalSoup - A Python library for automating interaction with websites.
ScrapeGraph AI - A Python scraper based on AI.
Snscrape - A social networking service scraper in Python.
📖 Natural Language Processing (NLP)#
Resources#
A selection of resources for learning and applying natural language processing in Python.
NLP in Python with Deep Learning - A resource for learning NLP with deep learning.
Awesome Nlp - A ranked list of awesome Python libraries for natural language processing (NLP).
Hands on NLTK Tutorial - The hands-on NLTK tutorial for NLP in Python.
NLTK Book - Natural Language Processing with Python.
Tools#
A collection of powerful libraries and frameworks for natural language processing in Python.
Natural Language Toolkit (NLTK) - A leading platform for building Python programs to work with human language data.
TextBlob - A simple library for processing textual data.
SpaCy - An open-source software library for advanced NLP in Python.
TextRank - A library for TextRank algorithm implementation.
Flair - A simple framework for state-of-the-art NLP.
BERT - A transformer-based model for NLP tasks.
Transformers - A library for state-of-the-art NLP models.
🔢 Mathematics, Statistics & Probability#
Mathematics#
A collection of resources for learning and applying mathematics and statistics, particularly in the context of data science and machine learning.
Stats Maths with Python - Collection of Python scripts and notebooks for statistics and mathematics.
Hackermath - Resource for learning statistics and mathematics for data science.
ML Bool - Comprehensive resource for mathematics in machine learning.
ML foundations - Focus on calculus and optimization techniques for ML.
Khan Academy - Math for Data Science - Free online courses covering various math topics.
Towards Data Science - Math Section - Articles and resources on mathematics for data science.
Fast.ai - Computational Linear Algebra - Resource for learning linear algebra computationally.
Immersive Linear Algebra - Interactive resource for understanding linear algebra.
Brilliant.org - Interactive courses - Engaging courses on foundational mathematics.
Cross Validated (Stack Exchange) - Q&A site for statistics and data analysis.
Wolfram Alpha - Online equation solver - Computational knowledge engine for solving equations.
Statistics & Probability#
A selection of resources focused on statistics and probability, including tutorials, interactive tools, and comprehensive guides.
geeksforgeeks - Probability in Maths - Overview of probability concepts in mathematics.
geeksforgeeks - Statistics in Maths - Introduction to statistical concepts in mathematics.
Statistical Data Analysis in Python - Tutorial for statistical analysis using Python.
All of Statistics - Resource for studying statistics based on Wasserman’s book.
The Elements of Statistical Learning - Notebooks for understanding statistical learning concepts.
Seeing Theory - Interactive visual resource for learning probability and statistics.
Statistics cookbook - Cookbook for statistical methods and techniques.
NoteBooks Statistics and Machine Learning - Notebooks covering statistics and machine learning topics.
Code repository for O’Reilly book - Companion code for a practical statistics book.
Statistical Learning Theory - Stanford University - Lecture notes on statistical learning theory.
Statistics and probability - Khan Academy resource for statistics and probability.
StatLect - Comprehensive online textbook covering probability and statistics concepts.
Introduction to Statistics With Python - Code and notebooks accompanying the book on statistics with Python.
stanford.edu - Probabilities and Statistics - Refresher course on probabilities and statistics from Stanford University.
Bayesian Methods for Hackers - Resource for learning Bayesian methods in Python.
🧪 A/B Testing#
A collection of resources focused on A/B testing.
Dynamicyield - An online course covering A/B testing and optimization techniques.
Awesome A/B Testing - A collection of articles focused on AB testing and statistical methods.
Evan’s Awesome A/B Tools - A/B test calculators.
🤖 Machine Learning#
A collection of resources to help you learn and apply machine learning concepts and techniques.
100 Days of ML Coding - A comprehensive coding challenge to learn machine learning over 100 days.
Microsoft ML for Beginners - A beginner-friendly introduction to machine learning concepts and practices.
Made With ML - Resource for building and deploying machine learning applications.
MLU Explain - Visual explanations of ML algorithms (XGBoost, PCA etc).
Handson-ml3 - Hands-on guide to machine learning and deep learning using Python.
Jason’s Machine Learning 101 - Presentation covering the basics of machine learning concepts.
🧠 Productivity & Development Tools#
Resources#
A collection of resources and tools to enhance productivity and streamline development processes.
Awesome Jupyter - Curated list of Jupyter projects, libraries, and resources.
Best of Jupyter - Ranked list of notable Jupyter Notebook, Hub, and Lab projects.
Awesome AutoHotkey - A curated list of awesome AutoHotkey libraries, scripts, and resources.
Awesome Productivity - A curated list of delightful productivity resources.
Microsoft To Do - A simple to-do list app from Microsoft.
Google Keep - A note-taking and list-making app.
Bujo - Tools to help transform the way you work and live.
Parabola - An AI-powered workflow builder for organizing data.
Notion - An all-in-one workspace for note-taking and task management.
Trello - A visual project management tool.
Asana - A project management platform for tracking work and projects.
Awesome Chatgpt Prompts - A repository for ChatGPT prompt curation.
Markdown Here - Extension for writing emails in Markdown and rendering them before sending.
Cookiecutter Data Science - A standardized project structure for data science projects.
Sketch - Toolkit designed for designers, focusing on their workflow.
The Markdown Guide - Comprehensive guide to learning Markdown.
Kittl - Platform for creating and editing charts and data visualizations.
Useful Linux Tools#
A selection of tools to enhance productivity and functionality in Linux environments.
Peek - Simple animated GIF screen recorder with an easy to use interface.
CopyQ - Clipboard manager with advanced features.
Translate Shell - Command-line translator using Google Translate, Bing Translator, Yandex.Translate, etc.
Espanso - Cross-platform Text Expander written in Rust.
Flameshot - Powerful yet simple to use screenshot software.
Inkscape - A powerful, free, and open-source vector graphics editor for creating and editing visualizations.
Rclone - A command-line program to manage files on cloud storage.
Rsync - A fast and versatile file copying tool that can synchronize files and directories between two locations over a network or locally.
Timeshift - System restore tool for Linux that creates filesystem snapshots using rsync+hardlinks or BTRFS snapshots.
Backintime - A comfortable and well-configurable graphical frontend for incremental backups.
Fzf - A command-line fuzzy finder.
Osquery - SQL powered operating system instrumentation, monitoring, and analytics.
GNU Parallel - A tool to run jobs in parallel.
HTop - An interactive process viewer.
Ncdu - A disk usage analyzer with an ncurses interface.
Thefuck - A command line tool to correct your previous console command.
Useful VS Code Extensions#
A collection of extensions to enhance functionality and productivity in Visual Studio Code.
JDBC Adapter - Connect to various databases using JDBC.
DBCode - Connect - Database client for managing and querying databases.
Markdown All in One - Essential tools for Markdown editing.
Markdown Preview GitHub Styles - Changes VS Code’s markdown preview to match GitHub’s styling.
Snippington Python Pandas Basic - Basic tools for working with Pandas in Python.
PDF Viewer for Visual Studio Code - View PDF files directly in VS Code.
Quick Python Print - Quickly handle print operations in Python.
Rainbow CSV - Highlight CSV and TSV files and run SQL-like queries.
Remove Blank Lines - Extension to remove empty lines in documents.
PDF Preview in VSCode - Show PDF previews in VS Code.
CSV to Table - Convert CSV/TSV/PSV files to ASCII formatted tables.
Data Preview - Import, view, slice, and export data.
Data Wrangler - Tool for cleaning and preparing tabular datasets.
Error Lens - Enhances the display of errors and warnings in code.
Indent Rainbow - Makes indentation easier to read.
Markdown Table Editor - Add features to edit Markdown tables.
WYSIWYG Editor for Markdown - View Word and Excel files and edit Markdown.
Prettier - Code formatting extension for VS Code.
Project Manager - Easily switch between projects.
Python Indent - Automatically indent Python code.
SandDance - Visually explore and present your data.
SQL Notebooks - Open SQL files as VSCode Notebooks.
SQL Tools - Database management tools for VSCode.
Kanban Board - A Kanban board extension for organizing tasks within VS Code.
Path Autocomplete - Provides path completion for files and directories in VS Code.
Path Intellisense - Autocompletes filenames in your code.
Python Imports Utils - Utilities for managing Python imports.
Workspace Dashboard - Organize your workspaces in a speed-dial manner.
Remote Development - Open any folder in a container, on a remote machine, or in WSL.
Text Power Tools - An all-in-one solution with 240+ commands for text manipulation.
Toggle Quotes - Toggle between single, double, and backticks for strings.
Comment Translate - Helps translate comments, strings, and variable names in your code.
Text Marker - Select text in your code and mark all matches with configurable highlight color.
Bookmarks - Mark lines in your code and jump to them easily.
Dendron - A hierarchical note-taking tool that grows as you do.
Gitignore Generator - Simplifies the process of generating .gitignore files.
Test Explorer UI - Run your tests in the sidebar of Visual Studio Code.
Python Test Explorer - Run your Python tests in the sidebar of Visual Studio Code.
📚 Skill Development & Career Resources#
Practice Resources#
A collection of resources to enhance skills and advance your career in data analysis and related fields.
Kaggle Competitions - Platform for participating in data analysis and machine learning competitions.
Makeovermonday - A platform focused on enhancing data visualization practices.
Workout Wednesday - Engage in weekly challenges to improve your visualization skills.
Official TidyTuesday Repository - Repository for the TidyTuesday project, promoting data analysis.
DataCamp Projects - Practical projects in data analysis to enhance skills.
DrivenData Competitions - Data analysis competitions with a social impact focus.
LeetCode Data Science Problems - Challenges related to data analysis and algorithms.
Codecademy Data Science Path - Interactive courses for learning data analysis.
Curated Jupyter Notebooks#
A selection of curated Jupyter notebooks to support learning and exploration in data science and analysis.
Awesome Notebooks - Data & AI notebook templates catalog organized by tools.
Data Science Ipython Notebooks - Data science Python notebooks covering various topics.
Pydata Book - Materials and IPython notebooks for “Python for Data Analysis” by Wes McKinney.
Spark py Notebooks - Apache Spark & Python tutorials for big data analysis and machine learning.
DataMiningNotebooks - Example notebooks for data mining accompanying the course at Southern Methodist University.
Pythondataanalysis - Python data repository with Jupyter notebooks and scripts.
Python For Data Analysis - An introduction to data science using Python and Pandas with Jupyter notebooks.
Jdwittenauer Ipython Notebooks - A collection of IPython notebooks covering various topics.
Data Sources & Datasets#
A collection of resources for accessing datasets and data sources for analysis and projects.
Kaggle Datasets - Extensive collection of datasets for practice in data analysis.
Opendatasets - A Python library for downloading datasets from Kaggle, Google Drive, and other online sources.
Datasette - An open source multi-tool for exploring and publishing data.
Awesome Public Datasets - Curated list of high-quality open datasets.
Open Data Sources - Collection of various open data sources.
Free Datasets for Projects - Dataquest’s compilation of free datasets.
Data World - The enterprise data catalog that CIOs, governance professionals, data analysts, and engineers trust in the AI era.
Awesome Public Real Time Datasets - A list of publicly available datasets with real-time data.
Resume and Interview Tips#
A variety of resources to help you prepare for interviews and enhance your resume.
Awesome Interview Questions - A curated awesome list of lists of interview questions.
Data Science Interview Questions Answers - Curated list of data science interview questions and answers.
Data Science Interview Preperation Resources - Resource to help you prepare for your upcoming data science interviews.
Devinterview - Ace your next tech interview with confidence.
Interviewqs - Ace your next data science interview.
Data Interview Qs - A curated list of data science interview questions and answers.
Interview Query - A platform for preparing for data science interviews.
150 Essential Data Science Questions and Answers - A collection of essential data science questions and answers.
Data Science Prep - Sample interview questions for data science.
Analytics Vidhya - 40 interview questions asked at startups in machine learning/data science.
DataScience Interview Questions - A collection of data science interview questions.
Interviewbit - A platform to prepare for data science interviews.
Interview Query - Another platform to prepare for data science interviews.
📋 Cheatsheets#
A collection of cheatsheets across various domains to aid in quick reference and learning.
Python#
Python Cheat Sheet - Comprehensive Python syntax and examples.
Learn Python - Interactive Python learning.
Pythoncheatsheet - Quick reference for Python basics and advanced topics.
Comprehensive Python Cheatsheet - Detailed Python functions and libraries.
Data Science & Machine Learning#
Data Science All Cheat Sheet - Covers ML, DL, and analytics.
DS Cheatsheets - Curated DS/ML concepts and workflows.
Data Science Cheat Sheets (Math) - Cheat sheets for quick reference in data science mathematics.
Pandas Cheat Sheet - Data manipulation with Pandas.
PySpark Cheatsheet - Common PySpark patterns.
Linux & Command Line#
Linux Cheatsheet - Linux commands and shortcuts.
Bash Awesome Cheatsheets - Bash scripting essentials.
Unix Commands Reference - Unix terminal basics.
Git & GitHub#
GitHub Cheat Sheet - Git/GitHub workflows and tips.
Git Awesome Cheatsheets - Git commands and best practices.
Git and Git Flow Cheat Sheet - Branching strategies.
Probability & Statistics#
Stanford CME 106 Cheatsheets - Probability and statistics for engineers.
10-Page Probability Cheatsheet - In-depth probability concepts.
Statistics Cheatsheet - Key statistical methods.
Docker#
Docker Cheat Sheet - Docker commands and workflows.
Docker Awesome Cheatsheets - Containerization basics.
Tools & Workflow#
VSCode Awesome Cheatsheets - VS Code shortcuts.
Markdown Cheatsheet - Formatting for GitHub READMEs.
Emoji Cheat Sheet - Emojis in Markdown.
SQL & Databases#
Quick SQL Cheatsheet - Handy SQL reference guide.
Interview Preparation#
21 Must-Have Cheat Sheets - Interview-focused guides.
Solutions Architect Metrics Cheatsheet - System design metrics.
Miscellaneous#
CheatSheet for CheatSheets - Mega-repository of cheat sheets.
Think Stats Cheatsheet - Book companion.
Dataquest - Power BI Cheat Sheet - A helpful resource for Power BI users.
🌐 Additional Resources#
A wide range of resources designed to facilitate learning, development, and exploration across different domains.
Growth.Design - A collection of product case studies and behavioral psychology insights for data-driven decision-making.
Jupyter Book - Create beautiful, publication-quality books and documents from computational content.
Awesome Quarto - A curated list of Quarto resources, including talks, tools, examples, and articles. Contributions are welcome!
Awesome Vscode - A comprehensive list of useful VS Code extensions and resources.
UC Berkeley - Data 8 - Course materials for the Data Science Foundations course.
A collective list of free APIs - A comprehensive list of free APIs for various purposes.
Introduction to Big Data - Resources and materials for understanding Big Data concepts.
Awesome Readme - Collection of well-crafted README files for inspiration.
Anomaly Detection Resources - Books, papers, videos, and toolboxes related to anomaly detection.
Awesome Code Review - A collection of resources for code review practices.
W3Resource - Online platform offering tutorials, code examples, and exercises for various programming languages.
🤝 Contributing#
We welcome your contributions!
See CONTRIBUTING.md for how to add resources.
📜 License#
This work is dedicated to the public domain under the CC0 1.0 Universal license.