Synopsis
Kirill Eremenko is a Data Science coach and lifestyle entrepreneur. The goal of the Super Data Science podcast is to bring you the most inspiring Data Scientists and Analysts from around the World to help you build your successful career in Data Science. Data is growing exponentially and so are salaries of those who work in analytics. This podcast can help you learn how to skyrocket your analytics career. Big Data, visualization, predictive modeling, forecasting, analysis, business processes, statistics, R, Python, SQL programming, tableau, machine learning, hadoop, databases, data science MBAs, and all the analytcis tools and skills that will help you better understand how to crush it in Data Science.
Episodes
-
797: Deep Learning Classics and Trends, with Dr. Rosanne Liu
02/07/2024 Duration: 01h09minDr. Rosanne Liu, Research Scientist at Google DeepMind and co-founder of the ML Collective, shares her journey and the mission to democratize AI research. She explains her pioneering work on intrinsic dimensions in deep learning and the advantages of curiosity-driven research. Jon and Dr. Liu also explore the complexities of understanding powerful AI models, the specifics of character-aware text encoding, and the significant impact of diversity, equity, and inclusion in the ML community. With publications in NeurIPS, ICLR, ICML, and Science, Dr. Liu offers her expertise and vision for the future of machine learning. Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information. In this episode you will learn: • How the ML Collective came about [03:31] • The concept of a failure CV [16:12] • ML Collective research topics [19:03] • How Dr. Liu's work on the “intrinsic dimension” of deep learning models inspired the now-standard LoRA approach to fin
-
796: Earth's Coming Population Collapse and How AI Can Help, with Simon Kuestenmacher
28/06/2024 Duration: 42minWant to feel optimistic about your day? In this Friday episode, Simon Kuestenmacher talks to Jon Krohn about demography: What it is, why it’s so important, and why its forecasts should give us reason to hope for a better future. In an increasingly globalized world, and with an aging population in countries with the biggest GDPs, demography is more valuable than ever. Additional materials: www.superdatascience.com/796 Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.
-
795: Fast-Evolving Data and AI Regulatory Frameworks, with Dr. Gina Guillaume-Joseph
25/06/2024 Duration: 01h07minGina Guillaume-Joseph talks to Jon Krohn about the data and regulatory frameworks set to transform the AI industry and why that’s important to anyone working with data. This episode offers a solid path to understanding AI regulation’s past, present and future. Gina walks listeners through the AI Bill of Rights, the NIST AI Risk Framework and the MITRE ATLAS threat model. This episode is brought to you by AWS Inferentia and AWS Trainium, by Crawlbase, the ultimate data crawling platform, and by Babbel, the science-backed language-learning platform. Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information. In this episode you will learn: • What “responsible AI” means [08:14] • Why the federal government should be behind AI regulation [12:22] • The US vs EU on AI regulation [18:46] • About the AI Bill of Rights [26:14] • About MITRE and the MITRE Atlas [37:19] • What a systems engineer does [54:11] Additional materials: www.superdatascience.co
-
794: Exciting (and Frightening!) Trends in Open-Source AI
21/06/2024 Duration: 11minTrends in open-source AI: Join Jon Krohn and a panel of data science icons as they discuss the most exciting and concerning developments in open-source AI. Hear insights from Drew Conway, Jared Lander, Emily Zabor, and JD Long on the transformative potential of AI and its future impact. Additional materials: www.superdatascience.com/794 Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information.
-
793: Bayesian Methods and Applications, with Alexandre Andorra
18/06/2024 Duration: 01h33minBayesian methods take the spotlight in this episode with Alex Andorra, co-founder of PyMC Labs, and Jon Krohn. Learn how Bayesian techniques handle tough problems, make the most of prior knowledge, and work wonders with limited data. Alex and Jon break down essentials like PyMC, PyStan, and NumPyro libraries, show how to boost model efficiency with PyTensor, and talk about using ArviZ for top-notch diagnostics and visualizations. Plus, get into advanced modeling with Gaussian Processes. This episode is brought to you by Crawlbase, the ultimate data crawling platform. Interested in sponsoring a SuperDataScience Podcast episode? Email natalie@superdatascience.com for sponsorship information. In this episode you will learn: • Practical introduction to Bayesian statistics [04:54] • Definition and significance of epistemology [17:52] • Explanation of PyMC and Monte Carlo methods [27:57] • How to get started with Bayesian modeling and PyMC [34:26] • PyMC Labs and its consulting services [50:50] • ArviZ for post-m
-
792: In Case You Missed It in May 2024
14/06/2024 Duration: 22minJon Krohn shares his favorite clips from May. Hear how Navdeep Martin is spearheading a company to tackle the climate crisis, why Sol Rashidi and Demetrios Brinkmann find nailing job titles so necessary in the fast-paced industries of tech and AI, and get the latest on embeddings with Luis Serrano. Additional materials: www.superdatascience.com/792 Interested in sponsoring a SuperDataScience Podcast episode? Visit passionfroot.me/superdatascience for sponsorship information.
-
791: Reinforcement Learning from Human Feedback (RLHF), with Dr. Nathan Lambert
11/06/2024 Duration: 57minReinforcement learning through human feedback (RLHF) has come a long way. In this episode, research scientist Nathan Lambert talks to Jon Krohn about the technique’s origins of the technique. He also walks through other ways to fine-tune LLMs, and how he believes generative AI might democratize education. This episode is brought to you by AWS Inferentia (go.aws/3zWS0au) and AWS Trainium (go.aws/3ycV6K0), and Crawlbase (crawlbase.com), the ultimate data crawling platform. Interested in sponsoring a SuperDataScience Podcast episode? Visit passionfroot.me/superdatascience for sponsorship information. In this episode you will learn: • Why it is important that AI is open [03:13] • The efficacy and scalability of direct preference optimization [07:32] • Robotics and LLMs [14:32] • The challenges to aligning reward models with human preferences [23:00] • How to make sure AI’s decision making on preferences reflect desirable behavior [28:52] • Why Nathan believes AI is closer to alchemy than science [37:38] Additi
-
790: Open-Source Libraries for Data Science at the New York R Conference
07/06/2024 Duration: 07minThe experts reveal their top open-source R libraries with us live from the New York R Conference! This Super Data Science Podcast episode features an exclusive panel with data science trailblazers Drew Conway, Jared Lander, Emily Zabor, and JD Long. They share their favorite R libraries and valuable insights to enhance your data science practice. Additional materials: www.superdatascience.com/790 Interested in sponsoring a SuperDataScience Podcast episode? Visit passionfroot.me/superdatascience for sponsorship information.
-
789: ML for Wind-Powered Energy Generation, with Dr. Jason Yosinski
04/06/2024 Duration: 01h14minMachine Learning for Wind Energy is front and center in this episode as Jon Krohn is joined by Dr. Jason Yosinski, CEO of Windscape AI. Dr. Yosinski brings to light the latest ML advancements sparking significant changes in renewable energy. Tune in for a comprehensive review of these cutting-edge technologies and their expansive impact on the industry and the environment's well-being. This episode is brought to you by Crawlbase, the ultimate data crawling platform. Interested in sponsoring a SuperDataScience Podcast episode? Visit passionfroot.me/superdatascience for sponsorship information. In this episode you will learn: • Enhancing predictability in wind energy with ML [04:52] • Data utilization from wind turbines by energy providers [11:41] • Jason's journey into wind energy [17:55] • Landing the right startup idea [22:47] • Visualizing neural networks with the Deep Vis Toolbox [31:29] • Extreme event forecasting at Uber vs. nowcasting at Windscape AI [45:13] • Discoveries from Loss Change Allocation r
-
788: Multi-Agent Systems: How Teams of LLMs Excel at Complex Tasks
31/05/2024 Duration: 10minMulti-agent systems could mark a significant turning point in generative AI. From mastering increasingly complex tasks to getting LLMs to collaborate, in this Five-Minute Friday, Jon Krohn discusses the systems that are working to bridge the remaining gaps left by the latest large language models (LLMs). Additional materials: www.superdatascience.com/788 Interested in sponsoring a SuperDataScience Podcast episode? Visit passionfroot.me/superdatascience for sponsorship information.
-
787: MLOps: The Job and The Key Tools, with Demetrios Brinkmann
28/05/2024 Duration: 56minMLOps, how to build an online community, and tools for scaling LLMs: In this episode, Demetrios Brinkmann speaks to Jon Krohn about the similarities and differences between LLMOps, MLOps and DevOps, and why this should matter to companies looking to hire such engineers. You will also hear how to get involved in the MLOps community wherever you are in the world, and how you can start developing great products with the available tools. This episode is brought to you by AWS Inferentia (go.aws/3zWS0au) and AWS Trainium (go.aws/3ycV6K0). Interested in sponsoring a SuperDataScience Podcast episode? Visit passionfroot.me/superdatascience for sponsorship information. In this episode you will learn: • What MLOps is [03:51] • About LLMOps [12:06] • About LlamaIndex and Ollama [18:29] • Insights from Demetrios’ MLOps survey [20:49] • Guidance for using third-party APIs [40:18] • Recommendations for building an online community in tech and AI [47:07] Additional materials: www.superdatascience.com/787
-
786: The Six Keys to Data Scientists' Success, with Kirill Eremenko
24/05/2024 Duration: 27minLearn about the six keys to data science success as host Jon Krohn welcomes back Kirill Eremenko, the mastermind behind SuperDataScience. Kirill shares his top insights on data science careers, from building strong portfolios to leveraging mentors and hands-on labs. With over 2.7 million students, his advice is a must-hear for aspiring and experienced data scientists alike.Additional materials: www.superdatascience.com/786Interested in sponsoring a SuperDataScience Podcast episode? Visit passionfroot.me/superdatascience for sponsorship information.
-
785: Math, Quantum ML and Language Embeddings, with Dr. Luis Serrano
21/05/2024 Duration: 01h06minDr. Luis Serrano from the Serrano Academy reveals how to make Math and Quantum ML accessible, tackles the challenges of teaching A.I. to beginners, and explores the power of embeddings in enterprise applications. Explore the future of Quantum Machine Learning and the latest trends in AI, including multimodality and autonomous systems.This episode is brought to you by AWS Inferentia and AWS Trainium. Interested in sponsoring a SuperDataScience Podcast episode? Visit passionfroot.me/superdatascience for sponsorship information.In this episode you will learn:• How math and AI can be made easy to understand [05:21]• The three major categories of learners [16:21]• Why embeddings are the most important component of LLMs [26:19]• How semantic search differs from a traditional keyword search [29:57]• The most exciting emerging application areas for AI [42:41]• The promising application areas for Quantum Machine Learning [49:18]Additional materials: www.superdatascience.com/785
-
784: Aligning Large Language Models, with Sinan Ozdemir
17/05/2024 Duration: 09minAligning LLMs: How can we teach pre-trained LLMs to hold a conversation and learn new information from each other? This was where Sinan Ozdemir began his investigation into aligning LLMs. In this episode, he talks to Jon Krohn about the limitations of definitions for LLMs, training LLMs, and whether it is possible to train an LLM without alignment.Additional materials: www.superdatascience.com/784Interested in sponsoring a SuperDataScience Podcast episode? Visit passionfroot.me/superdatascience for sponsorship information.
-
783: Generative A.I. for Solar Power Installation, with Navdeep Martin
14/05/2024 Duration: 01h05minRecent advances in GenAI, how to tackle the climate crisis with advanced technology, and addressing the knowledge gap in understanding AI: Jon Krohn speaks to Flypower co-founder and CEO Navdeep Martin about the advances made in GenAI, from products to applications, and how we might use AI to tackle climate change.This episode is brought to you by AWS Inferentia and AWS Trainium. Interested in sponsoring a SuperDataScience Podcast episode? Visit passionfroot.me/superdatascience for sponsorship information.In this episode you will learn:• How the Washington Post’s recommendation systems work [03:29]• Why product leaders make great CEOs [10:36]• How Flypower uses GenAI to tackle climate change [22:13]• How Flypower identifies its customers’ most pertinent questions [30:03]• How AI might come to tackle climate change [36:52]• How to mitigate hallucination in AI models [41:04]Additional materials: www.superdatascience.com/783
-
782: In Case You Missed It in April 2024
10/05/2024 Duration: 40minHear Jon Krohn’s favorite five clips from his April interviews. Chief Scientist at Posit PBC Hadley Wickham on the subtle differences between Python and R. Professor of Business Analytics Barrett Thomas walks through the variables that companies should consider when using drones or any other tech to improve their business operations and bottom line. Aleksa Gordić, Founder of Runa AI believes an overhaul of the current educational system is long overdue. Bernard Marr discusses the future of GenAI and its impact on the world of work. And SuperDataScience founder Kirill Eremenko gives a lively workshop on gradient boosting. Additional materials: www.superdatascience.com/782Interested in sponsoring a SuperDataScience Podcast episode? Visit passionfroot.me/superdatascience for sponsorship information.
-
781: Ensuring Successful Enterprise AI Deployments, with Sol Rashidi
07/05/2024 Duration: 01h04minSol Rashidi, a distinguished data executive who has served in C-suite roles at Fortune 100 companies, joins Jon Krohn to delve into successful enterprise AI strategies and the reasons behind the high turnover among Chief Data Officers. This episode provides an in-depth look at selecting AI projects that succeed and understanding the strategic value of patents in various industries. Benefit from Sol’s extensive experience and practical advice on navigating complex corporate challenges.This episode is brought to you by AWS Inferentia and AWS Trainium. Interested in sponsoring a SuperDataScience Podcast episode? Visit passionfroot.me/superdatascience for sponsorship information.In this episode you will learn:• How CDOs and related roles have such high turnover because [09:40]• The importance of building relationships in AI projects [17:01]• How Sol's book "The AI Survival Guide" came about [20:44]• How high-criticality, low-complexity AI projects are the ones with the highest probability of success [27:11]• How
-
780: How to Become a Data Scientist, with Dr. Adam Ross Nelson
03/05/2024 Duration: 08minWant to become a data scientist? Jon and Adam discuss the key steps to becoming a data scientist, with a focus on developing portfolio projects. Hear about the 10 project ideas Adam recommends in his book to help you stand out in the data science community.Additional materials: www.superdatascience.com/780Interested in sponsoring a SuperDataScience Podcast episode? Visit passionfroot.me/superdatascience for sponsorship information.
-
779: The Tidyverse of Essential R Libraries and their Python Analogues, with Dr. Hadley Wickham
30/04/2024 Duration: 01h27minTidyverse, ggplot2, and the secret to a tech company’s longevity: Hadley Wickham talks to Jon Krohn about Posit’s rebrand, Tidyverse and why it needs to be in every data scientist’s toolkit, and why getting your hands dirty with open-source projects can be so lucrative for your career.This episode is brought to you by Intel and HPE Ezmeral Software. Interested in sponsoring a SuperDataScience Podcast episode? Visit passionfroot.me/superdatascience for sponsorship information.In this episode you will learn:• All about the Tidyverse [04:46]• Hadley’s favorite R libraries [17:10]• The goal of Posit [30:29]• On bringing multiple programming languages together [36:02]• The principles for a long-lasting tech company [52:10]• How Hadley developed ggplot2 [55:24]• How to contribute to the open-source community [1:05:43]Additional materials: www.superdatascience.com/779
-
778: Mixtral 8x22B: SOTA Open-Source LLM Capabilities at a Fraction of the Compute
26/04/2024 Duration: 06minMixtral 8x22B is the focus on this week's Five-Minute Friday. Jon Krohn examines how this model from French AI startup Mistral leverages its mixture-of-experts architecture to redefine efficiency and specialization in AI-powered tasks. Tune in to learn about its performance benchmarks and the transformative potential of its open-source license.Additional materials: www.superdatascience.com/778Interested in sponsoring a SuperDataScience Podcast episode? Visit passionfroot.me/superdatascience for sponsorship information.