Synopsis
Kirill Eremenko is a Data Science coach and lifestyle entrepreneur. The goal of the Super Data Science podcast is to bring you the most inspiring Data Scientists and Analysts from around the World to help you build your successful career in Data Science. Data is growing exponentially and so are salaries of those who work in analytics. This podcast can help you learn how to skyrocket your analytics career. Big Data, visualization, predictive modeling, forecasting, analysis, business processes, statistics, R, Python, SQL programming, tableau, machine learning, hadoop, databases, data science MBAs, and all the analytcis tools and skills that will help you better understand how to crush it in Data Science.
Episodes
-
677: Digital Analytics with Avinash Kaushik
09/05/2023 Duration: 01h27minHow does one use marketing analytics to drive business success? Avinash Kaushik, Chief Strategy Officer at Croud and former Sr. Director of Global Strategic Analytics at Google joins Jon Krohn live for an exciting episode that covers the transformative power of AI, his 'four clusters of intent' framework and the value of hands-on data tools. This episode is brought to you by Pathway, the reactive data processing framework, by Posit, the open-source data science company, and by Anaconda, the world's most popular Python distribution. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information. In this episode you will learn: • What is a chief strategy officer? [3:55] • Brand vs performance analytics [7:23] • Incrementality-centric marketing [32:53] • Avinash's time at Google [37:54] • How to maintain human-touch with AI [48:58] • Four clusters of intent framework [1:11:28] • Avinash's most significant career challenges [1:17:18] Additional materials: ww
-
676: The Chinchilla Scaling Laws
05/05/2023 Duration: 13minChinchilla AI, and fine-tuning proprietary tasks with large language models: On this week’s Five-Minute Friday, host Jon Krohn outlines the principles of the Chinchilla Scaling Laws, the incredible power of models such as Cerebras-GPT based on these laws, and the impact of scaling on the number of viable applications and commercial use cases.Additional materials: www.superdatascience.com/676Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
-
675: Pandas for Data Analysis and Visualization
02/05/2023 Duration: 01h08minWrangling data in Pandas, when to use Pandas, Matplotlib or Seaborn, and why you should learn to create Python packages: Jon Krohn speaks with guest Stefanie Molin, author of Hands-On Data Analysis with Pandas. This episode is brought to you by Posit, the open-source data science company, and by AWS Inferentia. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information. In this episode you will learn:• The advantages of using pandas over other libraries [07:55]• Why data wrangling in pandas is so helpful [12:05]• Stefanie’s Data Morph library [24:27]• When to use pandas, matplotlib, or seaborn [33:45]• Understanding the ticker module in matplotlib [36:48]• Where data analysts should start their learning journey [40:08]• What it’s like being a software engineer at Bloomberg [51:19] Additional materials: www.superdatascience.com/675
-
674: Parameter-Efficient Fine-Tuning of LLMs using LoRA (Low-Rank Adaptation)
28/04/2023 Duration: 05minModels like Alpaca, Vicuña, GPT4All-J and Dolly 2.0 have relatively small model architectures, but they're prohibitively expensive to train even on a small amount of your own data. The standard model-training protocol can also lead to catastrophic forgetting. In this week's episode, Jon explores a solution to these problems, introducing listeners to Parameter-Efficient Fine-Tuning (PEFT) and the leading approach: Low-Rank Adaptation (LoRA).Additional materials: www.superdatascience.com/674Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
-
673: Taipy, the open-source Python application builder
25/04/2023 Duration: 01h12minVincent Gosselin, CEO and co-founder of Taipy, an open-source Python library, joins Jon Krohn to discuss how to accelerate productivity in Python and build scalable, reusable, and maintainable data pipelines. Gosselin shares his breadth of wisdom honed over his decades-long AI career. This episode is brought to you by Pathway, the reactive data processing framework, and by Posit, the open-source data science company. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information. In this episode you will learn:• The Taipy library functionality [2:59]• The future of data pipelines [21:40]• Common trends of companies that are successful at adopting data pipelines [28:31]• How no-code and low-code trends impact the data science lifecycle [33:00]• How Vincent chose the programming languages that underpin Taipy [41:40]• Common trends on how companies manage their data to learn from it [45:06]• Vincent's perspective on AI winters [51:03] Additional materials: www
-
672: Open-source "ChatGPT": Alpaca, Vicuña, GPT4All-J, and Dolly 2.0
21/04/2023 Duration: 16minGet started with language models: Learn about the commercial-use options available for your business in this week’s Five-Minute Friday, where host Jon Krohn discusses four models that have many of the capabilities of ChatGPT and can run at a fraction of the cost.Additional materials: www.superdatascience.com/672Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
-
671: Cloud Machine Learning
18/04/2023 Duration: 01h03minGet to grips with AWS, Azure, Google Cloud Platform on this week’s episode. Host Jon Krohn speaks with Kirill Eremenko and Hadelin de Ponteves about CloudWolf, a cloud computing educational platform that prepares students for certification in AWS (Amazon Web Services). Find out why an accreditation in cloud computing could be the safest investment for your data science career. This episode is brought to you by Posit, the open-source data science company, and by AWS Inferentia. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information. In this episode you will learn:• About CloudWolf [07:04]• Why learning the cloud is important for data scientists [09:12]• Is learning cloud computing complex? [22:30]• Essential AWS services [28:31]• Database options on AWS [33:47]• How to run analytics on AWS [40:58]• Why an AWS certification is so helpful [56:35] Additional materials: www.superdatascience.com/671
-
670: LLaMA: GPT-3 performance, 10x smaller
14/04/2023 Duration: 13minHow does Meta AI's natural language model, LLaMa compare to the rest? Based on the Chinchilla scaling laws, LLaMa is designed to be smaller but more performant. But how exactly does it achieve this feat? It's all done by training a small model for a longer period of time. Discover how LLaMa compares to its competition, including GPT-3, in this week's episode. Additional materials: www.superdatascience.com/670Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
-
669: Streaming, reactive, real-time machine learning
11/04/2023 Duration: 01h40minIn this episode, Jon Krohn welcomes Adrian Kosowski, Co-Founder and Chief Product Officer at Pathway, who shares insights on streaming data processing and reactive data processing, and how they're shaping the future of machine learning. Tune in now for an unforgettable episode. This episode is brought to you by Posit, the open-source data science company, and by AWS Inferentia. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information. In this episode you will learn:• About Pathway's reactive data processing framework [04:45]• Reactive data processing use cases [17:08]• What is the difference between batch and streaming processing [33:18]• Transformers in data engineering and data streaming [53:44]• The benefits of Adrian's technical background as a CPO [1:04:17]• Adrian's responsibilities and favorite tools as a CPO [1:15:25]• Emerging ML approaches and tools for startups [1:28:49] Additional materials: www.superdatascience.com/669
-
668: GPT-4: Apocalyptic stepping stone?
07/04/2023 Duration: 55minAI risks, RLHF, and inner alignment: GPT stands to give the business world a major boost. But with everyone racing either to develop products that incorporate GPT or use it to carry out critical tasks, what dangers could lie ahead in working with a tool that applies essentially unknowable means (inner alignments) to reach its goals? This week’s guest Jérémie Harris speaks with Jon Krohn about the essential need for anyone working with GPT to understand the impact of a system comprising inner alignments that cannot – and may never – be fully understood.Additional materials: www.superdatascience.com/668Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
-
667: Harnessing GPT-4 for your Commercial Advantage
04/04/2023 Duration: 01h04minGPT-4, augmenting human tasks with AI, and using GPT-4 commercially: Vin Vashishta speaks to host Jon Krohn about how to leverage GPT-4 and outperform your competitors in both speed and value. Learn how GPT-4 has outmatched its predecessors – and many skilled workers – in this latest iteration of large language models. This episode is brought to you by Pathway, the reactive data processing framework, by Posit, the open-source data science company, and by epic LinkedIn Learning instructor Keith McCormick. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information. In this episode you will learn:• Using GPT-4 to screen for jobs [06:26]• A framework for improving systems with GPT [13:32]• Teaming, tooling and collaborating with GPT-4 [29:58]• How to accelerate data science with generative A.I. [45:36]• How to prepare for opportunities with GPT-4 [52:09] Additional materials: www.superdatascience.com/667
-
666: GPT-4
31/03/2023 Duration: 11minGPT-4 has landed! But how well does it compare to GPT-3.5? Tune in to hear Jon stack its performance against its predecessor–the results might just blow your mind.Additional materials: www.superdatascience.com/666Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
-
665: How to be both socially impactful and financially successful in your data career
28/03/2023 Duration: 01h27minAngel investor and data science consultant Josh Wills sits down with Jon Krohn to discuss his former roles (Google, Slack, and Cloudera) and the essential skills for engineering scalable machine learning projects. This episode is brought to you by Pathway, the reactive data processing framework, and by epic LinkedIn Learning instructor Keith McCormick. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information. In this episode you will learn:• Josh's 'Data Engineering for Machine Learning' course [06:50]• Contextual bandits [10:52]• Data quality and monitoring [16:45]• The “infinite loop of sadness” in data product development [25:12]• Josh’s definition of a data scientist [30:02]• Josh's role at WeaveGrid [37:36]• Management-Track vs Independent Contributor [48:47]• Josh's work on the Covid pandemic [1:06:46]• Josh’s favorite tech stack [1:11:13] Additional materials: www.superdatascience.com/665
-
664: MIT Study: ChatGPT Dramatically Increases Productivity
24/03/2023 Duration: 05minCan ChatGPT make us better and faster in our work, and is it the future or just another fad? In this episode, Jon Krohn delves into a new study from MIT about the tool’s potential productivity for white-collar tasks.Additional materials: www.superdatascience.com/664Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
-
663: Astonishing CICERO negotiates and builds trust with humans using natural language
21/03/2023 Duration: 01h17minNLP, transformer architectures, and machines beating humans at their own game: Jon Krohn talks to Alexander H. Miller about his work in building a machine that can outsmart humans in the game of Diplomacy by engineering powers of persuasion and collusion to its own advantage. This episode is brought to you by epic LinkedIn Learning instructor Keith McCormick (linkedin.com/learning/instructors/keith-mccormick). Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information. In this episode you will learn:• Training a natural language model to interact with Diplomacy players [05:07]• Processing speeds for a Diplomacy bot [29:32]• Using transformer architectures [37:25]• How Diplomacy AI actually works [43:25]• CICERO's potential real-world applications [55:28]• How to R&D an AI project [59:27]• How to become an AI Research Manager [1:06:12] Additional materials: www.superdatascience.com/663
-
662: The Most Popular SuperDataScience Podcast Episodes of 2022
17/03/2023 Duration: 07minOur list of the top 10 SuperDataScience podcast episodes for 2022 is here. From Pandas to causality, AI breakthroughs and data storytelling, these were your most popular episodes of the year gone by. Additional materials: www.superdatascience.com/662 Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
-
661: Designing Machine Learning Systems
14/03/2023 Duration: 01h16minChip Huyen, co-founder of Claypot AI and author of O'Reilly's best-selling "Designing Machine Learning Systems" is here to share her expertise on designing production-ready machine learning applications, the importance of iteration in real-world deployment, and the critical role of real-time machine learning in various applications. Technical listeners like data scientists and machine learning engineers will definitely enjoy this one! This episode is brought to you by Pathway, the reactive data processing framework (pathway.com), and by epic LinkedIn Learning instructor Keith McCormick (linkedin.com/learning/instructors/keith-mccormick). Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information. In this episode you will learn:• Why Chip wrote 'Designing Machine Learning Systems' [08:58]• How Chip ended up teaching at Stanford [13:18]• About Chip's book 'Designing Machine Learning Systems' [21:12]• What makes ML feel like magic [30:53]• How to align bus
-
660: Five Ways to Use ChatGPT for Data Science
10/03/2023 Duration: 03minChatGPT is well-known for its potential to disrupt the writing industry, but in what other, perhaps less explored, ways can we use the tool? In this episode, Jon Krohn outlines five critical ways that ChatGPT can augment a data scientist’s work. From generating code to acting as a translation tool for programming languages, listen in to hear why ChatGPT could become a vital part of every data scientist’s toolkit. Additional materials: www.superdatascience.com/660 Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
-
659: Open-Source Tools for Natural Language Processing
07/03/2023 Duration: 01h20minNLP practitioners: this episode is for you. From the awareness of linguistic elements and annotation to getting the necessary people in the room, Vincent Warmerdam presents to Jon Krohn a recipe for a successful project and the open-source NLP tools to get there. This episode is brought to you by epic LinkedIn Learning instructor Keith McCormick (linkedin.com/learning/instructors/keith-mccormick). Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information. In this episode you will learn:• How Vincent came to work with De Speld [08:57]• Vincent’s role at Explosion [18:59]• How users can apply spaCy [21:46]• Prodigy: Annotate training data more efficiently with scripts [26:28]• How to manage “skill anxiety” with Calmcode [32:32]• How Vincent fixed bad labels [42:47]• The value of understanding linguistics for NLP [54:42]• How to constrain artificial stupidity [1:02:38] Additional materials: www.superdatascience.com/659
-
658: How to Build Data and ML Products Users Love
03/03/2023 Duration: 35minWhat makes data products popular? Brian T. O'Neill, Founder and Principal of Designing for Analytics, returns to the podcast to help us crack the code on building data products that people love. Additional materials: www.superdatascience.com/658 Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.