Superdatascience

  • Author: Vários
  • Narrator: Vários
  • Publisher: Podcast
  • Duration: 694:35:06
  • More information

Informações:

Synopsis

Kirill Eremenko is a Data Science coach and lifestyle entrepreneur. The goal of the Super Data Science podcast is to bring you the most inspiring Data Scientists and Analysts from around the World to help you build your successful career in Data Science. Data is growing exponentially and so are salaries of those who work in analytics. This podcast can help you learn how to skyrocket your analytics career. Big Data, visualization, predictive modeling, forecasting, analysis, business processes, statistics, R, Python, SQL programming, tableau, machine learning, hadoop, databases, data science MBAs, and all the analytcis tools and skills that will help you better understand how to crush it in Data Science.

Episodes

  • 681: XGBoost: The Ultimate Classifier, with Matt Harrison

    23/05/2023 Duration: 01h12min

    Unlock the power of XGBoost by learning how to fine-tune its hyperparameters and discover its optimal modeling situations. This and more, when best-selling author and leading Python consultant Matt Harrison teams up with Jon Krohn for yet another jam-packed technical episode! Are you ready to upgrade your data science toolkit in just one hour? Tune-in now!This episode is brought to you by Pathway, the reactive data processing framework, by Posit, the open-source data science company, and by Anaconda, the world's most popular Python distribution. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.In this episode you will learn:• Matt's book ‘Effective XGBoost’ [07:05]• What is XGBoost [09:09]• XGBoost's key model hyperparameters [19:01]• XGBoost's secret sauce [29:57]• When to use XGBoost [34:45]• When not to use XGBoost [41:42]• Matt’s recommended Python libraries [47:36]• Matt's production tips [57:57]Additional materials: www.superdatascience.

  • 680: Automating Industrial Machines with Data Science and the Internet of Things (IoT)

    19/05/2023 Duration: 30min

    Industrial machinery’s dependence on data science, tech stacks to build IoT platforms, and transitioning from data science to product: This week’s Friday episode with Allegra Alessi explores the minutiae of product ownership for the Internet of Things at packaging company Bobst. Join host Jon Krohn and his guest as they unpack how the IoT is leading factory production.Additional materials: www.superdatascience.com/680Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.

  • 679: The A.I. and Machine Learning Landscape, with investor George Mathew

    16/05/2023 Duration: 01h34min

    Generative AI, MLOps, and making smart investments in AI: This week’s episode is critical listening for AI investors and generative AI creators. AI investor George Mathew talks with host Jon Krohn about the emerging generative AI stack, the critical elements of MLOps to ensure a scalable model, and the tools developers can use for a saleable product.This episode is brought to you by Posit, the open-source data science company, by AWS Inferentia, and by Anaconda, the world's most popular Python distribution. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.In this episode you will learn:• Venture capital’s role in the technology startup ecosystem [05:59]• How RLHF helps UI become more intuitive [12:53]• The four layers of the generative AI stack [34:16]• The risks for generative AI business founders and investors [46:50]• How MLOps drive best practices and help implementation [56:33]• The importance of PLG (Product Lead Growth) [1:04:15]• How g

  • 678: StableLM: Open-source "ChatGPT"-like LLMs you can fit on one GPU

    12/05/2023 Duration: 11min

    StableLM, the new family of open-source language models from the brilliant minds behind Stable Diffusion is out! Small, but mighty, these models have been trained on an unprecedented amount of data for single GPU LLMs. This week, Jon breaks down the mechanics of this model–see you there! Additional materials: www.superdatascience.com/678 Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.

  • 677: Digital Analytics with Avinash Kaushik

    09/05/2023 Duration: 01h27min

    How does one use marketing analytics to drive business success? Avinash Kaushik, Chief Strategy Officer at Croud and former Sr. Director of Global Strategic Analytics at Google joins Jon Krohn live for an exciting episode that covers the transformative power of AI, his 'four clusters of intent' framework and the value of hands-on data tools. This episode is brought to you by Pathway, the reactive data processing framework, by Posit, the open-source data science company, and by Anaconda, the world's most popular Python distribution. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information. In this episode you will learn: • What is a chief strategy officer? [3:55] • Brand vs performance analytics [7:23] • Incrementality-centric marketing [32:53] • Avinash's time at Google [37:54] • How to maintain human-touch with AI [48:58] • Four clusters of intent framework [1:11:28] • Avinash's most significant career challenges [1:17:18] Additional materials: ww

  • 676: The Chinchilla Scaling Laws

    05/05/2023 Duration: 13min

    Chinchilla AI, and fine-tuning proprietary tasks with large language models: On this week’s Five-Minute Friday, host Jon Krohn outlines the principles of the Chinchilla Scaling Laws, the incredible power of models such as Cerebras-GPT based on these laws, and the impact of scaling on the number of viable applications and commercial use cases.Additional materials: www.superdatascience.com/676Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.

  • 675: Pandas for Data Analysis and Visualization

    02/05/2023 Duration: 01h08min

    Wrangling data in Pandas, when to use Pandas, Matplotlib or Seaborn, and why you should learn to create Python packages: Jon Krohn speaks with guest Stefanie Molin, author of Hands-On Data Analysis with Pandas. This episode is brought to you by Posit, the open-source data science company, and by AWS Inferentia. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information. In this episode you will learn:• The advantages of using pandas over other libraries [07:55]• Why data wrangling in pandas is so helpful [12:05]• Stefanie’s Data Morph library [24:27]• When to use pandas, matplotlib, or seaborn [33:45]• Understanding the ticker module in matplotlib [36:48]• Where data analysts should start their learning journey [40:08]• What it’s like being a software engineer at Bloomberg [51:19] Additional materials: www.superdatascience.com/675

  • 674: Parameter-Efficient Fine-Tuning of LLMs using LoRA (Low-Rank Adaptation)

    28/04/2023 Duration: 05min

    Models like Alpaca, Vicuña, GPT4All-J and Dolly 2.0 have relatively small model architectures, but they're prohibitively expensive to train even on a small amount of your own data. The standard model-training protocol can also lead to catastrophic forgetting. In this week's episode, Jon explores a solution to these problems, introducing listeners to Parameter-Efficient Fine-Tuning (PEFT) and the leading approach: Low-Rank Adaptation (LoRA).Additional materials: www.superdatascience.com/674Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.

  • 673: Taipy, the open-source Python application builder

    25/04/2023 Duration: 01h12min

    Vincent Gosselin, CEO and co-founder of Taipy, an open-source Python library, joins Jon Krohn to discuss how to accelerate productivity in Python and build scalable, reusable, and maintainable data pipelines. Gosselin shares his breadth of wisdom honed over his decades-long AI career. This episode is brought to you by Pathway, the reactive data processing framework, and by Posit, the open-source data science company. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information. In this episode you will learn:• The Taipy library functionality [2:59]• The future of data pipelines [21:40]• Common trends of companies that are successful at adopting data pipelines [28:31]• How no-code and low-code trends impact the data science lifecycle [33:00]• How Vincent chose the programming languages that underpin Taipy [41:40]• Common trends on how companies manage their data to learn from it [45:06]• Vincent's perspective on AI winters [51:03] Additional materials: www

  • 672: Open-source "ChatGPT": Alpaca, Vicuña, GPT4All-J, and Dolly 2.0

    21/04/2023 Duration: 16min

    Get started with language models: Learn about the commercial-use options available for your business in this week’s Five-Minute Friday, where host Jon Krohn discusses four models that have many of the capabilities of ChatGPT and can run at a fraction of the cost.Additional materials: www.superdatascience.com/672Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.

  • 671: Cloud Machine Learning

    18/04/2023 Duration: 01h03min

    Get to grips with AWS, Azure, Google Cloud Platform on this week’s episode. Host Jon Krohn speaks with Kirill Eremenko and Hadelin de Ponteves about CloudWolf, a cloud computing educational platform that prepares students for certification in AWS (Amazon Web Services). Find out why an accreditation in cloud computing could be the safest investment for your data science career. This episode is brought to you by Posit, the open-source data science company, and by AWS Inferentia. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information. In this episode you will learn:• About CloudWolf [07:04]• Why learning the cloud is important for data scientists [09:12]• Is learning cloud computing complex? [22:30]• Essential AWS services [28:31]• Database options on AWS [33:47]• How to run analytics on AWS [40:58]• Why an AWS certification is so helpful [56:35] Additional materials: www.superdatascience.com/671

  • 670: LLaMA: GPT-3 performance, 10x smaller

    14/04/2023 Duration: 13min

    How does Meta AI's natural language model, LLaMa compare to the rest? Based on the Chinchilla scaling laws, LLaMa is designed to be smaller but more performant. But how exactly does it achieve this feat? It's all done by training a small model for a longer period of time. Discover how LLaMa compares to its competition, including GPT-3, in this week's episode. Additional materials: www.superdatascience.com/670Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.

  • 669: Streaming, reactive, real-time machine learning

    11/04/2023 Duration: 01h40min

    In this episode, Jon Krohn welcomes Adrian Kosowski, Co-Founder and Chief Product Officer at Pathway, who shares insights on streaming data processing and reactive data processing, and how they're shaping the future of machine learning. Tune in now for an unforgettable episode. This episode is brought to you by Posit, the open-source data science company, and by AWS Inferentia. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information. In this episode you will learn:• About Pathway's reactive data processing framework [04:45]• Reactive data processing use cases [17:08]• What is the difference between batch and streaming processing [33:18]• Transformers in data engineering and data streaming [53:44]• The benefits of Adrian's technical background as a CPO [1:04:17]• Adrian's responsibilities and favorite tools as a CPO [1:15:25]• Emerging ML approaches and tools for startups [1:28:49] Additional materials: www.superdatascience.com/669

  • 668: GPT-4: Apocalyptic stepping stone?

    07/04/2023 Duration: 55min

    AI risks, RLHF, and inner alignment: GPT stands to give the business world a major boost. But with everyone racing either to develop products that incorporate GPT or use it to carry out critical tasks, what dangers could lie ahead in working with a tool that applies essentially unknowable means (inner alignments) to reach its goals? This week’s guest Jérémie Harris speaks with Jon Krohn about the essential need for anyone working with GPT to understand the impact of a system comprising inner alignments that cannot – and may never – be fully understood.Additional materials: www.superdatascience.com/668Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information. 

  • 667: Harnessing GPT-4 for your Commercial Advantage

    04/04/2023 Duration: 01h04min

    GPT-4, augmenting human tasks with AI, and using GPT-4 commercially: Vin Vashishta speaks to host Jon Krohn about how to leverage GPT-4 and outperform your competitors in both speed and value. Learn how GPT-4 has outmatched its predecessors – and many skilled workers – in this latest iteration of large language models. This episode is brought to you by Pathway, the reactive data processing framework, by Posit, the open-source data science company, and by epic LinkedIn Learning instructor Keith McCormick. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information. In this episode you will learn:• Using GPT-4 to screen for jobs [06:26]• A framework for improving systems with GPT [13:32]• Teaming, tooling and collaborating with GPT-4 [29:58]• How to accelerate data science with generative A.I. [45:36]• How to prepare for opportunities with GPT-4 [52:09] Additional materials: www.superdatascience.com/667

  • 666: GPT-4

    31/03/2023 Duration: 11min

    GPT-4 has landed! But how well does it compare to GPT-3.5? Tune in to hear Jon stack its performance against its predecessor–the results might just blow your mind.Additional materials: www.superdatascience.com/666Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.

  • 665: How to be both socially impactful and financially successful in your data career

    28/03/2023 Duration: 01h27min

    Angel investor and data science consultant Josh Wills sits down with Jon Krohn to discuss his former roles (Google, Slack, and Cloudera) and the essential skills for engineering scalable machine learning projects. This episode is brought to you by Pathway, the reactive data processing framework, and by epic LinkedIn Learning instructor Keith McCormick. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information. In this episode you will learn:• Josh's 'Data Engineering for Machine Learning' course [06:50]• Contextual bandits [10:52]• Data quality and monitoring [16:45]• The “infinite loop of sadness” in data product development [25:12]• Josh’s definition of a data scientist [30:02]• Josh's role at WeaveGrid [37:36]• Management-Track vs Independent Contributor [48:47]• Josh's work on the Covid pandemic [1:06:46]• Josh’s favorite tech stack [1:11:13] Additional materials: www.superdatascience.com/665

  • 664: MIT Study: ChatGPT Dramatically Increases Productivity

    24/03/2023 Duration: 05min

    Can ChatGPT make us better and faster in our work, and is it the future or just another fad? In this episode, Jon Krohn delves into a new study from MIT about the tool’s potential productivity for white-collar tasks.Additional materials: www.superdatascience.com/664Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.

  • 663: Astonishing CICERO negotiates and builds trust with humans using natural language

    21/03/2023 Duration: 01h17min

    NLP, transformer architectures, and machines beating humans at their own game: Jon Krohn talks to Alexander H. Miller about his work in building a machine that can outsmart humans in the game of Diplomacy by engineering powers of persuasion and collusion to its own advantage. This episode is brought to you by epic LinkedIn Learning instructor Keith McCormick (linkedin.com/learning/instructors/keith-mccormick). Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information. In this episode you will learn:• Training a natural language model to interact with Diplomacy players [05:07]• Processing speeds for a Diplomacy bot [29:32]• Using transformer architectures [37:25]• How Diplomacy AI actually works [43:25]• CICERO's potential real-world applications [55:28]• How to R&D an AI project [59:27]• How to become an AI Research Manager [1:06:12] Additional materials: www.superdatascience.com/663

  • 662: The Most Popular SuperDataScience Podcast Episodes of 2022

    17/03/2023 Duration: 07min

    Our list of the top 10 SuperDataScience podcast episodes for 2022 is here. From Pandas to causality, AI breakthroughs and data storytelling, these were your most popular episodes of the year gone by. Additional materials: www.superdatascience.com/662 Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.

page 16 from 50