Synopsis
Kirill Eremenko is a Data Science coach and lifestyle entrepreneur. The goal of the Super Data Science podcast is to bring you the most inspiring Data Scientists and Analysts from around the World to help you build your successful career in Data Science. Data is growing exponentially and so are salaries of those who work in analytics. This podcast can help you learn how to skyrocket your analytics career. Big Data, visualization, predictive modeling, forecasting, analysis, business processes, statistics, R, Python, SQL programming, tableau, machine learning, hadoop, databases, data science MBAs, and all the analytcis tools and skills that will help you better understand how to crush it in Data Science.
Episodes
-
595: Data Engineering 101
26/07/2022 Duration: 01h19minTune in as Joe Reis and Matt Housley, co-founders of Ternary Data and co-authors of the book “Fundamentals of Data Engineering” join Jon Krohn to discuss major undercurrents across the data engineering lifecycle, and their top tools and techniques. In this episode you will learn: • What is data engineering? [3:55] • Why Joe and Matt identify as “recovering data scientists” [6:12] • What kinds of people tend to become data scientists vs. data engineers [10:38]? • Key components of Joe and Matt’s book [26:31] • Major undercurrents across the data engineering lifecycle [28:26] • The most under-utilized tool in a data engineer's toolbox [34:39] • How there are tradeoffs in any data pipeline latency considerations, but faster is typically the default assumption [38:55] • Joe and Matt’s favorite data engineering tools and techniques [43:39] Additional materials: www.superdatascience.com/595
-
594: Why CEOs Care About A.I. More than Other Technologies
22/07/2022 Duration: 05minThis week, Jon Krohn and A.I. industry veteran Ben Taylor discuss the driving factors that push CEOs to prioritize A.I. over other technologies. Additional materials: www.superdatascience.com/594
-
593: The Real-World Impact of Cross-Disciplinary Data Science Collaboration
19/07/2022 Duration: 01h21minJon welcomes Professor Philip Bourne, Founding Dean of the School of Data Science at the University of Virginia to discuss his biomedical data science research, the importance of open-source and open-access within the industry and the data science skills you need to succeed today. In this episode you will learn: • Why Philip founded a School of Data Science [6:08] • How computing and data science have evolved across academic departments [15:55] • The improvements needed in higher education [26:44] • The most important data science skills for academia and industry and the 4+1 model [36:49] • Philip’s biomedical data science research and its fascinating practical applications [43:24] • The essential roles of open-source code and open-access publishing in data science [1:01:27] Additional materials: www.superdatascience.com/593
-
592: How to Sell a Multimillion Dollar A.I. Contract
15/07/2022 Duration: 03minIn this episode, Jon Krohn welcomes A.I. industry veteran Ben Taylor to discuss how to sell multimillion dollar A.I. contracts. Tune in to hear why trust and proof of value are some of the critical steps in his sales process. Additional materials: www.superdatascience.com/592
-
591: Simulations and Synthetic Data for Machine Learning
12/07/2022 Duration: 01h14minMars Buttfield-Addison, PhD Candidate at the University of Tasmania, joins Jon Krohn for a high-energy episode covering everything from Machine Learning simulations to Swift, space junk, and more! In this episode you will learn: • What simulations and synthetic data are, and why they can be invaluable for real-life applications [5:47] • How simulated bots can solve any problem [9:07] • Practical uses of simulated data [21:49] • Why the mobile operating system language Swift is interesting for A.I. [25:46] • Why it's critical to track the amount of junk in space [35:47] • Whether programming or statistical skills are more important in data science [47:05] • What it’s like creating video games in a "secret" games lab [56:45] • Why you might want to do a data science internship in industry before pursuing in academia [ 1:01:54] Additional materials: www.superdatascience.com/591
-
590: Artificial General Intelligence is Not Nigh (Part 2 of 2)
08/07/2022 Duration: 05minIn this episode, Jon continues his two-part series on artificial general intelligence (AGI) and why we are unlikely to realize it anytime soon. Listen in as Jon reviews Meta's Yann LeCun's seven-part perspective on the topic. Additional materials: www.superdatascience.com/590
-
589: Narrative A.I. with Hilary Mason
05/07/2022 Duration: 56minHilary Mason, Co-Founder and CEO of Hidden Door, joins Jon Krohn for a live discussion that explores narrative A.I., emerging ML techniques, and how her OSEMN data science process developed. In this episode you will learn: How narrative A.I. can assist creativity [5:14] How to build ML products that have no quantitative error function to optimize [10:31] How to ensure creative A.I. systems do not output non-sense or explicit content [16:58] Hilary's OSEMN data science process [21:05] The emerging ML technique she’s most excited about [24:58] What it takes to be successful as CEO of an early-stage A.I. company [27:20] What she looks for in engineering hires [32:28] How she’s hopeful A.I. will transform our lives for the better in the decades to come [38:48] Additional materials: www.superdatascience.com/589
-
588: Artificial General Intelligence is Not Nigh
01/07/2022 Duration: 05minIn this episode, Jon kicks off a two-part series that sees him explore the popular topic of artificial general intelligence and why it might–or might not–be only a few years away. Listen in as Jon explains the several reasons why he doesn't believe that AGI is nigh. Additional materials: www.superdatascience.com/588
-
587: Data Engineering for Data Scientists
28/06/2022 Duration: 01h25minMark Freeman, Senior Data Scientist at Humu, joins Jon Krohn to talk about all things data engineering and offers listeners some critical tips for their data science career journey – from what it takes to get promoted to his number one tip for getting hired at a fast-growing capital-backed startup. In this episode you will learn: How Humu leverages data and machine learning to improve workplace behaviors [10:38] What is data engineering? [14:21] What it takes to get promoted into more senior data science roles [20:55] The differences between junior, senior, and staff data scientists [30:21] Mark’s top tools for data extraction, modeling, and pipeline engineering [37:08] Mark’s number one tip for getting hired at a fast-growing venture capital-backed startup [53:10] Why all data scientists should be interested in Web3 [1:11:53] Additional materials: www.superdatascience.com/587
-
586: Daily Habit #10: Limit Social Media Use
24/06/2022 Duration: 04minIn this episode, Jon dives into the popular topic of social media and its impact on his productivity. Tune in to hear how minimizing the use of social media can positively impact your days, mental health and work. Additional materials: www.superdatascience.com/586
-
585: PyMC for Bayesian Statistics in Python
21/06/2022 Duration: 01h26minIn this episode, Dr. Thomas Wiecki, Core Developer of the PyMC Library and CEO of PyMC Labs, joins Jon for a masterclass in Bayesian statistics. Tune in to hear about PyMC, and discover why Bayesian statistics can be more powerful and interpretable than any other data modeling approach. In this episode you will learn: What Bayesian statistics is [7:30] Why Bayesian statistics can be more powerful and interpretable than any other data modeling approach [17:20] How PyMC was developed [20:41] Commercial applications of Bayesian stats [43:07] How to build a successful company culture [1:03:14] What Thomas looks for when hiring [1:11:13] Thomas’s top resources for learning Bayesian stats yourself [1:13:57] Additional materials: www.superdatascience.com/585
-
584: OpenAI Codex
17/06/2022 Duration: 04minIn this episode, Jon reviews the remarkable natural language model Codex by OpenAI. Learn why it has amassed a waitlist and how you can leverage its practical applications in your work. Additional materials: www.superdatascience.com/584
-
583: The State of Natural Language Processing
14/06/2022 Duration: 01h14minIn this episode, natural language processing (NLP) expert and Lead Data Scientist at CB Insights, Rongyao Huang, joins Jon Krohn to discuss NLP. Listen in for a thorough review of the field over the past decade and how the coming iron age of NLP will help us overcome the limitations of today's approaches. In this episode you will learn: The evolution of NLP techniques over the past decade [4:14] What's next in the coming iron age of NLP [35:33] Rongyao’s Bauhaus-inspired model for effective data science [43:12] Rongyao's long-term career pathfinding framework [51:50] Rongyao’s top tips for staying sane while juggling career and family [1:00:30] Additional materials: www.superdatascience.com/583
-
582: Model Speed vs Model Accuracy
10/06/2022 Duration: 03minIn this episode, Jon wraps up his three-part series on business value and machine learning. Listen in as he explains why starting with simple models is best, and why speed is likely more important to your users than accuracy. Additional materials: www.superdatascience.com/582
-
581: Bayesian, Frequentist, and Fiducial Statistics in Data Science
07/06/2022 Duration: 01h24minIn this episode founding Editor-in-Chief of the Harvard Data Science Review and Professor of Statistics at Harvard University, Prof. Xiao-Li Meng, joins Jon Krohn to dive into data trade-offs that abound, and shares his view on the paradoxical downside of having lots of data. In this episode you will learn: What the Harvard Data Science Review is and why Xiao-Li founded it [5:31] The difference between data science and statistics [17:56] The concept of 'data minding' [22:27] The concept of 'data confession' [30:31] Why there’s no “free lunch” with data, and the tricky trade-offs that abound [35:20] The surprising paradoxical downside of having lots of data [43:23] What the Bayesian, Frequentist, and Fiducial schools of statistics are, and when each of them is most useful in data science [55:47] Additional materials: www.superdatascience.com/581
-
580: Collecting Valuable Data
03/06/2022 Duration: 05minIn this episode, Jon resumes his series on strategies for getting business value from machine learning. Part one saw him review several ways to identify a commercial problem before starting data collection or ML model development. And now, in part two, Jon digs into the data collection process. Additional materials: www.superdatascience.com/580
-
579: Transforming Dentistry with A.I.
31/05/2022 Duration: 47minIn this episode, the CEO of Overjet, Dr. Wardah Inam, joins Jon Krohn to discuss the classification and quantification of dental diagnoses with computer vision, her data labeling challenges, and tips for building a successful A.I. business. In this episode you will learn: How Overjet leverages computer vision to qualify and quantify dental diagnoses [5:11] How A.I. solutions reduce the under-diagnosis of common diseases like periodontal disease [8:15] Overjet's particular ML challenges within the dental industry [15:45] Wardah's experience in introducing A.I. to the dental industry [20:12] Wardah's tips for building a successful A.I. business [23:34] What she looks for in the data scientists and software engineers she hires [39:36] Additional materials: www.superdatascience.com/579
-
578: Identifying Commercial ML Problems
27/05/2022 Duration: 03minIn this episode, Jon kicks off a new Five-Minute Friday series that explores the strategies for getting business value from machine learning. Part one sees him review several ways to identify a commercial problem before starting data collection or ML model development. Additional materials: www.superdatascience.com/578
-
577: Scaling A.I. Startups Globally
24/05/2022 Duration: 55minIn this episode, the former CEO and co-founder behind Onfido, an AI-based ID verification, joins Jon Krohn to discuss his path to start-up success. Tune in to hear valuable information from Husayn Kassai. In this episode you will learn: How Husayn's start-up journey began [5:55] How Husayn determined that his challenge could be solved by machine vision [11:18] Onfido's initial seed stages [18:23] Launching and scaling your start-up in the U.S. market [22:00] The most important component in building the best product [26:30] Husayn's latest start-up [28:52] Husayn’s startup project decision-making process [37:49] Choosing your co-founding team [44:04] Additional materials: www.superdatascience.com/577
-
576: Tech Startup Dramas
20/05/2022 Duration: 03minHollywood has officially fallen for the drama of tech startups! Tune in to hear Jon Krohn review the small-screen adaptations of WeWork (WeCrashed), Uber (Super Pumped), and Theranos (The Dropout). Additional materials: www.superdatascience.com/576