Xiao-Li Meng
@XiaoLiMeng1
Seeking simplicity in statistics, complexity in wine, and everything else in fortune cookies.
The soup analogy is excellent, but it requires three assumptions: (1) the soup is well mixed in the pot, (2) nothing is altered to the spoonful, and (3) the taster knows how to judge and report. (2) & (3) are critical with human review. scientificamerican.com/article/the-se… via @sciam
I saw @szhang_ds tweet out an excellent analogy for this. You only need to sample a spoonful to know if the soup is good. Doesn’t matter what size the pot is.
New podcast! For this month’s episode, we talk to @colbyhall & @LelandVittert for an-in-depth look at how the media uses #data to report & analyze U.S. presidential races past & present. Listen! hdsr.mitpress.mit.edu/podcast #2024PresidentialRace #Election2024 #ElectionPolls #voting
Or above their heads — teaching design without having carried out one is like teaching recipes without ever cooking a dish. I learned a lot by teaching sample surveys, but I failed Miserably when I got involved in an alumni survey — I couldn’t even get a pilot study going …
Unfortunately you are not alone. Providing timely data and examples requires instructors to stay current with practice and to invest time and energy ongoing-ly in pedagogy. Neither of which is incentivized suitably in many univ. Perhaps demand and funding from industry can help?
My personal experience in grad school when I leant these courses are they are too old in the sense that the data and examples my professors use are way too outdated. If they can use the modern data and examples to deliver the core idea in these topics, then I will be very excited
Yes your points are well taken. Well designed and implemented data collections involve a lot of “behavioral statistics”, which is a lot messier to formulate, investigate, and write articles about. Same for preprocessing, another critical area we statisticians stay away from :-(
My personal experience in grad school when I leant these courses are they are too old in the sense that the data and examples my professors use are way too outdated. If they can use the modern data and examples to deliver the core idea in these topics, then I will be very excited
Fully agree and they were not exciting or challenging because those of us who taught them were not excited or challenged ourselves. We taught them out of textbooks, not out of ours experience or headaches. But things are changing, e.g., I know you are excited & challenged! :-)
The major issue in bringing these courses back is to make them both really challenging and exciting. The reason they were taken out was because the courses often were not able to show the challenges in applying them to practical use cases. @XiaoLiMeng1 may be you can teach one😅
The best line for fundraising from @TimRitchieMOS “People give to what they value when they are asked by somebody they trust”. And of course a bit of luck also helps. So good luck! :-)
For our latest issue @XiaoLiMeng1 interviewed @TimRitchieMOS of @museumofscience for an insightful discussion that touched on many topics pertaining to the role of #museums in society, particularly in the context of #DataScience #AI, & #fundraising hdsr.mitpress.mit.edu/pub/9wczyq5o
From the keynote speech by Nancy Potok at AI Day for Federal Statistics: CNSTAT Public Event, NASEM. The webinar on May 21-22 is free to all. Registration infor is in the post. Check out the special issue on Democratizing Data at HDSR site. Thanks!

Thank YOU @francescadomin8 and @IavorBojinov for all your effort and innovation! I gather by “muse”, you meant “amused”, i.e., by how I had managed to mistype “causal conversations” as “casual conversations”, causing some last-minute frantic changes before launching :-)
Just launched with @IavorBojinov a new column Causal inference for everyone @TheHDSR @harvard_data. We thank our "muse" and editor-in-chief @XiaoLiMeng1 hdsr.mitpress.mit.edu/pub/laxlndnv/r…
My first presentation inside a barrel, and hopefully it’s not the last one. How did I do? Well, it would depend on what’s in your glass or bottle …
Just launched with @IavorBojinov a new column Causal inference for everyone @TheHDSR @harvard_data. We thank our "muse" and editor-in-chief @XiaoLiMeng1 hdsr.mitpress.mit.edu/pub/laxlndnv/r…
I was often told that the common wisdom is that it’s unwise to launch an issue on Fridays. But I hope collectively we can prove that “wisdom of crowd” is not a part of HSI, and this issue on HI, AI, and HSI will provide a provocative and relaxing reading for the weekend. Thanks!
Our fall issue (5.4) has launched!🎉Read it now starting with the editorial, "Human Intelligence, Artificial Intelligence, & Homo Sapiens Intelligence?" by Editor-in-Chief @XiaoLiMeng1 @harvard_data @francescadomin8 @mitpress #AI #ChatGPT #DataScience hdsr.mitpress.mit.edu/pub/nw5bilq7
GPT-n is certainly a disruptive technology in many senses. It is therefore fitting that this special issue also kicks off the open call of submission for HDSR, which has been by invitation only (upon successful proposal screening). Please help to get the words out, and submit!
Announcing an open call for our special issue on #GenerativeAI! Edited by @FranBerman @rherbrich & David Leslie ( @turinginst ) Submit today! Click here for details: assets.pubpub.org/tbjq9yfk/Futur…
Great to be back in the loop, before the n in GPT-n becomes too large to catch up. :-) Writing a thematized editorial for HDSR is my quarterly intense intellectual exercise. I’m grateful to authors, reviewers, and editors for providing all the heavy equipments. More, pleases!
Our latest issue is out! Read it now starting with the editorial, "Data Science & Engineering With Human in the Loop, Behind the Loop, & Above the Loop" by Founding Editor-in-Chief @XiaoLiMeng1 #HITL #DataScience @harvard_data @mitpress hdsr.mitpress.mit.edu/pub/812vijgg
While we are at it, check out the podcast on the DP issue, and the special issue editors @RuobinGong, @EricaGroshen, and Salil Vadhan's great editorial hdsr.mitpress.mit.edu/pub/fgyf5cne Thanks again Robin, Erica, and Salil! Thanks much @Rmcleodb for putting everything together!
Do you want to know the Whats, Whys & the Hows of the U.S. #Census? Check out the latest #HDSRPodcast episode and listen to our conversation with guests @EricaGroshen & @RuobinGong #DifferentialPrivacy @uscensusbureau hdsr.podbean.com/e/differential…
Thank you for not making this seminar private! :-) Data privacy is truly a topic for everyone. Abstract for those interested (see you 8:30am EST Dec 15!) dropbox.com/s/usdmpl5s81ja… Please also check out the special issue on DP for 2020 US census in HDSR: hdsr.mitpress.mit.edu/specialissue2
Join us virtually tomorrow at 14.30 CET for the international roundtable on CSS “Privacy, Data Privacy, and Differential Privacy” with @XiaoLiMeng1 from Harvard University! More info on our website: liu.se/en/article/sem…
Fully agree since no new values are created by sampling. But subsampling can enhance DP. A lot more to be investigated; there are interesting/tough questions to answer — under what conditions it is the right tradeoff for prioritizing outliers over most states in a distribution?
Join us virtually tomorrow at 14.30 CET for the international roundtable on CSS “Privacy, Data Privacy, and Differential Privacy” with @XiaoLiMeng1 from Harvard University! More info on our website: liu.se/en/article/sem…
Thank you! My wonderful coauthor Keli Liu put together this figure. He was an undergraduate student when we wrote that article. He’s so original that when I recommended a Ph.D student I had to write that he was my “best Ph.D student” in order to avoid a comparison with Keli. :-)
One of my favorite figures by @XiaoLiMeng1. Never get tired of discussing the relevance-robustness trade-off!
Thanks much for the invitation! As for TBA, perhaps TBW (to be written) or TBR (to be researched) would be more accurate. :-). In any case, here is the missing title: “Privacy, Data Privacy, and Differential Privacy” — wish me good luck to have it ready by 12/15 8am! :-)
The fall program for the IAS Seminar Series is now out, and we welcome everyone to join our meetings! Among the speakers: @joshua_a_becker @joakimjansson @giulia_ndr @CBicchieri @JennieBrand1 @UzziLeadership @kenbenoit @XiaoLiMeng1 and many others!
Many thanks also to Nick Lindsay @bluenoser2 at @mitpress for bringing this topic to HDSR and for making the initial connections! Properly sharing and managing data is critical for a healthy evolution of the data science ecosystem. Grateful to all authors,editors and reviewers.
Many thanks to @memartone, @SciTechProf, & Richard Nakamura for editing our special collection of 10 articles on the new NIH #data sharing policy! Read their intro: #DataManagement #DataSharing hdsr.mitpress.mit.edu/pub/yb0lddel
Thanks! Need data minding and data confession here rss.onlinelibrary.wiley.com/doi/abs/10.111…, not much data mining or AI. Sampling rate 0.00005% could be ok. The million -- or billion -- dollar question is whether there was a strict quality control of the sampling & human review processes ...
I saw @szhang_ds tweet out an excellent analogy for this. You only need to sample a spoonful to know if the soup is good. Doesn’t matter what size the pot is.