Skip to content Skip to sidebar Skip to footer

Generating Synthetic Descriptive Data in PySpark | by Matt Collins | Jan, 2024

Use various data source types to quickly generate text data for artificial datasets. Image generated with DALL-E 3In a previous article, we explored creating many-to-one relationships between columns in a synthetic PySpark DataFrame. This DataFrame only consisted of Foreign Key information and we didn’t produce any textual information that might be useful in a demo…

Read More

How Artificial Intelligence Might be Worsening the Reproducibility Crisis in Science and Technology | by LucianoSphere (Luciano Abriata, PhD) | Jan, 2024

Discussion backed up by some concrete examples, sketching broad guidelines on how to develop better AI systems Photo by National Cancer Institute on UnsplashArtificial Intelligence has become an integral tool in scientific research, but concerns are growing that the misuse of these powerful tools is leading to a reproducibility crisis in science and its technological…

Read More

Python “Tuple+”: Named Tuples. Tuples are a powerful Python type — but… | by Marcin Kozak | Jan, 2024

PYTHON PROGRAMMING Tuples are a powerful Python type — but named tuples even more so! Named tuples join the strengths of names and tuples. Photo by Ainur Iman on UnsplashThe three most popular Python data types are the list, the dictionary, and the tuple. Lists and dictionaries are mutable, meaning that their elements can be…

Read More

Geospatial Indexing Explained: A Comparison of Geohash, S2, and H3 | by Ben Feifke | Jan, 2024

Geospatial indexing, or Geocoding, is the process of indexing latitude-longitude pairs to small subdivisions of geographical space, and it is a technique that we data scientists often find ourselves using when faced with geospatial data. Though the first popular geospatial indexing technique “Geohash” was invented as recently as 2008, indexing latitude-longitude pairs to manageable subdidivisions…

Read More