A Collection of Fun Databases For Programming Exploration

2023年4月5日 10:25

Longtime Slashdot reader Esther Schindler writes: When you learn a new tool/technology, you need to create a sample application, which cannot use real in-house data. Why not use something fun for the sample application's data, such as a Star Wars API or a data collection about World Cup contests? Esther Schindler, Slashdot user #16185, assembled a groovy collection of datasets that may be useful but also may be a source of fascinating internet rabbit holes. For those interested in datasets, Esther also recommends the Data is Plural newsletter and the website ResearchBuzz, which shares dataset descriptions as well as archive-related news and tools. "Google Research maintains a search site for test datasets, too, if you know what you're looking for," adds Esther. There's also, of course, Kaggle.com.

Read more of this story at Slashdot.

Free Data-Center Heat Is Allegedly Saving a Struggling Public Pool $24K a Year

Slashdot

著者: BeauHD

2023年3月18日 07:40

An anonymous reader quotes a report from Ars Technica: A public pool in the UK is expected to save [about $24,000] and cut carbon emissions by 25.8 tons annually by warming a 25-meter children's pool with waste heat from a data center from startup Deep Green. UK-based Deep Green is a newcomer in the data-center heat game and is making its entrance notable by putting a monetary figure on potential savings, which are fueled by the heat's low, low rate of free. Deep Green's paying customers are machine-learning and AI firms seeking computing resources. As reported by Datacenter Dynamics on Tuesday, clients can leverage Deep Green's 28 kW system with high-performance computing (HPC) capabilities. The HPC cluster at the Exmouth Leisure Centre swimming pool has 12 four-CPU cards and could eventually be used for cloud services and video rendering, Deep Green CEO Mark Bjornsgaard told the publication. According to the BBC , the server is about the size of a washing machine. The computers are submerged in mineral oil that captures heat that gets transferred into pool water with a heat exchanger. The pool still has a gas boiler to boost the water's temperature if required. Deep Green claims it's transferring about 96 percent of the energy used by its computers and reducing a pool's gas heat usage by 62 percent. Deep Green is paying the Exmouth Leisure Centre for all the electricity its data center uses, as well as any setup costs, and the Exmouth Leisure Centre gets the heat for free. Deep Green CTO Mat Craggs told Datacenter Dynamics: "Our expected heat transfer from the kit is 139,284 kWh a year, equivalent to 62 percent of the pool's heat needs." He noted that adding more servers to the tub could extend the figure to 70 or 80 percent. Deep Green's data center can heat the Exmouth Leisure Centre's 25 meter pool to 86 degrees Fahrenheit for about 60 percent of the time, BBC reported. The startup has plans to set up data centers in seven more UK locations and has a 2023 target of 20 locations.

Read more of this story at Slashdot.

Baserow Challenges Airtable With an Open Source No-Code Database Platform

Slashdot

著者: BeauHD

2022年7月7日 07:02

An anonymous reader quotes a report from TechCrunch: The burgeoning low-code and no-code movement is showing little sign of waning, with numerous startups continuing to raise sizable sums to help the less-technical workforce develop and deploy software with ease. Arguably one of the most notable examples of this trend is Airtable, a 10-year-old business that recently attained a whopping $11 billion valuation for a no-code platform used by firms such as Netflix and Shopify to create relational databases. In tandem, we're also seeing a rise in "open source alternatives" to some of the big-name technology incumbents, from Google's backend-as-a-service platform Firebase to open source scheduling infrastructure that seeks to supplant the mighty Calendly. A young Dutch company called Baserow sits at the intersection of both these trends, pitching itself as an open source Airbase alternative that helps people build databases with minimal technical prowess. Today, Baserow announced that it has raised $5.2 million in seed funding to launch a suite of new premium and enterprise products in the coming months, transforming the platform from its current database-focused foundation into a "complete, open source no-code toolchain," co-founder and CEO Bram Wiepjes told TechCrunch. So what, exactly, does Baserow do in its current guise? Well, anyone with even the most rudimentary spreadsheet skills can use Baserow for use-cases spanning content marketing, such as managing brand assets collaboratively across teams; managing and organizing events; helping HR teams or startups manage and track applicants for a new role; and countless more, which Baserow provides pre-built templates for. [...] Baserow's open source credentials are arguably its core selling point, with the promise of greater extensibility and customizations (users can create their own plug-ins to enhance its functionality, similar to how WordPress works) -- this is a particularly alluring proposition for businesses with very specific or niche use cases that aren't well supported from an off-the-shelf SaaS solution. On top of that, some sectors require full control of their data and technology stack for security or compliance purposes. This is where open source really comes into its own, given that businesses can host the product themselves and circumvent vendor lock-in. With a fresh 5 million euros in the bank, Baserow is planning to double down on its commercial efforts, starting with a premium incarnation that's officially launching out of an early access program later this month. This offering will be available as a SaaS and self-hosted product and will include various features such as the ability to export in different formats; user management tools for admin; Kanban view; and more. An additional "advanced" product will also be made available purely for SaaS customers and will include a higher data storage limit and service level agreements (SLAs). Although Baserow has operated under the radar somewhat since its official foundation in Amsterdam last year, it claims to have 10,000 active users, 100 sponsors who donate to the project via GitHub and 800 users already on the waiting list for its premium version. Later this year, Baserow plans to introduce a paid enterprise version for self-hosting customers, with support for specific requirements such as audit logs, single sign-on (SSO), role-based access control and more.

Read more of this story at Slashdot.

SQLite or PostgreSQL? It's Complicated!

Slashdot

著者: msmash

2022年7月5日 07:00

Miguel Grinberg, a Principal Software Engineer for Technical Content at Twilio, writes in a blog post: We take blogging very seriously at Twilio. To help us understand what content works well and what doesn't on our blog, we have a dashboard that combines the metadata that we maintain for each article such as author, team, product, publication date, etc., with traffic information from Google Analytics. Users can interactively request charts and tables while filtering and grouping the data in many different ways. I chose SQLite for the database that supports this dashboard, which in early 2021 when I built this system, seemed like a perfect choice for what I thought would be a small, niche application that my teammates and I can use to improve our blogging. But almost a year and a half later, this application tracks daily traffic for close to 8000 articles across the Twilio and SendGrid blogs, with about 6.5 million individual daily traffic records, and with a user base that grew to over 200 employees. At some point I realized that some queries were taking a few seconds to produce results, so I started to wonder if a more robust database such as PostgreSQL would provide better performance. Having publicly professed my dislike of performance benchmarks, I resisted the urge to look up any comparisons online, and instead embarked on a series of experiments to accurately measure the performance of these two databases for the specific use cases of this application. What follows is a detailed account of my effort, the results of my testing (including a surprising twist!), and my analysis and final decision, which ended up being more involved than I expected. [...] If you are going to take one thing away from this article, I hope it is that the only benchmarks that are valuable are those that run on your own platform, with your own stack, with your own data, and with your own software. And even then, you may need to add custom optimizations to get the best performance.

Read more of this story at Slashdot.

MongoDB 6.0 Brings Encrypted Queries, Time-Series Data Collection

Slashdot

著者: msmash

2022年6月8日 01:41

The developers behind the open source MongoDB, and its commercial service counterpart MongoDB Atlas, have been busy making the document database easier to use for developers. From a report: Available in preview, Queryable Encryption provides the ability to query encrypted data, and with the entire query transaction be encrypted -- an industry first according to MongoDB. This feature will be of interest to organizations with a lot of sensitive data, such as banks, health care institutions and the government. This eliminates the need for developers to be experts in encryption, Davidson said. This end-to-end client-side encryption uses novel encrypted index data structures, the data being searched remains encrypted at all times on the database server, including in memory and in the CPU. The keys never leave the application and the company maintains that the query speed nor overall application performance are impacted by the new feature. MongoDB is also now supporting time series data, which are important for monitoring physical systems, quick-moving financial data, or other temporally-oriented datasets. In MongoDB 6.0, time-series collections can have secondary indexes on measurements, and the database system has been optimized to sort time-based data more quickly. Although there are a number of databases specifically geared towards time-series data specifically, such as InfluxDB, many organizations may not want to stand-up an entire database system for this specific use, a separate system costing more in terms of support and expertise, Davidson argued. Another feature is Cluster-to-Cluster Synchronization, which provides the continuous data synchronization of MongoDB clusters across environments. It works with Atlas, in private cloud, on-premises, or on the edge. This sets the stage for using data in multiple places for testing, analytics, and backup.

Read more of this story at Slashdot.

Google Cloud Launches AlloyDB, a New Fully-Managed PostgreSQL Database Service

Slashdot

著者: BeauHD

2022年5月12日 22:00

An anonymous reader quotes a report from TechCrunch: Google today announced the launch of AlloyDB, a new fully-managed PostgreSQL-compatible database service that the company claims to be twice as fast for transactional workloads as AWS's comparable Aurora PostgreSQL (and four times faster than standard PostgreSQL for the same workloads and up to 100 times faster for analytical queries). [...] AlloyDB is the standard PostgreSQL database at its core, though the team did modify the kernel to allow it to use Google's infrastructure to its fullest, all while allowing the team to stay up to date with new versions as they launch. Andi Gutmans, who joined Google as its GM and VP of Engineering for its database products in 2020 after a long stint at AWS, told me that one of the reasons the company is launching this new product is that while Google has done well in helping enterprise customers move their MySQL and PostgreSQL servers to the cloud with the help of services like CloudSQL, the company didn't necessarily have the right offerings for those customers who wanted to move their legacy databases (Gutmans didn't explicitly say so, but I think you can safely insert 'Oracle' here) to an open-source service. "There are different reasons for that," he told me. "First, they are actually using more than one cloud provider, so they want to have the flexibility to run everywhere. There are a lot of unfriendly licensing gimmicks, traditionally. Customers really, really hate that and, I would say, whereas probably two to three years ago, customers were just complaining about it, what I notice now is customers are really willing to invest resources to just get off these legacy databases. They are sick of being strapped and locked in." Add to that Postgres' rise to becoming somewhat of a de facto standard for relational open-source databases (and MySQL's decline) and it becomes clear why Google decided that it wanted to be able to offer a dedicated high-performance PostgreSQL service. The report also says Google spent a lot of effort on making Postgres perform better for customers that want to use their relational database for analytics use cases. "The changes the team made to the Postgres kernel, for example, now allow it to scale the system linearly to over 64 virtual cores while on the analytical side, the team built a custom machine learning-based caching service to learn a customer's access patterns and then convert Postgres' row format into an in-memory columnar format that can be analyzed significantly faster."

Read more of this story at Slashdot.

Breach of Washington State Database May Expose Personal Info of Millions

Slashdot

著者: BeauHD

2022年2月8日 07:02

An anonymous reader quotes a report from The Associated Press: The Washington State Department of Licensing said the personal information of potentially millions of licensed professionals may have been exposed after it detected suspicious activity on its online licensing system. The agency licenses about 40 categories of businesses and professionals, from auctioneers to real estate agents, and it shut down its online platform temporarily after learning of the activity in January, agency spokesperson Christine Anthony said Friday. Data stored on the system, which is called POLARIS, could include Social Security numbers, birth dates and driver's licenses. The agency doesn't yet know whether such data was actually accessed or how many individuals may have been affected, Anthony said. Anthony said the agency has been working with the state Office of Cybersecurity, the state Attorney General's Office and a third-party cybersecurity firm to understand the scope of the incident, The Seattle Times reported Friday. In the meantime, the shutdown of the POLARIS system is causing problems for some professionals and firms that need to apply for, renew or modify their licensing. The size of the breach remains unclear. Data from 23 professions and business types licensed by the state is processed via POLARIS, Anthony said. Within those 23 categories, which also include bail bonds agents, funeral directors, home inspectors and notaries, the agency has around 257,000 active licenses in its system, Anthony said, adding that "there are likely more records that may be identified while conducting our investigation."

Read more of this story at Slashdot.

The Case Against SQL

Slashdot

著者: EditorDavid

2021年7月19日 05:34

Long-time Slashdot reader RoccamOccam shares "an interesting take on SQL and its issues from Jamie Brandon (who describes himself as an independent researcher who's built database engines, query planners, compilers, developer tools and interfaces). It's title? "Against SQL." The relational model is great... But SQL is the only widely-used implementation of the relational model, and it is: Inexpressive, Incompressible, Non-porous. This isn't just a matter of some constant programmer overhead, like SQL queries taking 20% longer to write. The fact that these issues exist in our dominant model for accessing data has dramatic downstream effects for the entire industry: - Complexity is a massive drag on quality and innovation in runtime and tooling - The need for an application layer with hand-written coordination between database and client renders useless most of the best features of relational databases The core message that I want people to take away is that there is potentially a huge amount of value to be unlocked by replacing SQL, and more generally in rethinking where and how we draw the lines between databases, query languages and programming languages... I'd like to finish with this quote from Michael Stonebraker, one of the most prominent figures in the history of relational databases: "My biggest complaint about System R is that the team never stopped to clean up SQL... All the annoying features of the language have endured to this day. SQL will be the COBOL of 2020..." It's been interesting to follow the discussion on Twitter, where the post's author tweeted screenshots of actual SQL code to illustrate various shortcomings. But he also notes that "The SQL spec (part 2 = 1732) pages is more than twice the length of the Javascript 2021 spec (879 pages), almost matches the C++ 2020 spec (1853) pages and contains 411 occurrences of 'implementation-defined', occurrences which include type inference and error propagation." His Twitter feed also includes a supportive retweet from Rust creator Graydon Hoare, and from a Tetrane developer who says "The Rust of SQL remains to be invented. I would like to see it come."

Read more of this story at Slashdot.

LexisNexis To Provide Giant Database of Personal Information To ICE

Slashdot

著者: BeauHD

2021年4月6日 09:02

An anonymous reader quotes a report from The Intercept: The popular legal research and data brokerage firm LexisNexis signed a $16.8 million contract to sell information to U.S. Immigration and Customs Enforcement, according to documents shared with The Intercept. The deal is already drawing fire from critics and comes less than two years after the company downplayed its ties to ICE, claiming it was "not working with them to build data infrastructure to assist their efforts." Though LexisNexis is perhaps best known for its role as a powerful scholarly and legal research tool, the company also caters to the immensely lucrative "risk" industry, providing, it says, 10,000 different data points on hundreds of millions of people to companies like financial institutions and insurance companies who want to, say, flag individuals with a history of fraud. LexisNexis Risk Solutions is also marketed to law enforcement agencies, offering "advanced analytics to generate quality investigative leads, produce actionable intelligence and drive informed decisions" -- in other words, to find and arrest people. The LexisNexis ICE deal appears to be providing a replacement for CLEAR, a risk industry service operated by Thomson Reuters that has been crucial to ICE's deportation efforts. In February, the Washington Post noted that the CLEAR contract was expiring and that it was "unclear whether the Biden administration will renew the deal or award a new contract." LexisNexis's February 25 ICE contract was shared with The Intercept by Mijente, a Latinx advocacy organization that has criticized links between ICE and tech companies it says are profiting from human rights abuses, including LexisNexis and Thomson Reuters. The contract shows LexisNexis will provide Homeland Security investigators access to billions of different records containing personal data aggregated from a wide array of public and private sources, including credit history, bankruptcy records, license plate images, and cellular subscriber information. The company will also provide analytical tools that can help police connect these vast stores of data to the right person. In a statement to The Intercept, a LexisNexis Risk Solutions spokesperson said: "Our tool contains data primarily from public government records. The principal non-public data is authorized by Congress for such uses in the Drivers Privacy Protection Act and Gramm-Leach-Bliley Act statutes." They declined to say exactly what categories of data the company would provide ICE under the new contract, or what policies, if any, will govern how agency agency uses it.

Read more of this story at Slashdot.

SEGA Lawyers Demand 'Immediate Suspension' of Steam Database Over Alleged Piracy

Slashdot

著者: BeauHD

2021年3月31日 07:40

An anonymous reader quotes a report from TorrentFreak: The popular and entirely legal Steam Database has found itself in a precarious position following two erroneous DMCA notices from SEGA. Steam Database's host is being asked to suspend the platform due to a claimed lack of response to the first notice. This prompted the site to take down entirely legal content in an effort to address the problem. [...] TorrentFreak was able to review the notice sent by SEGA to SteamDB's host and it pulls no punches. SEGA doubles down by stating that SteamDB is illegally distributing the game Yakuza: Like a Dragon, noting that it has tried to inform SteamDB but was "not able" to resolve the issue. Worryingly, it then implies that legal action might be taken against SteamDB for non-compliance, adding that the host should "immediately suspend" SteamDB due to the alleged ongoing infringement. Which, of course, is not taking place. This puts SteamDB's host in a tough position. Failure to act against an allegedly infringing customer can put the host at risk in terms of liability but disabling a customer's website can cause a whole new set of problems, especially when that customer has not infringed anyone's rights. In an effort to sort the problem out, SteamDB's host asked for additional input from the operators of SteamDB but nevertheless warned that if that information was not received, it may still block the SteamDB server within 24 hours, as demanded in the SEGA takedown notice. In order to defuse the situation, SteamDB took down the allegedly-infringing page which as far as SEGA goes (and at least in theory) should solve the disconnection threat problem. However, the entire situation has proven counterproductive for SEGA too.

Read more of this story at Slashdot.

Tinder Users Will Soon Be Able To Access a Background Check Database

Slashdot

著者: BeauHD

2021年3月16日 06:25

Tinder and Match have announced a new partnership with Garbo, a non-profit, female-founded background check platform. In theory, it should allow Tinder (and Match Group's other sites) to ping Garbo's database and proactively show users when it finds something they might want to be aware of. Engadget reports: If you're not familiar with Garbo, it was founded by Kathryn Kosmides, a "survivor of gender-based violence" who wanted to make it easier to find information about people you may connect with online. Garbo's platform aggregates numerous data sources to provide details on an individual, including "arrests, convictions, restraining orders, harassment, and other violent crimes." The organization's site says that often times, you don't even need a last time to find some details on an individual -- a first name and phone number will work. As part of the deal, Garbo's platform will be available to people using Match Group apps, starting with Tinder later this year. [...] Garbo cites making ridesharing services safer as another core initiative for the non-profit in addition to working with dating services, so it wouldn't surprise us to see a similar partnership appear between Garbo and companies like Uber or Lyft -- but for now, it's starting with Tinder.

Read more of this story at Slashdot.

Uber and Lyft Create a Shared Database of Drivers Banned For Assault

Slashdot

著者: BeauHD

2021年3月12日 19:00

Uber and Lyft will work together to share information on US drivers and delivery people accused of physical and sexual assault to ensure those individuals are banned on both platforms, the two companies announced on Thursday in separate blog posts. Engadget reports: HireRight, a company that specializes in conducting background checks, will oversee the Industry Sharing Safety Program database. Other transportation and delivery companies in the US will have the chance to contribute and access the database as long as they adhere to the same data accuracy and privacy policies that Uber and Lyft must follow. "We want to share this information with each other and hopefully in the near future with other companies, so that our peers in this space can be informed and make decisions for their own platforms to keep those platforms safe," Jennifer Brandenburger, Lyft's head of policy development, told NBC News. The database won't include information on victims. Additionally, the incident that landed a driver in the database will fall in broad categories.

Read more of this story at Slashdot.

Hackers Are Selling More Than 85,000 MySQL Databases On a Dark Web Portal

Slashdot

著者: EditorDavid

2020年12月13日 07:34

An anonymous Slashdot reader writes: For the past year, hackers have been breaking into MySQL databases, downloading tables, deleting the originals, and leaving ransom notes behind, telling server owners to contact the attackers to get their data back. If database owners don't respond and ransom their data back in nine days, the databases are then put up on auction on a dark web portal. "More than 85,000 MySQL databases are currently on sale on a dark web portal for a price of only $550/database," reports ZDNet: This suggests that both the DB intrusions and the ransom/auction web pages are automated and that attackers don't analyze the hacked databases for data that could contain a higher concentration of personal or financial information. Signs of these ransom attacks have been piling up over the course of 2020, with the number of complaints from server owners finding the ransom note inside their databases popping up on Reddit, the MySQL forums, tech support forums, Medium posts, and private blogs.

Read more of this story at Slashdot.

リーディングビュー