International Research Data Analysis Excellence Awards

11 July, 2025

Crypto Security: Lessons From Breaches and Regulation !

The crypto world is buzzing with activity, and with it comes some serious challenges regarding security and pesky regulations. With high-profile breaches like Bitcoin Depot's latest incident making headlines, trust in our beloved digital currency is at stake. And just when you thought things couldn't get more intense, New Zealand decides to ban crypto ATMs across the board. This move aims to tackle money laundering, but it also looks like it will complicate things for businesses that want to use crypto. Strap in, as we try to navigate the tricky waters of crypto security and regulation.

Cybersecurity Breaches: A Troubling Trend

Bitcoin Depot's data breach was a smash hit. The breach happened back in June 2024 but was only disclosed recently, all because of an ongoing federal investigation. Oh, the irony, right? It compromised sensitive customer data, such as names, driver’s license numbers, and contact information. As if that wasn't enough, 2025 has already seen its fair share of major hacks. Who could forget the $40 million GMX V1 exploit or the $1.4 billion Bybit hack? These incidents show that crypto isn't as safe as we thought it was, and they raise the need for strong security measures.

According to a Bitcoin Depot spokesperson, they couldn't notify affected customers sooner because the feds asked them not to. But now, they're enhancing their security infrastructure with improved monitoring and increased internal data protection awareness. Seems like a solid start but begs the question - how vulnerable can we get?

Regulating Crypto: A Double-Edged Sword

The regulatory environment is shifting. The U.S. has moved away from requiring banks to ask for permission before engaging in crypto activities. Now, banks can go ahead and do their thing as long as they comply with safety and legal standards. This is a win for innovation but also raises concerns on whether regulations will go too far.

But it's not just the U.S. – New Zealand's strict ban on crypto ATMs is a whole other can of worms. With over 220 kiosks gone, countries are tightening their grip on the industry. This can be good for crypto's image, but what does it mean for businesses looking to capitalize on these solutions?

Strategic Security Measures: Keeping It Real

For those of us navigating these waters, now is the time to take strategic measures. We all need to plow some cash into cybersecurity infrastructure to keep our precious data safe. A formal verification process and quality smart contract audits can reduce the likelihood of being hacked. And let's not forget about leveraging blockchain technology and AI for better transaction security and compliance processes.

In this age of collaboration, partnerships with traditional institutions and regulators can provide startups the edge they need. And keep your compliance game strong. With crypto in the picture, you can bet your investment that maintaining rigorous compliance—especially AML—is crucial. Always be monitoring.

Summary

We are in an interesting chapter of the crypto story. With incidents like Bitcoin Depot's breach reminding us of the challenges ahead, the regulatory framework is evolving to support innovation but also ensure oversight. It’s a tightrope walk. For businesses, bolstering security and collaborating with regulators is vital for building trust. The journey isn't over until the balance between security and innovation is struck – and that is a precarious line.

#ResearchDataExcellence #DataAnalysisAwards #InternationalDataAwards #ResearchDataAwards #DataExcellence #ResearchData #DataAnalysis #DataAwards #GlobalDataExcellence #DataInnovationAwards #DataResearch #ExcellenceInData #DataAwardWinners#DataAnalysisExcellence #ResearchDataInsights #GlobalResearchAwards #DataExcellenceAwards #ExcellenceInResearchData #ResearchDataLeadership #DataResearchExcellence #AwardWinningData #InternationalResearchAwards #DataAnalysisInnovation #ResearchDataAchievement #ExcellenceInDataAnalysis #GlobalDataInsights #ResearchDataSuccess #DataAwards2024

Website: International Research Data Analysis Excellence Awards

Visit Our Website : researchdataanalysis.com
Nomination Link : researchdataanalysis.com/award-nomination
Registration Link : researchdataanalysis.com/award-registration
member link : researchdataanalysis.com/conference-abstract-submission
Awards-Winners : researchdataanalysis.com/awards-winners
Contact us : rdat@researchdataanalysis.com

Get Connected Here:
==================
Facebook : www.facebook.com/profile.php?id=61550609841317
Twitter : twitter.com/Dataanalys57236
Pinterest : in.pinterest.com/dataanalysisconference
Blog : dataanalysisconference.blogspot.com
Instagram : www.instagram.com/eleen_marissa

10 July, 2025

Neo4j goes serverless, bringing graph analytics to any data source !

Neo4j Inc. today announced a new serverless offering that dramatically simplifies the deployment of its graph database offering, making it easier to use with artificial intelligence applications.

Most critically, it works with any data source, and there’s no need to fiddle around with messy extract, load and transfer operations, the company said. The new offering is meant to bring the powerful capabilities of graph analytics to any user, with one of the major implications being that they’ll be able to make smarter AI applications.

Graph databases such as Neo4j are very different from traditional Structured Query Language-based data platforms such as Oracle and Microsoft SQL. Instead of storing data in tables consisting of rows and columns, it utilizes a graph structure made up of nodes, edges and properties, to represent and store information. It’s a more versatile format that makes data easier to retrieve within a single operation in most cases.

Perhaps the biggest advantage of graph databases is that they enable what’s known as “vector search,” where unstructured data such as images and handwritten notes can be represented as vector embeddings. These capture both the explicit and implicit relationships between data and any patterns that can be drawn from it. These properties make them ideal for large language models, enabling them to retrieve a much richer variety of information, enhancing their ability to reason and infer.

As Neo4j explains, graph analytics can improve AI decision-making by “uncovering hidden patterns and relationships in complex data, delivering more accurate insights with richer context than traditional analytics.”

The analyst firm Gartner Inc. said in a 2024 report that one of the main challenges in AI development is that enterprise data is “sparse and replete with gaps,” which makes it difficult to find and link important information.

“Data and analytics leaders should use graph analytics as a preferred technology in specific use cases to fill data gaps and blend data assets even when they have diverse data quality,” the report recommended.

That’s all well and good advice, but the challenge with graph analytics has always been implementing it, as the Neo4j database and similar systems are notoriously difficult to set up and use. But that’s no longer the case with the launch of today’s new serverless offering, known as Neo4J Aura Graph Analytics.

Available starting today, Neo4j Aura Graph Analytics is said to work with any kind of data source, including Oracle, Microsoft SQL, Databricks, Google BigQuery, Snowflake and Microsoft OneLake. It’s said to make graph analytics accessible to any company by removing the biggest barriers to adoption — namely, the need for setting up ETL pipelines, the ability to write custom queries in the Cypher language, and specialized expertise in graph analytics.

So instead of spending weeks struggling to get up and running, companies can now deploy Neo4j Aura Graph Analytics on the cloud infrastructure of their choice and start collecting, organizing, analyzing and visualizing unstructured data in a matter of minutes, the company said.

Neo4j Aura Graph Analytics comes with more than 65 ready-to-use graph algorithms and is optimized for high-performance AI applications, with support for parallel workflows ensuring any app can scale in a seamless way. Under its pay-as-you-go pricing model, customers will be billed based on the processing power and storage consumed.

“By removing hurdles like complex queries, ETL and costly infrastructure setup, organizations can tap into the full power of graph analytics without needing to be graph experts,” said Neo4j Chief Product Officer Sudhir Hasbe. “The result will be better decisions on any enterprise data source, built on a deeper understanding of how everything connects.”

The company makes some big claims regarding the kind of performance boost its new service will provide to the average AI application. Among other things, it says it can boost the accuracy of LLMs by up to 80% by helping them to uncover deeper patterns and relationships in complex connected data. Moreover, those models will be able to adapt in real time as the underlying data itself changes.

By using graph analytics, AI models can derive insights from their underlying datasets twice as fast as before, thanks to Neo4J’s use of parallelized in-memory processing. It also reduces coding tasks by up to 75%, as there’s no need for any ETL. Finally, because the offering is serverless, there’s no need to worry about the administrative overheads, which can translate to a reduced total cost of ownership as it eliminates the need to provision and maintain servers.

International Data Corp. analyst Devin Pratt said the launch of Neo4j’s serverless platform is an “exciting move” by the company that will significantly boost the accessibility of graph analytics.

“It will allow enterprises to scale analytics across any data source or cloud platform, transforming their data into a wealth of actionable knowledge, providing deeper insights for improved organizational decision-making,” he said.

Neo4j said its serverless offering will soon be joined in general availability by its native integration with Snowflake, which was first announced last year. With that integration, Snowflake users will be able to employ more than 65 graph algorithms directly, without needing to move information from that cloud data warehouse environment first.

09 July, 2025

Interview: Pure Storage on the AI data challenge beyond hardware !

Interview: Pure Storage on the AI data challenge beyond hardware

We talk to Pure Storage’s vice-president of AI infrastructure about data quality for artificial intelligence and the need for data engineering to ensure the integrity, completeness and appropriateness of data for AI training

ByAntony Adshead, Storage Editor
Published: 24 Jun 2025 14:23

To successfully tackle artificial intelligence (AI) workloads is not just about throwing compute and storage resources at it. Sure, you need enough processing power and the storage to supply it with data at the correct rate, but before any such operations can achieve success, it’s critical to ensure the quality of data used in AI training.

That’s the core message from Par Botes, vice-president of AI infrastructure at Pure Storage, whom we caught up with last week at the company’s Accelerate event in Las Vegas.

Botes emphasised the need for enterprises tackling AI to capture, organise, prepare and align data. That’s because data can often be incomplete or inappropriate to the questions AI tries to answer.

We talked to Botes about data engineering, data management, the use of data lakehouses and making sure datasets fit the need being addressed by AI.
What does Pure Storage view as the key upcoming or emerging storage challenges in AI?

I think it’s hard to create systems that solve problems using AI without having a really good way of organising data, capturing data, then preparing it and aligning it to the processing elements, the GPUs [graphics processing units], that make them access data fast enough.
What in particular makes those challenges difficult?

I’ll start with the most obvious one: how do I get GPUs to consume the data? The GPUs are incredibly powerful, and they drive a tremendous amount of bandwidth.

It’s hard to feed GPUs with data at the pace we consume it. That is starting to increasingly become solved, particularly at the high end. But for a regular enterprise type of company, these are new types of systems and new types of skills they have to implement.

“As your data improves, as your insights change, your data has to change with it. Thus, your model has to evolve with it. This becomes a continuous process”

Par Botes, Pure Storage

It’s not a hard problem on the science side, it’s a hard problem in operations, because these are not muscles that have existed in enterprise for a long time.

The next part of that problem is: How do I prepare my data? How do I gather it? How do I know where I have the correct data? How do I assess it? How do I track it? How do I apply lineage to it to see that this model is trained with this set of data? How do I know that it has a complete dataset? That’s a very hard problem.
Is that a problem that varies between customer and workload? Because I can imagine one might know, just by the expertise that resides within an organisation, that you have all the data you need. Or, in another situation, it might be unclear whether you do or not.

It’s pretty hard to know, without reasoning about [whether] you have all the data you need. I’ll give you an example.

I spent many years building a self-driving car – perception networks, driving systems – but frequently, we found the car didn’t perform as well in some conditions.

The road turned left and slightly uphill, with other cars around it. We then realised we didn’t have enough training data. So, having a principled way of reasoning about the data, reasoning about completeness, reasoning about the range [of data], and to have all the data for that, and analysing it mathematically, is not a discipline that’s super common outside of high-end training companies.
Having looked at the issues that tend to arise, the difficulties that can arise with AI workloads, how would you say that customers can begin to mitigate those?

The general approach I recommend is to think about your data engineering processes. So, we partner with data engineering companies that do things like lakehouses.

Think about: How do I apply a lakehouse to my incoming data? How do I use my lakehouse to clean it and prepare it? In some cases, maybe even transform it and make it ready for the training system. I will start by thinking about the data engineering discipline in my company and how do I prepare that to be ready for AI?
What does data engineering consist of if you drill down into it?

Data engineering generally consists of how do I get access to other datasets that can exist in corporate databases, in structured systems, or in other systems we have, and how do I get access to that? How do I ingest that into an intermediate form that I lakehouse? And how do I then transform that and select data from those sets that might be across different repositories to create a dataset that represents the data I want to train against.

That’s the discipline we typically call data engineering. And it’s becoming a very distinct skill and a very distinct discipline.
When it comes to storage, how do customers support data lakehouses with storage? In what forms?

Today, what’s common is you have the cloud companies, which provide the data lakehouses, and for the on-prem, we have the system houses.

We work with several of them. We provide complete solutions that include data lakehouse vendors. And we partner with those.

And then, of course, the underlying storage that makes it perform fast and work well. And so the key components, I’d say, are the popular data lakehouse databases and the infrastructure beneath that, and then connect those over into other storage systems for the training side.
Looking at data engineering, is it really a one-time, one-off challenge, or is it something that’s ongoing as organisations tackle AI?

Data engineering is kind of hard to disentangle from storage. They’re not exactly the same thing, but they’re closely related.

Once you start using AI, you want to record all new data. You want to transform it and make it part of your AI system, whether you’re using that with RAG [retrieval augmented generation] or fine-tuning, or if you are advanced, you build your own model.

You’re constantly going to increase it and make it better. As your data improves, as your insights change, your data has to change with it. Thus, your model has to evolve with it.

This becomes a continuous process.

You have to think about a few things, such as lineage. What’s the history of this data? What originated from where? What’s consumed where? You want to think about, when people use your model or when you internally use your model. What’s the question being asked? What’s the question that comes up with it?

And you want to store and use that for quality assurance, also for further training in the future. This becomes what we call an AI flywheel of data. The data is constantly ingested, consumed, computed, ingested, consumed, computed.

And that circle doesn’t stop.
Is there anything else you think customers ought to be looking at?

You should also think, what is this data really, what does the data represent? If this data represents something you observe or something you do, if you have gaps in the data, the AI will fill in those gaps. When it fills in those gaps wrongly, we call it hallucination.

The trick is to know your data well enough that you know where there are gaps. And if you have gaps, can you find ways to fill out those gaps? When you get to that level of sophistication, you’re starting to have a really impressive system to use.

Even if you start with the very basics of using a cloud service, start by recording what you send and what you’re getting back. Because that forms the basis for your data management discipline. And when I use the term data engineering, in between data engineering and storage is this discipline called data management.

This is the organisation of data, which you want to start as early as you can. Because by the time you get ready to do something beyond just using the service, you now have the first body of data for your data engineers and for your storage.

08 July, 2025

Understanding how data fabric enhances data security and governance !

Data fabric is a powerful architectural approach for integrating and managing data across diverse sources and platforms.

As enterprises navigate increasingly complex data environments, the need for seamless data access coupled with robust security has never been more critical. Data fabric has the potential to enhance both data security as well as governance, but it’s not always a straight path for organizations to achieve the optimal outcome.
Understanding data fabric architecture

To get the most of data fabric, it’s important to first actually understand what it is and what it can actually provide to an organization. Unfortunately, defining data fabric can also be a challenge.

[ Related: Data mesh vs. data fabric vs. data virtualization: There’s a difference ]

“Thanks to multiple, often vendor-centric, definitions, there remains confusion in the industry about the precise nature of data fabric,” Matt Aslett, director, with global technology research and advisory firm ISG, told InfoWorld.

ISG defines data fabric as a technology-driven approach to automating data management and data governance in a distributed architecture that includes on-premises, cloud and hybrid environments. Aslett added that a common misconception is that enterprises must discard existing data platforms and management products to embrace data fabric.
Quick answers on data fabric, security and governance
What is data fabric? A data fabric is an architectural approach that integrates and manages data across diverse, distributed sources that include on-premises, public cloud, hybrid cloud), automating data management and governance.
How does data fabric enhance security? It centralizes security policies, automates data discovery and classification, enables granular access controls (RBAC/ABAC), facilitates encryption and masking, and supports comprehensive governance frameworks across the entire data estate.
What are the main security challenges with data fabric? Key challenges include persistent data silos, complex compliance requirements, talent shortages, unmanaged shadow IT data, overall data fragmentation, and the inherent complexity of integrating diverse IT environments.
Why is data validation important for data fabric? Data validation ensures the quality and consistency of data, building trust for business intelligence and AI initiatives. It helps reduce errors by checking data closer to its source.
What are real-world benefits of data fabric? Organizations using data fabric have seen significant improvements in areas like faster regulatory reporting, improved patient data accuracy, reduced supply chain errors , and quicker customer insights.

Key elements of a data fabric architecture include the following:

Metadata-driven data identification and classification
Knowledge graphs
Automated, ML-driven, data management.

“These capabilities provide the connective tissue that traverses disparate data silos and can complement the use of existing bespoke data tools by facilitating an abstracted view of data from across the business to support business intelligence and artificial intelligence initiatives that rely on the data unification,” Aslett said.
Data security challenges

Implementing a data fabric architecture presents several security challenges that enterprisemust address to ensure the integrity, confidentiality and availability of data assets.

Among the security challenges are these six::

Data silos and fragmentation
Despite the promise of integration, many organizations struggle with persistent data silos in their initial data fabric implementations.

“The biggest challenge is fragmentation; most enterprises operate across multiple cloud environments, each with its own security model, making unified governance incredibly complex,” Dipankar Sengupta, CEO of Digital Engineering Services at Sutherland Global told InfoWorld.

Compliance and regulatory complexity
Adhering to industry standards and regulations such as General Data Protection Regulation (GDPR), Health Insurance Portability and Accountability Act (HIPAA) and California Consumer Privacy Act of 2018 (CCPA) is a significant challenge in data fabric implementations.

Different data types and sources may fall under different regulatory frameworks. As such, implementing consistent compliance measures across the entire fabric requires careful planning and execution.

Talent
According to Sengupta, the other blind spot is talent, with 82% of firms struggling to hire skilled data professionals.

Shadow IT
Shadow IT is also a persistent threat and challenge. According to Sengupta, some enterprises discover nearly 40% of their data exists outside governed environments. Proactively discovering and onboarding those data sources has become non-negotiable.

Data fragmentation
Another major obstacle to effective data security and governance is fragmentation.

“There are too many data silos to secure and govern, and too many tools required to get the job done,” Edward Calvesbert, Vice President, Product Management, for IBM watsonx.data, told InfoWorld.

IT complexity
According to Anil Inamdar, Global Head of Data Services at NetApp Instaclustr, the potential for data fabric security/governance challenges really begins with the complexity of the organization’s IT environment.

“If security is already inconsistent across hybrid or multi-cloud setups, teams will subsequently struggle to get their data fabric architecture as secure as it needs to be,” Inamdar said.
How data fabric enhances security

While there are some challenges, the reason why so many organizations choose to deploy data fabric is because it does significantly enhance data security and governance.

Data fabric architectures offer significant advantages for enhancing security when properly implemented across a number of different domains.

Centralized Security Policies
Organizations are using data fabric to fix the challenge of fragmentation. IBM’s Calvesbert noted that with data fabric organizations can create a centralized set of policies and rules capable of reaching all data within the organization. Policies and rules can be linked to any and all data assets through metadata like classifications, business terms, user groups, and roles – and then enforced automatically whenever data is accessed or moved.

Regulatory compliance
A data fabric deepens organizations’ understanding and control of their data and consumption patterns. “With this deeper understanding, organizations can easily detect sensitive data and workloads in potential violation of GDPR, CCPA, HIPAA and similar regulations,” Calvesbert commented. “With deeper control, organizations can then apply the necessary data governance and security measures in near real time to remain compliant.”

Metadata management
Automated metadata management and data cataloging are integral components and benefits of data fabric.

“It’s a big deal, because when metadata is automatically tagged and tracked across both cloud and on-prem environments, you are getting the level of visibility that is going to make security folks and compliance officers happy,” NetApp’s Inamdar commented. “Automating this process within a data fabric creates that digital breadcrumb trail that follows data wherever it goes.”

Automated data discovery and classification
Automated tools within data fabric discover and classify data, reducing manual effort and enhancing governance. This involves identifying sensitive data across environments, categorizing it and applying appropriate security measures.

Data access control and authorization
Data fabric supports granular access control through Role-Based Access Control (RBAC) and Attribute-Based Access Control (ABAC), ensuring only authorized users can access sensitive data. This is vital for minimizing unauthorized access risks, with mechanisms like dynamic masking complementing these controls.

Data encryption and masking
Data fabric facilitates data encryption for protection at rest and in transit and data masking to obscure sensitive information. Encryption transforms data so it’s not transparent and available for anyone to look at, while masking replaces data with realistic but fake values, ensuring privacy.

Data governance frameworks
Data fabric supports implementing and enforcing data governance frameworks, providing tools for policy definition, monitoring, and enforcement. This ensures data is managed according to organizational policies, enhancing control and accountability.

IDG
Why data validation is critical for data fabric success

Data security and governance inside a data fabric shouldn’t just be about controlling access to data, it should also come with some form of data validation.

The cliched saying “garbage-in, garbage-out” is all too true when it comes to data. After all, what’s the point of ensuring security and governance on data that isn’t valid in the first place?

“Validating the quality and consistency of data is essential to establishing trust and encouraging data usage for both BI and AI projects,” Alslett said.

So how can and should enterprises use data validation within a data fabric? Sutherland Global’s Sengupta commented that the most effective validation strategies he has seen start with pushing checks as close to the source as possible. He noted that validating data upfront —rather than downstream — has helped reduce error propagation by over 50% in large-scale implementations. This distributed approach improves accuracy and lightens the processing load later in the pipeline.

Machine learning is playing a growing role as well. Statistical baselines and anomaly detection models can flag issues that rigid rule-based systems often miss. In one case cited by Sengupta this approach helped increase trust in critical data assets by nearly 80%.

“What’s often overlooked, though, is the value of context-aware validation—cross-domain consistency checks can expose subtle misalignments that may look fine in isolation,” Sengupta said. “For real-time use cases, stream validation ensures time-sensitive data is assessed in-flight, with accuracy rates approaching 99.8%.”
Benefits and use cases: data fabric in the real world

The real world impact of data fabric is impressive. While it can often just be used as a marketing term by vendors, there is tangible return on investment opportunities too.

“In our work with large enterprises, the most tangible impact of data fabric initiatives comes from their ability to speed up access to trustworthy data, safely and at scale,” Sengupta said.

For instance, a global financial institution reduced its regulatory reporting time by 78% and accelerated access provisioning by 60% after re-architecting its data governance model around unified security policies.

In healthcare, a provider network improved patient data accuracy from 87% to 99%, while also cutting integration time for new data sources by 45%, a critical gain when onboarding new partners or navigating compliance audits.

A manufacturing client saw a 52% drop in supply chain data errors and significantly improved the processing of IoT sensor data, boosting integration speed by 68%.

In retail, better orchestration of policies and quality controls translated into 85% faster delivery of customer insights, a 3x increase in analyst productivity, and a 30% reduction in storage costs through better data hygiene.

“What these outcomes show is that when data is treated not just as an infrastructure component but as an enabler of business velocity, the returns are both measurable and strategic,” Sengupta said.

05 July, 2025

DeviQA Launches OwlityAI – the First Fully Autonomous AI-Driven QA Platform !

DeviQA redefines QA with OwlityAI: a QA platform that scan your app, write tests, runs them, maintains, and reports bugs – no prompts or manual steps needed.

DeviQA, a global leader in software testing services, proudly announces the launch of OwlityAI – the world’s first truly autonomous AI-driven QA solution, built to remove the need for manual testing and redefine how software quality is delivered.

After 15+ years of working with engineering teams, DeviQA saw a clear need: testing needed to be faster, smarter, and radically more efficient. OwlityAI was built to solve that – not just with automation, but with full autonomy.

“We didn’t just want to improve QA, we wanted to reinvent it,” said Oleg Sadikov, co-founder of OwlityAI and DeviQA.

Read More on AiThority: The Hidden Risk Hiding in Plain Sight: Hallucinations and the Illusion of Reliable AI

What makes OwlityAI different?

OwlityAI is not just another testing tool. It’s an AI-driven QA platform that understands your product, generates test cases, runs tests, and reports issues – with no human intervention.

Getting started with OwlityAI is simple. Just input a URL to your web application, and OwlityAI instantly:

04 July, 2025

Lessons Learned After 6.5 Years Of Machine Learning !

by Anthony Tori

When I started learning machine learning more than six years ago, the field was in the midst of really getting traction. In 2018-ish, when I took my first university courses on classic machine learning, behind the scenes, key methods were already being developed that would lead to AI’s boom in the early 2020s. The GPT models were being published, and other companies followed suit, pushing the limits, both in performance and parameter sizes, with their models. For me, it was a great time to start learning machine learning, because the field was moving so fast that there was always something new.

From time to time, usually every 6 to 12 months, I look back on the years, mentally fast-forwarding from university lectures to doing commercial AI research. In looking back, I often find new principles that have been accompanying me during learning ML. In this review, I found that working deeply on one narrow topic has been a key principle for my progress over the last years. Beyond deep work, I’ve identified three other principles. They are not necessarily technical insights, but rather patterns of mindset and methods.
The Importance of Deep Work

Winston Churchill is famous not only for his oratory but also for his incredible quickness of mind. There’s a popular story about a verbal dispute between him and Lady Astor, the first woman in British Parliament. Trying to end an argument with him, she quipped:

If I were your wife, I’d put poison in your tea.

Churchill, with his trademark sharpness, replied:

And if I were your husband, I’d drink it.

Giving witty repartee like that is admired because it’s a rare skill, and not everyone is born with such reflexive brilliance. Luckily, in our domain, doing ML research and engineering, quick wit is not the superpower that gets you far. What does is the ability to focus deeply.

Machine learning work, especially the research side, is not fast-paced in the traditional sense. It requires long stretches of uninterrupted, intense thought. Coding ML algorithms, debugging obscure data issues, crafting a hypothesis — it all demands deep work.

By “deep work,” I mean both:The skill to concentrate deeply for extended periods
The environment that allows and encourages such focus

Over the past two to three years, I’ve come to see deep work as essential to making meaningful progress. The hours I’ve spent in focused immersion — several times a week — have been far more productive than much more fragmented blocks of distracted productivity ever could. And, thankfully, working deeply can be learned, and your environment set up to support it.

For me, the most fulfilling periods are always those leading up to paper submission deadlines. These are times where you can laser focus: the world narrows down to your project, and you’re in flow. Richard Feynman said it well:

To do real good physics, you need absolute solid lengths of time… It needs a lot of concentration.

Replace “physics” with “machine learning,” and the point still holds.
You Should (Mostly) Ignore Trends

Have you heard of large language models? Of course, you have — names like LLaMA, Gemini, Claude, or Bard fill the tech news cycle. They’re the cool kids of generative AI, or “GenAI,” as it’s now stylishly called.

But here’s the catch: when you’re just starting out, chasing trends can make gaining momentum hard.

I once worked with a researcher, and we both were just starting in “doing ML”. We’ll call my former colleague John. For his research, he dove head-first into the then-hot new field of retrieval-augmented generation (RAG), hoping to improve language model outputs by integrating external document search. He also wanted to analyze emergent capabilities of LLMs — things these models can do even though they weren’t explicitly trained for — and distill those into smaller models.

The problem for John? The models he based his work on evolved too fast. Just getting a new state-of-the-art model running took weeks. By the time he did, a newer, better model was already published. That pace of change, combined with unclear evaluation criteria for his niche, made it nearly unmanageable for him to keep his research going. Especially for someone still new to research, like John and me back then.

This isn’t a criticism of John (I likely would have failed too). Instead, I am telling this story to make you consider: does your progress rely on continually surfing the foremost wave of the latest trend?
Doing Boring Data Analysis (Over and Over)

Every time I get to train a model, I mentally breathe a sigh of relief.

Why? Because it means I’m done with the hidden hard part: data analysis.

Here’s the usual sequence:You have a project.
You acquire some (real-world) dataset.
You want to train ML models.
But first…you need to prepare the data.

A lot can go wrong in that last step.

Let me illustrate this with a mistake I made while working with ERA5 weather data — a massive, gridded dataset from the European Centre for Medium-Range Weather Forecasts. I wanted to predict NDVI (Normalized Difference Vegetation Index), which indicates vegetation density, using historical weather patterns from the ERA5 data.

For my project, I had to merge the ERA5 weather data with NDVI satellite data I got from the NOAA, the US weather agency. I translated the NDVI data to ERA5’s resolution, added it as another layer, and, getting no shape mismatch, happily proceeded to train a Vision Transformer.

A few days later, I visualized the model predictions and… surprise! The model thought Earth was upside down. Literally — my input data showed a normally oriented world, but my vegetation data was flipped at the Equator.

What went wrong? I had overlooked how the resolution translation flipped the orientation of the NDVI data.

Why did I miss that? Simple: I did not want to do the data engineering, but directly skip ahead to machine learning. But the reality is this: in real-world ML work, getting the data right is the work.

Yes, academic research often lets you work with curated datasets like ImageNet, CIFAR, or SQuAD. But for real projects? You’ll need to:Clean, align, normalize, and validate
Debug weird edge cases
Visually inspect intermediate data

And then repeat this until it’s truly ready

I learned this the hard way by skipping steps I thought were not necessary for my data. Don’t do the same.
(Machine Learning) Research Is a Specific Kind of Trial and Error

From the outside, scientific progress always seems to be elegantly smooth:

Problem → Hypothesis → Experiment → Solution

But in practice, it’s much messier. You’ll make mistakes — some small, some facepalm-worthy. (e.g., Earth flipped upside down.) That’s okay. What matters is how you treat those mistakes.

Bad mistakes just happen. But insightful mistakes teach you something.

To help myself learn faster from the perceived failures, I now maintain a simple lab notebook. Before running an experiment, I write down:My hypothesis
What I expect to happen
Why I expect it

Then, when the experimental results come back (often as a “nope, did not work”), I can reflect on why it might have failed and what that says about my assumptions.

This transforms errors into feedback, and feedback into learning. As the saying goes:

An expert is someone who has made all the mistakes that can be made in a very narrow field.

That’s research.
Final Thoughts

After 6.5 years, I’ve come to realize that doing machine learning well has little to do with flashy trends or just tuning (large language) models. In hindsight, I think it’s more about:Creating time and space for deep work
Choosing depth over hype
Taking data analysis seriously
Embracing the messiness of trial and error

If you’re just starting out — or even are a few years in — these lessons are worth internalizing. They won’t show up in conference keynotes, but they’ll show up through your actual progress.The Feynman quote is from the book Deep Work, by Cal Newport
For Churchill’s quote, several variations exist, some with coffee, some with tea, being poisoned

03 July, 2025

How to Learn AI for Data Analytics in 2025 !

Data analytics has changed. It is no longer sufficient to know tools like Python, SQL, and Excel to be a data analyst.

Data skills for decision making.

As a data professional at a tech company, I am experiencing firsthand the integration of AI into every employee’s workflow. There is an ocean of AI tools that can now access and analyze your entire database and help you build data analytics projects, machine learning models, and web applications in minutes.

NVIDIA DGX Spark

If you are an aspiring data professional and aren’t using these AI tools, you are losing out. And soon, you will be surpassed by other data analysts; people who are using AI to optimize their workflows.

In this article, I will walk you through AI tools that will help you stay ahead of the competition and 10X your data analytics workflows.

With these tools, you can:Build and deploy creative portfolio projects to get hired as a data analyst
Use plain English to create end-to-end data analytics applications
Speed up your data workflows and become a more efficient data analyst

Additionally, this article will be a step-by-step guide on how to use AI tools to build data analytics applications. We will focus on two AI tools in particular - Cursor and Pandas AI.

For a video version of this article, watch this:

AI Tool 1: Cursor

Cursor is an AI code editor that has access to your entire codebase. You just have to type a prompt into Cursor’s chat interface, and it will access all the files in your directory and edit code for you.

If you are a beginner and can’t write a single line of code, you can even start with an empty code folder and ask Cursor to build something for you. The AI tool will then follow your instructions and create code files according to your requirements.

Here is a guide on how you can use Cursor to build an end-to-end data analytics project without writing a single line of code.

Step 1: Cursor Installation and Setup

Let’s see how we can use Cursor AI for data analytics.

To install Cursor, just go to www.cursor.com, download the version that is compatible with your OS, follow the installation instructions, and you will be set up in seconds.

Here’s what the Cursor interface looks like:

Cursor AI Interface

To follow along to this tutorial, download the train.csv file from the Sentiment Analysis Dataset on Kaggle.

Then create a folder named “Sentiment Analysis Project” and move the downloaded train.csv file into it.

Finally, create an empty file named app.py. Your project folder should now look like this:

Sentiment Analysis Project Folder

This will be our working directory.

Now, open this folder in Cursor by navigating to File -> Open Folder.

The right side of the screen has a chat interface where you can type prompts into Cursor. Notice that there are a few selections here. Let’s select “Agent” in the drop-down.

This tells Cursor to explore your codebase and act as an AI assistant that will refactor and debug your code.

Additionally, you can choose which language model you’d like to use with Cursor (GPT-4o, Gemini-2.5-Pro, etc). I suggest using Claude-4-Sonnet, a model that is well-known for its advanced coding capabilities.

Step 2: Prompting Cursor to Build an Application

Let’s now type this prompt into Cursor, asking it to build an end-to-end sentiment analysis model using the training dataset in our codebase:
Create a sentiment analysis web app that: 1. Uses a pre-trained DistilBERT model to analyze the sentiment of text (positive, negative, or neutral) 2. Has a simple web interface where users can enter text and see results 3. Shows the sentiment result with appropriate colors (green for positive, red for negative) 4. Runs immediately without needing any training Please connect all the files properly so that when I enter text and click analyze, it shows me the sentiment result right away.

After you enter this prompt into Cursor, it will automatically generate code files to build the sentiment analysis application.

Step 3: Accepting Changes and Running Commands

As Cursor creates new files and generates code, you need to click on “Accept” to confirm the changes made by the AI agent.

After Cursor writes out all the code, it might prompt you to run some commands on the terminal. Executing these commands will allow you to install the required dependencies and run the web application.

Just click on “Run,” which allows Cursor to run these commands for us:

Run Command Cursor

Once Cursor has built the application, it will tell you to copy and paste this link into your browser:

Cursor App Link

Doing so will lead you to the sentiment analysis web application, which looks like this:

International Research Data Analysis Excellence Awards

11 July, 2025

Crypto Security: Lessons From Breaches and Regulation !

10 July, 2025

Neo4j goes serverless, bringing graph analytics to any data source !

09 July, 2025

Interview: Pure Storage on the AI data challenge beyond hardware !

08 July, 2025

Understanding how data fabric enhances data security and governance !

05 July, 2025

DeviQA Launches OwlityAI – the First Fully Autonomous AI-Driven QA Platform !

04 July, 2025

Lessons Learned After 6.5 Years Of Machine Learning !

03 July, 2025

How to Learn AI for Data Analytics in 2025 !

Crypto Security: Lessons From Breaches and Regulation !

Search This Blog

Report Abuse