Agencies Must Take an Authentic Approach to Synthetic Data – MeriTalk

The  Accenture Federal Technology Vision 2022   highlights four technology trends that will have significant impact on how government operates in the near future. Today we look at  Trend #3, The Unreal: Making Synthetic Genuine .

Artificial intelligence (AI) is one of the most strategic technologies impacting all parts of government. From protecting our nation to serving its citizens, AI has proven itself mission critical. However , at the core, there is a growing paradox.

Synthetic data is increasingly being used to fill some AI methods’ need for large amounts of data. Gartner predicts that 60 percent of the data used for AI development and analytics projects will be synthetically generated by 2024. Synthetic information is information that, while manufactured, mimics features of real-world data.

At the same time, the growing use of synthetic data presents challenges. Bad actors are using these same technologies in order to create deepfakes and disinformation that undermines trust. For example , social media was weaponized using a deepfake in the early days from the Russian-Ukrainian War in an unsuccessful effort to sow confusion.

In our latest research, all of us found that will by judging data based on its authenticity – instead of its “realness” – we can begin to put in place safeguards needed to make use of synthetic information confidently.

Where Synthetic Data is Making a Difference Today

Government already is leveraging synthetic data to create meaningful outcomes.

During the height of the particular COVID crisis, for example , researchers needed extensive data about how the virus affected the human body and public health. Much of this data was being collected in patients’ electronic medical records, but experts typically face barriers in obtaining such data due to privacy concerns.

Using synthetic information, a wide array associated with COVID research was artificially generated and informed by – though not directly derived from – actual patient data. For instance , the National Institutes of Health (NIH) in 2021 partnered with the California-based startup Syntegra to generate and validate a nonidentifiable replica of the NIH’s considerable database associated with COVID-19 patient records , called the National COVID Cohort Collaborative (N3C) Data Enclave. Nowadays, N3C consists of more than 5 million COVID-positive individuals. The synthetic data set precisely duplicates the original data set’s statistical properties but with no links in order to the original information so it can be shared plus used by scientists around the world trying to develop insights, treatments, and vaccines.

The U. S. Census Bureau has leveraged artificial data as well. Its Survey of Income and Program Participation (SIPP) gives insight into national income distributions, the particular impacts of government assistance programs, and the complex relationships between tax policy and economic activity. But that data is highly detailed and could be utilized to identify specific individuals.

To make the data safe for general public use, whilst also retaining its research value, the particular Census Bureau created synthetic data from the SIPP data sets.

A Framework for Synthetic Information

To create a framework with regard to when using synthetic data is appropriate, agencies can start simply by considering potential uses cases, to see which ones align with objective.

For example, a healthcare organization or financial institution might be particularly interested within leveraging artificial data to protect Personally Identifiable Information.

Artificial data could also be used in order to understand rare, or “edge, ” events, like training a self-driving car to respond in order to infrequent occurrences like when debris falls on a highway at night. There won’t become much real-world data on something that will happens so infrequently, but synthetic data could fill up in the gaps.

Synthetic data likewise could end up being of interest to agencies looking to control for bias in their models. It can be used to improve fairness plus remove prejudice in credit and loan decisions, for instance , by generating training information that removes protected variables such as gender and race.

In addition, many agencies can benefit from the particular reduced cost associated with synthetic data. Rather than having to collect and/or mine vast troves of real-life information, they could turn in order to machine-generated information to build models quickly plus more cost-effectively.

In the near future, artificial cleverness “factories” could even become used to generate synthetic data. Generative AI refers to the particular use of AI to create man made data rapidly, at great scale, and accurately. It can enable computers to learn patterns through a large amount of real-life data – including text, visual information, and multimedia – plus to produce new content that mimics those underlying patterns.

One common approach to generative AI is making use of generative adversarial networks (GANS) – modeling architectures that pit two neural networks – the generator and a discriminator – against each other. This creates a feedback loop in which the generator constantly learns to produce more realistic data, while the discriminator gets better in differentiating fake data from the real data. However, this same technology is also being used in order to enable deepfakes.

Principles of Authenticity

As this synthetic realness progresses, conversations about AI that line up good plus bad with real and fake will shift to focus instead upon authenticity. Instead of asking “Is this real? ” we’ll begin to evaluate “Is this authentic? ” based on four primary tenets:

  • Provenance (what will be its history? )
  • Policy (what are its restrictions? )
  • People (who is usually responsible? )
  • Purpose (what is it trying to do? )

Many already understand the urgency here: 98% of Circumstance. S. federal government executives say their own organizations are usually committed in order to authenticating the origin of their data as it pertains to AI.

With these principles, synthetic realness can push AI to new heights. By solving regarding issues associated with data bias and information privacy, it can bring next-level improvements to AI models in terms of both fairness plus innovation. And synthetic content will enable customers and employees alike to possess more seamless experiences along with AI, not only saving valuable time and energy but also enabling novel interactions.

As AI progresses plus models improve, enterprises are building the particular unreal world. But whether we use synthetic data in ways in order to improve the world or fall victim to malicious actors is yet to end up being determined. Most likely, we will land somewhere in the expansive in-between, and that’s why elevating authenticity within your organization is so important. Authenticity is the compass as well as the framework that will guide your agency to use AI in a genuine way – across mission sectors, make use of cases, and time – by considering provenance, policy, people, plus purpose.

Learn more about artificial data and how federal agencies can use it successfully and authentically in Trend 3 from the Accenture Federal government Technology Eyesight 2022: The Unreal .


  • Nilanjan Sengupta: Managing Director – Applied Intelligence Chief Technology Officer
  • Marc Bosch Ruiz, Ph. D.: Managing Director – Computer Vision Lead
  • Viveca Pavon-Harr, Ph. Deb.: Applied Cleverness Discovery Lab Director
  • David Lindenbaum: Movie director of Machine Learning
  • Shauna Revay, Ph. D.: Device Learning Center of Excellence Lead
  • Jennifer Sample, Ph level. D.: Applied Intelligence Growth and Strategy Lead