The Secret Of Google Professional-Data-Engineer Dump

we provide Breathing Google Professional-Data-Engineer exam question which are the best for clearing Professional-Data-Engineer test, and to get certified by Google Google Professional Data Engineer Exam. The Professional-Data-Engineer Questions & Answers covers all the knowledge points of the real Professional-Data-Engineer exam. Crack your Google Professional-Data-Engineer Exam with latest dumps, guaranteed!

Online Google Professional-Data-Engineer free dumps demo Below:

NEW QUESTION 1

How would you query specific partitions in a BigQuery table?

  • A. Use the DAY column in the WHERE clause
  • B. Use the EXTRACT(DAY) clause
  • C. Use the PARTITIONTIME pseudo-column in the WHERE clause
  • D. Use DATE BETWEEN in the WHERE clause

Answer: C

Explanation:
Partitioned tables include a pseudo column named _PARTITIONTIME that contains a date-based timestamp for data loaded into the table. To limit a query to particular partitions (such as Jan 1st and 2nd of 2017), use a clause similar to this:
WHERE _PARTITIONTIME BETWEEN TIMESTAMP('2017-01-01') AND TIMESTAMP('2017-01-02')
Reference: https://cloud.google.com/bigquery/docs/partitioned-tables#the_partitiontime_pseudo_column

NEW QUESTION 2

Why do you need to split a machine learning dataset into training data and test data?

  • A. So you can try two different sets of features
  • B. To make sure your model is generalized for more than just the training data
  • C. To allow you to create unit tests in your code
  • D. So you can use one dataset for a wide model and one for a deep model

Answer: B

Explanation:
The flaw with evaluating a predictive model on training data is that it does not inform you on how well the model has generalized to new unseen data. A model that is selected for its accuracy on the training dataset rather than its accuracy on an unseen test dataset is very likely to have lower accuracy on an unseen test dataset. The reason is that the model is not as generalized. It has specialized to the structure in the training dataset. This is called overfitting.
Reference: https://machinelearningmastery.com/a-simple-intuition-for-overfitting/

NEW QUESTION 3

When using Cloud Dataproc clusters, you can access the YARN web interface by configuring a browser to connect through a proxy.

  • A. HTTPS
  • B. VPN
  • C. SOCKS
  • D. HTTP

Answer: C

Explanation:
When using Cloud Dataproc clusters, configure your browser to use the SOCKS proxy. The SOCKS proxy routes data intended for the Cloud Dataproc cluster through an SSH tunnel.
Reference: https://cloud.google.com/dataproc/docs/concepts/cluster-web-interfaces#interfaces

NEW QUESTION 4

Your company produces 20,000 files every hour. Each data file is formatted as a comma separated values (CSV) file that is less than 4 KB. All files must be ingested on Google Cloud Platform before they can be processed. Your company site has a 200 ms latency to Google Cloud, and your Internet connection bandwidth is limited as 50 Mbps. You currently deploy a secure FTP (SFTP) server on a virtual machine in Google Compute Engine as the data ingestion point. A local SFTP client runs on a dedicated machine to transmit the CSV files as is. The goal is to make reports with data from the previous day available to the executives by 10:00 a.m. each day. This design is barely able to keep up with the current volume, even though the bandwidth utilization is rather low.
You are told that due to seasonality, your company expects the number of files to double for the next three months. Which two actions should you take? (choose two.)

  • A. Introduce data compression for each file to increase the rate file of file transfer.
  • B. Contact your internet service provider (ISP) to increase your maximum bandwidth to at least 100 Mbps.
  • C. Redesign the data ingestion process to use gsutil tool to send the CSV files to a storage bucket in parallel.
  • D. Assemble 1,000 files into a tape archive (TAR) fil
  • E. Transmit the TAR files instead, and disassemble the CSV files in the cloud upon receiving them.
  • F. Create an S3-compatible storage endpoint in your network, and use Google Cloud Storage Transfer Service to transfer on-premices data to the designated storage bucket.

Answer: CE

NEW QUESTION 5

Which Java SDK class can you use to run your Dataflow programs locally?

  • A. LocalRunner
  • B. DirectPipelineRunner
  • C. MachineRunner
  • D. LocalPipelineRunner

Answer: B

Explanation:
DirectPipelineRunner allows you to execute operations in the pipeline directly, without any optimization. Useful for small local execution and tests
Reference:
https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/runners/DirectPipelineRun

NEW QUESTION 6

You are building a data pipeline on Google Cloud. You need to prepare data using a casual method for a machine-learning process. You want to support a logistic regression model. You also need to monitor and adjust for null values, which must remain real-valued and cannot be removed. What should you do?

  • A. Use Cloud Dataprep to find null values in sample source dat
  • B. Convert all nulls to ‘none’ using a Cloud Dataproc job.
  • C. Use Cloud Dataprep to find null values in sample source dat
  • D. Convert all nulls to 0 using a Cloud Dataprep job.
  • E. Use Cloud Dataflow to find null values in sample source dat
  • F. Convert all nulls to ‘none’ using a Cloud Dataprep job.
  • G. Use Cloud Dataflow to find null values in sample source dat
  • H. Convert all nulls to using a custom script.

Answer: C

NEW QUESTION 7

Cloud Bigtable is a recommended option for storing very large amounts of _____ ?

  • A. multi-keyed data with very high latency
  • B. multi-keyed data with very low latency
  • C. single-keyed data with very low latency
  • D. single-keyed data with very high latency

Answer: C

Explanation:
Cloud Bigtable is a sparsely populated table that can scale to billions of rows and thousands of columns, allowing you to store terabytes or even petabytes of data. A single value in each row is indexed; this value is known as the row key. Cloud Bigtable is ideal for storing very large amounts of single-keyed data with very low latency. It supports high read and write throughput at low latency, and it is an ideal data source for MapReduce operations.
Reference: https://cloud.google.com/bigtable/docs/overview

NEW QUESTION 8

You are migrating your data warehouse to BigQuery. You have migrated all of your data into tables in a dataset. Multiple users from your organization will be using the data. They should only see certain tables based on their team membership. How should you set user permissions?

  • A. Assign the users/groups data viewer access at the table level for each table
  • B. Create SQL views for each team in the same dataset in which the data resides, and assign the users/groups data viewer access to the SQL views
  • C. Create authorized views for each team in the same dataset in which the data resides, and assign theusers/groups data viewer access to the authorized views
  • D. Create authorized views for each team in datasets created for each tea
  • E. Assign the authorized views data viewer access to the dataset in which the data reside
  • F. Assign the users/groups data viewer access to the datasets in which the authorized views reside

Answer: C

NEW QUESTION 9

A data scientist has created a BigQuery ML model and asks you to create an ML pipeline to serve predictions. You have a REST API application with the requirement to serve predictions for an individual user ID with latency under 100 milliseconds. You use the following query to generate predictions: SELECT predicted_label, user_id FROM ML.PREDICT (MODEL ‘dataset.model’, table user_features). How should you create the ML pipeline?

  • A. Add a WHERE clause to the query, and grant the BigQuery Data Viewer role to the application service account.
  • B. Create an Authorized View with the provided quer
  • C. Share the dataset that contains the view with the application service account.
  • D. Create a Cloud Dataflow pipeline using BigQueryIO to read results from the quer
  • E. Grant the Dataflow Worker role to the application service account.
  • F. Create a Cloud Dataflow pipeline using BigQueryIO to read predictions for all users from the query.Write the results to Cloud Bigtable using BigtableI
  • G. Grant the Bigtable Reader role to the application service account so that the application can read predictions for individual users from Cloud Bigtable.

Answer: D

NEW QUESTION 10

Suppose you have a table that includes a nested column called "city" inside a column called "person", but when you try to submit the following query in BigQuery, it gives you an error.
SELECT person FROM `project1.example.table1` WHERE city = "London"
How would you correct the error?

  • A. Add ", UNNEST(person)" before the WHERE clause.
  • B. Change "person" to "person.city".
  • C. Change "person" to "city.person".
  • D. Add ", UNNEST(city)" before the WHERE clause.

Answer: A

Explanation:
To access the person.city column, you need to "UNNEST(person)" and JOIN it to table1 using a comma. Reference:
https://cloud.google.com/bigquery/docs/reference/standard-sql/migrating-from-legacy-sql#nested_repeated_resu

NEW QUESTION 11

Your company maintains a hybrid deployment with GCP, where analytics are performed on your anonymized customer data. The data are imported to Cloud Storage from your data center through parallel uploads to a data transfer server running on GCP. Management informs you that the daily transfers take too long and have asked you to fix the problem. You want to maximize transfer speeds. Which action should you take?

  • A. Increase the CPU size on your server.
  • B. Increase the size of the Google Persistent Disk on your server.
  • C. Increase your network bandwidth from your datacenter to GCP.
  • D. Increase your network bandwidth from Compute Engine to Cloud Storage.

Answer: C

NEW QUESTION 12

Which of the following statements is NOT true regarding Bigtable access roles?

  • A. Using IAM roles, you cannot give a user access to only one table in a project, rather than all tables in a project.
  • B. To give a user access to only one table in a project, grant the user the Bigtable Editor role for that table.
  • C. You can configure access control only at the project level.
  • D. To give a user access to only one table in a project, you must configure access through your application.

Answer: B

Explanation:
For Cloud Bigtable, you can configure access control at the project level. For example, you can grant the ability to:
Read from, but not write to, any table within the project.
Read from and write to any table within the project, but not manage instances. Read from and write to any table within the project, and manage instances. Reference: https://cloud.google.com/bigtable/docs/access-control

NEW QUESTION 13

Your neural network model is taking days to train. You want to increase the training speed. What can you do?

  • A. Subsample your test dataset.
  • B. Subsample your training dataset.
  • C. Increase the number of input features to your model.
  • D. Increase the number of layers in your neural network.

Answer: D

Explanation:
Reference: https://towardsdatascience.com/how-to-increase-the-accuracy-of-a-neural-network-9f5d1c6f407d

NEW QUESTION 14

You’ve migrated a Hadoop job from an on-prem cluster to dataproc and GCS. Your Spark job is a complicated analytical workload that consists of many shuffing operations and initial data are parquet files (on average
200-400 MB size each). You see some degradation in performance after the migration to Dataproc, so you’d like to optimize for it. You need to keep in mind that your organization is very cost-sensitive, so you’d like to continue using Dataproc on preemptibles (with 2 non-preemptible workers only) for this workload.
What should you do?

  • A. Increase the size of your parquet files to ensure them to be 1 GB minimum.
  • B. Switch to TFRecords formats (app
  • C. 200MB per file) instead of parquet files.
  • D. Switch from HDDs to SSDs, copy initial data from GCS to HDFS, run the Spark job and copy results back to GCS.
  • E. Switch from HDDs to SSDs, override the preemptible VMs configuration to increase the boot disk size.

Answer: C

NEW QUESTION 15

You have a job that you want to cancel. It is a streaming pipeline, and you want to ensure that any data that is in-flight is processed and written to the output. Which of the following commands can you use on the Dataflow monitoring console to stop the pipeline job?

  • A. Cancel
  • B. Drain
  • C. Stop
  • D. Finish

Answer: B

Explanation:
Using the Drain option to stop your job tells the Dataflow service to finish your job in its current state. Your job will immediately stop ingesting new data from input sources, but the Dataflow
service will preserve any existing resources (such as worker instances) to finish processing and writing any buffered data in your pipeline.
Reference: https://cloud.google.com/dataflow/pipelines/stopping-a-pipeline

NEW QUESTION 16

When creating a new Cloud Dataproc cluster with the projects.regions.clusters.create operation, these four values are required: project, region, name, and .

  • A. zone
  • B. node
  • C. label
  • D. type

Answer: A

Explanation:
At a minimum, you must specify four values when creating a new cluster with the projects.regions.clusters.create operation:
The project in which the cluster will be created The region to use
The name of the cluster
The zone in which the cluster will be created
You can specify many more details beyond these minimum requirements. For example, you can
also specify the number of workers, whether preemptible compute should be used, and the network settings.
Reference:
https://cloud.google.com/dataproc/docs/tutorials/python-library-example#create_a_new_cloud_dataproc_cluste

NEW QUESTION 17

You are building a new data pipeline to share data between two different types of applications: jobs generators and job runners. Your solution must scale to accommodate increases in usage and must accommodate the addition of new applications without negatively affecting the performance of existing ones. What should you do?

  • A. Create an API using App Engine to receive and send messages to the applications
  • B. Use a Cloud Pub/Sub topic to publish jobs, and use subscriptions to execute them
  • C. Create a table on Cloud SQL, and insert and delete rows with the job information
  • D. Create a table on Cloud Spanner, and insert and delete rows with the job information

Answer: A

NEW QUESTION 18

Your organization has been collecting and analyzing data in Google BigQuery for 6 months. The majority of the data analyzed is placed in a time-partitioned table named events_partitioned. To reduce the cost of queries, your organization created a view called events, which queries only the last 14 days of data. The view is described in legacy SQL. Next month, existing applications will be connecting to BigQuery to read the events data via an ODBC connection. You need to ensure the applications can connect. Which two actions should you take? (Choose two.)

  • A. Create a new view over events using standard SQL
  • B. Create a new partitioned table using a standard SQL query
  • C. Create a new view over events_partitioned using standard SQL
  • D. Create a service account for the ODBC connection to use for authentication
  • E. Create a Google Cloud Identity and Access Management (Cloud IAM) role for the ODBC connectionand shared “events”

Answer: AE

NEW QUESTION 19

You need to migrate a 2TB relational database to Google Cloud Platform. You do not have the resources to significantly refactor the application that uses this database and cost to operate is of primary concern.
Which service do you select for storing and serving your data?

  • A. Cloud Spanner
  • B. Cloud Bigtable
  • C. Cloud Firestore
  • D. Cloud SQL

Answer: D

NEW QUESTION 20

Flowlogistic’s CEO wants to gain rapid insight into their customer base so his sales team can be better informed in the field. This team is not very technical, so they’ve purchased a visualization tool to simplify the creation of BigQuery reports. However, they’ve been overwhelmed by all the data in the table, and are spending a lot of money on queries trying to find the data they need. You want to solve their problem in the most cost-effective way. What should you do?

  • A. Export the data into a Google Sheet for virtualization.
  • B. Create an additional table with only the necessary columns.
  • C. Create a view on the table to present to the virtualization tool.
  • D. Create identity and access management (IAM) roles on the appropriate columns, so only they appear in a query.

Answer: C

NEW QUESTION 21
......

P.S. Thedumpscentre.com now are offering 100% pass ensure Professional-Data-Engineer dumps! All Professional-Data-Engineer exam questions have been updated with correct answers: https://www.thedumpscentre.com/Professional-Data-Engineer-dumps/ (239 New Questions)